Review of Evaluation of surgical procedures for sex reassignment: a systematic review

This is a 2007 review of research on gender reassignment surgery. It shows clearly that we need more research in this area.

The research is not strong enough to evaluate the efficacy of gender reassignment surgery in general. In addition, we do not have a way to evaluate particular surgeries.

From the abstract:

“The evidence concerning gender reassignment surgery in both MTF and FTM transsexism has several limitations in terms of: (a) lack of controlled studies, (b) evidence has not collected data prospectively, (c) high loss to follow up and (d) lack of validated assessment measures. Some satisfactory outcomes were reported, but the magnitude of benefit and harm for individual surgical procedures cannot be estimated accurately using the current available evidence.”

The authors reviewed all the articles they could find on gender reassignment surgeries from 1980 onwards. The review took place in October and November 2005.

The great strength of this review is that they looked at individual surgical procedures. Too often studies lump together all gender reassignment surgeries and then evaluate whether or not they were effective. It is possible that some surgeries are more helpful for people’s mental well-being than others. In addition, some surgeries may have better physical outcomes or fewer risks than others. The physical outcomes could certainly affect people’s mental well being as well.

They did not find enough good studies looking at individual surgeries; there is a great need for more such studies. We need to know what are the complications and problems with various surgeries. Are some techniques better than others? Do some medical centers have better physical outcomes than others?

Only a few of the studies reported on patients’ well-being, mental health, or satisfaction; these studies had the same methodological weaknesses as the others.

This is the main finding of the review – we don’t have great data and we need further research. You can read more about some of the specific surgical procedures here.

The authors discuss the quality of research and directions for future research; I have included their discussion below.


In the first section concerning MTF surgical procedures, 38 published papers met the inclusion criteria (23 case series and 15 case studies) with an additional 13 papers excluded (four case series, three case studies, four reviews, one prospective non-randomized controlled study, one expert opinion). The level of included evidence was of poor quality. There was a clear lack of randomized controlled evidence and only one excluded study included a control group comparison. No studies met the inclusion criteria for labiaplasty, orchidectomy or penectomy procedures. A large amount of evidence is available reporting vaginoplasty and clitoroplasty procedures. Some complications have been reported. All the studies report, to various degrees, satisfactory outcomes in terms of being able to have penetrative sexual intercourse and achieving sexual fulfilment.

In the second section concerning FTM surgical procedures, 44 published papers met the inclusion criteria (26 case series, 17 case studies, one cohort study) with an additional 19 papers being excluded (seven reviews, five expert opinions, four case series, three case studies). The majority of included evidence was of poor quality. Many of the studies reported good satisfactory outcomes with few complications for each of the individual procedures. The main outcomes reported were the ability to perform penetrative sexual intercourse and achieve orgasm. Another key factor requested by many FTM patients was the ability to void whilst standing. Whilst successful results were reported by many studies for phalloplasty procedures, an inability to perform sexual penetration due to the construction of a small phallus was a common problem reported following the metoidioplasty procedure. Some of the FTM core surgical procedures are frequently completed along with other surgery, making it difficult to assess the effectiveness of each procedure alone. Furthermore, the assessment of effectiveness is also confounded by the lack of controlled evidence, unclear outcome measures, and a reliance on case series and case studies.

Six previous reviews have reported the clinical effectiveness of GRS. Six reviewed evidence in MTF patients and three of these also reviewed evidence in FTM patients. Of these, three were systematic reviews. These earlier reviews provide a summary of approximately 172 individual studies. Two recent unpublished reports provided a brief summary of some of the reviews. Several key points were raised in these previous reviews. The first related to the quality of the evidence and study design. Concerns were raised about the lack of randomized controlled evidence, the majority of evidence involved case studies and case series, with few studies using group comparisons, standardized measures or the follow up of participants. A second concern related to the validity of findings. Many studies involved a combination of different surgical procedures. Thirdly, there was concern about the validity of outcome measures. Despite many reports of positive outcomes of patients, there was little consensus of how to measure effectiveness. The large range of outcomes reported across studies makes it difficult to accurately evaluate the overall outcomes of individual surgical procedures.

Several previous reviews reported a controlled study which compared 20 patients having immediate surgery with 20 patients awaiting surgery for penectomy, orchidectomy and the construction of a neovagina. The remaining studies reflect lower grades of evidence, and had further problems in their design such as selected patient groups, retrospective analysis and losses to follow up. Conclusions from the reviews are understandably tentative, but highlight improvements in patients across most studies, although 10–15% of patients with transsexism who undergo GRS have poor outcomes.

The quality of evidence included in this review has been poor due to the lack of concealment of allocation, completeness of follow up and blinding. As well as the fundamental limitation in study design, several other issues regarding the interpretation of the evidence are worth consideration. Firstly, all the reviews, and many of the individual studies within them, examine different types of GRS. The Mate-Kole study, for example, is essentially an evaluation of three surgical techniques. Clearly, trying to reach a robust conclusion about GRS as a whole is not possible when the combination of techniques varies across studies. Secondly, the patient populations within, and across studies, are heterogeneous and we have little idea about the referral, diagnosis, assessment and selection processes that precede inclusion within the studies. Consequently, Brown concludes that a lengthy differential diagnosis and a specialized approach to interviewing gender dysphoric patients are needed. Thirdly, the choice of outcome measures varies across studies, with very little use of validated health-related quality of life (QOL) measures. This complicates further our ability to draw conclusions, and also limits the commissioners’ ability to identify studies that use outcomes that are relevant to their role. Finally this review has focused on a subset of surgical procedures that are used within this field. Whilst these are considered to be the most routine, it is recognized that other procedures are currently used and these too need to be critically appraised in future reviews.

No published evidence on cost-effectiveness was found. Best and Stein speculate that some cost offsets are possible following surgery due to the reduced need for psychiatric and hormonal treatment, but no evidence is available for this. The lack of generic QOL measures means that measures of cost-effectiveness that can be used to assess value for money relative to other healthcare interventions are not possible.

When trying to consider all of the evidence together, there is a dilemma regarding its interpretation. Reviews of heterogeneous patient groups and interventions clearly give the greatest depth of evidence, but give little in the way of specific information that is of use to purchasers. In contrast, studies of individual techniques have a more limited evidence base but allow us to focus on specific clinical questions with more consistent reporting. But these provide information on purchasing decisions that are less realistic, as some procedures are unlikely to be purchased in isolation. In between these extremes, are sets of studies that investigate various combinations of multiple procedures, but matching these studies to the activity of different providers and patients, is extremely complex.

Taking this reasoning further, some would argue that assessment of GRS in isolation is difficult to interpret, as it is the final step in a longer treatment process. This is more contentious, as many patients do not reach the point of referral for surgery and many do not wish to undergo any surgery. Also, taking this argument to its extreme would require studies of the effectiveness of treatment from initial diagnosis to the end of post-surgical follow up; such studies do not exist.

Despite these difficulties in interpretation of review evidence the conclusion about the strength of evidence regarding GRS appears clear: little robust evidence exists.

Future research

There is a need for good quality controlled trials based on clearly defined diagnosis and assessment criteria.

An important consideration for future studies is how best to evaluate the effectiveness of a surgical procedure. One possibility is assessment of patient satisfaction and regret following surgery. More importantly is the need for standardised measures to assess the outcome of surgery. One suitable method, which has received limited research, is the use of QOL measures in samples before and after GRS. Rakic et al. investigated several aspects of QOL after GRS in 32 patients with transsexism (22 MTF, 10 FTM). Four aspects of QOL were examined: sexual activity; attitude towards the patients’ own body; relationships with other people; and occupational functioning. For the majority of persons with transsexism, QOL improved after surgery in terms of these aspects. All patients (100%) were satisfied with their GRS. However, only 20 patients (62%) were satisfied with how their bodies looked. In a study by Barrett, they used the General Health Questionnaire and assessments of depression inpatient groups. More controlled studies using this type of experimental design are needed to provide a better measure of surgical effectiveness.

For many patients undergoing GRS, their desire is to look ‘normal’ and be capable of having a normal sexual relationship. The results presented in this review have provided little evidence on how successful individual surgical procedures are in achieving these goals. Further research is needed to investigate these specific outcome measures of satisfaction and function.

In conclusion, we have confirmed the findings from previous reviews that the evidence to support GRS has several limitations in terms of: (a) lack of controlled studies; (b) evidence has not collected data prospectively; (c) high loss to follow up; and (d) lack of validated assessment measures. We have extended these findings from previous reviews by providing a summary of the evidence available for each of the ‘core’ procedures for MTF and FTM transsexism. In the majority of studies a large number of persons with transsexism experience a successful outcome in terms of subjective well being, cosmesis, and sexual function. We conclude that the magnitude of benefit and harm cannot be estimated accurately using the current available evidence.

Original Source:

Evaluation of surgical procedures for sex reassignment: a systematic review by Sutcliffe PA, Dixon S, Akehurst RL, Wilkinson A, Shippam A, White S, Richards R, Caddy CM in J Plast Reconstr Aesthet Surg. 2009 Mar;62(3):294-306.


