|
|
||||||||
ORIGINAL RESEARCH |
From the Early Pregnancy and Gynaecology Ultrasound Unit, Department of Obstetrics and Gynaecology, Kings College Hospital; the Department of Obstetrics and Gynaecology, Greenwich District Hospital; the Department of Obstetrics and Gynaecology, University Hospital Lewisham; and the Department of Public Health Sciences, Guys, Kings, and St. Thomas School of Medicine, Kings College Hospital, London, United Kingdom.
Address reprint requests to: Davor Jurkovic, MD, PhD, Early Pregnancy and Gynaecology Ultrasound Unit, Department of Obstetrics and Gynaecology, Kings College Hospital, Denmark Hill, London SE5 8RX, United Kingdom, E-mail: davor.jurkovic{at}kcl.ac.uk
| Abstract |
|---|
|
|
|---|
Methods: This was a prospective collaborative study. Women were recruited from three hospitals and all assessments were performed at the Gynaecology Ultrasound Unit, Kings College Hospital. One hundred women with known adnexal masses were examined preoperatively. The demographic, biochemical, and sonographic data recorded for each patient included age, menopausal status, CA 125 levels, ultrasound morphology, and Doppler blood flow analysis. The diagnosis of malignancy was made for each woman using three logistic regression models previously described by Alcazar et al, Tailor et al, and Timmerman et al. Variables used in these models were then combined to form a new model. The results were compared with the final histopathologic diagnosis.
Results: Sixty-seven women had benign tumors and 33 had ovarian cancer. Women with malignant tumors were older than those with benign masses. There were significant differences in CA 125 levels, presence of papillary proliferations, and ascites between the two groups. The sensitivities and specificities achieved respectively by the models were as follows: 45% and 93% with Tailor et als model, 9% and 99% with Alcazar et als model, and 73% and 91% with Timmer-man et als model. There was no significant improvement over the performance of Timmerman et als model and the new combined model.
Conclusion: All models performed less well than originally reported. Combining the models did not lead to a significant improvement in performance. Larger sample sizes that incorporate all types of ovarian tumors are necessary to design more accurate diagnostic models.
Accurate preoperative discrimination of benign and malignant adnexal masses remains difficult despite recent advances in medical imaging. The quality of cytoreductive surgery is one of the most important prognostic factors and it is argued that surgery for ovarian cancer should be performed in specialized oncology centers.1 However, the presence of an adnexal mass is a common problem encountered in clinical practice and referral of all women with such masses would be both impractical and unnecessary. Preoperative detection of malignancy would allow selective referrals to the appropriate centers for optimal care, whereas those women with benign masses could be offered more conservative surgery. In the past, individual sonographic, demographic, and biochemical variables were used to distinguish between benign and malignant adnexal masses.2,3 However, used alone, the diagnostic accuracy of these variables was poor. Subsequently, variables were combined to form morphologic scoring systems that assess various morphologic features identified on gray-scale ultrasonography.4,5
More recently, in an attempt to improve diagnostic accuracy further, multivariable logistic regression models have been introduced that combine various sonographic, demographic, and biochemical data.69 In the initial reports, these models achieved high sensitivities and specificities in the diagnosis of ovarian cancer. Furthermore, they enable calculations of the probability of malignancy in an adnexal mass, which is helpful when counseling patients. In this study we tested the accuracy of three such models on a large number of women with adnexal masses.68 We analyzed each model individually and examined whether combining models might further improve their diagnostic performance.
| Materials and Methods |
|---|
|
|
|---|
/6). After this, the entire tumor was surveyed by color Doppler imaging. The ultrasound equipment was initially set at maximum sensitivity to detect blood flow. By gradually increasing the pulse repetition frequency, low-velocity signals were filtered out. A pulsed Doppler gate was placed over the areas within the tumor with the highest blood velocity. Adjustments were made to the angle of the probe until the audible signal gave the highest pitch. Flow velocity waveforms were obtained and the peak systolic velocity, time-averaged maximum velocity, pulsatility index (PI), and resistance index (RI) were calculated electronically. In women with bilateral tumors, the side with the highest blood flow was included for analysis. A blood sample was then taken for the measurement of CA 125 levels (Immuno-1 analyzer; Bayer, Tarrytown, NY).
All tumor specimens were examined histologically and classified according to the World Health Organization classification.10 Tumors were staged surgically at the time of the operation by the attending gynecologic surgeon according to the International Federation of Gynecology and Obstetrics (FIGO) staging system.11
The probability of malignancy in adnexal masses was estimated using three previously described logistic regression models. The first model was reported by Tailor et al in 1997 (LR1) and was based on a combination of demographic and ultrasound variables to diagnose ovarian malignancy.6 The LR1 model calculates the probability of malignancy of an adnexal mass using the formula: P = 1/(1 + e-z), where e is the base value for natural logarithms and z = (0.1273 age) + (0.2794) time-averaged maximum velocity [TAMXV]) + (4.4136 papillary projection score) - 14.2046. The papillary projection score is given values of 1 or 0 depending on the presence or absence of papillae.6 A probability of greater than 50% was taken to be diagnostic of malignancy.
The second model was proposed by Alcazar et al in 1998 (LR2). This model combines morphologic features with Doppler blood flow variables.7 The LR2 model uses the same formula, except z = -5.002 + (4.263 Score) + (3.095 CD). The score was the morphologic score that was assessed using the scoring system proposed by Sassone et al.4 A final score of 9 or greater was given the value of 1, whereas a final score of less than 9 was assigned a value of 0. The CD represented the color Doppler findings, and an RI value of over 0.45 was given the value of 0; an RI value of equal to or less than 0.45 was given the value of 1. A probability of greater than 75% was taken to be diagnostic of malignancy.
The third model was described by Timmerman et al in 1999 (LR3) and includes tumor marker CA 125, as well as morphologic and demographic data.8 The LR3 model uses the same formula, except z = (0.5948 menopausal status) + (0.0205 CA 125 levels) + (0.5446 ascites) - (0.762 unilocularity) - (1.1606 smooth) + (1.5409 papillary projection score) + (0.7633 bilateral masses) - 1.0889. Values of 1 or 0 are assigned for the presence or absence of the following variables: ascites, bilateral masses, unilocular cysts, and internal smooth walls of the mass. Menopausal status is given values of 0 or 1 depending on whether the woman is pre- or postmenopausal.8 A final probability of greater than 50% was taken to be diagnostic of malignancy.
Logistic regression was then used to combine the variables used in the models to create four new models: one combining variables from LR1 and LR2, one combining variables from LR1 and LR3, one combining variables from LR2 and LR3, and one combining variables from LR1, LR2, and LR3, in an attempt to improve diagnostic accuracy.
A database file was set up using Microsoft Excel (Redmond, WA) for Windows to facilitate data entry and retrieval. Statistical analysis was carried out using SPSS for Windows (Version 6.0; SPSS Inc., Chicago, IL). The means of unpaired groups of data were compared using the Mann-Whitney U test or Student t test. The proportions of benign and malignant cases with various morphologic characteristics were compared using the Yates corrected µ2 test. Receiver operating characteristic (ROC) curves were constructed with the results. The performances of the three models were compared by analyzing the areas under the ROC curves, using the methods of Hanley and McNeil.12,13 It was calculated that a sample of 100 women would enable us to detect an odds ratio (OR) of 5.0 for a given predictor variable in a multiple logistic regression model, with 80% power at the 5% significance level.
| Results |
|---|
|
|
|---|
|
|
|
|
|
|
The second combined model included variables from LR1 and LR3. The value for z was calculated using the following formula: z = (4.1379 menopausal status) + (-0.0789 age) + (2.5191 ascites) + (0.2214 bilateral) + (0.0013 CA 125) + (-1.4995 unilocularity) + (0.0262 TAMXV) + (3.5768 papillary projection score) + (-2.8723 smooth) + 1.9213. This model achieved a 79% sensitivity and 94% specificity.
The third model combined variables from LR2 and LR3, where z = (2.1679 menopausal status) + (2.8941 ascites) + (0.0013 CA 125) + (0.1499 bilateral) + (-1.4350 unilocularity) + (3.8600 papillary projection score) + (-2.7390 smooth) + (-0.0139 score) + (1.8962 CD) - 0.8216. This model achieved a 73% sensitivity and 84% specificity.
The fourth model was generated by combining variables from all three models. z was calculated using the following: z = (-0.0725 age) + (0.0184 TAMXV) + (3.6824 papillary projection score) + (3.9781 menopausal status) + (2.6358 ascites) + (0.2002 bilateral) + (0.0013 CA 125) + (-1.5192 unilocularity) + (-2.8796 smooth) + (-0.0310 score) + (0.9761 CD) + 1.7507. This model achieved a 79% sensitivity and 94% specificity.
An ROC curve was constructed using these results. The best accuracy was achieved by combining the models LR1 and LR3. However, the performance of this model was not significantly better than using LR3 alone. The addition of the variables used in LR2 did not improve the sensitivity and specificity.
| Discussion |
|---|
|
|
|---|
The LR2 model failed to detect 30 of the 33 cases of ovarian cancer. This model is largely influenced by impedance to blood flow in the tumors, with an RI of over 0.45 being given a value of 0 and an RI of less than or equal to 0.45 a value of 1. In our study, this led to the large numbers of false-negative results. Indeed, the poor diagnostic performance of indices of impedance to flow has been reported in other studies. Tekay et al3 found a significant overlap between the PI and RI values in benign and malignant tumors. Using an RI value of 0.5 as diagnostic of malignancy, only a 46% sensitivity and 89% specificity were achieved. Similar results were reported by other investigators.14,15 They concluded that no particular PI and RI cutoff value can be defined to enable accurate differentiation between benign and malignant adnexal masses.
Meanwhile, the LR3 model failed to diagnose nine cases of ovarian cancer. This was composed of two borderline tumors, four nonepithelial ovarian cancers, and three epithelial cancers. The two cases of borderline tumors occurred in premenopausal women and on ultrasound scan unilocular cysts were seen without the typical morphologic features of malignancy such as papillary proliferations, wall irregularities, multilocularity, and ascites. The three invasive epithelial ovarian tumors that were wrongly classified as benign also had no ultrasound features of malignancy. All four cases of nonepithelial tumors had smooth internal walls with no evidence of papillary proliferations. None of those tumors were bilateral and ascites was absent. Three of four of those cases had low CA 125 levels.
Previous studies have suggested that the malignant potential of unilocular cysts is low.2,15 Granberg et al found that of 499 (49%) unilocular, 438 (43%) multilocular, and 80 (8%) solid tumors studied, only 1% of the unilocular cysts were malignant compared with 44% of multilocular cysts.2 Thirty-nine percent of the solid tumors were malignant. In addition, this group reported that risk of malignancy rises with increasing cyst size.2,16 They concluded that unilocular cysts less than 5 cm in diameter were unlikely to be malignant. However, in our study three of 21 invasive epithelial tumors and two of six borderline tumors appeared as unilocular cysts. Of these, two of the invasive epithelial tumors and one of the borderline tumors were less than 5 cm in diameter. The four nonepithelial tumors misdiagnosed were multilocular but lacked other features of malignancy.
The LR1 model had five false-positive results. Four of those false-positive results occurred in premenopausal women whose tumors were highly vascular. The fifth case was that of a vascular thecoma in a postmenopausal woman. The LR2 model gave only one false-positive result. An endometrioma was incorrectly classified as malignant. The RI was less than 0.45 in that case. The LR3 model gave six false-positive results. Two of those were cases of benign cystadenomas occurring in young premenopausal women. Both of those cysts showed the presence of papillary proliferations on scan. The CA 125 levels were 17 kU/L in both of those women. In another two cases bilateral endometriotic cysts were present with raised CA 125 levels; that led to a false-positive diagnosis. In the third case of endometriosis, papillary proliferations were noted within the cyst, leading to a false-positive diagnosis. The CA 125 levels again were raised. The final false-positive result occurred in a postmenopausal woman with a benign cystadenoma. The cyst was unilocular and smooth-walled. However, the CA 125 levels were grossly elevated at 1067 kU/L.
Although CA 125 levels are found to be raised in over 80% of women with invasive epithelial ovarian cancers, it is well documented that CA 125 lacks specificity and is often raised in women with benign conditions such as endometriosis or pelvic inflammatory disease.17 As demonstrated by our results, CA 125 may affect diagnostic accuracy of models, which limits the usefulness of that parameter when combined to form an integral part of such models. We found that using a cutoff value for CA 125 of over 35 kU/L achieved a diagnostic sensitivity and specificity of 76% and 82%, respectively. That result is comparable to those previously reported.17,18
It appears from these results that when tested prospectively, none of the models achieved the level of diagnostic accuracy that would allow their implementation into routine clinical practice. The relatively poor prospective performance of the logistic regression models may be due to several factors. All models were originally designed based on retrospective data analysis, thus leading to the best-fit models being generated for each data set. Clearly, our data set is markedly different from that used in the three models, which may account for the differences in performance. In fact, the original data sets are also different from each other, leading to different variables being retained in the different models. This in itself reflects the heterogeneity of tumors encountered in everyday practice. Combining variables used in the different models might possibly overcome this problem. However, our results showed that combining variables did not lead to a significant improvement in diagnostic performance compared with using the LR3 model alone. We found that adding the variables used in LR3 to the other models improved their performance, whereas adding the variables from LR1 or from LR2 to the LR3 model did not significantly improve performance. When considering implementation of new combined models into clinical practice, one must remember that the proposed models will not perform as well when applied to new populations compared with the population on which it was developed.
Subjective assessment of adnexal masses by experts achieved high sensitivities and specificities in a recent study.19 A 96% sensitivity and 90% specificity were achieved by experienced operators taking into account both ultrasound findings and clinical history. Valentin20 suggested in a recent editorial that subjective assessment was as good as, if not superior to, the mathematic models published to date. However, subjective assessment is dependent on the experience of the operator and therefore, in practical terms, reliable assessment may be limited to only a few specialized centers.
The introduction of diagnostic models would allow a uniform approach when assessing adnexal masses. The models would ensure reproducibility of diagnosis and reduce dependence on operator experience. Another important advantage is the ability to estimate the probability of malignancy that is helpful in both decision making and patient counseling. Therefore, the search for optimum diagnostic models should continue. Alternatives are logistic regression models designed to investigate nonlinear relationships or the use of artificial neural networks. Initial reports of neural network models appear to be encouraging.8,21 However, all published neural networks are designed using a limited sample size and therefore must be tested on a larger number of women before then can be introduced into clinical practice.
| Footnotes |
|---|
Received September 7, 1999. Received in revised form December 28, 1999. Accepted January 10, 2000.
| References |
|---|
|
|
|---|
2. Granberg S, Wikland M, Jansson I. Macroscopic characterization of ovarian tumors and the relation to the histological diagnosis: Criteria used for ultrasound evaluation. Gynecol Oncol 1989;35: 13944.[Medline]
3. Tekay A, Jouppila P. Validity of pulsatility and resistance indices in classification of adnexal tumors with transvaginal color Doppler ultrasound. Ultrasound Obstet Gynecol 1992;2:33844.[Medline]
4. Sassone M, Timor-Tritsch IE, Artner A, Westhoff C, Warren W. Transvaginal sonographic characterization of ovarian disease: Evaluation of a new scoring system to predict ovarian malignancy. Obstet Gynecol 1991;78:706.
5. Lerner JP, Timor-Tritsch IE, Federman A, Abramovich G. Transvaginal ultrasonographic characterization of ovarian masses with an improved, weighted scoring system. Am J Obstet Gynecol 1994;170:815.[Medline]
6. Tailor A, Jurkovic D, Bourne TH, Collins WP, Campbell S. Sonographic prediction of malignancy in adnexal masses using multivariate logistic regression analysis. Ultrasound Obstet Gynecol 1997;10:417.[Medline]
7. Alcazar JL, Jurado M. Using a logistic model to predict malignancy of adnexal masses based on menopausal status, ultrasound morphology, and color Doppler findings. Gynecol Oncol 1998;69:14650.[Medline]
8. Timmerman D, Verrelst H, Bourne TH, De Moor B, Collins WP, Vergote I, et al. Artificial neural network models for the preoperative discrimination between malignant and benign adnexal masses. Ultrasound Obstet Gynecol 1999;13:1725.[Medline]
9. Schutter EMJ, Sohn C, Kristen P, Mobus V, Crombach G, Kaufmann M, et al. Estimation of probability of malignancy using a logistic model combining physical examination, ultrasound, serum CA 125, and serum CA 72-4 in postmenopausal women with a pelvic mass: An international multicenter study. Gynecol Oncol 1998;69:5663.[Medline]
10. Serov SF, Scully RE, Sobin LH. Histological typing of ovarian tumors. International histological classification of tumors, vol. 9. Geneva: World Health Organization, 1973:1754.
11. Shepherd JH. Revised FIGO staging for gynaecological cancer. Br J Obstet Gynaecol 1988;96:88992.
12. Hanley JA, McNeil BL. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:83943.
13. Hanley JA, McNeil BL. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143: 2936.
14. Valentin L, Sladkevicius P, Marsal K. Limited contribution of Doppler velocimetry to the differential diagnosis of extrauterine pelvic tumors. Obstet Gynecol 1994;83:42533.
15. Prompeler HJ, Madjar H, Sauerbrei W, Lattermann U, Pfleiderer A. Quantitative flow measurements for classification of ovarian tumors by transvaginal color Doppler sonography in postmenopausal patients. Ultrasound Obstet Gynecol 1994;4:40613.[Medline]
16. Granberg S, Norstrom A, Wikland M. Tumors in the lower pelvis as imaged by vaginal sonography. Gynecol Oncol 1990;37:2249.[Medline]
17. Vasilev SA, Schaerth JB, Campeau J, Morrow CP. Serum CA 125 levels in preoperative evaluation of pelvic masses. Obstet Gynecol 1988;71:7516.
18. Di-Xia C, Schwartz PE, Xinguo L, Zhan Y. Evaluation of CA 125 levels in differentiating malignant tumors from benign tumors in patients with pelvic masses. Obstet Gynecol 1988;72:237.
19. Timmerman D, Schwarzler P, Collins WP, Claerhout F, Coenen M, Amant I, et al. Subjective assessment of adnexal masses with the use of ultrasonography: An analysis of interobserver variability and experience. Ultrasound Obstet Gynecol 1999;13:116.[Medline]
20. Valentin L. High-quality gynecological ultrasound can be highly beneficial, but poor-quality gynecological ultrasound can do harm (editorial). Ultrasound Obstet Gynecol 1999;13:17.[Medline]
21. Tailor A, Jurkovic D, Bourne TH, Collins WP, Campbell S. Sonographic prediction of malignancy in adnexal masses using an artificial neural network. Br J Obstet Gynaecol 1999;106:2130.[Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |