Accurate preoperative classification of ovarian tumours is central to deciding whether patients require referral to oncology services or can be managed in a general hospital. Ultrasound-based strategies are widely used to support this distinction, but their performance varies by model, threshold and patient group. A systematic review and meta-analysis published in Ultrasound in Obstetrics & Gynecology compared the Risk of Malignancy Index, Logistic Regression model 2, the IOTA Simple Rules model, ADNEX and subjective assessment. Most approaches showed good diagnostic performance, with subjective assessment and Simple Rules followed by subjective assessment providing the strongest overall results. ADNEX offered a practical alternative when a strategy less dependent on operator expertise was preferred.
Subjective Assessment Shows Strongest Performance
The comparison covered 99 eligible studies and more than 42,000 ovarian tumours, around a quarter of which were malignant. Most data were collected prospectively, and the included settings ranged from oncology centres to mixed and non-oncology environments. The assessed models were selected because they are among the most widely used and clinically implemented ultrasound-based approaches for differentiating benign and malignant ovarian tumours.
Subjective assessment achieved the strongest overall diagnostic performance. It was the only approach in the overall population with both sensitivity and specificity above 90%. This reflects the value of expert ultrasound interpretation based on image recognition and clinical information. However, its performance depends on examiner experience, training and exposure.
Must Read: Multimodal AI for Ovarian Mass Ultrasound
Simple ultrasound-based Rules followed by subjective assessment of inconclusive cases performed similarly to subjective assessment alone. This combined strategy applies Simple Rules first and reserves expert subjective assessment for cases that remain inconclusive. In the included data, fewer than one in five cases required this second step. The approach therefore maintains high diagnostic accuracy while reducing the workload for experienced examiners.
ADNEX Offers an Operator-Independent Option
The ADNEX model also performed well, although its accuracy depended on the selected cut-off. Lower thresholds increased sensitivity and reduced the likelihood of missed malignancies, but they also lowered specificity. Higher thresholds improved specificity but reduced sensitivity. The strongest balance appeared around the 20% cut-off, while higher cut-offs produced greater specificity at the expense of sensitivity.
This pattern illustrates the practical trade-off between identifying malignancies and limiting false-positive results. ADNEX provides risk estimates as percentages and avoids inconclusive results, which may make it useful when a strategy less dependent on operator expertise is preferred. It also allows threshold selection according to the clinical setting and the relative importance placed on sensitivity or specificity.
Logistic Regression model 2 also showed good diagnostic performance at its commonly used threshold, but ADNEX offers additional multiclass risk prediction and more flexibility in cut-off selection. The Risk of Malignancy Index showed the weakest performance among the assessed approaches. Although its specificity remained comparatively high, its sensitivity was lower than that of the other ultrasound-based strategies. This lower sensitivity creates a less favourable profile when the aim is to reduce missed malignancies before treatment decisions are made.
Menopausal Status and Prevalence Affect Accuracy
Diagnostic performance differed by menopausal status. Specificity was higher in premenopausal women than in postmenopausal women across all strategies. This difference was particularly evident for LR2 and ADNEX. In postmenopausal women, lower specificity may increase false-positive results, especially when low ADNEX thresholds are used.
Sensitivity tended to be lower in premenopausal women than in postmenopausal women. The difference was especially clear for RMI, which had notably weaker sensitivity in premenopausal women. In both premenopausal and postmenopausal groups, subjective assessment and Simple Rules followed by subjective assessment achieved the strongest overall performance.
Disease prevalence also affected model performance. Specificity was generally lower in populations with higher ovarian cancer prevalence. Across low- and high-prevalence settings, subjective assessment and Simple Rules followed by subjective assessment retained the strongest diagnostic profile. ADNEX, Simple Rules with inconclusive cases treated as malignant and LR2 followed in overall performance. These findings show that model selection and threshold choice may need to account for patient mix, menopausal status and the expected prevalence of malignancy in the assessed population.
Ultrasound-based assessment can support preoperative differentiation between benign and malignant ovarian tumours, but model choice remains important. Subjective assessment and Simple Rules followed by subjective assessment provide the best overall diagnostic performance, with the combined strategy reducing reliance on expert review for every case. ADNEX offers a strong alternative when a strategy less dependent on examiner expertise is needed. RMI performs less well than the other assessed approaches, mainly because of lower sensitivity. Menopausal status, malignancy prevalence and selected ADNEX cut-off all influence the balance between sensitivity and specificity.
Source: Ultrasound in Obstetrics and Gynecology
Image Credit: iStock
References:
Lems E, Koch AH, Delvaux EJLG et al. (2026) Diagnostic accuracy of ultrasound models for assessment of ovarian tumors: systematic review and meta-analysis. Ultrasound Obstet Gynecol, 67:590–603