Hybrid AI Enhances O-RADS MRI Scoring

In Imaging
Wed, 9 Apr 2025

The Ovarian-Adnexal Reporting and Data System (O-RADS) for MRI provides a structured method for stratifying malignancy risk in adnexal masses. Endorsed by the American College of Radiology, it categorises findings into five levels of risk, aiming to enhance decision-making and communication between radiologists and referring clinicians. Despite its value, adoption in routine clinical practice remains inconsistent, with radiologist interpretation and application of O-RADS rules varying. Existing calculators require manual interaction, which can be inefficient.

With recent advances in natural language processing, particularly through large language models (LLMs), there is an opportunity to automate the calculation of O-RADS MRI scores directly from radiology report descriptions. A recent review published in Radiology examined the performance of an optimised LLM-based approach, including a hybrid strategy combining AI-driven feature classification with deterministic scoring logic, in assigning O-RADS scores from pelvic MRI reports.

The Promise of Hybrid Automation in Radiology
In this retrospective single-centre study, two LLM-based strategies were evaluated using pelvic MRI reports from a regional cancer centre. All reports described at least one nonphysiologic adnexal lesion and were drawn from two time periods: after and before the implementation of the O-RADS MRI system. The first approach, termed "LLM only", employed GPT-4 with few-shot learning and direct prompting with O-RADS rules to assign scores. The second, referred to as the "hybrid" model, involved GPT-4 classifying lesion features extracted from report text, which were then processed using a deterministic formula to generate the final O-RADS score.

The hybrid model demonstrated superior accuracy across internal test sets. In the set of 173 lesions where radiologists had previously assigned O-RADS scores, the hybrid model achieved 97% accuracy, outperforming the LLM-only model at 90%. For lesions where an O-RADS score had originally been reported (n = 158), the hybrid model again surpassed both the LLM-only model and the original radiologist, with an accuracy of 97%, compared to 89% and 88%, respectively. The hybrid model performed particularly well in categorising O-RADS 2 lesions, correctly classifying 98% of such cases, in contrast to the radiologist’s 86%. Additionally, the model maintained high accuracy when tested on 183 lesions from earlier reports predating the implementation of O-RADS MRI, achieving 95% compared to 87% for the LLM-only strategy. These findings support the hybrid model's robustness in handling diverse report styles and its potential for retrospective or prospective clinical use.

Strategic Optimisation: The Key to Model Performance
The study highlighted the necessity of strategic optimisation for LLM applications in radiology. Out-of-box models often struggle with the complexity of clinical scoring systems. To address this, the hybrid approach utilised prompt engineering and structured classification of features such as lesion size, location, enhancement characteristics and tissue composition. GPT-4 was used to produce structured outputs in JSON format, which were subsequently fed into a deterministic scoring algorithm based on O-RADS MRI rules.

Recommended Read: Updated Imaging Guidelines for Ovarian Cancer

This strategy allowed the model to excel where standard LLMs fall short. The hybrid model demonstrated almost perfect agreement with expert-reviewed reference standards (κ = 0.95), compared with strong agreement between the original radiologist scores and the reference standard (κ = 0.91) and the LLM-only model (κ = 0.87). Discrepancies in hybrid model outputs were rare. Of the 158 lesions with original scores, the hybrid model differed in 21 cases, and in 81% of those, it matched the reference standard. The model’s output also offered interpretability by revealing which specific features were used to determine each score. The most common causes of discrepancies were misclassification of features such as solid enhancing tissue and fluid content. Even with these limitations, the model demonstrated consistent performance across datasets, including reports using less standardised language from the period prior to O-RADS MRI implementation.

Clinical Implications and Future Integration
The hybrid model’s accuracy in classifying O-RADS 2 lesions suggests its utility in reducing unnecessary interventions. Correct identification of benign findings could prevent overtreatment or redundant imaging. Importantly, the study found that only a minority of MRI reports originally included an O-RADS score, underscoring the opportunity for automated tools to increase adoption. The hybrid model showed consistent performance even in reports lacking standard O-RADS terminology, suggesting its potential as a retrospective auditing or prospective decision support tool.

Integration of such models into dictation software or clinical systems could support radiologists by automating part of the scoring process and aligning reporting with expert standards. While external validation and broader generalisability remain necessary, this study demonstrates a clear path forward for implementing LLM-based applications that enhance diagnostic consistency. The transparency of the hybrid model further strengthens its appeal by enabling clinicians to understand the rationale behind the automated scores. Moreover, its strong performance across different lesion types and reporting periods indicates its potential resilience to regional variations in practice.

The study has demonstrated that large language models, when strategically optimised, can accurately assign O-RADS MRI scores from pelvic MRI reports. The hybrid approach, combining LLM feature classification with a deterministic formula, consistently outperformed both a simpler LLM-only model and original radiologist assessments. With high agreement to reference standards and adaptability across reporting styles, this hybrid method may support greater consistency and adoption of O-RADS MRI in clinical practice. Further evaluation in diverse clinical settings and application to other reporting systems would be a valuable next step.

Source: Radiology

Image Credit: Freepik

References:

Bhayana R, Jajodia A, Chawla T et al. (2025) Accuracy of Large Language Model–based Automatic Calculation of Ovarian-Adnexal Reporting and Data System MRI Scores from Pelvic MRI Reports. Radiology, 315:1.

radiology AI, O-RADS MRI, adnexal mass risk, hybrid AI radiology, pelvic MRI scoring

Latest Articles

Hospitals of the Future: The Next Frontier in Patient-Centred Care
- Journal Article
- 18/10/2025
Hospitals are rapidly evolving into smart, connected ecosystems focused on proactive, personalised care. Leveraging AI, robotics, remote monitoring and digital health tools, they enhance diagnostics, improve workflows and support decentralised models like virtual wards. Predictive analytics, interoper
READ MORE
AI Orchestration in Emergency Radiology – Implementation in the Valencia Health Region
- Journal Article
- 18/10/2025
The Valencia Health Region deployed a vendor-neutral AI orchestration system across 29 hospitals to improve emergency radiology. Validated at Hospital General Universitario Dr Balmis, it streamlines triage, accelerates diagnoses and reduces radiologists’ workload. The system processes over 5,700 studi
READ MORE
Advancement of 3D Printing in Healthcare and Its Impact on Sustainability
- Journal Article
- 18/10/2025
3D printing is transforming healthcare through personalised devices, surgical precision and faster prototyping while advancing sustainability. On-demand production reduces waste, supports circular economy models and lowers carbon footprints by minimising transport and inventory. Despite its promise,...
READ MORE

O-RADS MRI, adnexal mass risk, hybrid AI radiology, pelvic MRI scoring, radiology AI, LLMs in healthcare, GPT-4 radiology, ovarian cancer imaging, structured reporting Hybrid AI enhances O-RADS MRI scoring, boosting diagnostic accuracy & clinical adoption in adnexal lesion evaluation.

Hybrid AI Enhances O-RADS MRI Scoring

References:

Latest Articles

Related Articles

Latest News

INFO

IMAGING

ICU

EXEC

IT

CARDIOLOGY

JOURNALS

EVENTS

FACULTY

PARTNERS

JOBS

COMPANIES

PRODUCTS

BLOG

VIDEOS

Communities

CONTACT US

EU Office

Rue Villain XIV 53-55

B-1050 Brussels, Belgium

Tel: +357 86 870 007

E-mail: [email protected]

EMEA & ROW Office

166, Agias Filaxeos

CY-3083, Limassol, Cyprus

Tel: +357 86 870 007

E-mail: [email protected]

Headquarters

Kosta Ourani, 5

Petoussis Court, 5th floor

CY-3085 Limassol, Cyprus

E-mail: [email protected]