AI-Guided Hybrid Reading Maintains Mammography Accuracy

In Imaging
Wed, 10 Sep 2025

Artificial intelligence has shown strong performance in mammographic screening, yet implementation remains cautious because even high-performing models can miss cancers. A key challenge is knowing when the model’s prediction is reliable enough to act on without human input. Researchers evaluated a strategy that pairs a probability of malignancy score with an explicit measure of prediction uncertainty, allowing AI to take recall decisions only when confident and routing the rest to radiologist double reading. The aim was to test whether such a hybrid workflow could lower reading workload while maintaining cancer detection and recall rates in a national screening context.

Uncertainty-Guided Model and Reading Workflow

The team developed a noncommercial mammography interpretation pipeline that detects suspicious regions, classifies them with a ConvNeXt-tiny network, and aggregates findings into an examination-level probability of malignancy on a 1–100 scale. Alongside this score, the system generated an uncertainty estimate focused on the region classification stage, which was considered the main source of potential error at examination level given the region detector’s high sensitivity. Eight candidate uncertainty metrics were explored, derived either from Monte Carlo dropout distributions or from the probability output itself, and computed from either the most suspicious region or all regions. The dataset comprised 41,471 digital screening examinations from 15,524 women in the Dutch national programme from July 2003 to August 2018; images were acquired on Hologic-Lorad Selenia systems. Examinations had been double read with arbitration, and 2-year follow-up identified screen-detected and interval cancers. Ethical approval was waived.

Must Read: Improving Mammography Report Clarity with AI

A hybrid strategy required two thresholds: one for probability of malignancy and one for uncertainty. If uncertainty exceeded its threshold, the examination was routed to double reading irrespective of the malignancy score. If uncertainty was below the threshold, the AI’s malignancy score alone determined recall. Thresholds were optimised on half of the dataset and tested on the remaining half, using bootstrap resampling to compare cancer detection, recall, sensitivity and specificity with standard double reading. The optimisation sought to minimise the proportion sent to radiologists while keeping cancer detection at or above, and recall at or below, the corresponding means for double reading.

Workload Reduction Without Compromising Accuracy

Three uncertainty metrics enabled a split in which a portion of examinations could be acted on by AI alone while maintaining screening performance. The best performer was the entropy of the mean probability of malignancy for the most suspicious region. With this metric, 61.9% of examinations were referred for radiologist double reading and 38.1% were decided by AI, yielding a cancer detection rate of 6.6 per 1000 examinations and a recall rate of 23.7 per 1000. These outcomes were statistically similar to standard double reading at 6.7 and 23.9 per 1000, respectively. Among recalled women, 19.0% would have been recalled by AI alone under this configuration.

Receiver operating characteristic analyses showed that the AI model’s discrimination depended on its certainty classification. For examinations labeled uncertain, the model achieved an area under the curve of 0.87, whereas for examinations labeled certain the area under the curve was 0.96. At matched specificity within each group, AI sensitivity was lower than radiologists’ in the uncertain group, but comparable in the certain group, where AI sensitivity of 85.4% did not differ from double reading at 88.9%. Cancer prevalence was similar in the uncertain and certain groups at 8.6 and 9.8 per 1000, respectively. Breast density patterns differed between groups, with density C or D more frequent among uncertain cases, while age at diagnosis, tumour size and cancer type showed no clear differences.

Comparators, Alternative Splits and Limitations

Standalone AI at a point optimised to maintain cancer detection comparable to double reading produced a markedly higher recall rate of 52.8 per 1000 and lower specificity of 95.4%, underscoring the value of uncertainty-based routing rather than unconditional automation. A more conventional hybrid that ignored uncertainty and sent only the highest AI scores for double reading required radiologist review of 91.1% of examinations to preserve cancer detection and recall rates similar to double reading, offering far less workload relief than the uncertainty-guided split. Only one screen-detected cancer with a confident AI prediction would have been missed by the hybrid strategy; in contrast, cases that AI would have incorrectly dismissed under AI-only reading were captured by the hybrid because their predictions were classified as uncertain and thus sent to radiologists.

Several constraints frame interpretation. Uncertainty estimation was limited to the classification stage and did not model uncertainty from region detection. The retrospective design did not capture potential changes in radiologist behaviour under a new prevalence mix, nor did it measure reading time, so a 38.1% reduction in examinations does not necessarily equate to an equivalent reduction in minutes. All data came from a single screening unit and a single vendor’s digital mammography systems, which may limit generalisability to other settings or to digital breast tomosynthesis. Nonetheless, uncertainty metrics that were simpler to compute performed at least as well as more intensive methods in separating cases into higher and lower AI performance bands, suggesting feasibility for broader architectures.

An uncertainty-aware hybrid reading workflow allowed AI to make recall decisions only when confident and referred the remainder to double reading, reducing the share of examinations requiring radiologist interpretation to 61.9% while maintaining cancer detection and recall at levels comparable to standard double reading. Performance gains concentrated in the AI-certain subset, where discrimination approached that of double reading, supporting selective automation driven by explicit uncertainty quantification. These results indicate a pragmatic pathway to ease screening workload without compromising outcomes, particularly in programmes seeking scalable, audited use of AI for examination triage and recall decisions.

Source: Radiology

Image Credit: iStock

References:

Verboom SD, Kroes J, Pires S et al. (2025) AI Should Read Mammograms Only When Confident: A Hybrid Breast Cancer Screening Reading Strategy. Radiology, 316:2.

breast imaging, cancer detection, women’s health, AI Mammography, Radiology AI, hybrid breast cancer screening, uncertainty guided AI, mammogram screening, double reading, radiology workload

Latest Articles

Hospitals of the Future: The Next Frontier in Patient-Centred Care
- Journal Article
- 18/10/2025
Hospitals are rapidly evolving into smart, connected ecosystems focused on proactive, personalised care. Leveraging AI, robotics, remote monitoring and digital health tools, they enhance diagnostics, improve workflows and support decentralised models like virtual wards. Predictive analytics, interoper
READ MORE
AI Orchestration in Emergency Radiology – Implementation in the Valencia Health Region
- Journal Article
- 18/10/2025
The Valencia Health Region deployed a vendor-neutral AI orchestration system across 29 hospitals to improve emergency radiology. Validated at Hospital General Universitario Dr Balmis, it streamlines triage, accelerates diagnoses and reduces radiologists’ workload. The system processes over 5,700 studi
READ MORE
Advancement of 3D Printing in Healthcare and Its Impact on Sustainability
- Journal Article
- 18/10/2025
3D printing is transforming healthcare through personalised devices, surgical precision and faster prototyping while advancing sustainability. On-demand production reduces waste, supports circular economy models and lowers carbon footprints by minimising transport and inventory. Despite its promise,...
READ MORE

AI mammography, hybrid breast cancer screening, uncertainty guided AI, radiology AI, cancer detection accuracy, double reading, women’s health imaging, mammogram AI screening, breast density AI, radiology workload reduction AI-guided hybrid mammography reading cuts workload while maintaining cancer detection accuracy in breast screening.

AI-Guided Hybrid Reading Maintains Mammography Accuracy

References:

Latest Articles

Related Articles

Latest News

INFO

IMAGING

ICU

EXEC

IT

CARDIOLOGY

JOURNALS

EVENTS

FACULTY

PARTNERS

JOBS

COMPANIES

PRODUCTS

BLOG

VIDEOS

Communities

CONTACT US

EU Office

Rue Villain XIV 53-55

B-1050 Brussels, Belgium

Tel: +357 86 870 007

E-mail: [email protected]

EMEA & ROW Office

166, Agias Filaxeos

CY-3083, Limassol, Cyprus

Tel: +357 86 870 007

E-mail: [email protected]

Headquarters

Kosta Ourani, 5

Petoussis Court, 5th floor

CY-3085 Limassol, Cyprus

E-mail: [email protected]