Breast cancer is the most frequently diagnosed cancer among women worldwide. Mammographic screening remains a key strategy in early detection, helping reduce mortality and morbidity. However, limitations such as false-positive and false-negative results, overdiagnosis and radiologist workload continue to challenge screening programmes. Furthermore, a shortage of trained breast radiologists exacerbates these issues. Artificial intelligence, particularly models based on deep learning, has emerged as a promising approach to assist radiologists by improving diagnostic accuracy and efficiency. Using retrospective screening data from BreastScreen Norway, two AI models—one commercially available and one developed in-house—were evaluated to assess their ability to detect breast cancer and accurately localise suspicious findings on mammograms. 

 

Cancer Detection by AI Models 
The study included 129,434 screening mammograms from 42,300 women aged 50–69 years, collected between 2008 and 2018. Model A, a commercially available tool (Lunit INSIGHT MMG), and Model B, developed in-house by the Norwegian Computing Center, were evaluated as stand-alone readers. Their performances were benchmarked against two detection thresholds, set at 3.2% and 11.1% of the highest AI scores, corresponding to the recall and consensus rates, respectively, in the study population. 

 

The AUC values for both models were 0.97 when analysing screen-detected cancers and 0.93 when including interval cancers. At the 11.1% threshold, Model A identified 92.4% (685 of 741) of screen-detected cancers and 45.6% (95 of 208) of interval cancers. Model B showed similar performance, detecting 93.7% (694 of 741) of screen-detected and 44.7% (93 of 208) of interval cancers. At the 3.2% threshold, both models identified 76.3% (565 of 741) of screen-detected cancers and 18.7% (39 of 208) of interval cancers. Combined, the models identified 53.8% (112 of 208) of interval cancers at the higher threshold. 

 

These findings suggest both models are capable of detecting a substantial portion of breast cancers in a screening setting, with slightly higher sensitivity at a broader threshold that mimics real-world consensus practices. The overlap in detected cases between the models was not complete, indicating that combining AI outputs could potentially improve detection further. 

 

Accuracy of Lesion Localization 
A radiologist review was conducted to assess the spatial accuracy of AI markings. From the dataset, 98 screen-detected and 102 interval cancers were selected for detailed evaluation. Cases were categorised into four groups based on whether the cancers were identified by one or both models. 

 

For the 48 screen-detected cancers identified by both models, all markings were correctly located in at least one mammographic view. For cancers identified by only one model, correct localization was observed in all 20 cases for Model A and in 87% (26 of 30) of Model B cases. Among interval cancers identified by both models, Model A and Model B showed correct marking location in 82% and 79% of cases, respectively. 

 

In interval cancers identified by only one model, Model A achieved correct localization in 68% (13 of 19) of cases, while Model B did so in 60% (9 of 15). Among all interval cancers identified by either model, 21.6% (45 of 208) were correctly marked and classified during radiologist review as either false negatives or minimal signs of malignancy. 

 

Radiologists classified interval cancers into four categories: true negative, minimal sign nonspecific, minimal sign significant and false negative. For interval cancers identified by both models, 21% were classified as false negative and 32% as minimal sign significant. Model A correctly marked all 14 false-negative cases in this group. For cancers identified by only one model, fewer cases were classified as false negatives. Among all correctly marked and reviewed interval cancers identified by either model, 44.1% were considered retrospectively visible. 

 

The correct identification and localization of retrospectively visible cancers indicate that AI may detect subtle mammographic features that can be missed in clinical practice. These results support the potential for AI to contribute meaningfully to reducing interval cancer rates. 

 

Model Comparison and Workflow Implications 
Despite both models showing high diagnostic accuracy, their detection profiles were not identical. Cases identified by both models were more often correctly localised and more frequently classified as having minimal or overt signs of malignancy by radiologists. This suggests that integrating the outputs of multiple AI systems could improve performance consistency and diagnostic confidence. 

 

Must Read: Predicting Breast Cancer Using Sequential Mammogram Analysis 

 

Two potential clinical workflows could benefit from these findings. The first is replacing one radiologist in the double-reading process with an AI model, and the second is using AI for triage to reduce overall reading volume. A Swedish trial demonstrated that AI-supported triage reduced radiologist workload without increasing false-positive rates, while slightly improving cancer detection. In this study, combining two AI models led to fewer undetected cancers and improved localisation accuracy for overlapping cases. 

 

While promising, the study acknowledges limitations. Model B was trained solely on BreastScreen Norway data, and although the test centre was excluded from training, this homogeneity might limit generalisability. Moreover, the evaluation did not consider false-positive markings and tumour characteristics were not included due to limited subgroup sizes. Further work is needed to explore how these AI features translate to clinical benefit in prospective or real-world settings. 

 

Both AI models demonstrated strong standalone performance in detecting and localising breast cancer on screening mammograms. Their ability to correctly mark interval cancers—particularly those later classified as false negatives or subtle signs—shows potential for improving screening accuracy and reducing missed diagnoses. However, broader implementation will depend on further studies, including prospective validations and workflow integration trials. These findings support the continued development and clinical evaluation of AI tools to enhance breast cancer screening outcomes. 

 

Source: Radiology: Artificial Intelligence 

Image Credit: iStock


References:

Martiniussen MA, Larsen M, Hovda T et ak. (2025) Performance of Two Deep Learning–based AI Models for Breast Cancer Detection and Localization on Screening Mammograms from BreastScreen Norway. Radiology: Artificial Intelligence, 7:3. 



Latest Articles

AI in breast screening, breast cancer diagnosis, mammography AI tools, radiology advancements, cancer detection technology, deep learning radiology, interval cancer AI, breast health UK, healthcare AI innovation, diagnostic accuracy mammograms Deep learning AI boosts breast cancer detection accuracy and localisation in screening mammograms.