Making AI Severity Scores More Reliable in Radiology

In IT
Tue, 22 Jul 2025

AI severity scores are increasingly integrated into radiological workflows, offering a quantifiable measure of the likelihood of pathology. While these tools have demonstrated potential in enhancing diagnostic performance, their current implementation often lacks the context necessary for accurate interpretation. The lack of clarity surrounding how these scores are produced and what they truly signify can hinder their effectiveness. Six human factors—ranging from inconsistent systems to perceptual mismatches—highlight the limitations of AI scores used in isolation. Understanding these constraints and identifying ways to address them is essential to ensure that AI supports rather than complicates radiologists’ decision-making.

Interpretative Variability and Cognitive Challenges

Radiologists are confronted with a growing number of AI tools, each presenting its own severity score system. These scoring systems differ not only between different AI vendors but also between updates of the same system. An AI score of 3 may have different implications depending on the algorithm used or whether the system has recently been updated. Such inconsistencies demand a significant cognitive effort from radiologists who may be required to interpret several score types in the same clinical session. This variability can generate confusion, increase mental load and contribute to alarm fatigue.

Even if a system maintains consistency over time, radiologists themselves bring their own variability to the interpretation process. Two clinicians may interpret the same score differently, leading to diverging decisions regarding further testing or treatment. Additionally, a single radiologist’s interpretation may shift over time as they gain more experience with the system or adapt to its updates. This interaction complicates the ability to rely on scores without further guidance. These inconsistencies reduce the potential for AI scores to provide reliable, standardised support across radiological practice.

Lack of Context and Distribution Awareness

Another major limitation lies in the opacity of score distributions. Without knowing where a particular score fits within a broader dataset, it becomes nearly impossible to assess the true significance of the number. A score that appears moderate may correspond to a high or low level of actual risk, depending on the underlying distribution, which may differ between training and local clinical populations. This lack of shared reference points leaves room for misinterpretation and hinders confident decision-making.

Must Read: How AI Strategy Shapes Diagnosis in Radiology

Moreover, AI systems often employ visual aids, such as coloured heatmaps, to illustrate the severity of findings. These visual cues are not universally understood and may be culturally or perceptually misleading. For example, red might imply danger in some settings but not in others, and red-green colourblindness affects a significant proportion of the population. Display hardware variability and degradation further compound the risk of misinterpretation. All of these factors point to the conclusion that AI scores, when presented without context or standardisation, lack the clarity necessary for effective clinical integration.

Towards a Solution: Embedding Diagnostic Value

To overcome these challenges, a structured approach is proposed: supplementing AI severity scores with the false discovery rate (FDR) and false omission rate (FOR) for each score threshold. The FDR represents the probability that a positive AI finding is incorrect, while the FOR indicates the chance that a negative result misses a true pathology. Together, these metrics provide clinicians with contextual probabilities tied to their specific patient populations. These values can be derived from retrospective local data, ensuring their relevance and improving interpretative reliability.

Presenting FDR and FOR values alongside AI scores offers radiologists a clearer picture of the real diagnostic implications of a given result. This approach acknowledges that raw scores alone are insufficient and that interpretability hinges on transparency and localisation. If radiologists can consistently assess the risk of false positives and negatives at each threshold, decision-making will become more evidence-based and standardised. This could also improve both inter- and intra-rater reliability, addressing the variability that currently undermines confidence in AI-assisted diagnostics.

A robust experimental design has been proposed to test this hypothesis. Radiologists would evaluate the same set of images under two conditions—first, with AI scores alone, and second, with scores accompanied by corresponding FDR and FOR values. Their accuracy, consistency and diagnostic decisions would then be compared across these two conditions. The outcomes of such a study would provide empirical support for implementing these additional metrics into clinical AI systems, aligning with regulatory efforts to increase transparency and safety.

AI severity scores have the potential to support radiological diagnosis, but their effectiveness is limited when used in isolation. Variability in scoring systems, radiologist interpretation and score meaning all contribute to uncertainty. Without context, these scores cannot be relied upon to guide clinical decisions. A promising solution lies in augmenting scores with the false discovery and omission rates tied to local data, offering radiologists a clearer, standardised understanding of what each score truly means. This approach promises to improve diagnostic performance, reduce cognitive burden and facilitate more consistent clinical decisions across the board.

Source: European Radiology Experimental

Image Credit: iStock

References:

Bernstein MH, van Assen M, Bruno MA et al. (2025) Is a score enough? Pitfalls and solutions for AI severity scores. Eur Radiol Exp, 9:67.

clinical decision support, diagnostic accuracy, AI in Healthcare, European Radiology Experimental, Radiology AI, radiologist tools, AI severity scores, FDR, FOR, imaging variability

Latest Articles

Hospitals of the Future: The Next Frontier in Patient-Centred Care
- Journal Article
- 18/10/2025
Hospitals are rapidly evolving into smart, connected ecosystems focused on proactive, personalised care. Leveraging AI, robotics, remote monitoring and digital health tools, they enhance diagnostics, improve workflows and support decentralised models like virtual wards. Predictive analytics, interoper
READ MORE
AI Orchestration in Emergency Radiology – Implementation in the Valencia Health Region
- Journal Article
- 18/10/2025
The Valencia Health Region deployed a vendor-neutral AI orchestration system across 29 hospitals to improve emergency radiology. Validated at Hospital General Universitario Dr Balmis, it streamlines triage, accelerates diagnoses and reduces radiologists’ workload. The system processes over 5,700 studi
READ MORE
Advancement of 3D Printing in Healthcare and Its Impact on Sustainability
- Journal Article
- 18/10/2025
3D printing is transforming healthcare through personalised devices, surgical precision and faster prototyping while advancing sustainability. On-demand production reduces waste, supports circular economy models and lowers carbon footprints by minimising transport and inventory. Despite its promise,...
READ MORE

AI severity scores, radiology AI, diagnostic accuracy, false discovery rate, medical AI tools, radiology decision-making, AI interpretation, imaging variability, contextual diagnostics Improve diagnostic accuracy in radiology by embedding context and reducing variability in AI severity scores.

Making AI Severity Scores More Reliable in Radiology

References:

Latest Articles

Related Articles

Latest News

INFO

IMAGING

ICU

EXEC

IT

CARDIOLOGY

JOURNALS

EVENTS

FACULTY

PARTNERS

JOBS

COMPANIES

PRODUCTS

BLOG

VIDEOS

Communities

CONTACT US

EU Office

Rue Villain XIV 53-55

B-1050 Brussels, Belgium

Tel: +357 86 870 007

E-mail: [email protected]

EMEA & ROW Office

166, Agias Filaxeos

CY-3083, Limassol, Cyprus

Tel: +357 86 870 007

E-mail: [email protected]

Headquarters

Kosta Ourani, 5

Petoussis Court, 5th floor

CY-3085 Limassol, Cyprus

E-mail: [email protected]