Prostate MRI is used in parts of Europe before biopsy to help avoid unnecessary procedures and support more targeted sampling when findings look suspicious. Interest is growing in extending MRI into screening settings, where large volumes and limited specialist capacity make efficient reporting essential. Artificial intelligence has been promoted as a way to standardise reads, reduce workload and potentially identify clearly negative scans. A key concern is that tools developed in clinical populations, where cancer prevalence is higher, may behave differently in screening, where false positives can trigger avoidable follow-up and biopsies. Research drawing on a screening trial dataset assessed whether an AI system built for biparametric prostate MRI could match radiologists for detecting clinically significant disease. The results point to useful discrimination overall, but a clear specificity gap that matters in screening pathways.
Screening Context and the Reference for Clinically Significant Disease
The evaluation used prostate MRI examinations from a screening trial cohort, reflecting the lower cancer prevalence expected in screening. After exclusions for missing or non-interpretable imaging, the analysis included just over 1,300 MRI examinations from a little over 1,200 men. Clinically significant prostate cancer was defined as ISUP grade 2 or higher, using histopathology as the primary reference standard. Follow-up over several years supported outcome ascertainment, and a subset of men underwent systematic biopsies regardless of MRI findings, strengthening confidence in some negative cases.
Radiologist interpretation was available from routine reading in consensus, with scoring performed using PI-RADS v2. Although the clinical protocol was multiparametric, the comparison with AI was anchored to the biparametric PI-RADS assessment. Additional review was used for discordant scenarios, including men with low PI-RADS scores who were later found to have clinically significant cancer and men with higher PI-RADS scores whose biopsies did not show clinically significant disease. Where stronger tissue evidence was available, such as prostatectomy histology, it superseded biopsy findings.
Model Approach and How Performance Was Compared
The AI system was a deep-learning segmentation model trained on biparametric MRI inputs, using T2-weighted imaging alongside diffusion information, including a high b-value sequence and an apparent diffusion coefficient map. The dataset was split into a larger training set and a smaller test set. The model produced a case-level likelihood score for clinically significant disease and candidate lesion outputs derived from the predicted probability map.
Must Read: MRI Habitat Maps Flag High-Grade Prostate Cancer
Performance for clinically significant cancer detection was assessed at the case level using the area under the receiver operating characteristic curve, which was around 0.83 with a confidence interval reported in the source. To understand how this might translate into clinical decisions, specificity was compared at sensitivity levels aligned with common PI-RADS thresholds used by radiologists. Rather than focusing on a single operating point, the analysis matched sensitivity between the AI system and radiologist thresholds, then examined how often each approach correctly identified men without clinically significant cancer.
This design is relevant to screening pathways because a small change in specificity can have an outsized impact on downstream workload when the majority of scans are negative. In a screening population with roughly an 11% prevalence of clinically significant disease in the overall cohort, the burden created by false positives becomes a central operational and patient safety consideration.
Specificity Shortfalls and the Pattern of False Positives
While the AI system showed good overall discrimination, it was less specific than radiologists at comparable sensitivity levels. When sensitivity was matched to the radiologist approach at lower PI-RADS thresholds, radiologists maintained higher specificity, while the AI system produced a larger share of false positives. At more stringent thresholds, the gap narrowed, but the overall pattern remained that the AI system tended to label more cases as suspicious when clinically significant cancer was not present.
The source material links false positives to regions showing restricted diffusion, a signal that can be associated with malignancy but is not exclusive to cancer. Examples described include restricted diffusion in benign structures and appearances influenced by imaging artefact. In practice, radiologists can incorporate anatomical context and recognise patterns that suggest benignity or artefactual change, whereas the AI system was more likely to elevate such findings into candidate lesions. These errors matter in screening because an AI-driven increase in suspicious classifications can drive additional diagnostic steps, including biopsies, undermining the efficiency gains that motivate MRI-based screening approaches.
The findings also reinforce a broader implementation point: tools trained and tuned in clinical settings may require re-optimisation for screening, where the balance between sensitivity and specificity has different consequences. In this context, performance that looks strong on a global metric can still lead to an operationally problematic number of false positives, particularly if the intended use is triage or decision support.
In a screening trial MRI cohort, a deep-learning AI tool for biparametric prostate MRI achieved an AUROC of about 0.83 for clinically significant cancer detection, but delivered lower specificity than radiologists when sensitivities were matched to PI-RADS v2 thresholds. The practical implication for screening pathways is that false positives can escalate follow-up and biopsy burden in a population where most scans are negative. The results underline the need for validation in true screening populations, careful selection of operating thresholds and close attention to the specific false-positive patterns that may erode efficiency and patient benefit.
Source: European Radiology
Image Credit: iStock
References:
Langkilde F, Gren M, Wallström J et al. (2025) Evaluation of AI for prostate cancer detection in biparametric-MRI screening population data. Eur Radiol: In Press.