Rising MRI volumes and increasingly complex examinations continue to pressure radiology services. Responding to this demand, researchers developed a deep learning (DL) model to analyse routine knee MRI and assessed its value for resident radiologists. The model targets 23 conditions across cartilage, menisci, bone marrow, ligaments and other soft tissues, reflecting typical clinical reporting needs. Training and internal evaluation used 3121 studies from a high-volume radiologic practice, with external testing on 458 studies from a university hospital. Performance was quantified with area under the receiver operating characteristic curve (AUC), sensitivity and specificity, and a controlled reader study measured the model’s impact on accuracy, efficiency and agreement among residents. Results indicate that multi-condition assistance can raise sensitivity, stabilise agreement and shorten reading times, while highlighting the importance of condition-level performance transparency. 

 

Model Built for Routine Clinical Protocols 

The model was designed around non-contrast routine knee MRI protocols comprising axial, coronal and sagittal proton density-weighted fat-saturated sequences, plus a sagittal T1-weighted sequence. Images underwent histogram normalisation, intensity clipping and rescaling, then were resized to 256 × 256 pixels with 32 slices to standardise inputs. The architecture adapted a multi-sequence transformer approach, using a ResNet18 backbone enhanced with residual 3D blocks to encode each slice as a context-aware token before transformer-based classification. Labels for 23 binary (and some ternary) condition categories were derived from clinical reports using a structured template and multi-step quality control by board-certified radiologists, aiming to balance annotation scale and reliability. 

 

Must Read: Fast, Clear and Green: DL in Knee MRI 

 

Two datasets underpinned development and testing. The internal cohort included 3121 studies from 3018 adults scanned between 2012 and 2019 on 1.5-T and 3-T systems. The external test set comprised 458 studies from 429 adults scanned between 2022 and 2023 on four scanners at a university hospital. Both cohorts reflected real-world case mix without additional exclusion criteria to mirror routine practice. 

 

Performance Holds Across Datasets 

Across five-fold cross-validation on the internal data, the model reached at least 0.85 AUC in 8 conditions and at least 0.75 AUC in 18 conditions. Median internal performance was AUC 0.85 (range 0.57–0.99), sensitivity 76% (27%–97%) and specificity 81% (57%–99%). External testing showed robust generalisation with a median AUC of 0.78 (0.57–0.99), sensitivity 70% (13%–94%) and specificity 80% (48%–99%), and a mean absolute AUC difference of 0.05 ± 0.03 per condition compared with internal results. 

 

Conditions with distinct imaging features performed strongly. Status post anterior cruciate ligament (ACL) reconstruction achieved near-perfect AUC, with high performance also seen in medial collateral ligament (MCL) tears, effusion and Baker’s cyst. Meniscal tears and retropatellar cartilage pathology achieved moderate performance externally. More subtle or heterogeneous entities, such as posterior cruciate ligament (PCL) pathology, lateral collateral ligament (LCL) non-tear findings, tibial cartilage pathology and meniscal degeneration, showed lower AUCs and wider confidence intervals. Despite demographic, protocol and scanner differences between centres, 11 of 23 conditions had no significant AUC difference across datasets, underscoring stability for a substantial subset of targets. 

 

Assistance Shifts Resident Accuracy, Sensitivity and Speed 

A reader study with four residents—two inexperienced and two experienced—examined unassisted versus assisted reporting on 50 external cases using a tricolour interface indicating per-condition model reliability (green AUC > 0.85, yellow 0.75–0.85, red < 0.75). When all conditions were included, assistance increased sensitivity and inter-reader agreement across experience levels. Inexperienced residents also improved in overall accuracy, though specificity fell in both groups due to the inclusion of low-AUC conditions. 

 

Restricting analysis to conditions with AUC ≥ 0.75 clarified the clinical signal. Inexperienced readers showed significant gains in accuracy, sensitivity and specificity with assistance. Experienced readers improved sensitivity without significant changes in accuracy or specificity, making assistance net beneficial. Reading efficiency also shifted. Mean time per study decreased by 10% for experienced residents (4.7 to 4.2 minutes, p = 0.045) and by 4% for inexperienced residents, the latter without statistical significance. Behavioural metrics aligned with these outcomes. Residents more often accepted alternative model suggestions in higher-AUC categories, and accepted changes were more likely to be correct when the model’s reliability was greater. In low-AUC conditions, acceptance rates were higher than their incremental correctness, indicating limited benefit and the need for clear reliability signalling. 

 

A multi-condition DL model tailored to routine knee MRI demonstrated solid performance, external generalisation and meaningful assistance effects for residents. Benefits concentrated in conditions where model AUC was at least moderate, with improved sensitivity, stronger inter-reader agreement and faster readings, especially for experienced residents. Slight reductions in specificity emerged when low-reliability conditions were included, reinforcing the value of condition-level performance transparency and cautious adoption. For healthcare teams seeking scalable support for knee MRI reporting, these findings point to targeted deployment where the model is strongest, alongside ongoing refinement for subtle and rare pathologies and evaluation within richer clinical contexts. 

 

Source: European Radiology 

Image Credit: iStock


References:

Vuskov R, Hermans A, Pixberg M et al. (2025) Comprehensive deep learning-assisted multi-condition analysis of knee MRI studies improves resident radiologist performance. Eur Radiol: In Press. 



Latest Articles

deep learning, knee MRI, radiology AI, diagnostic imaging, radiology workflow, MRI analysis, AI in healthcare, radiology residents, MRI performance, European Radiology, medical imaging AI, MRI reporting, UK healthcare Deep learning model improves knee MRI accuracy, sensitivity and reading speed for radiologists.