Prostate cancer detection relies heavily on multiparametric magnetic resonance imaging (mpMRI), particularly T2-weighted imaging (T2WI). However, image quality (IQ) varies across scans, often due to motion artifacts or hardware limitations. This variability can prompt unnecessary repeat scans, consuming time and resources. In response, researchers developed a deep learning (DL) model to automatically assess the quality of axial T2WI prostate MRI scans. The model aims to replicate radiologist performance in evaluating scan quality and determining when rescans are warranted, potentially streamlining clinical workflows and reducing redundant imaging.
Deep Learning Model Design and Dataset Preparation
The study retrospectively analysed 1,412 prostate MRI scans acquired between 2017 and 2021 at three sites. These scans were selected from a larger dataset of over 27,000 exams, with preference given to poor-quality images to ensure balanced representation across the quality spectrum. Image quality was categorised on a 0–3 scale by four experienced uroradiologists, with 0 and 1 deemed nondiagnostic (requiring rescanning) and 2 and 3 as diagnostic (no rescan needed).
Must Read: Optimising Prostate Cancer Diagnosis with MRI-Based Risk Stratification
To train and validate the DL model, the researchers divided the dataset into training (1,006 exams), validation (203 exams) and testing (203 exams) subsets. They evaluated 11 different convolutional neural network (CNN) architectures, ultimately selecting the 3D DenseNet_169 model for its superior performance. Preprocessing included using only the 16 central axial slices of each scan, cropping around the prostate using a segmentation model and resampling to a consistent resolution. Image intensities were normalised scan-by-scan to standardise input.
The model used mean square error as the loss function to maintain the ordinal nature of the IQ scores. Due to class imbalance—only 8.4% of the scans were rated IQ=0—the model applied oversampling with augmentation techniques such as left-right flipping, rotation and translation. An exponential learning rate scheduler and early stopping were incorporated to prevent overfitting during training.
Performance Evaluation and Comparison with Radiologists
On the test set, the DL model predicted IQ scores with 57.1% accuracy and a Cohen’s κ of 0.658, comparable to the inter-rater reliability among expert radiologists (κ = 0.738). When simplifying the prediction to a binary decision of diagnostic versus nondiagnostic quality, the model achieved an accuracy of 78.3% and a κ of 0.537. The area under the curve (AUC) for predicting the need for rescanning reached 0.867, indicating strong discriminatory capability.
Performance varied slightly across IQ levels, with the model performing best in identifying high-quality scans (IQ=3). Index Balanced Accuracy, accounting for class imbalance, ranged from 0.25 for IQ=0 to 0.59 for IQ=3. For rescan decisions, balanced accuracies for diagnostic and nondiagnostic classes were comparable at 0.61 and 0.59, respectively. Five-fold cross-validation showed consistent performance, with the refined DenseNet_169 model maintaining stable accuracy across data splits.
Importantly, the DL model’s predictions aligned well with radiologist assessments, suggesting it could effectively replicate human judgement in real-world settings. Moreover, radiologist consensus was used to label test scans, enhancing the reliability of model evaluation.
Clinical Impact and Operational Efficiency
To assess clinical utility, the study compared the model's predictions against real-world rescan decisions made by MRI technologists at one of the study sites. This site had adopted a motion-resistant imaging sequence (PROPELLER) in cases where initial T2WI scans were suspected of being nondiagnostic. Among 174 cases, technologists ordered rescans for 73 scans that radiologists later deemed diagnostic, leading to a 63% unnecessary rescan rate. The DL model, using the same 86% sensitivity, could reduce that rate to 28%, demonstrating its potential to improve scanner efficiency and reduce patient burden.
Time savings from fewer unnecessary rescans could be significant. Even avoiding two redundant scans per day across ten prostate MRI exams could conserve ten minutes or more of scanner time, depending on the sequence used. This could enhance throughput in busy imaging centres without compromising diagnostic standards.
Despite promising results, the researchers acknowledged several implementation challenges. For the model to be adopted in clinical settings, trust must be established among radiologists and technologists. The model's output would ideally be integrated into MRI systems to deliver real-time feedback immediately after scan acquisition. Such integration may require specialised hardware to support rapid image segmentation and assessment.
The study demonstrated that a deep learning model can accurately assess the quality of axial T2-weighted prostate MRI scans and determine the necessity of rescanning, closely matching expert radiologist decisions. With a substantial agreement on IQ scores and promising performance in binary diagnostic assessments, the model has the potential to reduce unnecessary rescans and optimise MRI workflow. Broader adoption of such tools, however, hinges on clinical trust and technological integration into existing imaging infrastructure.
Source: European Radiology Experimental
Image Credit: iStock