The accuracy of radiology reports is critical to ensuring effective patient care, but the process of proofreading these reports is both time-intensive and prone to human error. Large language models (LLMs) offer a potential solution by automating the detection of common errors. While closed-source LLMs such as GPT-4 have demonstrated strong performance, concerns around data privacy restrict their application in clinical settings. Open-source models, which can be locally hosted within hospital infrastructures, provide an alternative that addresses these privacy concerns.
A recent study published in European Radiology compares the effectiveness of open-source and closed-source LLMs in detecting errors in radiology reports, evaluating their performance in various categories of errors and imaging modalities. The goal is to determine whether open-source models can serve as a viable substitute for commercial closed-source alternatives while maintaining accuracy and compliance with data security regulations.
Comparison of Processing Time and Accuracy
The study evaluated two closed-source models (GPT-4 and GPT-4o) and two open-source models (Llama 3-70b and Mixtral 8x22b). The open-source models were significantly faster, with an average processing time of 6 seconds per report compared to 13 seconds for the closed-source models. The difference in processing speed suggests that open-source models may offer practical benefits for institutions that prioritise efficiency. However, despite their speed, the closed-source models outperformed the open-source alternatives in error detection accuracy. GPT-4o achieved the highest detection rate at 88%, followed by GPT-4 at 83%. Among the open-source models, Llama 3-70b performed better at 79%, while Mixtral 8x22b had the lowest accuracy at 73%. These findings highlight the trade-off between speed and accuracy, with open-source models offering faster processing but slightly lower precision in error detection.
Further analysis revealed that while open-source models performed adequately, their lower accuracy rates could lead to undetected errors in clinical practice. The implications of these findings are important, as even minor errors in radiology reports can lead to misinterpretations that impact patient diagnoses and treatment. The results suggest that while open-source models may be suitable for certain applications, additional refinement is needed to improve their accuracy and reliability to match commercial alternatives.
Performance Across Imaging Modalities and Error Types
The study analysed error detection performance across different imaging modalities, including X-ray, ultrasound, CT and MRI. Closed-source models consistently outperformed open-source models across all modalities, with the most significant differences observed in X-ray and CT/MRI reports. The higher accuracy of closed-source models in these modalities indicates that they may be better suited for detecting complex and subtle errors, which require a more advanced understanding of radiological terminology and structured medical language.
Must Read: Improving Radiology Reports with AI & Common Data Elements
When evaluating specific error types, numerical errors were detected with the highest accuracy (88%), while typographical errors (75%), findings-impression discrepancies (73%) and interpretation errors (70%) were less reliably identified. This suggests that LLMs are particularly effective at identifying structured numerical inconsistencies but may struggle with errors requiring contextual understanding. The discrepancy in detection rates across error types highlights an important limitation of LLM-based proofreading: while they excel at detecting certain objective errors, they may fall short when addressing more nuanced issues such as interpretative inconsistencies or discrepancies between findings and final impressions. This finding underscores the importance of human oversight in LLM-assisted error detection workflows, particularly in cases where contextual understanding is critical.
Implications for Clinical Use and Data Privacy
While closed-source LLMs demonstrate superior performance, their reliance on cloud-based infrastructure presents privacy risks that limit their practical application in clinical environments. Many healthcare institutions are subject to strict data protection regulations that prohibit patient data from being processed by external servers. Open-source LLMs, on the other hand, can be integrated into local hospital infrastructures, ensuring compliance with data security regulations. Although their error detection accuracy is lower, ongoing refinement and domain-specific training could enhance their reliability.
These models offer a promising alternative for automated error detection in radiology reporting, reducing workload while safeguarding patient data. Given the growing interest in privacy-compliant artificial intelligence tools, the adoption of open-source LLMs could help bridge the gap between automation and data security. However, the findings indicate that additional fine-tuning of open-source models is necessary before they can reliably match the accuracy of their closed-source counterparts. Future research should explore ways to improve open-source models’ ability to detect complex errors while preserving their inherent privacy advantages. Additionally, testing these models in real-world clinical environments may provide insights into their practical viability and integration into existing radiology workflows.
The study highlights the potential of LLMs in automating error detection in radiology reports, with closed-source models currently achieving higher accuracy but open-source models providing a privacy-compliant alternative. While further improvements in open-source models are necessary to match the performance of commercial alternatives, their ability to operate within secure hospital environments makes them a viable option for clinical adoption. Continued advancements in fine-tuning and model optimisation will be crucial in bridging the accuracy gap and enhancing the role of LLMs in radiology workflows. By refining open-source models and integrating them into hospital infrastructures, healthcare institutions may be able to leverage automated error detection while maintaining full control over patient data. This balance between accuracy and privacy is essential to ensure that LLMs contribute to improved radiology reporting without compromising sensitive medical information.
Source: European Radiology
Image Credit: iStock