Radiology reports are traditionally written for clinicians and often contain highly technical terminology. As patients increasingly access their medical records through digital platforms and legislative frameworks, many encounter language that is difficult to interpret. Limited health literacy, unfamiliarity with medical jargon and complex reporting styles can lead to confusion, anxiety and reduced satisfaction with care. Automated approaches have been proposed to address this gap. Large language models, trained on extensive text datasets, can generate human-like language and are being explored to simplify radiology reports for patient use. A systematic review and meta-analysis evaluated how patients, members of the public and medical professionals rate large language model–rewritten radiology reports, focusing on perceived understanding, readability, accuracy and safety across imaging modalities and clinical contexts.
Study Scope and Evaluation Framework
The review identified 38 eligible studies published between 2022 and 2025, generating 12,922 simplified reports assessed by 508 evaluators, including 387 lay people and 121 clinicians. Reports spanned six imaging modalities, most commonly MRI and CT, and covered multiple subspecialties, with musculoskeletal imaging most frequently represented. Most reports were produced in English and 92% of studies used OpenAI GPT models, with GPT-4 the most common version.
Primary outcomes included patient or lay self-reported understanding and clinician-assessed quality, typically measured on five-point Likert scales. Secondary outcomes comprised readability metrics such as Flesch–Kincaid Grade Level, Flesch Reading Ease Score and Automated Readability Index, along with word counts and error rates. Across studies, lay participants rated simplified reports substantially higher for perceived understanding than original radiologist-authored reports. The pooled mean Likert score for original reports was 2.16, compared with 4.04 for simplified reports. In studies directly comparing both versions, the pooled mean difference was 2.00, indicating markedly improved perceived comprehension.
Must Read: Structured Reporting Improves Radiography Workflow
Clinicians evaluated simplified reports for accuracy, completeness, simplicity, releasability and potential for harm. Pooled mean accuracy was 4.45 and completeness 4.53, suggesting high confidence in clinical fidelity. Ratings for simplicity were similarly positive. However, releasability and absence of potential harm received lower mean scores, reflecting concerns regarding unsupervised dissemination and workflow integration.
Readability Gains and Clinical Risk
Readability analyses demonstrated consistent improvements across imaging modalities. For CT reports, the pooled mean difference in Flesch–Kincaid Grade Level was −6.20. For x-ray, the reduction was −5.07 and for MRI −5.0. These shifts corresponded to a move from university-level language towards school-level reading age for CT and x-ray reports. Flesch Reading Ease and Automated Readability Index scores showed parallel improvements.
Despite these gains, readability metrics rely on structural features such as word and sentence length and may not capture true comprehension. Radiology reports contain abbreviations, numbers and complex punctuation, which can affect scoring stability. Improved readability was often accompanied by longer text, as models added definitions and explanatory phrases. While this expansion may enhance clarity, it may also increase cognitive load and clinician review burden.
Error rates were assessed in 17 studies. The pooled rate for any error was 7.2%. For clinically significant errors, defined as those potentially altering diagnosis or severity, the pooled rate was 0.9%. Although low, this indicates that a small proportion of simplified reports contained errors with potential clinical impact. Sensitivity analyses showed higher mean accuracy ratings for GPT-4 compared with GPT-3.5. Accuracy ratings were similar when assessed by radiologists and non-radiologists.
Governance, Workflow and Patient Expectations
The findings highlight both promise and complexity in implementation. Clinicians expressed confidence in the accuracy and completeness of simplified reports but were more cautious regarding immediate release to patients. Responsibility for verification remains unresolved, with radiologists best positioned to assess technical correctness and referring physicians to contextualise findings within overall care.
Timing of release introduces further considerations. Immediate access may cause distress before clinician consultation, while delayed access may reduce engagement. Governance, liability, quality assurance and equitable dissemination remain open questions. Digital access must account for varying literacy and digital proficiency.
Patient-related outcomes extended beyond understanding. Satisfaction with simplified reports was reported with a pooled mean score of 3.81. Empathy conveyed in simplified reports achieved a pooled mean of 3.61, indicating moderately positive perception. In one study, trust was rated slightly higher for radiologist-authored reports than for simplified versions.
No included study evaluated patients independently applying language models to their own reports or incorporated patient co-design in developing simplified outputs. Most studies were small and single-centre, with limited reporting of participant demographics. Heterogeneity across studies was high, reflecting differences in modalities, specialties, prompting strategies and assessment frameworks.
Across 38 studies, large language model simplification of radiology reports was associated with substantially improved patient-perceived understanding and enhanced readability while maintaining high clinician-rated accuracy and completeness. Error rates were low, though a small proportion of clinically significant errors was observed. Ratings for releasability and safety were more cautious, underscoring governance and workflow challenges. The evidence supports the potential of language model–based simplification to advance patient-centred radiology communication, provided that implementation is accompanied by careful oversight and further evaluation in real-world settings.
Source: The Lancet Digital Health
Image Credit: iStock