Improving Radiology with Fine-Tuned Language Models

In IT
Sun, 24 Aug 2025

Large language models are being integrated into radiology workflows with growing interest, thanks to their ability to perform advanced natural language tasks. These tools support report generation, summarisation, data labelling and clinical decision support. However, their tendency to generate inaccurate or misleading information, often referred to as hallucination, presents a critical barrier. To address this, two key techniques—prompt engineering and fine-tuning—are being explored to enhance model reliability, performance and clinical integration. Their strategic application allows radiology teams to better align large language models with domain-specific standards and improve output accuracy.

Prompt Engineering for Reliable Output
Prompt engineering refers to the method of crafting and refining the text instructions given to a model to generate useful, structured outputs. In radiology, prompts can guide language models to perform specific tasks such as summarising findings, assigning severity scores or producing structured reports. Techniques range from basic zero-shot prompting, which gives no prior examples, to more complex strategies like few-shot and chain-of-thought prompting, which guide the model with patterns or reasoning steps.

Must Read: Prompt Engineering in Healthcare: Driving Effective AI Integration

Well-designed prompts are essential for improving both the structure and relevance of the model's response. Zero-shot prompting is simple and flexible but performs inconsistently in complex tasks. Few-shot prompting embeds examples into the prompt, helping the model mimic structured responses. Chain-of-thought prompting adds transparency by encouraging stepwise reasoning, which is particularly useful in differential diagnosis or resectability assessments.

Iterative optimisation of prompts can significantly reduce errors. Domain experts often guide this process, identifying misinterpretations and refining prompts based on output evaluations. Automated systems such as LangChain or DSPy can reduce manual effort by generating optimised prompts programmatically. These approaches help radiologists leverage large language models more safely by aligning model behaviour with specific clinical needs.

Fine-Tuning for Specialised Applications
Unlike prompt engineering, fine-tuning alters the internal weights of a model by training it on domain-specific data. This enables the model to better understand specialised language, context and workflows relevant to radiology. Several fine-tuning strategies exist, each suited to different computational and data constraints.

Traditional full fine-tuning updates all model parameters using labelled datasets but demands substantial resources. Instruction tuning improves performance by teaching models how to follow structured instructions, such as linking radiological findings with impressions. Parameter-efficient methods like LoRA and QLoRA offer cost-effective alternatives by updating only small portions of the model, reducing training time and memory requirements.

More advanced strategies include reinforcement learning from human feedback, which guides models using reward systems based on expert preferences. Though effective, this method depends heavily on human input, which can be resource-intensive. Direct preference optimisation offers a simpler alternative, using binary preferences to refine outputs without full reinforcement loops.

Factual correctness remains a major concern in radiology applications. Models can generate plausible but incorrect information that may affect patient care. Several tools and techniques, including supervised fine-tuning and retrieval-augmented generation, are being explored to enhance factual accuracy. These methods inject relevant facts into the model's processing pipeline or train it on curated datasets, reducing the risk of hallucination.

Clinical Integration and Challenges
The practical use of large language models in radiology extends across multiple tasks, from simplifying technical language for patients to structuring clinical reports and prioritising findings. These models enable radiologists to focus on complex cases by automating repetitive tasks, ultimately improving efficiency and care quality.

Integrating fine-tuning with prompt optimisation in modular pipelines can further enhance performance. Strategies that alternate between refining prompts and updating model weights have demonstrated superior outcomes compared to methods that rely on a single technique. Open-source models also offer a privacy-preserving alternative to commercial tools. Their adaptability and lower computational demands make them suitable for smaller institutions or secure environments governed by strict data regulations.

Despite their promise, challenges remain. The temperature setting, which controls output variability, has shown inconsistent effects on performance. Lower temperatures often lead to more deterministic outputs, while higher settings may improve accuracy in niche diagnostic tasks. Identifying optimal configurations for different use cases remains an open question.

Moreover, there is no consensus on how to evaluate large language model outputs in radiology. Common metrics borrowed from general natural language processing are supplemented by domain-specific scores, but their reliability is still under debate. Explainability frameworks like LIME and SHAP can highlight model decision paths, yet these systems do not always produce verifiable explanations. Transparent evaluation frameworks are essential for broader clinical adoption.

Bias in training data presents another significant risk. Some models exhibit skewed associations between diseases and demographic groups, which may reinforce existing healthcare disparities. Ongoing research is necessary to characterise and address these biases. The rapid pace of medical knowledge also poses a risk of model obsolescence, as language models require regular updates to reflect current guidelines and findings.

The integration of large language models into radiology depends on their ability to produce accurate, explainable and domain-specific outputs. Prompt engineering and fine-tuning are complementary strategies that support this goal by aligning model performance with clinical expectations. Continued focus on evaluation standards, interdisciplinary collaboration and bias mitigation will be crucial to ensure these models contribute to safe and equitable care across radiology practices.

Source: Radiology Advances

Image Credit: iStock

References:

Vahdati S, Mahmoudi E, Ganjizadeh A et al. (2025) Decoding Large Language Models for Radiology: Strategies for Fine-Tuning and Prompt Engineering. Radiology Advances: umaf024.

medical imaging, radiology AI, Natural Language Processing, Healthcare AI, large language models, Radiology Workflow, prompt engineering, clinical integration, diagnostic support, fine-tuning

Latest Articles

Hospitals of the Future: The Next Frontier in Patient-Centred Care
- Journal Article
- 18/10/2025
Hospitals are rapidly evolving into smart, connected ecosystems focused on proactive, personalised care. Leveraging AI, robotics, remote monitoring and digital health tools, they enhance diagnostics, improve workflows and support decentralised models like virtual wards. Predictive analytics, interoper
READ MORE
AI Orchestration in Emergency Radiology – Implementation in the Valencia Health Region
- Journal Article
- 18/10/2025
The Valencia Health Region deployed a vendor-neutral AI orchestration system across 29 hospitals to improve emergency radiology. Validated at Hospital General Universitario Dr Balmis, it streamlines triage, accelerates diagnoses and reduces radiologists’ workload. The system processes over 5,700 studi
READ MORE
Advancement of 3D Printing in Healthcare and Its Impact on Sustainability
- Journal Article
- 18/10/2025
3D printing is transforming healthcare through personalised devices, surgical precision and faster prototyping while advancing sustainability. On-demand production reduces waste, supports circular economy models and lowers carbon footprints by minimising transport and inventory. Despite its promise,...
READ MORE

radiology AI, large language models, fine-tuning, prompt engineering, clinical decision support, healthcare AI, medical imaging, natural language processing, radiology workflow, diagnostic AI Explore how fine-tuned language models and prompt engineering boost accuracy, reliability and efficiency in radiology AI.

Improving Radiology with Fine-Tuned Language Models

References:

Latest Articles

Related Articles

Latest News

INFO

IMAGING

ICU

EXEC

IT

CARDIOLOGY

JOURNALS

EVENTS

FACULTY

PARTNERS

JOBS

COMPANIES

PRODUCTS

BLOG

VIDEOS

Communities

CONTACT US

EU Office

Rue Villain XIV 53-55

B-1050 Brussels, Belgium

Tel: +357 86 870 007

E-mail: [email protected]

EMEA & ROW Office

166, Agias Filaxeos

CY-3083, Limassol, Cyprus

Tel: +357 86 870 007

E-mail: [email protected]

Headquarters

Kosta Ourani, 5

Petoussis Court, 5th floor

CY-3085 Limassol, Cyprus

E-mail: [email protected]