Retrieving structured information from radiology reports remains a major challenge within clinical and research settings. Traditional keyword-based systems, despite widespread use, often fail to capture the underlying clinical meaning due to their reliance on superficial lexical matches. These limitations affect radiologists attempting to locate relevant prior reports for comparison and researchers seeking representative datasets. 

 

Although large language models (LLMs) have made strides in natural language understanding, embedding models designed specifically for radiology remain underdeveloped due to the lack of scalable annotation methods. Addressing this gap, the RadSearch model was introduced as a semantic search system trained with a scalable approach using actual radiology report data. The model achieved strong performance in several retrieval tasks and significantly improved diagnostic accuracy when integrated into an LLM, offering a practical and domain-specific advancement in medical text retrieval. 

 

Developing a Scalable Training Approach 
RadSearch was trained using a contrastive learning approach that eliminated the need for manual annotation. This method utilised pairs of text segments from the same radiology report—specifically, the findings section and the impression section—as positive training examples. Negative examples were generated by pairing findings with unrelated impressions from different reports. This process created a robust set of training pairs that preserved semantic context while avoiding the limitations of lexicon-based or manually curated datasets. 

 

Must Read: Improving Radiology Reports with AI & Common Data Elements

 

The training set included 16,690 reports collected retrospectively from the University of Alabama at Birmingham. A customised Python parser was used to extract relevant sections, and reports lacking complete data were excluded. RadSearch employed a Siamese network architecture with weight initialisation from RadBERT-RoBERTa-4m. By relying solely on native report sections, this strategy enabled efficient model development while retaining domain specificity, allowing RadSearch to capture the unique language patterns used in radiological reporting. 

 

Evaluation Across Diverse Retrieval Tasks 
RadSearch’s performance was benchmarked against established embedding models, including GTE-large, All MPNet Base and MS MARCO DistilBERT Base. Evaluation tasks included findings-to-impression matching, matching reports of the same examination type, retrieving reports based on free-text clinical queries and enhancing LLM diagnosis. 

 

In the task of matching findings to corresponding impressions, RadSearch retrieved the correct impression in 52.0% of internal test set queries and 39.3% in an external set, far exceeding the results of the comparator models. Examination type matching was assessed using mean average precision (mAP), where RadSearch scored 48.6%, above most comparators but below GTE-large, which achieved 52.0%. The model also excelled at interpreting free-text clinical queries. Simulated queries were created across six diagnostic categories. RadSearch retrieved relevant reports for the general finding in 83.0% of cases, matching the location in 89.8% of cases. It performed consistently well across categories such as aneurysm, pulmonary embolism and cholecystitis. While it underperformed GTE-large in two specific subcategories—intracranial haemorrhage and the unique features of spinal fractures—it delivered more accurate results overall for the majority of evaluated metrics. Reports retrieved for these queries were manually reviewed by a radiology resident, ensuring an accurate assessment of semantic relevance. 

 

Improving LLM Diagnostic Performance 
RadSearch was further assessed in its ability to enhance the diagnostic capabilities of an LLM, specifically Llama 3.1 8B Instruct. The LLM was prompted with report finding descriptions and asked to identify the most likely diagnosis. Without embedding model support, the LLM achieved a diagnostic accuracy of 30%. When supplemented with RadSearch results, accuracy increased to 61%, outperforming the 47% obtained when using GTE-large for retrieval. This demonstrates the model’s potential for integration in retrieval-augmented generation pipelines, providing LLMs with richer, more relevant context. 

 

By identifying and retrieving radiology reports with comparable findings, RadSearch helped the LLM produce more accurate diagnostic outputs, suggesting potential clinical applications in decision support. Its consistent performance across retrieval scenarios further strengthens its suitability for integration in radiology workflows and educational tools. 

 

RadSearch presents a meaningful advancement in radiology-specific semantic search. Its training method avoids the scalability limitations of manual annotation while leveraging the natural structure of radiology reports to generate semantically aligned training pairs. Across a range of evaluation tasks, including findings matching, examination type classification and free-text query retrieval, RadSearch delivered strong and consistent results. Moreover, its integration into a large language model significantly improved diagnostic performance, highlighting its practical utility in clinical and research contexts. 

 

Although retrospective in design and not yet deployed in real-world settings, RadSearch offers a reproducible and scalable framework for semantic search in medical text. Future work may explore sentence-level training for increased specificity, but even in its current form, RadSearch provides a valuable resource for radiologists, researchers and AI developers. 

 

Source: Radiology 

Image Credit: iStock


References:

Savage CH, Chaudhari G, Smith AD et al. (2025) RadSearch, a Semantic Search Model for Accurate Radiology Report Retrieval with Large Language Model Integration. Radiology, 315:1 



Latest Articles

Radiology reports, semantic search, RadSearch, radiology AI, LLM integration, medical NLP, diagnostic support, radiology research, contrastive learning, medical text retrieval. Boost radiology report retrieval with RadSearch — AI-driven semantic search for precise diagnostics.