Accurate protocol selection for abdominal CT examinations is a critical component of radiology practice. Appropriate configuration of acquisition parameters and effective use of contrast media depend on choosing the correct protocol, which directly influences diagnostic precision and clinical outcomes. Despite its importance, the task of protocolling remains highly resource-intensive. Radiologists, trainees and technologists are routinely responsible for assigning hundreds of CT protocols each week, which contributes significantly to noninterpretive workload. These frequent interruptions not only consume valuable time but also divert attention from image interpretation, creating operational challenges for efficiency and focus. 

 

With the increasing capabilities of large language models (LLMs), particularly those optimised for reasoning tasks, new opportunities have emerged to streamline radiology workflows. Unlike earlier-generation models, newer reasoning-oriented LLMs are designed to handle complex, multistep tasks with greater precision. Their training incorporates reinforcement learning and chain-of-thought strategies tailored for fields such as science, mathematics and medicine. When guided by local protocol guidelines, such models have the potential to automate abdominal CT protocol assignment in a manner that is both consistent with clinical standards and adaptable to departmental needs. 

 

Model Performance and Methodology 
To assess the viability of LLMs in this context, 300 consecutive abdominal CT requisitions for adult patients were analysed. These cases, submitted in April 2024, included a balanced representation of emergency, inpatient and outpatient settings. De-identified requisition data, including free-text clinical indications, were processed using three distinct LLMs: GPT-4o, o1-mini and o3-mini. The models operated through a vendor-provided application programming interface (API), using a shared system prompt embedded with few-shot examples and institution-specific guidelines. Each model produced a protocol assignment, a triage priority and recommendations regarding the use of intravenous and enteric contrast media. 

 

Two experienced abdominal radiologists established consensus reference standards for each case. These standards encompassed 20 unique protocol types, with routine abdomen-pelvis, chest-abdomen-pelvis and renal colic protocols being most frequently assigned. Performance was evaluated based on exact match accuracy with the reference standards. The top-performing model, o1-mini, achieved an accuracy of 87.7 percent for protocol selection, 83.0 percent for priority assignment, 92.3 percent for intravenous contrast recommendations and 91.7 percent for enteric contrast media suggestions. The remaining two models performed comparably, with no more than 1.5 percentage points difference across any task. 

 

Furthermore, protocol selection accuracy remained consistent across clinical contexts. Emergency, inpatient and outpatient requisitions all exhibited similar performance metrics, with no statistically significant variation for any model. This consistency suggests that, when equipped with appropriate prompts, LLMs are capable of functioning reliably across different care environments, irrespective of patient location or urgency. 

 

Patterns of Misclassification and Cost Considerations 
Despite overall strong performance, specific error patterns were observed. The o3-mini model, for instance, frequently misassigned specialised protocols in place of the more commonly required routine abdomen-pelvis selection. It also tended to suggest incorrect protocols instead of appropriate CTA-related examinations in some cases. Additionally, inaccurate enteric contrast media recommendations occurred in 19 cases, reflecting potential misalignment between the prompt and evolving clinical practices—particularly the trend toward reduced use of enteric contrast in emergency imaging. 

 

Must Read: Dual-Energy CT for Abdominal Cancer Monitoring  

 

These limitations highlight the importance of prompt optimisation and human oversight in deployment. Errors were more prevalent in less common, more complex protocols, underscoring the value of clinician review for non-routine cases. Nevertheless, the models demonstrated potential to reduce interruptions in protocol assignment without undermining clinical safety when integrated thoughtfully into existing workflows. 

 

Operational cost was notably low. Processing all 300 cases required approximately $1.21 (around €1.11) using GPT-4o, and approximately $0.27 (around €0.25) for both o1-mini and o3-mini. Given the frequency of protocol assignments in radiology departments, such low per-use costs reinforce the practicality of LLMs as a scalable solution for administrative task reduction. The ability to adapt these tools to department-specific protocols through tailored prompting further enhances their applicability in diverse healthcare settings. 

 

Limitations and Practical Implications 
The analysis focused exclusively on abdominal CT requisitions from a single institution. Nearly half of all cases were assigned a routine abdomen-pelvis protocol, which may have inflated accuracy scores due to the relative simplicity of these cases. Furthermore, model decisions were based solely on requisition text, without access to supplementary clinical data from electronic medical records. This limitation contrasts with the typical radiologist workflow, which often relies on additional clinical context during protocol selection. 

 

Prompt design did not incorporate recent shifts in clinical practice, such as the reduction of enteric contrast use in emergency departments, contributing to some model errors in that area. The LLMs also operated in isolation, without integration into live clinical environments or electronic medical record systems, which would be necessary for real-time protocol assignment. Additionally, the reference standard was created for investigational purposes and did not reflect protocols clinically assigned at the time of imaging, introducing the potential for discrepancies in evaluation. In some clinical scenarios, multiple protocol options may be equally appropriate, further complicating the definition of a single reference standard. 

 

Large language models demonstrated strong performance in automating abdominal CT protocoling, including tasks such as protocol selection, prioritisation and contrast media recommendations. When supported by site-specific guidelines and structured prompts, models such as o1-mini and o3-mini achieved high accuracy at minimal cost, with processing expenses for 300 cases amounting to just $0.27 (around €0.25). Most errors were limited to less frequent, more complex protocols, suggesting that with appropriate safeguards and oversight, LLMs can meaningfully reduce the administrative burden on radiology departments. Wider implementation will require real-time integration, prompt refinement and broader validation across additional imaging contexts and institutions. 

 

Source: American Journal of Roentgenology 

Image Credit: iStock


References:

Sacoransky E, Azizi N, Yu E et al. (2025) General Purpose and Reasoning Large Language Models for Automated Abdominal CT Protocoling. AJR 2025: Accepted manuscript.  



Latest Articles

abdominal CT, CT protocol automation, radiology AI, large language models, contrast media, UK radiology, protocol assignment, AI in imaging Large language models improve abdominal CT protocol accuracy, cut admin load, and reduce cost in radiology.