Incidental detection of pancreatic cystic lesions (PCLs) is increasingly common during abdominal imaging. While many of these lesions are benign, some carry the potential for malignancy, necessitating routine surveillance. The American College of Radiology (ACR) Incidental Findings Committee (IFC) provides a detailed algorithm to guide follow-up recommendations for such lesions. However, implementing these recommendations in clinical practice remains challenging. Their complexity, reliance on multimodal inputs such as flowcharts and tables, and need for contextual awareness across serial examinations make them difficult to automate using traditional text-based artificial intelligence (AI) tools. To overcome these limitations, researchers evaluated a novel multimodal large language model (LLM) capable of integrating text and visual elements through a method called flowchart embedding. 

 

Embedding Clinical Knowledge into AI Workflows 

Conventional LLMs are designed to process and generate text, limiting their usefulness when clinical decision-making depends on multimodal information. The ACR IFC algorithm, for instance, includes key diagnostic and follow-up logic embedded in diagrams and tables that a plain-text model cannot interpret. To address this, the study employed a multimodal LLM known as GPT-4o, which can convert image-based content from clinical guidance documents into machine-readable text using vision-to-text capabilities. By structuring these inputs as JSON objects, the model was able to retrieve and use flowchart data dynamically, enabling more accurate and contextually aware recommendations. 

 

Must Read: Predicting Mortality and Identifying Incidental Findings in Chest CT with AI 

 

The researchers divided the study into two tasks. Task 1 involved using the LLM to review individual radiology reports and produce follow-up recommendations using three different knowledge retrieval methods: default knowledge (linking only to the document), plain-text retrieval-augmented generation (RAG) and flowchart embedding. Task 2 tested the model’s ability to evaluate serial imaging studies to detect interval changes in PCLs and update the follow-up schedule accordingly. In both tasks, the LLM was given structured prompts encouraging step-by-step reasoning to improve accuracy and transparency. 

 

Flowchart Embedding Outperforms Traditional Methods 

In task 1, the model achieved high accuracy when using flowchart embedding to identify key PCL characteristics, including lesion presence, size, location, main pancreatic duct (MPD) communication and worrisome features. Specifically, it reached 98.0–99.0% accuracy for these tasks. More significantly, it achieved an accuracy of 89.9–91.9% for follow-up recommendations, as independently confirmed by three radiologists. This performance was markedly better than that of the default knowledge method, which ranged from 39.9–42.4% and the plain-text RAG method, which scored only 23.7–25.3%. 

 

The advantage of flowchart embedding lies in its ability to incorporate structured guideline logic directly into the decision-making process. Plain-text RAG failed to accurately reflect clinical intent in many cases, often generating ambiguous or incorrect recommendations. For example, it produced unclear recommendations regarding imaging modality or timing in more than one-third of cases. In contrast, flowchart embedding eliminated ambiguity and provided complete, interpretable results. 

 

Task 2 extended the system’s capabilities to assess changes over time using two sequential imaging reports. The LLM achieved 96.5% accuracy in detecting interval change in PCLs and 81.2% accuracy in providing revised follow-up schedules. This performance was achieved through a two-stage prompting strategy, where the model first summarised initial findings and recommendations, then compared them against the follow-up report. It evaluated whether lesions remained stable or had changed and adjusted the surveillance interval accordingly. Most errors in follow-up scheduling were minor, often limited to simple miscalculations. 

 

Clinical Value and Broader Implications 

The success of flowchart embedding demonstrates the potential of multimodal LLMs to support clinical workflows by automating complex decision-making tasks. This is especially relevant in the case of incidental findings, where consistency and adherence to guidelines can be difficult to maintain across busy clinical settings. By accurately interpreting both unstructured report narratives and structured guideline components, the model offers a means of standardising care and reducing unnecessary imaging. 

 

Moreover, the system is adaptable. While the study focused on the ACR IFC guidance for PCLs, the same methodology could be applied to other conditions managed through structured algorithms. As the underlying design is agnostic to the specific guideline, different or updated clinical documents could be substituted with minimal modification. This flexibility supports ongoing clinical use and future integration into radiology information systems. 

 

Although the study had limitations, including a single-centre setting and a relatively small sample size, its findings suggest strong generalisability. The model performed well regardless of whether the radiology reports were structured or unstructured and across both CT and MRI modalities. Additionally, in task 2, more than one-third of the follow-up scans were conducted for unrelated indications, reinforcing the model’s value in real-world scenarios where imaging does not always follow a standardised pathway. 

 

The use of a multimodal LLM with flowchart embedding represents a significant step forward in the automation of radiologic decision support. By integrating structured and unstructured information, the model generated accurate, guideline-concordant follow-up recommendations for pancreatic cystic lesions. It also demonstrated the ability to process serial imaging data, assess interval changes and adjust surveillance schedules accordingly. These capabilities suggest that such systems could play an important role in enhancing clinical efficiency, improving care consistency and ultimately supporting evidence-based practice across a broader range of medical imaging contexts. 

 

Source: American Journal of Roentgenology 

Image Credit: iStock


References:

Zhu Z, Liu J, Hong CW et al. (2025) Multimodal Large Language Model With Knowledge Retrieval Using Flowchart Embedding for Forming Follow-Up Recommendations for Pancreatic Cystic Lesions. AJR 2025. Accepted manuscript. doi:10.2214/AJR.25.32729



Latest Articles

pancreatic cysts, multimodal AI, follow-up, medical imaging, radiology, LLM, flowchart embedding Enhance pancreatic cyst follow-up with multimodal AI for precise, guideline-based surveillance.