Chest x-ray interpretation requires visual assessment, diagnostic reasoning and clear communication, yet many AI systems separate these steps. Some models compare whole images with full radiology narratives, which can make their outputs hard to verify against specific anatomical areas. A recent analysis in IEEE Journal of Biomedical and Health Informatics examined a prompt-guided vision-language framework designed to connect user queries, image regions and diagnostic outputs in chest x-rays. The framework supports both written prompts and selected image areas, allowing focused inspection of clinically relevant regions. Its design combines localisation, diagnosis and explanation within one workflow, with the aim of making chest x-ray AI more interactive, anatomically specific and transparent.

 

Prompt-Guided Regional Reasoning

The framework addresses a limitation in many medical vision-language models: generated language is not always clearly connected to precise visual evidence. Standard models often learn from whole chest x-ray images and complete radiology narratives. This can limit their ability to recognise smaller, localised findings. Radiology narratives also tend to focus on abnormal findings and may omit normal structures, which makes training more difficult.

 

The proposed model accepts two types of input. A user can enter a written prompt, such as a query about opacity in a specific lung zone, or select an image area using coordinates. The system then processes the prompt and the image together to identify the relevant region. A Detection Transformer-based component links the query with image areas and produces localised outputs for later classification and explanation.

 

The workflow is organised around three functions. Prompt-Guided Localization identifies areas relevant to the query. Region-Level Diagnosis classifies findings in these selected areas. Region-Aware Explanation generates short descriptions linked to specific regions of interest. This structure allows each generated statement to be checked against a corresponding anatomical area, rather than leaving the output as a broad image-level statement.

 

Training and Evaluation Across Datasets

Training uses two stages. The first stage helps the model learn connections between chest x-ray images and related text. It uses several learning objectives to bring matching image and text information closer together and separate mismatched examples. Consecutive DeepSpeed micro-batches increase the number of comparison examples during training, which supports more robust learning.

 

The second stage focuses on region-level tasks. Image and text features are processed together, allowing the model to identify regions linked to medical findings. The selected region features are then used for classification and explanation. The system also handles cases where fewer relevant findings are present, so that unused query slots do not affect the region-level text-generation task.

 

The framework was trained and assessed using several publicly available chest x-ray datasets. MIMIC-CXR served as the main training dataset, while CheXpert, NIH Chestx-ray14, PadChest and BRAX added further diversity. External evaluation used datasets including VinDr-CXR, Chest ImaGenome, MIMIC-CXR Lung and IU-Xray. Images from the same subject were kept within a single split to prevent data leakage. Evaluation covered localisation, diagnosis, explanation, disease classification and grounded output generation.

 

Performance Gains and Remaining Limits

The framework performed strongly across several tasks. On MS-CXR, it improved prompt-guided localisation compared with several baseline approaches, with a mean Intersection-over-Union of 51.97. For region-level diagnosis on the same dataset, it reached an AUROC of 82.23, exceeding ChEX in that comparison. It also maintained performance across an external structured chest x-ray dataset.

 

Must Read: Improving Chest X-Ray Accuracy with Collaborative AI

 

The explanation task also showed improvements. On MS-CXR, the model exceeded ChEX across measures of label accuracy and language similarity. On Chest ImaGenome, it also maintained strong performance in key explanation measures. For disease classification, performance was assessed on pneumonia, pneumothorax and multi-label chest pathology datasets. On NIH ChestXray14, the mean AUROC across 14 pathologies was 0.741.

 

For grounded output generation on MIMIC-CXR, 74.6% of generated sentences were linked to matched region boxes. Examples showed localised findings such as pleural effusion, pneumonia and catheter position connected with corresponding image regions. Limitations remain. The model depends on well-formed prompts and may perform less effectively when queries are vague or anatomically imprecise. Full structured output remains limited by the current decoder design. The Detection Transformer backbone also adds inference latency, and text supervision may carry a bias towards abnormal findings.

 

The framework offers a region-aware approach to chest x-ray AI by connecting prompts, image evidence and generated diagnostic descriptions. Its design moves beyond broad image-level analysis by making localisation, diagnosis and explanation part of one prompt-guided workflow. Results across public datasets show gains in localisation, region-level diagnosis and grounded explanation, while limitations remain around prompt quality, output coherence, inference speed and abnormal-content bias. The model presents a route towards more verifiable and interactive AI support for radiology workflows.

 

Source: IEEE Journal of Biomedical and Health Informatics

Image Credit: iStock


References:

Liu L, Luo S, Li X et al. (2026) A Prompt-Guided Vision-Language Framework for Interpretable and Region-Aware Disease Diagnosis in Chest x-rays. IEEE Journal of Biomedical and Health Informatics. doi: 10.1109/JBHI.2026.3686304.




Latest Articles

prompt-guided AI, chest x-ray AI, radiology AI, medical imaging, vision-language model, AI diagnosis, chest x-ray interpretation, healthcare AI Prompt-guided AI improves chest X-ray interpretation with region-based diagnosis, localisation and explainable radiology insights.