A research study from the University of California, San Diego, describes an innovative approach to improving respiratory support decisions in intensive care units by integrating a large language model with an existing deep learning system. The work addresses a critical clinical challenge: determining whether patients at high risk of requiring invasive mechanical ventilation should receive high-flow nasal cannula (HFNC) or noninvasive ventilation (NIV) as their initial respiratory support.
The research team had previously developed RepFlow-CFR, a sophisticated deep counterfactual model that estimates individualised treatment effects for patients receiving either HFNC or NIV. While this model demonstrated strong performance in predicting outcomes, it suffered from a lack of transparency in decision-making. The model was not inherently constrained to follow clinical guidelines, which could lead to recommendations that conflict with evidence-based best practices.
To address these limitations, the researchers enhanced RepFlow-CFR by incorporating Claude 3.5 Sonnet, a large language model deployed in a HIPAA-compliant environment. The language model was configured to enforce adherence to established clinical guidelines, specifically the European Respiratory Society and American Thoracic Society 2017 guidelines for NIV and the European Respiratory Society 2022 guidelines for HFNC. The system was designed to assess whether RepFlow-CFR recommendations aligned with guideline-based criteria, generate independent treatment recommendations, and provide justifications citing relevant guideline statements.
The study analysed 1,261 intensive care unit encounters at UC San Diego Health between January 2016 and December 2023. The researchers compared recommendations from both the original RepFlow-CFR model and the language model-enhanced version against actual treatment decisions. They evaluated clinical outcomes, particularly the need for invasive mechanical ventilation and mortality or discharge to hospice, comparing rates between cases where treatment was concordant with recommendations versus those where it was discordant.
The results demonstrated significant benefits from the language model-enhanced approach. Treatments concordant with the enhanced recommendations were associated with substantially lower rates of invasive mechanical ventilation. For patients where HFNC was recommended, invasive mechanical ventilation occurred in 24.47% of concordant cases compared to 52.94% of discordant cases, representing a 97.33% relative risk increase when treatment diverged from recommendations. Concordance was also associated with reduced mortality or hospice discharge.
To assess clinical validity and safety, the researchers conducted a structured chart review of 20 cases. The review revealed that 95% of language model recommendations aligned with clinical guidelines, and physicians agreed with 65% of final recommendations. However, errors were identified in 11 of 20 cases. While most errors were rated as low or moderate risk, two cases were judged as potentially causing severe harm. The language model demonstrated excellent question comprehension in all cases and correct evidence retrieval in 95% of cases, though some recommendations contained incorrect content or missing clinically important information.
The study concludes that integrating large language models for guideline enforcement improves the interpretability and clinical alignment of counterfactual models in respiratory support decision-making. This hybrid framework not only enhances concordance with real-world practice but may also improve patient outcomes. Future work should focus on refining contraindication detection, expanding validation through prospective clinical trials, and integrating the system into electronic health records for real-time clinical decision support.
Source: Critical Care
Image Credit: iStock
References:
Lu X et al. (2025) Enhancing predictive modeling for respiratory support with LLM-driven guideline adherence. Critical Care.