Delirium remains a frequently missed yet clinically significant complication in older hospitalised adults. Associated with increased mortality, morbidity and prolonged hospital stays, it poses a serious burden to healthcare systems. Despite the availability of validated diagnostic tools such as the Confusion Assessment Method (CAM), early recognition of delirium often fails due to inconsistent screening, resource limitations and the subtlety of clinical signs. At Mount Sinai Hospital, a quality improvement initiative aimed to address these challenges through the development and deployment of a machine learning (ML) model that automates delirium risk stratification. The model integrates structured electronic medical record (EMR) data with natural language processing (NLP) features, enabling clinicians to better prioritise high-risk patients and improve outcomes.
Developing a Multimodal Risk Prediction Model
The ML model was created using EMR data from patients aged 60 and above, admitted to non-intensive care units between 2016 and 2020. Structured data such as laboratory results, vital signs and demographics were combined with unstructured clinical notes, which were processed using NLP techniques to identify key terms and phrases indicative of delirium. These features were selected and refined based on frequency and relevance, then used to train a model capable of stratifying patients by risk level.
Must Read: Predicting ICU Mortality with AI
The development process followed a vertically integrated approach, ensuring that the model was compatible with existing clinical workflows from the outset. Rather than building the model in isolation, developers worked alongside clinical teams to design and refine each stage of the process. This collaborative method enabled the creation of three successive model versions, each improving upon the last. The final model—referred to as the fusion multimodal model—merged the most predictive elements from both structured and unstructured data sources, achieving a high degree of accuracy in pilot evaluations.
Deployment into clinical practice began in early 2023. The model runs daily risk assessments for all eligible inpatients, providing a colour-coded risk indicator within the EMR interface. Patients identified as high risk are prioritised for CAM screening by trained assessors. Once assessed, they are temporarily excluded from further prediction cycles to avoid redundant alerts, with reassessment scheduled five days later. This automated system ensures more efficient use of assessment resources, enabling staff to focus on those most likely to benefit from early intervention.
Impact on Clinical Workflow and Detection Rates
Following deployment, the model was evaluated over a 13-month period, comparing its effect on workflow and clinical outcomes against a historical pre-deployment cohort. A total of 32,284 admissions were analysed, including 7023 inpatients assessed for outcome comparisons. The results demonstrated a significant improvement in delirium detection rates, with the median monthly identification rate rising from 4.42 percent to 17.17 percent. This fourfold increase suggests that the model successfully highlighted cases that might otherwise have been overlooked.
This change in detection rates also reflects a more focused approach to CAM assessments. Rather than relying on single assessments conducted indiscriminately across the inpatient population, clinicians were now guided by real-time predictions to target those most at risk. This allowed for more consistent identification and monitoring of delirium, without increasing workload or assessment frequency. The visual presentation of risk scores within the EMR made the tool accessible to frontline staff, supporting their clinical decision-making without introducing unnecessary complexity.
In terms of model performance, the fusion multimodal model achieved an area under the receiver operating characteristic curve (AUROC) of 0.94 during clinical use, indicating a high level of discrimination between patients with and without delirium. Sensitivity and specificity values were 83 and 90 percent respectively, with a probability threshold set at 0.55. These figures confirm the robustness of the model in real-world settings and validate its integration into daily clinical routines.
Clinical Outcomes and Broader Implications
Beyond detection, the model’s implementation was associated with several noteworthy clinical outcomes. Although the length of stay was longer in the post-deployment cohort—a likely reflection of higher delirium prevalence and comorbidity—other indicators pointed towards improved medication management. While a greater proportion of patients received medications such as opiates, benzodiazepines and antipsychotics, the daily doses administered were significantly lower than in the pre-deployment group. For example, diazepam and olanzapine dosages were both reduced, aligning with best practices that discourage excessive psychotropic use in older adults.
This change may reflect a more measured approach to pharmacological management, enabled by earlier detection and better risk characterisation. By identifying patients earlier and with greater accuracy, clinicians could intervene sooner, possibly mitigating the need for more aggressive treatment. It also suggests that the model may be supporting a shift in clinical culture towards more cautious prescribing in the context of delirium.
Nevertheless, the authors acknowledged limitations regarding generalisability. The model was developed within a system that includes a dedicated delirium service and unique multicomponent intervention protocols. Its successful adoption elsewhere may depend on the presence of similar infrastructure or the willingness of clinicians to adapt existing practices. Further evaluation in external settings would be necessary to confirm its broader applicability.
The implementation of an ML-based multimodal model for delirium risk stratification at Mount Sinai Hospital has shown clear benefits in clinical practice. By combining structured EMR data with NLP-derived insights, the model enhanced the detection of delirium, improved screening workflows and contributed to more judicious medication use. Its success illustrates the potential of AI to support clinical decision-making in a targeted, efficient manner. However, the need for external validation remains. Future efforts should focus on adapting and testing similar models across diverse hospital environments, ensuring that their benefits can be realised more broadly across healthcare systems.
Source: JAMA Network Open