Early identification of respiratory deterioration in intensive care unit (ICU) patients remains a central clinical challenge. Mechanical intubation is often required with little warning, and delayed recognition can increase clinical risk. Artificial intelligence (AI) models have demonstrated strong predictive performance for short-term clinical events, including the need for intubation. However, limited transparency and poor alignment with clinical reasoning have constrained their uptake in routine care. Explainable AI (XAI) aims to address this gap by making model outputs interpretable and clinically meaningful. For time-critical ICU decisions, explanations must reflect how physiological variables evolve over time and how those changes influence risk estimates. Aligning explanation design with established clinical workflows is therefore essential to support trust, usability and acceptance among clinicians.

 

Time-Series Modelling of Near-Term Intubation Risk

A near-term prediction framework was developed to estimate the probability of mechanical intubation within the following hour, based on physiological data from the previous six hours. The end of this six-hour window represented the current clinical assessment point, reflecting routine ICU reassessment rather than immediate procedural guidance. Outcome labelling occurred after the observation window to prevent leakage of post-decision information into the model inputs.

 

Data were extracted from the MIMIC-III database and included adult ICU patients aged 18 years or older with stays exceeding seven hours. This duration ensured adequate capture of lower-frequency variables such as partial pressure of oxygen. To address class imbalance, the cohort was constructed using a 1:2 ratio of intubated to non-intubated patients, resulting in a final sample of 4,608 individuals. Ten physiological variables were selected by clinicians as model inputs and organised as hourly time-series. Frequently recorded values were aggregated into hourly medians. Missing data were handled using Last Observation Carried Forward or population mean imputation to maintain consistent input structure.

 

Multiple machine learning models were trained to classify the multivariate time-series data, including Logistic Regression, Decision Trees, Random Forest, eXtreme Gradient Boosting, Explainable Boosting Machine, Long Short-Term Memory networks and Convolutional Neural Networks. Model comparison was based on area under the curve (AUC) performance on a held-out test set.

 

Explainability Grounded in Physiological Trajectories

Random Forest achieved the highest test AUC of 0.94, marginally outperforming deep learning approaches. This result indicated that, for this task and dataset size, tree-based models captured relevant temporal patterns effectively without the added complexity of deep architectures. Random Forest was therefore selected as the final model for explanation development.

 

Must Read:Synthetic Data Expands Paediatric Research and Care

 

SHapley Additive exPlanations (SHAP) was used to generate interpretable outputs. SHAP assigns contribution values to individual features for each prediction, supporting both global and patient-level interpretation. This approach enabled examination of how specific physiological values influenced estimated risk at a given time point. Global analysis identified fraction of inspired oxygen at the current hour as the most influential variable, followed by heart rate and earlier values of fraction of inspired oxygen and respiratory rate. Local explanations illustrated how extreme or rising values contributed to high predicted intubation risk for individual patients.

 

Standard SHAP plots are static and poorly suited to longitudinal data, limiting their clinical relevance in ICU settings where trend interpretation is central. To address this limitation, three time-aware visual explanation formats were designed. These included a time-aware force plot, a temporal importance bar chart and a dual-encoded heatmap. Each format aimed to integrate historical physiological trends with model attribution over time, allowing clinicians to compare AI reasoning with their own pattern recognition processes.

 

Clinician Preferences and Perceived Usability

Clinician evaluation was conducted using an online survey distributed via Prolific and administered through Qualtrics. The survey combined seven multiple-choice comprehension questions with Likert-scale items adapted from Technology Acceptance Model 3, assessing Perceived Usefulness, Perceived Ease of Use and behavioural intention. A total of 217 responses were collected, with 206 retained after quality filtering.

 

Overall, the heatmap emerged as the preferred explanation format, selected by approximately 55% of participants. Force plots were preferred by 30%, while 15% selected the bar chart. Average comprehension accuracy across all formats was 78%. Force plots achieved the highest comprehension score at 92%, followed by bar charts at 90%, while heatmaps scored 76%. Despite lower clarity during brief exposure, heatmaps received the highest ratings for Perceived Usefulness, Perceived Ease of Use and behavioural intention, each with mean scores close to 3.8.

 

The preference for heatmaps was attributed to their ability to combine raw physiological values and model contributions in a single view. This integration supported rapid pattern recognition and reduced the cognitive effort required to connect trends with risk estimates. In contrast, force plots and bar charts required clinicians to mentally integrate separate sources of information, making them less suited to time-pressured ICU environments.

 

A time-aware explainable AI framework was developed to support near-term intubation risk assessment in intensive care. The approach combined high-performing prediction using Random Forest with SHAP-based explanations adapted to reflect physiological trajectories over time. Clinician evaluation showed that explanation format strongly influenced acceptance, with dual-encoded heatmaps rated highest for usefulness and ease of use despite lower initial clarity. These findings highlight the importance of workflow-aligned explainability and suggest that clinically intuitive visualisations may be as critical as predictive accuracy for the adoption of AI decision support in critical care settings.

 

Source: International Journal of Medical Informatics

Image Credit: iStock


References:

Xian T, Mehandjiev N, Constantinides P et al. (2026) Clinician preferences for explainable AI in critical care: a comparative study of interpretable models and visualizations for intubation decision support. International Journal of Medical Informatics; 210:106287.




Latest Articles

Explainable AI, ICU intubation risk, clinical decision support, SHAP visualisation, time-series modelling, critical care AI, Random Forest, healthcare analytics Explainable AI visuals predict ICU intubation risk, improving clinician trust and decisions practices