Sepsis remains a major cause of morbidity and mortality in intensive care settings, requiring timely identification to support intervention. A recent systematic review published in International Journal of Medical Informatics evaluates machine learning approaches for early detection and prediction of sepsis, focusing on model explainability and alignment with clinically recognised biomarkers. The analysis covers 37 studies published between 2019 and mid-2025, all based on Sepsis-3 criteria and adult ICU populations. The findings highlight a rapid increase in the use of explainability techniques in predictive models, alongside persistent limitations in how key biological markers are represented. These patterns reflect both advances in data-driven methods and constraints linked to electronic health record data.
Rising Adoption of Explainable Machine Learning
Explainability methods show a marked increase in adoption over time, with odds rising by approximately 67% per year. Usage is substantially higher in the period from 2023 to 2025 compared with earlier years. Models incorporating explainability also achieve higher methodological quality scores, averaging above those without such techniques. This trend reflects a broader shift towards transparency and clinical interpretability in machine learning for sepsis prediction.
Must Read: Explainable AI Advances Sepsis Prediction in ICUs
Explainability approaches include post-hoc techniques such as SHAP and LIME, as well as intrinsically interpretable models. These methods provide insight into how input features contribute to predictions, enabling closer inspection of model behaviour at both global and individual levels. Simpler feature-importance measures are not considered sufficient for detailed interpretation, as they do not explain how specific variables influence individual predictions.
Despite this progress, heterogeneity remains across implementations. Different models apply varying methods, datasets and evaluation strategies, limiting direct comparison. The observed increase reflects adoption of specific explainability tools rather than a comprehensive shift across all interpretability approaches. Variability in study design and dataset characteristics also constrains the strength of temporal conclusions.
Mismatch Between Model Features and Clinical Biomarkers
Model explanations frequently prioritise variables that are consistently available in electronic health records, particularly vital signs such as heart rate, temperature and respiratory rate. These variables dominate feature importance rankings across studies, even though they are less specific to sepsis pathophysiology. In contrast, clinically validated biomarkers such as C-reactive protein and procalcitonin are rarely identified as top predictors.
C-reactive protein appears as an input feature in only a small number of models, while procalcitonin is included in even fewer. Their absence among leading predictive features highlights a gap between model outputs and established clinical understanding. This disparity reflects limitations in how these biomarkers are captured in large datasets. Missing values are common, with some databases reporting high levels of incomplete data for these variables.
Temporal sampling also affects representation. Biomarker levels can fluctuate, and peak values may not be recorded depending on testing frequency. As a result, models trained on retrospective datasets often fail to capture biologically meaningful signals. This constraint reduces the ability of explainability methods to reflect clinically relevant mechanisms.
The dominance of routinely collected variables leads to explanations that mirror data availability rather than biological specificity. Bridging this gap requires datasets that include more complete and temporally aligned biomarker measurements. Improved data capture would support models that better align predictive features with clinical reasoning.
Data and Methodological Constraints Limit Generalisability
Most models rely on structured time series data derived from electronic health records, including vital signs, laboratory values and demographic information. While this approach captures the evolving nature of sepsis, it introduces challenges related to data quality and consistency. Irregular sampling and missing values complicate both model training and interpretation, affecting robustness and generalisability.
Feature selection varies widely across models, with input variables ranging from small, curated sets to large collections exceeding one thousand features. Limited overlap in selected features raises concerns about stability across populations and settings. A significant proportion of models use local institutional datasets, which may not generalise to broader clinical environments.
External validation is not consistently performed, reducing confidence in cross-institution applicability. Reproducibility also remains limited, with relatively few models sharing code or datasets. Data access restrictions and privacy considerations contribute to these limitations, but the low level of code sharing indicates additional barriers to transparency.
Methodological quality varies, with scores ranging from moderate to high adherence to established criteria. Common strengths include adequate sample sizes, use of feature engineering and reporting of evaluation metrics. However, gaps persist in areas such as explainability, external validation and detailed reporting of model parameters.
Retrospective study designs dominate, limiting the ability to establish causal relationships and introducing potential biases linked to clinical workflows and documentation practices.
Machine learning models for sepsis prediction demonstrate increasing use of explainability techniques and strong reliance on electronic health record data. However, model outputs remain largely driven by readily available variables rather than clinically specific biomarkers. This creates a disconnect between predictive performance and biological relevance. Limitations in data completeness, feature selection and reproducibility further constrain clinical applicability. More comprehensive datasets and prospective designs are required to align predictive models with pathophysiology and support meaningful integration into clinical decision-making.
Source: International Journal of Medical Informatics
Image Credit: iStock
References:
Papapanagiotou I, Karalis A, Kokkoris S et al. (2026) Machine learning for early detection and prediction of sepsis: explainability and key sepsis biomarkers representation—A systematic review. International Journal of Medical Informatics; 214: 106420.