Artificial intelligence systems in healthcare offer the promise of improving clinical outcomes, from predicting mortality to optimising hospital workflows. However, a major barrier to the effective deployment of these models is performance degradation due to data shifts. These shifts occur when the data a model sees in practice differ significantly from the data it was trained on. In healthcare, such shifts can arise from changes in patient demographics, hospital processes or external events like pandemics, and they risk introducing harmful biases or reducing prediction accuracy. 

 

A recent study involving seven hospitals in Toronto developed a robust, label-agnostic monitoring pipeline to detect and remediate these shifts in real time. By combining this with transfer learning and continual learning strategies, the study provides a framework for maintaining AI model performance and ensuring their safe integration into clinical settings. 

 

Detecting Data Shifts in Hospital Environments 

To understand how data shifts affect clinical AI, researchers analysed electronic health records from over 143,000 patients admitted to general internal medicine wards across seven hospitals. The dataset covered a ten-year period and included both academic and community hospitals. The team developed a label-agnostic monitoring pipeline capable of detecting harmful shifts in real time. This approach used a black box shift estimator with maximum mean discrepancy testing, enabling identification of shifts without needing outcome labels. These shifts were triggered by various factors, including hospital type, demographic variables, admission sources and laboratory assay changes. Notably, younger age groups, admissions from nursing homes or acute care and transitions between hospital types showed significant distributional changes. Furthermore, temporal shifts linked to the COVID-19 pandemic and laboratory test upgrades, such as to high-sensitivity troponin assays, also impacted the data distribution. 

 

The pipeline proved especially useful in simulating clinical deployment scenarios. For example, when the model was trained in academic hospitals and evaluated in community hospitals, performance degraded unless adaptations were made. Similarly, subtle shifts linked to fewer brain natriuretic peptide (BNP) or D-dimer tests resulted in measurable drops in predictive accuracy for subgroups such as patients with respiratory or neurological disorders. This monitoring system was critical in identifying when and where performance dips were likely to occur, offering a foundation for targeted interventions. 

 

Mitigating Model Degradation with Transfer and Continual Learning 

Once data shifts were identified, the study evaluated methods to restore or maintain model performance. Transfer learning emerged as a key strategy. By pretraining a model on data from one hospital type and fine-tuning it for another, researchers improved model accuracy across sites. Specifically, community hospitals benefited the most when models were pretrained on their own data or fine-tuned using community-specific information. This finding highlighted that generalising across hospital types is not straightforward; differences in care settings and patient populations mean that models must be adapted thoughtfully. Training on data from all sites improved overall performance, but the benefits were not uniform across subgroups. 

 

Must Read: Mitigating Covariate Shift in Federated Learning for EHRs 

 

Continual learning, triggered by data drift, offered another layer of adaptability. Instead of retraining models at fixed intervals, updates were triggered when the system detected significant shifts in incoming data. This drift-triggered updating strategy proved more effective than maintaining static models, especially during periods of disruption such as the COVID-19 pandemic. The best results were achieved with models that updated every 120 days using recent data from a 60-day window. Selective updating using only positively or correctly predicted encounters was less effective than using all available encounters. Importantly, updating too frequently or with overly long training periods introduced risks such as overfitting or forgetting previous patterns. The approach balanced responsiveness with model stability, ensuring continued performance without sacrificing generalisability. 

 

Implications for Clinical Practice and Future Research 

The proactive approach outlined in this study addresses key challenges in deploying clinical AI safely. Many existing models are evaluated in controlled environments but fail to perform when exposed to the dynamic nature of real-world clinical settings. The study’s label-agnostic monitoring system removes the dependency on immediate outcome labels, a common bottleneck in healthcare. It enables real-time surveillance and supports timely interventions before performance drops result in clinical harm. Moreover, by integrating continual learning and transfer learning, the strategy allows for ongoing model optimisation without starting from scratch each time data changes. 

 

However, the work also underscores several limitations and areas for further research. While the framework demonstrated robustness across multiple hospitals within one region, its generalisability to other healthcare systems with different demographics or data structures remains to be tested. The pipeline relies on standard imputation techniques, and more sophisticated methods such as multiple imputation by chained equations could enhance accuracy. Additionally, integrating social determinants of health could improve fairness assessments across populations. Finally, the regulatory landscape surrounding adaptive clinical AI remains underdeveloped, and guidance is needed to ensure safe, ethical updates to deployed systems. 

 

The responsible deployment of clinical AI systems requires not only technical excellence but also resilience to the realities of evolving healthcare environments. The study presents a comprehensive solution to the problem of data shifts, showing that label-agnostic monitoring, paired with adaptive learning strategies, can maintain model performance and reduce potential harm. the framework bridges a critical gap between development and practice by focusing on deployment-readiness and proactive evaluation. Such approaches will be essential to ensure clinical AI technologies deliver on their promise to improve patient care across diverse and changing healthcare settings. 

 

Source: JAMA Network Open 

Image Credit: iStock


References:

Subasri V, Krishnan A, Kore A et al. (2025) Detecting and Remediating Harmful Data Shifts for the Responsible Deployment of Clinical AI Models. JAMA Netw Open, 8(6):e2513685. 



Latest Articles

clinical AI, data shift detection, model performance, hospital AI, real-time monitoring, continual learning, transfer learning, healthcare AI deployment, covariate shift, electronic health records, adaptive AI systems, AI in UK hospitals A new framework helps clinical AI adapt to data shifts using label-agnostic monitoring and continual learning.