Long-term success after kidney transplantation depends on recognising evolving risk as health trajectories change. Models based only on pre-transplant characteristics can miss complications that arise during follow-up, leading to underestimation or overestimation of risk. Researchers working with the Swiss Transplant Cohort Study developed a dynamic, interpretable machine learning approach that updates predictions annually using newly available clinical and laboratory data. The framework comprises a baseline model for the first post-transplant year and a follow-up model that recalibrates risk each subsequent year. Across more than a decade of follow-up, the approach improved discrimination for predicting graft loss and death compared with baseline-only methods, while preserving transparency on which variables most influenced individual predictions. The findings support more responsive, personalised risk assessment in routine transplant care.

 

Dynamic Modelling Enhances Predictive Accuracy

The framework combines two stages tailored to the availability of information over time. The baseline model estimates the risk of death or graft loss during the first year using only pre-transplant data, which is typically the extent of information available immediately after surgery. For each event-free year thereafter, the follow-up model incorporates baseline features alongside time-updated measurements such as laboratory values, rejection episodes and newly recorded comorbidities to estimate next-year risk. This annual, instance-based design aligns with real-world transplant follow-up and avoids propagating earlier risk scores into later predictions.

 

Must Read: AI Enhances Risk Prediction in Renal Cancer

 

Five machine learning approaches were evaluated within this structure: Logistic Regression, Support Vector Machine, Multilayer Perceptron, LightGBM and a tabular foundation model (TabPFN). Performance was assessed using the Area Under the Receiver Operating Characteristic curve (AUROC) and the Area Under the Precision-Recall Curve (AUPRC) with nested cross-validation. Incorporating follow-up data consistently enhanced discrimination. For death, AUROC values rose from approximately 0.60–0.70 at baseline to above 0.75–0.80 with follow-up integration. For graft loss, baseline AUROCs were generally below 0.70 but exceeded 0.80 once longitudinal signals were included. AUPRC, which is particularly informative for low-incidence outcomes, nearly doubled for some models when time-varying data were used.

 

LightGBM delivered strong overall results with practical advantages for implementation. In the follow-up setting it achieved AUROC values up to 0.896 for graft loss and 0.797 for death, with competitive AUPRC, supporting reliable ranking of at-risk patients. Year-by-year evaluation showed rising discrimination as more longitudinal data accrued. For death, AUROC increased from 0.64 in year 1 to the high 0.80s in later years, while AUPRC rose from just over 0.04 to peaks between 0.20 and 0.30. For graft loss, AUROC began at 0.68 and peaked around 0.95 by year 8, with AUPRC increasing from about 0.08 to around 0.49 by year 6. Some variability appeared in later years due to smaller at-risk cohorts, but overall trends supported continuous risk monitoring.

 

Key Predictors and Interpretability

Interpretability was addressed through SHAP (SHapley Additive exPlanations) analyses, providing insight into how individual variables influenced predictions. For death prediction, recipient age, estimated glomerular filtration rate (eGFR) and an indicator of cardiopulmonary disease were among the most impactful contributors, with additional signals from blood pressure, lipid measures and other markers of metabolic health. These results underline the combined importance of baseline characteristics and accumulating follow-up information when estimating mortality risk.

 

For graft loss, renal function measures dominated. eGFR, creatinine and proteinuria were consistently the strongest predictors, reflecting the centrality of real-time kidney function in short-term graft viability. Donor-related factors, including donor age and cold ischemia time, and markers of chronic kidney disease progression added further prognostic value. Analyses suggested that the ranking of key variables remained broadly stable over time, highlighting persistent relevance for kidney function and systemic health markers across the follow-up horizon.

 

The explanatory outputs support clinical decision-making by clarifying why a particular patient’s risk estimate is high or low at a given time point. Because the models leverage familiar clinical variables rather than obscure or untested biomarkers, the interpretability reduces barriers to adoption and aligns with existing transplant monitoring workflows.

 

Consistency Across Subgroups and Clinical Utility

Robustness across patient categories was examined through subgroup analyses in the follow-up period. For death prediction, AUROC values remained in a consistent range of 0.72–0.89 across subgroups defined by variables such as sex, donor type, blood group, ethnicity and transplant centre, with lower but stable AUPRC values reflecting the rarity of events. For graft loss, AUROC typically ranged from 0.85 to 0.90 with AUPRC values around 0.25–0.40. Targeted evaluations within donor categories showed comparable results between related and unrelated living donors and between deceased donor subtypes, supporting generalisability across common clinical scenarios.

 

Translating continuous risk scores into actionable decisions requires selecting operating points that fit local priorities and resources. The team examined recall-driven thresholds to reduce missed events in critical outcomes. For graft loss, choosing a recall of 60% yielded a precision of 15.8% with specificity above 95%. Increasing recall to 80% captured more true events but reduced precision to about 4.7%, illustrating the expected trade-off in low-incidence settings. Extending prediction windows to 3, 5 or 10 years increased the event rate and improved precision at similar recall levels, though the primary focus remained next-year prediction to match routine follow-up intervals. Routine performance monitoring and recalibration were recommended to address potential subgroup differences and to maintain alignment with evolving patient populations.

 

An interpretable, continuously updated machine learning framework improved the prediction of death and graft loss after kidney transplantation compared with static, baseline-only approaches. By integrating annual clinical and laboratory updates, the model captured changing risk profiles, identified consistently important predictors such as renal function and comorbidities and delivered robust performance across patient subgroups. LightGBM’s combination of discrimination and interpretability supports practical deployment. Aligning with annual transplant follow-up, the next-year prediction design offers a feasible way to prioritise monitoring and interventions for patients at elevated risk, while allowing centres to select thresholds suited to their capacity and risk tolerance. This approach provides a foundation for broader validation and integration into clinical decision support to enhance personalised transplant care.

 

Source: npj digital medicine

Image Credit: iStock


References:

Fan B, Schürch M, Tian Y et al. (2025) Enhancing post-kidney transplant prognostication: an interpretable machine learning approach for longitudinal outcome prediction. npj Digit Med; 8, 684.



Latest Articles

kidney transplantation, graft loss prediction, transplant mortality risk, machine learning in healthcare, dynamic risk modelling, LightGBM, personalised transplant care, digital medicine Dynamic AI models improve prediction of graft loss and death after kidney transplantation.