Colorectal cancer remains a major global health burden, with histopathological review of haematoxylin and eosin stained whole-slide images forming the diagnostic backbone for tumour detection, grading and subtype assessment. The rapid growth in slide volumes across multi-centre cohorts has increased pressure on pathology services while exposing the limitations of conventional computational pathology tools. Many existing models struggle when applied outside their training domain, often producing overconfident predictions and relying on spurious correlations linked to staining protocols or scanner characteristics. These weaknesses constrain clinical trust and complicate deployment across heterogeneous healthcare settings. A recently developed uncertainty-aware and causally adaptive foundation model addresses these challenges by combining principled uncertainty estimation with test-time adaptation strategies designed to improve robustness and interpretability in colorectal cancer pathology.

 

Must Read: CT-Based AI to Stratify Stage II Colorectal Cancer Risk

 

Performance Across Classification and Segmentation Tasks

The proposed framework is built on a pathology foundation backbone trained on large-scale histopathology data and evaluated across several public colorectal cancer datasets. In classification tasks covering tumour versus normal discrimination and subtype prediction, the model demonstrates consistently higher discriminative performance than existing foundation approaches. Improvements are observed not only in aggregate accuracy metrics but also in stability across difficult borderline cases, where conventional models often misclassify mildly atypical normal tissue as malignant. By integrating uncertainty-aware prediction heads, the model avoids excessive confidence in such cases and produces probability estimates that better reflect underlying ambiguity.

 

Beyond slide-level and patch-level classification, the framework also supports fine-grained gland segmentation, a task closely linked to colorectal cancer grading. On datasets with pixel-level gland annotations, the model preserves glandular contours with greater continuity and fewer false detections in surrounding stroma. Irregular or small glands, which are frequently under-segmented by baseline methods, are more consistently delineated. These gains indicate that the combination of uncertainty modelling and causal adaptation benefits not only high-level diagnostic decisions but also detailed morphological analysis that underpins downstream clinical assessments.

 

Uncertainty Quantification and Clinical Confidence

A defining feature of the model is its explicit decomposition of uncertainty into epistemic and aleatoric components. Epistemic uncertainty reflects limited model knowledge, often arising from distribution shifts or insufficient representation in training data, while aleatoric uncertainty captures irreducible noise related to tissue heterogeneity, staining variability or acquisition artefacts. By modelling both sources separately and combining them into a total uncertainty measure, the framework produces calibrated confidence estimates aligned with actual predictive accuracy.

 

Quantitative evaluation shows substantially improved calibration compared with established uncertainty baselines, with lower expected calibration error and improved probabilistic scores. Reliability analyses indicate that confidence levels closely track observed accuracy across the prediction range, reducing the prevalence of high-confidence errors. Patch-level uncertainty maps further illustrate how epistemic uncertainty highlights out-of-distribution morphologies such as mucinous regions or atypical glandular patterns, while aleatoric uncertainty correlates with staining noise and technical artefacts. This spatial correspondence between uncertainty and diagnostically ambiguous regions mirrors pathologist intuition and supports safer human–AI collaboration.

 

Simulated deferral scenarios provide additional insight into clinical utility. When the most uncertain cases are flagged for expert review, overall diagnostic accuracy improves and high-confidence errors decrease. Although such simulations do not replace reader studies, they illustrate how calibrated uncertainty can function as a triage mechanism, directing human attention to cases where automated predictions are least reliable.

 

Robustness Through Causal Test-Time Adaptation

Robust deployment across institutions requires models that can adapt to domain shifts without compromising stability. The framework addresses this requirement through causal test-time adaptation, which explicitly targets spurious correlations arising from non-biological factors such as staining intensity or scanner differences. By factorising representations into content-related and style-related components and applying causal interventions to the latter, the model encourages predictions that depend on biologically meaningful morphology rather than superficial appearance.

 

When transferred from a source dataset to external centres with distinct acquisition characteristics, the model maintains higher accuracy than entropy-based adaptation methods and exhibits reduced performance variability across sites. Visual analyses show improved alignment of feature representations across domains after adaptation, while intervention studies demonstrate suppression of stain-driven activations in favour of glandular and epithelial structures. Prediction variance under repeated interventions is markedly reduced, indicating more stable decision boundaries in unseen environments.

 

The adaptation process is designed for online use, allowing incremental updates as new slides are processed. Compared with entropy-only approaches, causal adaptation converges more rapidly and with fewer oscillations in accuracy and predictive entropy. This stability is particularly relevant for real-world workflows where slides arrive sequentially and acquisition conditions may change over time.

 

By unifying uncertainty decomposition, calibration and causal test-time adaptation within a foundation model architecture, the proposed framework addresses several persistent barriers to clinical adoption of computational pathology in colorectal cancer. Across classification, segmentation and cross-domain evaluation, the model demonstrates improved accuracy, better-calibrated confidence estimates and enhanced robustness to domain shift. Uncertainty maps and failure analyses further support interpretability and error localisation, reinforcing the potential for safe human–AI collaboration. While broader validation in prospective and geographically diverse cohorts remains necessary, the approach provides a coherent and practical pathway towards more reliable and trustworthy deployment of artificial intelligence in routine colorectal cancer pathology.

 

Source: npj digital medicine

Image Credit: iStock


References:

Lou S, Mo G, Zhang X et al. (2025) Uncertainty-aware and causal test-time adaptive foundation model for robust colorectal cancer pathology diagnosis. npj Digit Med: In Press.



Latest Articles

causal AI, colorectal cancer pathology, computational pathology, digital pathology AI, uncertainty-aware AI, test-time adaptation, medical AI diagnostics Colorectal cancer remains a major global health burden, with histopathological review of haematoxylin and eosin stained whole-slide images forming the...