Whole Slide Images (WSIs) play a central role in digital pathology by providing detailed visual information that supports diagnosis, subtyping and treatment planning. Their size and complexity have encouraged the use of deep learning methods that analyse slides as collections of smaller image tiles. Many existing models are trained on a single tumour type, which can limit generalisability and increase the risk of overfitting when datasets are small. Combining cohorts from different cancers offers a potential solution by expanding training data and capturing broader morphological variation. At the same time, differences between cancer types introduce challenges related to imbalance and representation learning, making multi-cancer modelling difficult without dedicated architectural and training strategies.
Challenges in Multi-Cancer Representation Learning
Training across multiple cancer cohorts introduces diversity that can strengthen feature learning but also complicates optimisation. Differences in cohort size, subtype distribution and morphological characteristics can create imbalance at several levels, including cancer type, slide count and tile representation. Such imbalance can influence how models learn decision boundaries and may lead to biased predictions.
Another difficulty arises from representation misalignment. Tile encoders trained on histopathology images often preserve strong cancer-type information within learned features. When cohorts are combined without additional safeguards, models may rely on cancer identity as a shortcut for prediction rather than learning morphology directly associated with classification targets such as molecular subtype. This behaviour can reduce performance when the relationship between cancer type and classification label varies across cohorts.
Independent training for each cancer avoids some of these issues but requires maintaining multiple models and does not benefit from shared morphological information. Simple joint training reduces this operational burden but may degrade performance across cohorts. Addressing these limitations requires approaches that capture both shared and cancer-specific patterns while reducing bias introduced by cohort heterogeneity.
Cancer-Aware Attention and Debiasing Strategy
A proposed multi-cancer framework introduces a Cancer-Aware Attention module integrated into a VisionTransformer tile encoder, forming a Cancer-Aware VisionTransformer architecture. The design separates attention queries into shared and cancer-specific components while maintaining common key and value projections. This structure allows the model to learn global morphological patterns across datasets while also capturing cancer-specific characteristics when relevant.
A learnable query-attention mechanism assigns token-level importance to shared and cancer-specific queries, enabling flexible emphasis depending on image context. Cancer-specific query projections are updated using gradients from their corresponding cohorts, while the shared query representation is updated using all available samples. The additional computational requirement is described as limited, with overhead mainly related to storing cancer-specific parameters.
Must Read:Hybrid CT Predicts GGN Invasiveness in Lung Cancer
To reduce cancer-type shortcuts in slide-level representations, the framework incorporates adversarial mutual-information minimisation during training. Mutual information between slide representations and cancer-type associations is estimated using a smoothed lower-bound estimator implemented through a score network. Training alternates between improving this estimator and optimising the classification objective combined with a weighted mutual-information penalty. Evidence from preliminary analysis indicates that mutual-information minimisation reduces the ability to predict cancer type from learned representations while improving downstream classification performance.
The framework also introduces hierarchical balancing across cancer types, slides and tiles. During encoder pretraining, weighting is distributed evenly across these levels. During multiple instance learning (MIL) training, weights are balanced across cancer-class combinations and then across slides within each combination. Sample weights are clipped within a limited statistical range and batch normalisation is applied during training, supporting stable optimisation and reducing overrepresentation of dominant cohorts.
Evaluation Across Multi-Cancer Cohorts
Evaluation uses a multi-cancer dataset derived from TCGA cohorts, including colorectal adenocarcinoma, stomach adenocarcinoma and endometrial carcinoma. Two classification tasks are examined: MSS versus MSI and GS versus CIN. Slides are H&E-stained diagnostic images that undergo tissue segmentation followed by tiling into fixed-size patches at a standard magnification level. Slides with insufficient tissue tiles are excluded from training.
Training uses nested cross-validation with patient-level stratification and validation sampling from the training set. Model selection is based on patient-level AUC, with balanced accuracy used as a secondary measure. Ensemble bagging of top-performing models is applied within each configuration.
Results indicate that naïve joint training across cancers can reduce performance compared with independent models in some cohorts, particularly for the MSS/MSI task. The cancer-aware framework improves AUC across colorectal, stomach and endometrial cohorts relative to both independent and joint baselines. For GS/CIN classification, improvements are observed across cohorts, with particularly strong performance reported for endometrial carcinoma.
Experiments across multiple MIL backbones show consistent gains when the cancer-aware encoder is used. Ablation analysis demonstrates that cancer-aware pretraining contributes to improved results compared with a standard VisionTransformer encoder. Comparisons with several foundation encoders indicate competitive performance despite a smaller embedding dimension. Sensitivity analysis identifies different mutual-information loss weights for the two classification tasks, and hierarchical balancing performs better than alternative sampling and weighting strategies.
Multi-cancer training of histopathology models can improve generalisation but introduces challenges related to imbalance, representation bias and cohort heterogeneity. A framework combining cancer-aware attention, adversarial mutual-information minimisation and hierarchical balancing demonstrates improved performance across TCGA colorectal, stomach and endometrial cohorts for MSS/MSI and GS/CIN classification tasks. The approach reduces reliance on cancer-type signals while preserving shared morphological learning across datasets. Reported limitations include dependence on TCGA-based cohorts and the need for external validation, as well as opportunities to refine weighting strategies and interpretability analysis of cancer-aware attention mechanisms.
Source: Medical Image Analysis
Image Credit: iStock