Early identification of patients who will not achieve complete response within three months after concurrent chemoradiotherapy is clinically relevant in locally advanced nasopharyngeal carcinoma, given the poorer outcomes reported for non-complete responders. MRI underpins diagnosis and staging, yet conventional radiomics can be constrained by time-consuming tumour annotation and hand-crafted feature pipelines. Deep learning can reduce manual feature engineering, but approaches for early response prediction have faced limitations in validation and workflow automation. Using pre-treatment multi-sequence MRI from two centres, researchers developed an end-to-end framework that automatically segments tumours, derives 2.5D imaging representations and applies a Transformer-based model, alongside a combined approach that integrates an imaging-derived feature with a single clinical predictor to support early response stratification.

 

Two-Centre Cohorts and Early Response Assessment

The work drew on two independent patient cohorts from separate hospitals. The main cohort included 128 eligible patients, divided into a training set of 89 and an internal validation set of 39. An external test cohort comprised 56 patients. Included patients had pathologically confirmed stage III–IVA squamous cell nasopharyngeal carcinoma staged under the 8th edition American Joint Committee on Cancer system, completed a full course of radical concurrent chemoradiotherapy and had multiparametric MRI available before and after treatment, alongside complete clinical data. Exclusions were applied for poor-quality imaging or missing sequences, prior malignancy or prior treatment for nasopharyngeal carcinoma and specified medical history factors including long-term steroid use or immune-related disease.

 

Early therapy response was assessed on MRI three months after treatment using RECIST 1.1 criteria. Two radiologists categorised outcomes as complete response (CR) or non-CR, with non-CR encompassing partial response, stable disease and progressive disease. Agreement between readers for response assessment was high, with Cohen’s κ reported at 0.93. Baseline characteristics were reported as not significantly different between the training and internal validation sets, supporting the use of these subsets for model development and internal evaluation.

 

Must Read: Boosting Early Detection of Nasopharyngeal Cancer

 

Automated Segmentation and 2.5D Transformer Feature Building

Multiparametric MRI sequences were used, including T1-weighted imaging, contrast-enhanced T1, T2-weighted imaging and diffusion-related sequences including ADC. Tumour regions were initially delineated on contrast-enhanced T1 images by two experienced radiation oncologists, with consensus procedures and senior review, and segmentation agreement was reported as high (Cohen’s κ = 0.86). Other sequences were aligned to the contrast-enhanced T1 reference space.

 

To reduce manual workload for downstream processing, three automatic segmentation approaches were compared. The selected model achieved Dice coefficients in the mid-0.8 range, and it was used to generate 3D tumour masks for ROI cropping and subsequent modelling. This automated segmentation stage underpinned the broader aim of a more scalable workflow while retaining an ROI-focused analysis.

 

A 2.5D strategy was then used to balance spatial context with modelling efficiency. Rather than modelling the full volume directly, a central slice representing the maximum tumour cross-section was paired with a limited set of neighbouring slices drawn from nearby positions. Slice-level features were extracted via transfer learning from pre-trained convolutional networks, then aggregated using a Transformer-based multi-instance learning framework to produce a patient-level prediction. This design aimed to retain informative tumour context across slices while limiting the computational and data demands associated with fully 3D architectures.

 

Predictive Performance and Clinical Integration

Three signatures were evaluated: a Clinical model, a Transformer model based on imaging-derived features and a Combined model integrating imaging and clinical information. Clinical feature selection identified tumour volume as the single significant predictor retained after multivariate analysis. The reported odds ratio for tumour volume was approximately 1.12, with a 95% confidence interval around 1.03–1.22 and p = 0.024. The Clinical model was trained using an ensemble classifier on tumour volume alone.

 

The Combined model integrated tumour volume with a Transformer-derived imaging feature after standardisation and used logistic regression for classification. Performance was assessed using ROC analysis, calibration testing and decision curve analysis to examine potential clinical utility across a range of threshold probabilities.

 

Across cohorts, the Transformer model showed strong discrimination in development and internal validation, with AUC values in the mid-to-high 0.9 range and a lower AUC in external testing (reported at 0.83). The Clinical model showed weaker generalisation, with the external test AUC reported at 0.66. The Combined model matched or exceeded the Transformer-only approach, including in external evaluation, where an AUC of 0.87 was reported. In the external cohort, overall accuracy for both imaging-driven approaches was reported as high (0.893 for the Transformer and Combined models), while the clinical-only model had substantially lower accuracy (0.625). Calibration results indicated acceptable agreement between predicted and observed outcomes for the Combined model, with Hosmer–Lemeshow testing reported as non-significant, and decision curve analysis suggested a higher net benefit for the Combined approach over relevant threshold ranges.

 

To support interpretability, Gradient CAM heat maps were used to visualise image regions that contributed most strongly to model predictions. These maps were compared with original inputs to identify anatomical areas emphasised by the model, presented as an interpretability layer alongside performance metrics.

 

An automated, two-centre MRI workflow combining tumour segmentation, 2.5D feature construction and a Transformer-based aggregation model demonstrated strong performance for predicting early response in locally advanced nasopharyngeal carcinoma treated with concurrent chemoradiotherapy. Tumour volume emerged as the sole retained clinical predictor and integrating it with an imaging-derived feature improved external discrimination relative to clinical-only modelling, with the combined approach reporting an external AUC of 0.87. Calibration and decision curve findings supported the potential utility of the combined signature as a decision-support input for early risk stratification. Reported limitations included retrospective design, a relatively small sample size, lack of long-term follow-up and reliance on MRI without histopathological verification, with plans noted for broader multicentre validation and incorporation of additional modalities such as PET-CT radiomics and histopathological sampling.

 

Source: Insights into Imaging

Image Credit: iStock


References:

Shi K, Chen C, Fei Y et al. (2025) Imaging-based transformer model predicts early therapy response in advanced nasopharyngeal carcinoma: a dual-center study. Insights Imaging; 16, 267.



Latest Articles

nasopharyngeal cancer, MRI, early response prediction, artificial intelligence, deep learning, radiomics, chemoradiotherapy MRI and AI predict early treatment response in nasopharyngeal cancer, enabling better risk stratification.