Preoperative assessment of ground-glass nodules (GGNs) in lung adenocarcinoma (LUAD) remains clinically important because pathological invasiveness can influence both surgical planning and prognosis. Pathology provides the reference standard, yet practical constraints in defining invasion areas before surgery keep CT central to decision-making. Conventional CT descriptors such as size, volume and attenuation have been used to distinguish invasive from non-invasive disease, but conclusions across studies have been inconsistent and inter-observer agreement can be limited. Radiomics offers high-throughput quantitative imaging features, while deep learning (DL) can learn representations directly from images, though each approach has recognised limitations in stability and generalisability. A multi-centre retrospective analysis evaluated whether combining radiomics and DL features within a multiple-instance learning (MIL) framework could strengthen CT-based prediction of GGN invasiveness.
Invasiveness Drives Management and Prognosis
The 2021 World Health Organization classification describes a spectrum from precursor glandular lesions (PGL) to minimally invasive adenocarcinoma (MIA) and invasive adenocarcinoma (IAC). PGL includes atypical adenomatous hyperplasia (AAH) and adenocarcinoma in situ (AIS), which are described as stages in the evolution of GGNs. In practical care pathways, non-invasive adenocarcinoma (NIAC), including AAH, AIS and MIA, is associated with conservative sublobar resection or long-term CT follow-up, alongside an almost 100% five-year disease-free survival rate. IAC, by contrast, can require more aggressive surgery such as lobectomy and extended lymph node dissection, with a reported five-year disease-free survival rate of approximately 49–84%.
Despite the clinical stakes, there is no unified standard for CT-based discrimination of invasiveness. Reported predictors have included nodule size, density, volume and CT value, but the relative value of these features can vary, and clinical interpretation can be affected by reader subjectivity. Radiomics seeks to quantify tumour heterogeneity through structured imaging features, yet performance can be influenced by segmentation variability, radiologist judgement, radiation dose and reconstruction differences. DL approaches can reduce reliance on manual feature design, but interpretability and cross-centre robustness remain recurrent concerns. The central challenge is achieving a model that performs reliably across institutions while remaining grounded in routine CT inputs.
Must Read: AI Triage Cuts Workload in Low-Dose Lung Screening
A Multi-Centre CT Pipeline with Feature Fusion
The analysis retrospectively included 1247 GGNs from 1182 patients across six hospitals between January 2013 and June 2021. Inclusion required a maximum lesion diameter under 3 cm, preoperative thin-section CT with slice thickness of 1–1.25 mm and confirmation of primary LUAD. Exclusions included prior treatment or biopsy before baseline CT, a CT-to-surgery interval exceeding two weeks, severe motion artefacts and incomplete imaging, clinical or pathological information. Pathological diagnosis was used to assign lesions to invasive and non-invasive groups.
Two experienced radiologists reviewed CT images while blinded to pathology, measuring each GGN twice and resolving disagreements through consultation with a chief physician. Lesions were manually segmented on CT slices and combined into a volume of interest while avoiding adjacent vessels, bronchi, mediastinum and chest wall structures. Reproducibility was assessed using intra- and inter-class correlation coefficients, with agreement considered satisfactory above 0.85 and disagreements resolved by a senior radiologist.
Five modelling strategies were developed and compared, spanning radiomics, DL and MIL-based fusion. Radiomics features were extracted and filtered before training machine learning classifiers. DL models were built in both 3D and 2.5D settings, with data augmentation used to reduce overfitting. MIL was used as a fusion strategy by combining features from the DL approaches, then integrating DL-derived multi-instance features with radiomics features to form a hybrid MIL-DL-Rad model. This design aimed to capture complementary information while reducing dependence on any single feature type.
Consistent Discrimination Across External Test Sets
Across the full dataset, invasive lesions outnumbered non-invasive lesions, with 841 invasive and 406 non-invasive nodules. Several measured variables differed between groups, including age, nodule volume, multiple diameter measures and CT value, while lesion location did not show a significant difference. Model evaluation used a training set, a validation set and three test sets grouped by hospital, supporting assessment beyond the development cohort.
The integrated MIL-DL-Rad approach delivered the most consistent overall performance, and an ExtraTrees classifier was selected as the preferred configuration when balancing discrimination and other performance metrics. Reported area under the curve (AUC) values for MIL-DL-Rad were 0.936 in training and 0.881 in validation, with external test-set AUCs of 0.926, 0.868 and 0.918. In contrast, several non-integrated approaches showed stronger performance in the training cohort but less consistent rankings in external testing.
Calibration curves indicated alignment between predicted probabilities and observed outcomes for the hybrid model, supporting use in individual risk estimation rather than only cohort-level separation. Decision curve analysis showed higher net benefit across most threshold probability ranges in the validation and test cohorts, indicating potential clinical utility for supporting intervention decisions. Statistical comparisons using DeLong tests indicated significant differences between the hybrid model and several comparators across multiple cohorts. Net reclassification improvement and integrated discrimination improvement analyses were reported as largely positive, consistent with improved prediction performance relative to alternative approaches.
Limitations were acknowledged. Cohort imbalance and variation in external test set sizes constrained statistical power and generalisability inference. The retrospective design introduced potential selection bias. Manual segmentation supported consistent feature extraction within the dataset but may limit scalability, and future work was described as including automated segmentation, prospective data collection, broader multi-centre collaboration and potential integration of additional data sources, including pathomics and clinical information.
A multi-centre CT-based approach that fused radiomics and DL representations within an MIL framework demonstrated robust performance for predicting the invasiveness of GGNs in LUAD across multiple hospital-defined cohorts. The MIL-DL-Rad configuration with an ExtraTrees classifier combined strong discrimination with favourable calibration and decision-analytic net benefit in external testing. For healthcare teams, the findings support feature fusion as a practical route to more stable preoperative risk stratification using routine CT data, with relevance to tailoring surgical strategy and follow-up planning while reinforcing the need for prospective validation and workflow-ready segmentation approaches.
Source: Insights into Imaging
Image Credit: iStock