Radiograph-based assessment of bone tumours can be difficult because benign and non-benign lesions may look similar, while specialist expertise is not always available in routine care settings. A 2026 analysis published in Insights into Imaging assessed whether deep learning models pretrained on medical images could improve bone tumour classification on radiographs. The work used data from 2,338 patients across four centres and compared medical-image pretraining with natural-image pretraining, radiomics and clinical modelling. It also tested model stability when tumour locations were slightly shifted, used heatmaps to show where the model focused and explored whether model outputs could support radiologists. The strongest performance came from a deep learning model using RadImageNet, a medical imaging dataset for pretraining.

 

Medical Image Pretraining Shows Stronger Results

The dataset included patients with histopathologically confirmed bone tumours. One centre provided the development group, while three other centres provided an independent external test group. The internal and external datasets differed in patient age, tumour type and tumour location. These differences created a demanding setting for assessing whether a model trained in one context could perform across other centres.

 

The comparison included two deep learning architectures, ResNet50 and InceptionV3. Each used transfer learning from either RadImageNet or ImageNet. RadImageNet represents medical-image pretraining, while ImageNet represents natural-image pretraining. A radiomics model used features extracted from tumour regions, and a clinical model used patient and tumour information. A combined model also linked the best image-based model with selected clinical features.

 

The best external performance came from ResNet50 pretrained on RadImageNet. It outperformed the same architecture pretrained on ImageNet, the InceptionV3 models and the radiomics model. The radiomics approach performed better on the internal test set than on the external test set, indicating weaker generalisability across centres. The clinical model alone also performed poorly on external testing. Combining clinical features with the strongest image model did not improve performance on the external set.

 

These findings favour medical-image pretraining over natural-image pretraining for this radiograph task. They also show that image-based deep learning performed more consistently than radiomics when applied beyond the development centre.

 

Robustness Varies by Tumour and Setting

The internal and external datasets differed not only in patient and tumour characteristics but also in image properties. Mean image intensity remained similar, while external images showed higher contrast. This difference added another test of whether the models could cope with variation across centres and imaging conditions.

 

Robustness testing used repeated changes to tumour bounding boxes. The boxes were randomly shifted to simulate variation in lesion localisation. This matters because tumour boundaries may not be marked in exactly the same way in different clinical settings. Models pretrained on RadImageNet remained more stable under these changes than models pretrained on ImageNet. ResNet50 with RadImageNet had the strongest stability pattern in this testing.

 

Error analysis showed that model performance depended on tumour subtype, location and lesion size. Osteosarcoma was detected more robustly than several other tumour categories. Some broader benign and non-benign groups had higher error rates. Giant cell tumours showed a marked increase in external-set errors, mainly as false negatives.

 

Anatomical location also affected performance. Lesions in the limbs were classified more accurately than lesions in the axial skeleton. More complex overlapping anatomy in areas such as the spine and pelvis made classification harder. Lesion size also mattered. Smaller lesions were more often missed, while larger lesions were detected more reliably.

 

Must Read: Low-Dose X-Rays Advance 3D Knee Bone Reconstruction

 

These results identify practical weak points. The model was not equally strong across all tumour groups and anatomical settings. Small lesions, axial locations and some tumour categories remained more challenging, even when the strongest medical-pretrained model was used.

 

Heatmaps Support Supervised Use

Heatmap visualisation showed the areas that influenced the ResNet50 RadImageNet model’s predictions. In correctly classified cases, the model often focused on diagnostically relevant regions. In some non-benign tumours, attention centred on the transition between tumour and normal tissue, where aggressive changes were visible. In cases where the model made an incorrect classification, the highlighted regions did not focus on useful diagnostic areas.

 

An auxiliary diagnostic experiment assessed whether model predictions and heatmaps could support radiologists. Two attending musculoskeletal radiologists reviewed cases without model assistance and then reviewed them again with model outputs after a washout period. The two readers had different results. Model assistance did not significantly improve performance for one radiologist. The other radiologist improved on the internal test set and showed a trend towards improvement on the external test set.

 

These results indicate that model support may not benefit all readers in the same way. Experience, baseline performance and interaction with model outputs all affected the result. The findings also support supervised use rather than autonomous screening. The radiologist identifies and marks the lesion, while the model provides a prediction and heatmap to support characterisation.

 

The proposed role differs by setting. In specialist centres, the model may support experts by helping reduce performance dips. In general practice settings, it may help triage suspicious lesions while supporting the dismissal of benign cases. However, lower external sensitivity means the model should not replace clinical judgement.

 

Medical-image pretraining improved deep learning performance for bone tumour classification on radiographs. ResNet50 pretrained on RadImageNet performed better than radiomics, clinical modelling and natural-image pretraining in external testing, while also showing stronger stability when lesion localisation varied. Heatmaps added interpretability in correctly classified cases, and model assistance helped one radiologist but not the other. The results support domain-specific pretraining as a useful direction for supervised radiograph-based decision support, while small lesions, axial locations and some tumour categories remain important limitations.

 

Source: Insights into Imaging

Image Credit: iStock 

 


References:

Li Z, Wang H, Wei G et al. (2026) Medical image pretraining-based transfer learning for generalizable and robust diagnosis of bone tumors on radiographs: a multi-center study. Insights Imaging; 17, 94.




Latest Articles

AI bone tumour classification, medical imaging AI, radiograph deep learning, RadImageNet, bone tumour detection, radiology AI, musculoskeletal imaging, ResNet50 medical AI AI trained on medical images improves bone tumour classification on radiographs with stronger accuracy, stability and decision support.