Commercial AI devices for detecting lung cancer on chest radiographs are moving closer to routine clinical use, but device selection remains difficult when products built for similar purposes perform differently. Chest radiography is a critical entry point for lung cancer diagnostic pathways, especially in settings with rising imaging demand and radiologist shortages. A 2026 independent comparison published in Radiology tested seven commercial devices on 5235 chest radiographs from adults referred from primary care at a single UK centre. Diagnostic accuracy varied clinically and statistically across devices, with differences in sensitivity, specificity, positive predictive value and agreement with radiologist reporting.
Representative Primary Care Dataset
The dataset comprised consecutive posteroanterior chest radiographs acquired at Sheffield Teaching Hospitals NHS Foundation Trust between July 2020 and February 2021. Eligible radiographs came from adult patients referred from primary care for any indication. Radiographs were excluded when patients had withdrawn consent for data use through the NHS National Data Opt-out or when an earlier radiograph from the same patient had already entered the dataset. Clinical information included age, sex, self-reported ethnicity, request details and radiologist reporting. Local lung multidisciplinary team records from July 2020 to August 2021 supplied the confirmed cancer diagnosis.
Must Read: AI Support Enhances Lung Nodule Detection
The final sample included 5235 radiographs from 5235 patients. The median age was 60 years, 53.4% of patients were female and 79.4% were White. Radiologists had provided reports for all included radiographs, with 91.0% reported by the same seven thoracic radiologists. Lung tumours appeared in radiologist reporting for 2.2% of radiographs, and cancer suspicion appeared in 2.8%.
Multidisciplinary team decisions confirmed lung cancer in 1.6% of patients. Tumours were retrospectively visible on the radiograph in 1.4% of patients, matching the previously reported prevalence among UK primary care chest radiograph referrals. Devices from Annalise.ai, Gleamer, InferVision, Milvue, Oxipit, Qure.ai and Rayscape entered testing. Their outputs used different classifications, so tumour and cancer outputs required standardisation before comparison.
Wide Variation in Diagnostic Accuracy
All seven devices provided classification outputs for tumour detection, and six provided continuous model scores. When visible confirmed cancer served as the reference standard, receiver operating characteristic analysis produced area under the curve values ranging from 0.80 to 0.94. Nine of 15 pairwise comparisons showed differences between devices. Against radiologist-reported tumours, values ranged from 0.81 to 0.88, with differences in four of 15 pairwise comparisons.
Performance also differed markedly in practical diagnostic terms. When visible confirmed cancer served as the benchmark, the most sensitive tumour-detection outputs identified more than three-quarters of visible confirmed cancers, while the least sensitive identified about one-fifth. Specificity ranged from just under 60% to more than 98%. Positive predictive value remained low across devices, from 1.5% to 28.4%, while negative predictive value stayed consistently high, from 98.7% to 99.7%. Overall accuracy ranged from 58.7% to 97.5%. Radiologist-reported tumour identification had sensitivity of 59.7%, specificity of 98.6% and accuracy of 98.1%.
Compared with radiologist reports, three devices detected more cancerous tumours, while four detected fewer. Additional false-positive results for tumour detection ranged from 10 to 2039. For cancer detection, only two devices provided classification outputs, with additional false-positive results ranging from 51 to 249. Agreement across devices was limited, with minimal agreement among low-specificity and standard-specificity models, and no agreement among high-specificity and standard-specificity models.
Deployment Choices Depend on Intended Use
The differences between devices matter because a positive AI output can alter downstream care. All devices increased false-positive results compared with radiologist reporting. False positives can lead to additional investigations, even when experienced radiologists retain oversight, because clinical users may hesitate to overrule AI results owing to automation bias and accountability concerns.
If positive AI results triaged patients directly to CT, device choice would produce markedly different volumes of extra CT examinations. In the sample of 5235 patients, a worst-case scenario in which each false-positive result caused an additional CT scan would translate to £1,200 (approximately €1,385) to £244,000 (approximately €281,500) in additional financial cost at the institution and 11 to 2222 kg CO2 equivalent in additional carbon footprint. False-positive analysis found that erroneous detection of other pathologies was the most common cause, with a median of 71%.
Clinical pathway design changes which performance profile looks preferable. A pathway that sends patients with radiographic appearances suspicious for lung cancer straight to CT may favour higher specificity and positive predictive value. A pathway that prioritises reporting worklists or supports radiologist accuracy may favour higher sensitivity and negative predictive value. Device benchmarking under the same conditions therefore becomes central to selecting technology for clinical deployment.
Commercial AI devices for lung cancer detection on chest radiographs do not perform equally in a representative primary care sample. Some devices detected more cancerous tumours than radiologist reporting, while others detected fewer, and every device added false-positive results. The wide spread in sensitivity, specificity, positive predictive value and agreement shows that device selection cannot rely on intended use alone. The most suitable profile depends on how the device will sit within the diagnostic pathway. Head-to-head evaluation under comparable clinical conditions remains essential before deployment decisions.
Source: Radiology
Image Credit: iStock
References:
Maiter A, Taylor J, Metherall P et al. (2026) Independent head-to-head comparison of commercial artificial intelligence devices for lung cancer detection on chest radiographs. Radiology, 319(2):e252205.