Phantom-based research is widely used in diagnostic radiology to evaluate imaging systems, refine protocols and validate image analysis without exposing patients or animals to risk. Physical or computational models of human tissues and anatomy enable controlled investigation across radiography, mammography, computed tomography, magnetic resonance imaging, ultrasound, nuclear medicine and radiotherapy. Poorly defined objectives, inadequate phantom selection, weak acquisition protocols and suboptimal analysis can, however, lead to misleading or non-reproducible findings. Recent guidance on phantom studies in medical imaging sets out recommendations to strengthen design, reporting and interpretation, with an emphasis on reproducibility and clear links to clinically relevant questions.
Role and Classification of Phantoms
Phantoms are models that reproduce specific aspects of human tissues or anatomical structures under controlled conditions. They can be physical or computational and are used to assess system performance, test reconstruction and analysis methods, optimise workflows, support quality assurance and contribute to development of artificial intelligence tools. In diagnostic radiology they allow investigation of image quality, detectability and dose relationships while avoiding ethical concerns associated with human or animal research. Formal ethics approval is not usually required, but workload, cost and potential impact on clinical equipment availability should still be taken into account.
Must Read: Consensus Imaging Guidelines for HCC Treatment
Clear, specific objectives form the basis of reliable phantom research. Phantom experiments may be designed to validate technologies, compare systems, optimise acquisition or reconstruction or support routine quality control by monitoring parameters such as spatial resolution, contrast, noise and dose efficiency. The guidance stresses that objectives should be measurable and consistent from background to conclusions. Vague aims encourage unfocused experimentation, selective reporting and post-hoc exploration of findings with limited value.
Physical phantoms are grouped into synthetic, mixed and biological categories. Synthetic phantoms include standard designs made from materials that approximate water or tissue and anthropomorphic models that reproduce anatomy and tissue heterogeneity, sometimes using three-dimensional printing. Mixed phantoms embed biological specimens within synthetic structures to combine realistic texture with stable and reproducible geometry. Biophantoms rely entirely on biological materials from animals or plants, such as excised organs or commonly available vegetables. Each category balances realism, stability and reproducibility differently, and selection should match the research question and be reported transparently.
Designing Robust and Reproducible Experiments
Reproducibility is a central theme in the guidance. Commercial synthetic and mixed phantoms, produced under controlled conditions, are favoured when standardisation and comparability across centres are important, as they support consistent measurements over time. Homemade synthetic or mixed phantoms can address specific questions but demand detailed description of fabrication, materials and preparation in the main methods section. Biophantoms may better approximate some tissue properties but are sensitive to variability and time-dependent change, which restricts their suitability for quantitative work and positions them mainly for proof-of-concept or interventional applications.
Acquisition protocols are closely tied to the primary objective. For system characterisation, a clearly documented reference acquisition is recommended, followed by systematic variation of relevant parameters such as tube voltage, exposure settings, reconstruction kernel, slice thickness, frame rate or sequence configuration, depending on modality. Repeated acquisitions under identical conditions help assess stability, and an acceptable coefficient of variation for physical parameters is provided as an example benchmark. Any deviations from planned protocols and technical problems should be documented to support interpretation and reproducibility.
When comparing systems or technologies, use of the same phantom and maximally harmonised acquisition settings is encouraged. Where exact matching is not feasible, the guidance recommends explicit description of equivalence criteria and potential confounders. Baseline clinical protocols can be supplemented by additional acquisitions to allow robust comparison, and multiple images per condition strengthen statistical power. For dose or contrast optimisation, protocols should link well-defined image quality metrics to systematic variation of radiation levels or contrast agent parameters, with repeated acquisitions at each setting to quantify variability. Planning of endpoints and sample size is also addressed, with emphasis on clearly defined primary and secondary endpoints and transparent data reporting, including treatment of intra-condition variability and inter-reader agreement.
Image Analysis, Statistics and Clinical Orientation
Image analysis is divided into quantitative and qualitative components. Quantitative analysis uses numerical metrics aligned with the objectives and supported by standardised acquisition, phantom positioning, region-of-interest placement and system calibration. Common metrics include signal-to-noise ratio, contrast-to-noise ratio, noise power spectrum, spatial resolution characterised by modulation transfer function and task-based detectability index. Each reflects a particular dimension of performance, from overall clarity and lesion visibility to noise texture, fine detail and task-specific detectability. Consistent definitions, repeated measurements and clear reporting are needed for meaningful comparison between systems, protocols or reconstruction methods.
Qualitative assessment complements numerical metrics by capturing human perception of image quality, diagnostic confidence and lesion visibility. The guidance highlights careful reader selection and training, use of more than two readers to limit individual bias and assessment of inter-reader reliability with appropriate statistics. Blinding and randomisation are considered essential, with anonymised and randomly ordered images used to avoid recognition of equipment or protocol and to minimise order effects. Clearly defined rating criteria and Likert-type scales, whether absolute or relative, should be described so that grading of noise, contrast, lesion detectability and artefacts is reproducible.
Statistical recommendations include reporting of frequency distributions for ordinal data, preferential use of non-parametric tests for analysis of Likert-scale scores and careful justification where parametric tests are applied. Provision of confidence intervals, exact p-values and explicit handling of outliers and excluded data is encouraged. Sensitivity analyses can be useful when conclusions depend on assumptions such as the treatment of neutral categories. Throughout, a strong clinical orientation is maintained. Phantom investigations enable controlled and reproducible work that would be impractical, unethical or too costly in humans or animals, but they cannot replace clinical validation. Improved image quality alone does not guarantee better diagnostic performance or outcomes, so alignment between phantom objectives and clinically relevant questions remains essential.
The guidance offers a structured approach to more rigorous and clinically meaningful phantom research in medical imaging. Clarifying objectives, selecting and documenting phantoms appropriately, standardising acquisition protocols, strengthening quantitative and qualitative image analysis and applying robust statistical methods can improve the reliability and comparability of results. Maintaining a clear link between phantom-based findings and clinical questions, while acknowledging the limits of phantom models, helps ensure that such work supports technology validation, system comparison, protocol optimisation and quality control without overstating implications for patient care.
Source: European Radiology Experimental
Image Credit: iStock