Clinical imaging generates vast volumes of data with potential value beyond immediate diagnosis and care delivery. While structured electronic health records and genomic datasets are routinely reused for research, radiological imaging has remained harder to access at scale because of technical complexity, governance constraints and the risk of disrupting clinical operations. Vanderbilt University Medical Center has addressed these challenges by developing ImageVU, a dedicated research imaging infrastructure designed to support secondary and opportunistic use of radiology data. The platform integrates clinical imaging with metadata-driven cohort discovery while maintaining regulatory compliance and protecting patient privacy. By embedding imaging within an established institutional research ecosystem, ImageVU aims to make large-scale reuse of MRI, CT and PET data feasible for a broad range of translational research activities.
Building a Dedicated Research Imaging Infrastructure
ImageVU was designed to operate independently of clinical picture archiving and communication systems while passively receiving a complete stream of imaging data. Clinical CT, MR and PET studies are duplicated in real time and routed to a dedicated Research PACS without altering routine clinical workflows. This approach avoids performance degradation in clinical systems that are optimised for diagnostic use rather than high-volume data retrieval. A separate backfill process retrieves historical or missed studies in a throttled manner to prevent interference with care delivery.
Must Read: LLMs Streamline Thrombectomy Report Summaries
The core architecture comprises four interconnected components: a Research PACS, an Ad Hoc Backfill Host, a Cloud Storage System and a De-Identification System. Together, these components support secure ingestion, temporary buffering, long-term encrypted storage and on-demand de-identification of imaging data. The Research PACS stores imaging sessions briefly before transferring them to cloud storage, preserving original DICOM formats and compression schemes. Local solid-state storage provides resilience during interruptions, while automated processes monitor and manage data movement.
By maintaining a single identified archive and applying de-identification only when access is requested, the infrastructure balances storage costs with flexibility. This design avoids the expense of parallel identified and de-identified archives while supporting a range of research requirements, from fully de-identified projects to studies requiring identified data under appropriate approvals.
Metadata-Driven Discovery and Research Access
A central feature of ImageVU is the integration of imaging metadata into Vanderbilt’s existing research data warehouses. Extracted metadata are stored in two parallel layers: the Research Derivative, which contains identified clinical data, and the Synthetic Derivative, which provides a de-identified, date-shifted repository. Imaging-specific views within these environments allow researchers to discover cohorts using structured attributes such as modality, study date and body region, alongside other clinical variables.
Researchers access these capabilities through web-based discovery tools aligned with institutional governance processes. Identified projects use the Research Derivative for cohort definition, while de-identified projects rely on the Synthetic Derivative under non-human subjects research determinations. Imaging data themselves remain in separate PACS and cloud storage systems, with retrieval and preparation managed through a controlled backend workflow. This separation ensures that only approved datasets are released and that privacy protections are applied consistently.
De-identification follows a multi-stage process. Automated processing removes standard identifiers and overlays, custom filters exclude image types that are difficult to anonymise and manual review is applied when required by project scope or data sharing agreements. The level of de-identification is aligned with institutional review board determinations, ranging from Synthetic Derivative–style anonymisation to retention of identifiers for approved uses. This layered approach reduces residual risk while preserving scientific utility.
Institutional Impact and Research Enablement
Since its implementation, ImageVU has demonstrated measurable institutional impact. By December 2024, the platform had processed 12.9 million MRI and CT series from more than 1.36 million studies involving over 453,000 patients. It had supported 75 project requests, delivered more than 50 terabytes of imaging data to 55 investigators and contributed to 66 published research papers. These figures reflect sustained use across multiple years and research domains.
Beyond volume metrics, the platform has improved compliance and standardisation. Centralised de-identification workflows limit the number of individuals handling protected health information and enforce consistent security controls. Dedicated informatics support provides guidance on imaging phenotyping and cohort refinement, which is particularly valuable for investigators without specialist imaging expertise. Early involvement of technical teams improves data quality and study design.
The shared infrastructure also supports scalability and cost efficiency. Automated pipelines enable batch processing of imaging data without requiring each project to build its own high-performance computing environment. Centralised storage and computation lower per-study costs, making imaging-based research more accessible to pilot projects and early-career investigators. Rapid turnaround for small cohorts supports proof-of-concept work and research in rare diseases where populations are limited.
ImageVU illustrates how a purpose-built research imaging infrastructure can integrate radiological data into broader secondary use ecosystems without compromising clinical operations or privacy requirements. By combining dedicated technical architecture with metadata-driven discovery and governed access processes, the platform expands the role of imaging in translational research. Its impact at the medical center’s activity demonstrates the value of aligning imaging reuse with existing institutional frameworks for clinical and genomic data. The design choices and lessons emerging from ImageVU offer a practical reference for organisations seeking to unlock the research potential of routinely acquired imaging while maintaining security, compliance and operational stability.
Source: Journal of Biomedical Informatics
Image Credit: iStock