Unlocking Data Potential with Synthetic Health Records

In IT
Sun, 25 May 2025

The emergence of digital medicine has been tightly interwoven with the proliferation of deep learning models, which require extensive and diverse datasets for effective development and validation. However, stringent privacy regulations, particularly in healthcare, limit the accessibility and sharing of real patient data. This tension between data needs and privacy preservation has intensified interest in synthetic health records (SHRs).

SHRs are artificially generated datasets designed to mirror real-world electronic health records (EHRs) while ensuring that no identifiable patient information is retained. A recent scoping review offers a comprehensive analysis of deep learning models capable of generating synthetic medical text, time series and longitudinal data. It explores their methodological strengths, data modalities, objectives and performance metrics, shedding light on both the promises and challenges of SHR implementation in healthcare innovation.

Synthetic Time Series for Physiological Modelling
Synthetic time series data generation is pivotal for modelling physiological signals such as electrocardiograms (ECG) and electroencephalograms (EEG), which are widely used in diagnostic and monitoring contexts. In this domain, 22 studies focused on time series generation, with the majority targeting data scarcity and privacy concerns. GAN-based models were dominant due to their capacity to replicate temporal dynamics, though issues such as mode collapse and sensitivity to hyperparameters were noted.

Must Read: Scalable Clinical AI with Synthetic Data Distillation

Diffusion models emerged as an alternative, demonstrating superior performance in capturing long-term dependencies in physiological patterns. Applications of synthetic time series extend beyond simple data replication; they support minority class generation, improve model robustness in low-resource settings and aid in imputation where signal gaps exist. Nevertheless, evaluation fidelity remains challenging, especially in translating synthetic patterns into clinically meaningful outputs. The fidelity of these synthetic records is often assessed visually or through benchmarking with classifiers, but the lack of standardised metrics for re-identification risk complicates the validation process.

Preserving Privacy with Synthetic Longitudinal Data
Longitudinal data encapsulate patient trajectories across multiple visits and timepoints, offering a rich source for understanding chronic disease progression, treatment patterns and long-term health outcomes. In the reviewed studies, 17 papers explored synthetic longitudinal data generation, with privacy protection cited as the primary motivation. GANs again featured prominently, often combined with graph-based and probabilistic approaches to capture the complex interdependencies within patient histories. Some models applied autoregressive or mixed-type architectures to handle the variety of data types—demographics, diagnoses, vitals—within EHRs. Yet, many public datasets used for training are ICU-centric, thereby underrepresenting non-acute cases and demographic diversity.

Limitations in dataset generalisability and the linguistic homogeneity of English-language records present challenges to scalability. Moreover, while fidelity and utility of generated data are typically assessed, few studies adequately quantify the risk of re-identification. The absence of universally accepted performance metrics for privacy evaluation hinders robust model comparison and slows adoption in regulatory-sensitive environments.

Generating Clinical Text: Opportunities and Constraints
Clinical narratives provide nuanced insight into patient conditions, physician assessments and care decisions. Generating synthetic medical text requires handling linguistic variability, contextual coherence and embedded domain knowledge. Of the 13 reviewed studies in this area, most employed large language models (LLMs), particularly GPT-style architectures, which have shown considerable success in producing coherent synthetic notes across multiple languages. These models demonstrated strong potential in both privacy preservation and addressing data scarcity. Yet, LLMs come with substantial computational costs and limitations in complex reasoning, which affects the accuracy and reliability of synthetic clinical narratives.

Chain-of-thought prompting has been proposed to enhance reasoning in generated text, but its effectiveness in multi-modal healthcare contexts remains inconclusive. Furthermore, the reproducibility of results is often hampered by inaccessible codebases and undocumented hyperparameter choices. While synthetic clinical text can support de-identification, enhance named entity recognition and enrich underrepresented clinical scenarios, achieving consistent quality and reliability remains a work in progress.

The generation of synthetic health records represents a critical enabler for data-driven healthcare, offering viable solutions to the persistent challenges of data scarcity, class imbalance and patient privacy. This scoping review reveals that while generative adversarial networks dominate time series modelling, longitudinal data benefit from probabilistic and graph-based methods and clinical texts are best served by large language models. However, across all modalities, there is a clear need for robust, standardised performance metrics that address fidelity, utility and privacy.

Current gaps in evaluation, data generalisability and model reproducibility limit the immediate application of SHRs in clinical practice. Bridging these gaps requires methodological refinement, regulatory alignment and interdisciplinary collaboration between technologists, clinicians and policymakers. As synthetic data generation matures, it holds the promise of transforming digital medicine by making high-quality, privacy-respecting data widely accessible for innovation and research.

Source: npg digital medicine

Image Credit: Freepik

References:

Loni M, Poursalim F, Asadi M et al. (2025) A review on generative AI models for synthetic medical text, time series, and longitudinal data. npj Digit. Med., 8:281.

AI healthcare, clinical AI, deep learning models, synthetic health records, digital medicine UK, synthetic EHR, data privacy healthcare, health data generation, GAN healthcare, synthetic clinical notes

Latest Articles

Hospitals of the Future: The Next Frontier in Patient-Centred Care
- Journal Article
- 18/10/2025
Hospitals are rapidly evolving into smart, connected ecosystems focused on proactive, personalised care. Leveraging AI, robotics, remote monitoring and digital health tools, they enhance diagnostics, improve workflows and support decentralised models like virtual wards. Predictive analytics, interoper
READ MORE
AI Orchestration in Emergency Radiology – Implementation in the Valencia Health Region
- Journal Article
- 18/10/2025
The Valencia Health Region deployed a vendor-neutral AI orchestration system across 29 hospitals to improve emergency radiology. Validated at Hospital General Universitario Dr Balmis, it streamlines triage, accelerates diagnoses and reduces radiologists’ workload. The system processes over 5,700 studi
READ MORE
Advancement of 3D Printing in Healthcare and Its Impact on Sustainability
- Journal Article
- 18/10/2025
3D printing is transforming healthcare through personalised devices, surgical precision and faster prototyping while advancing sustainability. On-demand production reduces waste, supports circular economy models and lowers carbon footprints by minimising transport and inventory. Despite its promise,...
READ MORE

synthetic health records, synthetic medical data, AI in healthcare, GANs, diffusion models, synthetic clinical text, patient privacy, longitudinal EHR, deep learning healthcare, UK digital medicine Explore how synthetic health records and AI models empower privacy-respecting, data-rich healthcare innovation.

Unlocking Data Potential with Synthetic Health Records

References:

Latest Articles

Related Articles

Latest News

INFO

IMAGING

ICU

EXEC

IT

CARDIOLOGY

JOURNALS

EVENTS

FACULTY

PARTNERS

JOBS

COMPANIES

PRODUCTS

BLOG

VIDEOS

Communities

CONTACT US

EU Office

Rue Villain XIV 53-55

B-1050 Brussels, Belgium

Tel: +357 86 870 007

E-mail: [email protected]

EMEA & ROW Office

166, Agias Filaxeos

CY-3083, Limassol, Cyprus

Tel: +357 86 870 007

E-mail: [email protected]

Headquarters

Kosta Ourani, 5

Petoussis Court, 5th floor

CY-3085 Limassol, Cyprus

E-mail: [email protected]