Federated health data systems allow institutions to keep data within their own environments while still supporting cross-site querying and analysis. This model helps protect data sovereignty and confidentiality, but it also makes data quality assessment more difficult. In these settings, central access to raw data is often not possible, even though reliable quality metrics remain essential for secondary use in research, healthcare delivery and public health policy. A proof-of-concept framework addresses this tension by combining federated data quality checks with differential privacy. The approach enables individual nodes to compute local metrics and share only privacy-protected results, creating a way to assess quality without exposing sensitive records. The implementation was tested on a synthetic dataset designed to resemble pseudonymised health data in HL7 FHIR format. It included 1,000 Patient resources and 10,000 Specimen resources, with checks focused on six dimensions that can be evaluated objectively and automatically: accuracy, completeness, consistency, timeliness, validity and uniqueness.
How the Framework Measures Data Quality
The framework uses a predefined library of nine checks that reflect common data problems and can be applied across multiple datasets. These checks include incompatible diagnosis and gender combinations, implausible birth dates, missing gender information, missing conditions, unsupported gender values, outdated records, invalid ICD-10 codes and duplicate patient identifiers. One additional check measures survival rate by gender, introducing stratification into the reporting process.
Each check is defined conceptually and then implemented through Clinical Quality Language or Java functions using the FHIR API. The architecture supports editing and adding new CQL-based checks when the target FHIR store can execute them, which allows some flexibility beyond the initial proof-of-concept. Both raw and obfuscated results are stored locally, but only the privacy-protected values are made visible externally. This separation supports local monitoring and troubleshooting while preventing disclosure of sensitive information outside the node.
The framework evaluates utility by comparing obfuscated outputs with the corresponding raw values. In the proof-of-concept, most checks used a privacy budget of ε = 0.2, while the stratified check used ε = 0.3. The total report budget was capped at ε = 2.0. All checks were designed at patient level, so the sensitivity parameter was set to 1, reflecting the assumption that each patient contributes to a given metric no more than once. Where more granular checks are needed, such as sample-level assessment, that sensitivity would need to increase to reflect multiple contributions per person.
Differential Privacy and the Reporting Process
Differential privacy protects against reidentification by adding calibrated noise to results before they are disclosed. This matters even when only aggregated metrics are shared, because rare diagnoses, unusual attribute combinations or very small counts can still create disclosure risks. In the framework, each node computes its own quality metrics locally and then applies differential privacy before publication. This makes it possible to expose semi-public quality summaries while keeping individual-level information protected.
Must Read: Federated Ultrasound AI Matches Experts and Protects Privacy
The reporting process becomes more complex when multiple checks or stratified outputs are combined. Each additional query consumes part of the total privacy budget, and stratification increases that cost further because the budget must be divided across strata. As a result, higher granularity improves interpretability in one sense but reduces precision in another, because each result carries more noise. Small subgroups are especially vulnerable to this effect. Rare conditions in younger populations or limited numbers of patients over 90 years of age can produce outputs where privacy protection has a disproportionate impact on usability.
The framework addresses this through controlled allocation of privacy budgets per check. A demonstration report used a total budget of ε = 2.00, of which ε = 1.70 was consumed, with one stratified check assigned ε = 0.30 and the others ε = 0.20. The interface then presents results in a configurable dashboard. Warning and error thresholds provide visual interpretation of quality levels, with green for acceptable values, yellow for warning and red for error. By default, CQL-based checks use a 10% warning threshold and a 30% error threshold, though these settings can be adjusted for individual checks.
Results, Architecture and Current Limits
The proof-of-concept produced privacy-preserving quality reports from a synthetic dataset generated with deliberate data quality errors. Several outputs remained close to the raw values, showing that meaningful signals can survive the addition of privacy noise. Missing gender information measured 3.70% in the raw data and 4.90% after obfuscation. Missing diagnoses changed from 20.00% to 20.10%, unsupported gender values from 9.60% to 10.80%, duplicate identifiers from 9.70% to 9.60%, and invalid ICD-10 codes from 96.90% to 96.80%. Very small values showed larger relative distortion, such as incompatible diagnoses shifting from 0.10% to 0.70%, though this still remained interpretable as affecting less than 1% of patients. Survival rate by gender changed from 27.60% to 26.30% for female patients and from 30.90% to 32.00% for male patients.
The operational model relies on two main components. The Data Quality Metrics Agent runs locally at each node, executes checks and interprets results without exposing sensitive information outside the institution. The Data Quality Metrics Server aggregates and visualises the obfuscated outputs centrally through a configurable dashboard. This server-side component remains separate from the core federated search or analysis interface, which supports integration into different architectures. A pilot deployment is underway within the federated search system of BBMRI-ERIC.
Four key limitations remain. The current implementation depends on predefined checks linked to a specific data model, which limits portability. It does not yet support custom user-defined checks on demand. The privacy budget calculation uses conservative sequential composition, which can add more obfuscation than necessary when correlations between stratifications are known. Small datasets also remain problematic, because even modest noise can make ratios hard to interpret. For this reason, differential privacy is recommended only above a minimum cohort size such as 30, or after aggregation across sites or time periods.
The proof-of-concept shows that federated data quality assessment can be combined with differential privacy to deliver useful quality metrics without exposing raw health data. Local execution, configurable privacy budgets and central visualisation create a practical model for semi-public reporting across distributed health data networks. Results from the synthetic FHIR dataset indicate that many metrics remain interpretable after obfuscation, even though low counts and small cohorts still present challenges. Wider adoption will depend on more reusable checks, broader support for different data models and mechanisms for dynamic validation. Even in its current form, the framework offers a concrete path towards privacy-preserving quality evaluation in federated health data environments.
Source: BMC Medical Informatics and Decision Making
Image Credit: iStock