HealthManagement, Volume 22 - Issue 4, 2022


The low quality of health data is at the core of why we are not making the most of data science. However, new ways of dealing with acquiring those data can overcome this situation.

Key Points

  • High-quality health data are essential for healthcare delivery, health research and innovation.
  • However, two key problems must be addressed: one concerning electronic health records access and interoperability, and second, the use of continuous private data generated from individuals.
  • The second problem can be addressed by creating a chain of trust for this data and ensuring its efficient transfer from the individual to data users with security and compliance with quality standards.
  • It is important to recognise the importance of personal health data and to explore new ways to guarantee secure and reliable access.

Not so long ago, someone told us data was the new gold. And they were damn right.

Just like gold, data has raised great expectations. Data analysts are becoming the new gold seekers. We expect them to find all those accurate data-driven decisions and predictions that will dramatically transform our enterprises and lives. We believe that the picture will be perfectly clear if we gather a sufficiently large amount of data. And well, maybe it is true, or at least partially true.

The possibilities that data analytics offers in healthcare are huge. Since health is one of our most valuable possessions, we are probably talking of diamonds instead of gold. But, just like diamonds, not all data collections have the same value. There is something every data analyst is conscious of: if you want to reach valuable conclusions, you need high-quality data. It’s not enough to have a great amount of data; it has to reach quality standards.

What do we mean by data quality? Data quality reflects how well data serve our purpose. Unfortunately, it means that collecting great volumes of data does not unavoidably lead us to great results. We have to ensure that our data shows accuracy, completeness, consistency, validity, uniqueness and timeliness. Despite big expectations, health data is nowadays lacking in virtually all of them. Of course, data can be cleaned out, but we must be aware that cleaning methods often have a negative impact on our results and may distort our conclusions.

Data quality is at the core of why health data are so difficult to deal with, and before we can sit and get game-changing solutions from them, we must face at least two problems.

The first problem is clinical records. Electronic Health Records (EHR) are the main source of health data. But they provide a heterogeneous bunch of structured and non-structured data from many different sources with many different formats since the use of customised EHR systems is frequent despite the CBHC 2011 Directive. Yet interoperability is critical if we want to reach next-level objectives. Seeking this interoperability, initiatives like the MyHealth@EU platform or Fast Healthcare Interoperability Resources (HL7 FHIR), an open source standards framework for healthcare data, aim to facilitate the transformation from one system to another:

Medicare’s Blue Button 2.0 is HLFHIR’s best-known use case. The recent publication of the Proposal for a Regulation of the European Parliament and the Council on the European Health Data Space (EHDS) aims to ensure a solid framework that supports health data interoperability through the EU. The European Council recognises the urgency of this task and has decided to prioritise the construction of this EHDS. Much work is done, but data integration is undoubtedly a difficult task in this scenario.

The second and main problem is that EHR can only account for a small part of our needs. High-value data originating in patients’ daily life are very difficult to collect. Those data that provide information about health behaviours, lifestyle, and socioeconomic and environmental factors, are mostly out of reach. However, their impact on health outcomes seems to be out of the question, not only for the well-known Social Determinants of Health (SDOH), defined by the Word Health Organization as “the conditions in which people are born, grow, live, work and age” but also for a large amount of data generated from the individual. Without this data, the picture is painfully incomplete.

Factors that make reliable personal data difficult to obtain are various, but one of them is the mistrust of public opinion about its use. The idea of indelible labels that may compromise a future hiring or a health insurance contract will surely discourage people from sharing that data. And this mistrust can only grow over time, limiting an analyst’s access to these sources.

Instead, if you have a logistics business, it is quite easy to collect data from your trucks: data concerning fuel levels, geolocalisation, truck speed etc., are readily available. With health data, the scenario is quite different: you cannot think of placing geolocators in people with multiple sclerosis or somehow manage to collect their grocery bills. Instead, you have to ask people with multiple sclerosis to share their data: where they live, work, pets they have, what brand of shampoo they use and how many beers they drink in a week. They have all the information you need - with a little help, they can give it to you with the right standards.

At this time, many paths open up before us. We can, of course, forget about this data or try to reach them, assuming their bad quality. Or we can engage in a direct dialogue with the data providers, the only ones that can guarantee its quality from origin: citizens.

One possible solution to our problem is to create a chain of trust for this information. This way, information will be transferred, fulfilling all required standards, to a trusted intermediary, who will be in charge of data custody. This entity can only be one of public trust and would be in charge, not only for guaranteeing data quality and availability for users but on the provider’s side, will guarantee its security and terms of use.

People would deposit their personal data just like they deposit money, with complete confidence that the day they want it back (or make a transfer to another entity), it will be returned easily and unchanged. Users would acquire data packages from those entities; this way, data deposits would give a profit to providers, and users would have the high-quality data they need.

Because human behaviour only changes in appearance, the actual data market for personal information recalls the trading of bagatelles for gold by some distant conquerors. The story is not one to be proud of. The data provider must know he is not losing control over his data. This probably includes, among other questions, the certainty that the transfer of information is not permanent and that it can be revoked at any time.

Let’s move on: if data is the new gold, are we ready to speak with their owners?


High-quality health data are essential if we want to support healthcare delivery and promote health research and innovation. With that objective in mind, we face two problems: the first one, concerning electronic health records access and interoperability, is under the scope of the European Health Data Space. The second one, concerning the use of continuous private data generated from individuals (SDOH and others), is far from being solved. We propose a new approach to the second challenge, creating a chain of trust for this data so that they can be efficiently transferred from the individual to data users with security and, at the same time, guarantee compliance with quality standards. This entity in charge of data custody and transfer can only be one of public trust and would act as an intermediary between data providers and data users, maybe even managing revenues obtained from data sharing.

The time has come to recognise the importance of personal health data and explore new ways to guarantee secure and reliable access to them with high-quality standards.

Conflict of Interest