NLP ‘Barometer’ for Analysing Local Clinical Data

NLP ‘Barometer’ for Analysing Local Clinical Data
share Share

A vendor-neutral, open-source, natural language approach of EHR data is suggested by a team of U.K. researchers, to help providers plan for surges in patients, particularly in the COVID-19 context.


You might also like: The WHO has developed a number of surge planning tools, which help to visualise acute and intensive care capacity needs over time, identify the timing and severity of the peak of the outbreak, and plan human resources for health systems.Learn more


While natural language approach, mainly to social media content, has been previously used for epidemiologic forecasting, this proved to be susceptible to distortions and keyword spamming in the uncontrolled online environment. A new study (Teo et al. 2021) reports on the results from a different approach. Instead of analysing publicly available data, the researchers focussed on private health data lakes, i.e. unstructured, freetext data from EHR systems of two large U.K. hospitals.


The data from both systems were pooled in two separate data lakes using the CogStack platform and treated as “bags of words” to identify symptom keywords and phrases suggestive of COVID-19, such as ‘ ‘dry cough’, ‘pyrexia’, ‘fever’, ‘dyspnoea’, ‘anosmia’, etc. (to avoid the distortion due to the ‘hashtag effect’, the word ‘COVID’ was excluded from the index of signal). The results showed that these signals closely tracked the gold-standard data of tests of Covid-19 positivity (nasal swab PCR) in both hospitals, with up to four-day head-start.


Another finding indicated that as the pandemic developed and providers, through general media, became more aware of specific COVID-19 symptoms, such as anosmia, the incidence of relevant phrases increased, affected by recall bias. The authors point out that this phenomenon should be accounted for when deploying their approach.


The GogStack platform is open-source on Github and available for any healthcare organisation. The researchers note that its implementation is low-cost, flexible, EHR-vendor-neutral and does not interfere with clinical routines. Despite the scope of the current study being limited to closed health data lakes, the authors emphasise the scalability of their approach to regional and even national level for short-term patient surge forecasting.


Image credit: Teo et al. (2021)

«« What’s New in Medical Devices

VR for Side Symptom Alleviation »»


Teo JTH et al. (2021) Real-time clinician text feeds from electronic health records. npj Digit. Med., 4(35).


Teo J (2021) What's trending in your electronic health record feed? Published 24 February. Npj Digital Medicine. Available from

Published on : Sat, 27 Feb 2021

Related Articles
NLP+EHR: Automatic Detection of Adverse Drug Reactions

A group of Spanish researchers present the results of their work on applying natural language processing (NLP) techniques... Read more

COVID-19 and Telehealth

A massive cohort study of over 36 million U.S. patients looks into how ambulatory care patterns changed considering the... Read more

COVID-19 Remote Monitoring: How Safe Is It?

A new study from the University of Toronto presents the results of implementing a multidisciplinary, family medicine-led... Read more

EHR, data lakes, natural language processing (NLP) system, COVID-19, patient surge, hospital planning NLP ‘Barometer’ for Analysing Local Clinical Data

No comment

Please login to leave a comment...