Mental health is a significant public health concern with considerable economic impacts and loss of quality of life. Depression affects about 19.4 million Americans annually, while generalised anxiety disorder affects approximately 19.8 million people. Globally, mental health conditions are a major cause of reduced quality of life and contribute to rising "deaths of despair" linked to suicide, obesity, and opioid overdoses. Public health researchers and policymakers aim to respond to these issues, but current mental health monitoring methods are limited, relying on subjective surveys with poor temporal and regional resolution. Language-based assessments from social media, such as Twitter, have shown promise in predicting health trends more accurately and at finer geospatial scales.

 

Validated measures of depression and anxiety, assessed regularly at the county level, could revolutionise population mental health research by identifying clusters and causes of mental health deterioration. While not direct clinical measurements, these assessments offer valuable insights into population health trends. The study, published in Nature Digital Medicine, integrated recent advances to create a pipeline for language-based mental health assessments (LBMHAs), evaluating their reliability and validity compared to traditional surveys like Gallup's COVID-19 Panel. The results demonstrated that LBMHAs could reliably monitor mental health over different times and regions. The study provided an open-source toolkit for deriving mental health estimates, facilitating further research and public health interventions.

 

 

Dataset of Twitter Posts for Mental Health Analysis

The study introduces CTLB-19-20, an updated version of the County Tweet Lexical Bank, containing county-mapped Twitter data from 2019 to 2020. This dataset includes 2.7 billion posts from 2.6 million users, filtered to about 1 billion posts from 2.2 million users, retaining essential details like post date, user ID, text, and US county. Posts were filtered to include only original English content, removing retweets, hyperlinks, and duplicates, covering 1418 counties, representing ~92% of the US population.

 

To ensure reliable depression and anxiety measurements, users must post at least three times per week, and counties must have at least 200 unique users weekly. Posts from counties not meeting these criteria are aggregated into "super counties" for broader analysis. The final dataset is enhanced through linear interpolation for missing weeks and adjusted for 2020-specific trends by subtracting 2019 means.

 

Assessing Depression and Anxiety Levels Using Adapted Lexical Models and Word Frequencies

Depression and anxiety levels are calculated using adapted lexical models and Anscombe-transformed word frequencies. Post-stratification weighting and domain adaptation address biases in Twitter's demographic representation and differences between Facebook and Twitter language use.

Reliability of measurements was evaluated by ensuring sufficient user data per time period, with thresholds for user counts determining reliability scores. Convergent validity was tested by comparing language-based assessments to Gallup COVID-19 Panel data, showing alignment despite methodological differences. External validity was assessed through correlations with County Health Rankings data and analysing depression and anxiety score changes during major US events in 2020.

 

Comparing LBMHAs with Survey Data and Addressing Limitations

The results showed that language-based mental health assessments (LBMHAs) reflected national and county-level trends comparable to Gallup surveys and captured mental health changes corresponding to significant events in 2020. This was particularly evident during the COVID-19 pandemic, where a notable increase in depression and anxiety was observed nationwide. LBMHAs overcome the limitations of self-reported surveys by using natural, unedited communication behaviours, providing higher-resolution insights into mental health. Despite challenges like the inclusion of large geographic units and the presence of non-human users, the study's hierarchical aggregation methods minimised these effects.

 

Limitations include the study's focus on Twitter data from 2019–2020, requiring future validation for other years or platforms. While lexical models were used for their historical validation and efficiency, future work might benefit from transformer-based models for better handling semantic drift. This study highlights the potential of AI-based population assessments for real-time mental health monitoring, suggesting applications beyond population health to specific organisational settings. The ability to provide objective, high-resolution mental health data can aid in better resource allocation and understanding of depression and anxiety risk factors.

 

Source: Nature Digital Medicine

Image Credit: iStock

 




Latest Articles

mental health, depression, anxiety, Twitter analysis, public health, language-based assessments, mental health monitoring, population health, digital medicine, social media health trends Revolutionize mental health research with language-based assessments from Twitter. Discover new tools for accurate, high-resolution mental health monitoring.