Depressive symptoms remain a major mental health concern, and earlier detection could support more timely intervention. A model using wearable device data and explainable artificial intelligence assessed depression risk from physiological, behavioural and health-related information collected in daily life. The approach used smartwatch and smartphone application data, including sleep, physical activity, heart rate, digital behaviour and questionnaire responses. Data from adults aged 55 to 85 years in Wonju, South Korea, were used to classify depressive symptoms according to PHQ-9 scores. Ensemble learning models identified influential predictors, while hybrid deep learning models combined neural network feature extraction with LightGBM classification. Models trained on selected important features performed better than those using the full feature set, with sleep-related variables among the strongest predictors.

 

Wearable Data Build a Broader Risk Profile

The model drew on a community-based cohort of middle-aged and older adults who provided wearable and smartphone data over a three-year period. Of 685 enrolled participants, 228 who continuously provided data were included in the analysis. Smartwatches captured step activity, heart rate and sleep states, while a synchronised smartphone application recorded text message counts, call logs, smartphone usage patterns and message content. Additional ecological momentary assessments included a daily mood questionnaire, weekly reports of significant stress events and monthly depression assessments. Structured survey data added information on health conditions, employment, income and family history of physical and mental illness.

 

Must Read: Standalone Mental Health Apps Improve Symptoms

 

Depressive symptoms were measured with PHQ-9. Scores from 0 to 4 defined the non-depressive group, while scores of 5 or higher defined the depressive group. On that basis, 129 participants were categorised as non-depressed and 99 as potentially depressed. The dataset included 61 features for training. To prepare the data, values were normalised to a fixed range between zero and one. Monthly averages of physiological indicators were paired with corresponding PHQ-9 scores, producing one sample per month. Samples with missing values were excluded, and adjacent months were removed to reduce temporal dependency. This process yielded 1,582 monthly samples, including 1,105 control samples and 477 samples with elevated depressive symptoms. SMOTE was then applied to reduce class imbalance in the training set.

 

Explainable AI Identifies the Strongest Signals

Three ensemble learning models, Random Forest, XGBoost and LightGBM, were used to identify the most important features associated with depressive symptoms. SHAP and LIME were applied to improve interpretability and quantify how individual variables influenced prediction. SHAP analysis showed that night sleep time consistently ranked among the most influential features across models. Day sleep time, religion activity, age, income responsibility and diabetes also appeared repeatedly among the strongest predictors. In the LightGBM model, night sleep time was the leading feature, while in the Random Forest model day sleep time ranked highest.

 

The SHAP beeswarm analysis indicated that shorter night sleep time generally corresponded to a higher likelihood of depressive outcomes. Participation in social behaviours such as social club activity was linked to lower predicted risk. Waterfall plots added an individual-level view and showed that night sleep time consistently lowered predicted depression probability. Religion activity, income responsibility and night deep time also reduced prediction scores in some cases. By contrast, step activity, diabetes and stroke were associated with higher prediction scores.

 

LIME analysis reinforced those patterns at the local level. For individuals classified as control, night sleep time, total sleep time and minimum heart rate were consistently associated with predictions towards the non-depressive class. For individuals with higher predicted probabilities of depression, welfare basic living, stroke, diabetes and barrier no time contributed positively to the model output. Support for adult children and welfare disability also appeared as influential factors in one example. A correlation matrix of the selected top features showed generally low correlation, with the highest absolute correlation around 0.77.

 

Hybrid Models Deliver Higher Accuracy with Fewer Features

Two hybrid architectures were developed for classification: 1D-CNN plus LightGBM and MLP plus LightGBM. In both cases, neural networks acted as feature extractors and LightGBM performed the final classification. The MLP branch used fully connected dense layers with dropout regularisation, while the 1D-CNN branch used four convolutional layers followed by pooling and a fully connected layer. Hyperparameter optimisation used grid search for the machine learning models, while the deep learning models used Adam, a learning rate of 0.001, a batch size of 64 and 40 training epochs.

 

Performance was evaluated with accuracy, precision, recall, F1-score and AUC. Models trained on selected important features consistently outperformed those trained on all available features. The 1D-CNN plus LightGBM model achieved an accuracy of 0.9180 with all features, rising to 0.9243 when trained on LightGBM-selected features. The MLP plus LightGBM model improved from 0.8991 with all features to 0.9306 when trained on XGBoost-selected features. That configuration also delivered the highest precision at 0.9205 and the highest F1-score at 0.8804. The highest AUC, however, was 0.9594 for MLP plus LightGBM trained on LightGBM-selected features. These results showed that targeted feature selection improved predictive performance, learning efficiency and interpretability.

 

The combination of wearable-derived behavioural and physiological data with explainable and hybrid AI models improved depression risk prediction in this cohort. Sleep-related variables had a central role across analyses, especially night sleep time, while socioeconomic and clinical variables also shaped prediction outcomes. Models built on selected important features performed better than models using the full feature set, with MLP plus LightGBM trained on XGBoost-selected features reaching the highest classification accuracy of 93.43%. Several limitations remained, including a single regional cohort, dependence on participant compliance with wearable use, reliance on structured tabular data and the use of self-reported PHQ-9 scores rather than clinician-administered diagnostic interviews.

 

Source: Journal of Medical Systems

Image Credit: iStock


References:

Ko J, Oh S, Enkhbayar D et al. (2026) Interpretable Feature Selection and Hybrid Deep Learning Models for Depressive Symptoms Prediction from Wearable Device Data. J Med Syst; 50, 26. 




Latest Articles

wearable data AI, depression prediction, mental health AI, PHQ-9 analysis, sleep data depression, LightGBM model, explainable AI healthcare Wearable data and AI enhance depression risk prediction using sleep, activity and behaviour insights, enabling earlier detection and personalised care.