Underdiagnosis of Alzheimer’s disease remains a persistent challenge, with gaps particularly visible across racial and ethnic groups. Timely identification is important for access to support, planning and treatment, yet routine clinical data often fail to capture many cases. Using electronic health records from a large academic health system, researchers developed a semi-supervised positive unlabeled learning approach designed to surface undiagnosed cases while addressing group-level disparities. The framework combines probabilistic selection of reliable negatives, race-specific pseudo-labelling and fairness-aware thresholding to improve detection across non-Hispanic white, non-Hispanic African American, Hispanic Latino and East Asian populations. Trained on data from over one hundred thousand eligible patients, the model aimed to improve sensitivity and calibration while maintaining fair performance across groups, and its predictions were further examined through genetic signals associated with disease risk.
Semi-Supervised Design and Diverse Cohort
Electronic health records were filtered by age and data completeness to form a cohort exceeding 115,000 eligible patients, with a non-overlapping subset reserved for validation that included genotype information. Recorded prevalence of diagnosed Alzheimer’s was lower than estimates expected from longitudinal projections for all racial and ethnic groups, indicating likely underdiagnosis in routine care. Among labelled cases, women were about twice as prevalent as men, and labelled positives had longer records, more encounters and more diagnoses than unlabeled patients, reflecting differences in healthcare utilisation.
The semi-supervised framework proceeded in stages. First, a generalised linear model was used to identify reliable negatives among unlabeled patients through a probabilistic gap criterion. Second, pre-processing bias mitigation assigned additional positive and negative pseudo-labels within each racial and ethnic group using race-specific thresholds aligned to expected prevalence signals. Third, a non-linear classifier was trained on the combination of labelled and pseudo-labelled data to leverage a broader pool of informative cases. Finally, post-processing bias mitigation set group-specific decision cut-offs by optimising group benefit equality, aligning predicted positive rates with validated indicators in each population. This sequence substantially expanded the effective training set compared with supervised baselines that rely only on confirmed diagnoses.
Performance, Calibration and Robustness
Across racial and ethnic groups, the semi-supervised approach achieved stronger discrimination than supervised baselines that predicted far fewer positives. Gains were most evident in sensitivity and precision-recall performance, addressing a core limitation of supervised models that missed many likely cases. Area under the receiver operating characteristic curve exceeded 0.9, and balanced accuracy was higher than in comparison models. While some supervised settings reached higher precision by labelling very few patients as positive, this came with markedly lower sensitivity and weaker alignment with validated prevalence signals.
Must read: EHR Interoperability: Levels, Standards and Practical Paths
Calibration analyses showed that predicted probabilities from the semi-supervised model were more reliable than those from baselines. Lower Brier scores and concentration of proxy-validated positives at higher predicted probabilities indicated improved probability estimates that better reflect clinical likelihood. The approach remained stable in sensitivity analyses that varied the proxy definitions used for validation. Performance was largely unchanged when specific proxy subsets were excluded, and only removal of proxies that uniquely validated a large share of cases produced notable declines. This pattern suggests the improvements did not hinge on narrow proxy choices but reflected broader gains from leveraging unlabeled data and fairness-aware thresholding.
Fairness, Feature Signals and Genetic Support
Fairness was evaluated by comparing discrimination metrics between unprivileged groups and the privileged group, then aggregating differences into parity losses. The semi-supervised approach achieved the lowest cumulative parity loss among all models considered. Using group benefit equality to set cut-offs equalised predicted positive rates with validated signals across populations and further reduced disparities relative to strategies based on a single overall threshold. Sensitivity analyses that recoded race and ethnicity while holding other features constant produced minimal shifts in sensitivity, indicating reduced dependence on race and ethnicity as direct predictors and supporting the contribution of the bias mitigation steps.
Model interpretability highlighted clinically coherent signals. Among the most influential features were neurological or mental health indicators such as memory loss and altered mental status, alongside healthcare utilisation measures including record density, record length, age at last visit, number of encounters and number of diagnoses. Non-neurological features also appeared among the top predictors, including decubitus ulcer and screening for malignant neoplasms. Factor analysis showed labelled positives and predicted positives clustering apart from predicted negatives on these features, with a subset of proxy-validated predicted positives showing more influence from non-neurological patterns such as palpitations and immunological findings. Importantly, feature direction and magnitude were consistent across racial and ethnic groups.
Independent genetic validation supported the predictions. Polygenic risk scores for Alzheimer’s were higher in labelled positives and predicted positives than in predicted negatives overall and within several racial and ethnic groups. Apolipoprotein E ε4 allele counts were also higher for labelled and predicted positives compared with predicted negatives in certain populations. These findings indicate biological plausibility for labels inferred from electronic health record data without imaging inputs.
A semi-supervised positive unlabeled learning framework with pre- and post-processing bias mitigation improved the detection of undiagnosed Alzheimer’s in routine electronic health records. Compared with supervised approaches, it delivered higher sensitivity, stronger precision-recall performance and better calibration while reducing disparities between racial and ethnic groups. Interpretability analyses identified familiar neurological and utilisation features along with non-neurological signals, and genetic validation provided independent support for predicted labels. For clinicians, health system leaders and researchers, this approach illustrates how leveraging unlabeled data and fairness-aware thresholding can enhance equitable case finding at scale using existing records.
Source: npj digital medicine
Image Credit: iStock