Physician fatigue poses a serious risk to decision-making quality in emergency departments, where high stakes and intense workloads are routine. Traditionally, fatigue has been measured indirectly through shift length, overnight duty or work frequency, which do not fully capture the nuanced, real-time condition of physicians during patient encounters. A new approach proposes using clinical notes as a source of insight into physician fatigue, capitalising on the linguistic patterns that emerge under cognitive stress. By applying machine learning techniques to these notes, researchers aim to create a more precise and actionable measure of fatigue, potentially improving both clinician well-being and patient outcomes. 

 

Quantifying Fatigue Using Textual Signals 

Using data from over 129,000 emergency department visits at a single academic medical centre, researchers developed a machine learning model to classify physician notes based on prior workload. Specifically, physicians who had worked at least four of the previous seven days were considered “high-workload” and presumably fatigued. The model was trained on a balanced dataset using features such as note length, readability, the predictability of words, and the frequency of specific linguistic elements including cognitive and affective word categories. 

 

Must Read: Addressing Physician Burnout: A Focus on Well-Being and Performance 

 

Interestingly, one of the strongest indicators of fatigue was the predictability of language, measured through the perplexity score of a fine-tuned language model. Lower perplexity—more predictable text—correlated with greater fatigue, suggesting that fatigued physicians rely on formulaic expressions, potentially reflecting decreased cognitive engagement. Other markers included reduced use of first-person pronouns and insight words, and an increased use of certainty terms and anger-related words. These patterns point to a measurable shift in linguistic behaviour under fatigue, forming the basis for an interpretable model. 

 

Validation Across Contexts and Correlation with Decision Quality 

To validate the model’s capacity to detect fatigue, researchers examined whether its predictions correlated with settings known to be fatiguing. Notes written during overnight shifts and those associated with high patient volumes on a single shift had significantly higher predicted fatigue scores, despite these variables not being part of the model’s training data. Additionally, greater variability in a physician’s recent shift start times—a measure of circadian disruption—was associated with increased predicted fatigue. 

 

Crucially, predicted fatigue was not only correlated with conditions of fatigue but also linked to decision-making outcomes. In cases where physicians opted to test for acute coronary syndrome, higher predicted fatigue was associated with a significant decrease in test yield. While coarse measures like total days worked showed no meaningful relationship with test outcomes, the fine-grained, note-based fatigue measure revealed a 19% decrease in diagnostic yield per standard deviation increase in predicted fatigue. This link between linguistic cues and clinical decision quality provides compelling support for the model’s utility. 

 

Implications for Large Language Models in Clinical Practice 

The study further examined the characteristics of clinical notes generated by large language models (LLMs). Since LLMs rely on next-word prediction, their generated notes tend to be highly predictable, mimicking the language patterns found in fatigued human-authored notes. When real physician notes and LLM-generated continuations were compared, the latter exhibited 74% higher predicted fatigue scores. This raises critical concerns about the potential quality and clinical value of AI-generated documentation, particularly if such notes mirror those written under fatigue. 

 

These findings highlight a dual challenge: ensuring that AI tools used to support documentation do not inadvertently introduce fatigue-like linguistic patterns, and recognising that automation may reduce the reflective thinking embedded in the act of note writing. As such, there is a need to consider more nuanced roles for LLMs in healthcare, such as assisting in information solicitation rather than full automation of documentation. Used appropriately, LLMs could enhance the note-writing process without undermining physician agency or the richness of clinical documentation. 

 

By analysing the text of clinical notes, a machine learning model can infer physician fatigue with greater precision than traditional metrics based on work schedules. This innovative method captures subtle but significant linguistic shifts associated with cognitive strain, offering a practical and interpretable way to monitor clinician well-being. The association between predicted fatigue and clinical decision outcomes underscores the model’s potential impact on both patient care and physician performance. At the same time, the study raises important questions about the role of LLMs in clinical documentation, particularly regarding their unintended resemblance to fatigued writing. This approach offers a promising direction for enhancing quality and safety through language-informed insights. 

 

Source: nature communications 

Image Credit: iStock


References:

Hsu CC, Obermeyer Z & Tan C (2025) A machine learning model using clinical notes to identify physician fatigue. Nat Commun, 16:5791.



Latest Articles

physician fatigue, clinical notes, machine learning, emergency medicine, AI in healthcare, cognitive strain, language patterns, LLMs, decision-making, healthcare AI Machine learning detects physician fatigue through linguistic patterns in clinical notes, improving care quality and clinician well-being.