Clinical decision-making often relies on round-number thresholds that simplify complex, continuous risks into discrete treatment triggers. Evidence shows these cut-offs can distort risk estimates and create unintended patterns in outcomes, including sudden jumps or paradoxical dips around memorable values. An interpretable machine learning approach examines population risk shapes to reveal where threshold-led practice may misalign with underlying biological risk. Analyses across simulated settings, a historical pneumonia dataset and three decades of intensive care data indicate both improvements and persistent artefacts, highlighting implications for bedside care, risk modelling and protocol design.  

 

Thresholds Shape Observable Risk 

Round-number thresholds are embedded in widely used tools such as APACHE II and SOFA, for example the serum creatinine cut-off at 3.5 mg/dL. While convenient, such points can shift treatment away from statistically optimal cut-offs and confer excess risk to patients just below or above the chosen boundary. Rather than fully deconfounding observational data, the analysis treats threshold-induced behaviour as a signal: abrupt changes or counterintuitive curves in the observed risk can indicate where treatment decisions are tied to specific numbers rather than continuous need. In practice, this yields two recurring artefacts in population risk: discontinuities at thresholds and counter-causal non-monotonicities where risk unexpectedly falls in high-biomarker ranges.  

 

Must Read: Decision Support Enhances Fall Risk Consultations 

 

In a historical pneumonia cohort, generalised additive models (GAMs) estimated mortality risk as additive, high-resolution component functions. Discontinuities appeared for blood urea nitrogen (BUN), with a rapid risk rise from 30–40 mg/dL then a plateau to 80 mg/dL and for systolic blood pressure with a sharp risk increase below 80 mmHg. Sex differences suggested stronger threshold adherence among male patients around BUN 40 mg/dL. Counter-causal patterns were also prominent: mortality risk was lower for patients with serum creatinine above 5 mg/dL, and for those with chronic comorbidities such as a history of chest pain, asthma or chronic lung disease, despite their higher intrinsic risk. The analysis reported mortality odds reductions of more than 30% for prior chest pain, 20% for asthma and 18% for chronic lung disease, underscoring how effective, routine treatments can depress observed risk in groups that would otherwise be high risk.  

 

Glass-Box Modelling and Automated Tests 

To detect these shapes systematically, the approach combines tree-based GAMs with two statistical tests. First, a discontinuity test compares the likelihood of the observed component function against a locally linearised version around each threshold, flagging jump-like behaviour consistent with protocol cut-offs. Second, a changepoint test on the signs of non-zero slopes identifies non-monotonic regions, with emphasis on concave segments more likely to reflect treatment effects than healthy plateaus. Boosted trees provide accurate, high-resolution feature functions, natively capture step changes and support bootstrap intervals for uncertainty. This glass-box orientation favours transparency and enables clinicians and guideline developers to reason about where discrete rules may be misaligned with continuous risk.  

 

Applying the same lens to longitudinal intensive care datasets shows how practice has evolved. In earlier data, BUN was associated with a steep risk rise between 20–35 mg/dL followed by a long plateau, over time this impact diminished, consistent with refined multi-factor decision-making. Age-related jumps at round numbers such as 50, 55 and 60 years smoothed in later cohorts, although increased risk persisted at 80 years. These patterns suggest that moving from single-number triggers toward multifactor evaluation can flatten risk discontinuities without abandoning practical scoring systems.  

 

Persistent Paradoxes Around Treatment Thresholds 

Not all artefacts receded. Across three intensive care datasets, mortality risk rose outside the healthy sodium range of 135–145 mEq/L but then unexpectedly declined beyond 150 mEq/L, a level indicating moderate hypernatraemia. Recorded care patterns showed increased water and electrolyte replacement at very high sodium, which likely reduced risk for those patients relative to those with moderately elevated values. A similar paradox strengthened for creatinine: in recent data, mortality risk peaked at moderate levels around 3–5 mg/dL and decreased at very high levels. Treatment records indicated sharp increases in continuous renal replacement therapy (CRRT) at both 3 mg/dL and 6 mg/dL, aligning with the observed risk reduction among patients with severe elevation. Together, these findings illustrate how aggressive interventions for the sickest patients can invert risk gradients, inadvertently deprioritising patients with moderate abnormalities who may benefit from earlier intervention.  

 

These insights carry implications for AI. Models trained on observational data can conflate lower observed risk, achieved through routine effective treatment, with intrinsically low risk. If uncorrected, they may underestimate risk in comorbid groups and divert care away from those who need it most. Interpretable, component-wise models with explicit tests for discontinuities and non-monotonicities help reveal where thresholds and treatment patterns shape the data, informing safer model design and more nuanced protocol refinement. Limitations remain, including reliance on intensive care cohorts, simplified depictions of treatment decisions and sparse data in some predictor regions, uncertainty estimates help guard against over-interpretation of noisy segments.  

 

Round-number thresholds aid consistency and speed but can introduce hidden risk by breaking the continuity of clinical decision-making. Interpretable modelling that exposes discontinuities and counter-causal curves offers a pragmatic way to locate these pressure points in real-world data, relate them to treatment behaviour and highlight opportunities to re-tune protocols. For clinicians, this supports earlier or more proportionate intervention around moderate abnormalities. For those deploying AI, it argues for models that surface and adjust for threshold-driven artefacts rather than silently encoding them. The direction of travel is toward dynamic reassessment of cut-offs while retaining practicality, aligning observable risk more closely with biological risk to improve outcomes.  

 

Source: npj digital medicine 

Image Credit: iStock


References:

Lengerich BJ, Caruana R, Nunnally ME et al. (2025) The hidden risk of round numbers and sharp thresholds in clinical practice. npj Digit Med; 8, 711. 



Latest Articles

clinical decision thresholds, round-number bias, healthcare AI risk, interpretable machine learning in medicine, GAM modelling, ICU data analysis, treatment artefacts, patient risk prediction Hidden risks in round-number clinical thresholds revealed by interpretable machine learning, showing how cut-offs distort real risk and treatment outcomes.