Patients with acute respiratory failure (ARF) who fail non-invasive ventilation (NIV) and subsequently require intubation face a significantly elevated risk of death. No formal clinical guidelines currently exist to assist clinicians in identifying which patients are most likely to fail NIV in a timely manner. Existing clinical scoring tools, including the HACOR score, the Updated HACOR score, and the ROX index, have limitations: their optimal cut-off values are uncertain, their discriminative power varies across patient populations and disease types, and they have commonly excluded important patient subgroups such as those with hypercapnic respiratory failure due to chronic obstructive pulmonary disease (COPD) exacerbation or obesity hypoventilation syndrome (OHS). Previously used machine learning models have also been restricted to patients with de novo acute hypoxaemic respiratory failure, and none have been evaluated for practical useability in a real-world hospital environment.

 

To address these gaps, researchers have developed NIVPredict, a web-based artificial intelligence tool designed to predict NIV outcome within two hours of treatment initiation across a broad and clinically diverse patient cohort. The tool is built on the Tabular Prior-Data Fitted Network (TabPFN) machine learning model. Unlike many machine learning algorithms requiring large datasets and repeated retraining, TabPFN can generate predictions by leveraging knowledge acquired from thousands of synthetic tasks during pretraining, reducing computational burden and the risk of overfitting. The tool was implemented as a browser-accessible application, deployable on smartphones, tablets, or laptops, and produces an immediate prediction of NIV success or failure accompanied by a calibrated confidence score.

 

The study collected data spanning 38 hospitals across four countries (United Kingdom, Italy, Spain, and Brazil), supplemented by data from the publicly available MIMIC-IV database from the United States. NIV failure was defined as the need for endotracheal intubation or death within seven days of NIV initiation. Physiological measurements were collected at two time points: T0 (baseline, within six hours prior to NIV initiation) and T1 (one to two hours after NIV initiation). The model was trained using data from 665 ARF patients drawn from the RENOVATE randomised clinical trial in Brazil, covering hypoxaemic ARF (including COVID-19 and immunocompromised patients) and hypercapnic ARF due to COPD exacerbation. External validation was conducted on a separate dataset of 422 patients from Italy, Spain, and the United States.

 

The practical useability of NIVPredict was evaluated by clinicians at the University Hospital of North Midlands NHS Trust (UHNM) in the U.K. Over this period, the tool was applied to data from 57 eligible ARF patients receiving NIV in both ward and ICU settings (42 NIV successes versus 15 NIV failures), with aetiologies primarily comprising COPD, community-acquired pneumonia, sepsis, and OHS.

 

NIVPredict consistently outperformed all conventional clinical indices across every validation setting. In internal repeated 5-fold cross-validation, the tool achieved a balanced accuracy of 78.9%, compared to 0.717 and 68.7% for the best-performing clinical index (Updated HACOR score). In external multi-centre validation, NIVPredict attained a balanced accuracy of 74.5%, versus 0.709 and 63.7% for the updated HACOR score. Calibration was strong, and decision curve analysis confirmed a greater net clinical benefit across a wide range of decision thresholds compared to conventional strategies.

 

Performance was highest during in-hospital testing at UHNM, where NIVPredict achieved an accuracy of 84.2%, sensitivity of 86.7%, specificity of 83.3%, and an AUC of 0.858, with an excellent Brier score of 0.093. When predictions were restricted to cases where the tool’s confidence score exceeded 60% (51 of 57 patients), accuracy increased further to 90.2%. In marked contrast, the HACOR and Updated HACOR scores both performed poorly at UHNM, where patients were predominantly suffering from COPD or OHS with balanced accuracies of only 67.6% and 65.0%, respectively.

 

The authors attribute the superiority of NIVPredict over existing indices to its use of temporal physiological trajectories rather than static single time-point measurements. The tool is framed as a decision-support aid rather than a prescriptive instrument; clinicians retain full responsibility for treatment decisions, and no specific risk thresholds are proposed by the authors.

 

Overall, NIVPredict represents a significant advance in the clinical prediction of NIV outcomes across a broad range of ARF aetiologies, including both hypoxaemic and hypercapnic respiratory failure. Using only routinely collected measurements taken before and within two hours of NIV initiation, the tool demonstrated robust and accurate predictions that substantially outperformed all currently available threshold-based clinical scores and indices in all settings tested. Its practical usability was confirmed through direct in-hospital testing by clinical staff. The authors conclude that these results provide a strong rationale for future prospective, multicentre studies to assess the capacity of NIVPredict to enhance clinical decision-making and ultimately improve patient outcomes.

 

Source: Critical Care

Image Credit: iStock

 




Latest Articles

AI, non-invasive ventilation, NIV, acute respiratory failure, NIVPredict NIVPredict: AI Tool for Early Prediction of NIV Outcome in Acute Respiratory Failure