An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department

A multi-modal AI system that predicts the risk of mortality, intubation, or admission to the intensive care unit among COVID-19 patients using chest X-ray images and clinical variables

Like Comment
Read the paper

Motivation

The spread of the coronavirus disease 2019 (COVID-19) has led to a surge in patients presenting to the emergency department with respiratory illness. This overload, on an already strained healthcare system, emphasizes the need for automated triage systems that can support decision-making by predicting the risk of patient deterioration.

Given the promise of digital health and the motivation to contribute to combating the global pandemic, we developed a prognostic system using artificial intelligence (AI) as presented in our recent paper, An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department”. An overview of the system is shown in Figure 1.

Overview of the AI system that assesses the patient’s risk of deterioration every time a chest X-ray image is collected in the ED. The system produces three types of outputs: (i) overall risk of deterioration within 24, 48, 72, and 96 hours using the multimodal average prediction of a deep neural network (COVID-GMIC) and a gradient boosting model (COVID-GBM), (ii) saliency maps for interpretability using COVID-GMIC, and (iIi) deterioration risk curves (DRC) using a modified version of the deep learning network (COVID-GMIC-DRC).
Figure 1. Overview of the AI system that assesses the patient’s risk of deterioration every time a chest X-ray image is collected in the ED. The system produces three types of outputs: (i) overall risk of deterioration within 24, 48, 72, and 96 hours using the multimodal average prediction of a deep neural network (COVID-GMIC) and a gradient boosting model (COVID-GBM), (ii) saliency maps for interpretability using COVID-GMIC, and (iIi) deterioration risk curves (DRC) using a modified version of the deep learning network (COVID-GMIC-DRC).

Multidisciplinary collaboration with clinical experts

To ensure that our proposed work is clinically meaningful, we collaborated closely with radiologists and front-line physicians at NYU Langone Health to define realistic tasks for our prognostic system, extract and curate the data from the hospital’s complex medical records, and specify a meaningful inclusion and exclusion criteria to preprocess the dataset. Based on the constraints of the available data and the day-to-day experience of the clinicians, we defined the risk for intubation, admission to the intensive care unit, or mortality as the prognostic system’s predicted outputs, at the time of patient assessment. 

In the emergency department, chest X-ray imaging is used as a first-line triage tool for patients who test positive for COVID-19. Compared to other imaging modalities, it is cheap and easy to obtain without incurring the risk of contaminating imaging suites. Other clinical variables, such as vital signs, laboratory test results, and patient demographics are also recorded. To learn from the diverse types of data in a multimodal manner, we used chest X-ray imaging along with the clinical variables that were recorded closest to the time of image acquisition as input data to our prognostic system. We developed this system rapidly as the data was being collected at NYU Langone Health between March 3, 2020 and May 13, 2020. To develop the system and perform hyperparameter tuning, we used a training set consisting of 5,617 chest X-ray images collected from 2,943 patients. To evaluate the performance of the system retrospectively, we used a test set consisting of  832 images collected from 718 patients. 

Multi-modal AI system using chest X-ray images and clinical variables

We processed the chest X-ray images using the Globally Aware Multiple Instance Classifier (GMIC) neural network architecture [1]. COVID-GMIC predicts the overall risk of deterioration within 24, 48, 72, and 96 hours, and computes saliency maps that highlight the regions of the image that most informed its predictions. As shown in Figure 2, COVID-GMIC utilizes the global network to generate four saliency maps that highlight the regions on the X-ray image that are predictive of the onset of adverse events within the four time windows. COVID-GMIC then applies a local network to extract fine-grained visual details from these regions. Finally, it employs a fusion module that aggregates information from both the global context and local details to make a holistic diagnosis. The predictions of COVID-GMIC are combined with predictions of a gradient boosting model [2] that learns from routinely collected clinical variables, referred to as COVID-GBM. The optimal weights assigned to the COVID-GMIC prediction in the COVID-GMIC and COVID-GBM ensemble were derived through optimizing the performance on the validation set (obtained from the folds of the Monte Carlo cross validation iterations).

Figure 2. Architecture of COVID-GMIC. 

Performance results on the test set and reader study 

Table 1 summarizes the key performance results. The multi-modal model ensemble of COVID-GMIC and COVID-GBM, denoted as ‘COVID-GMIC + COVID-GBM’, achieved the best performance across all time windows in terms of the area under receiver operating characteristic curve (AUC) and the area under the precision recall curve (PRAUC), except for the PR AUC in the 96 hours task. 

Table 1: Performance of the outcome classification task on the held-out test set, and on the subset of the test set used in the reader study. We include 95% confidence intervals estimated by 1,000 iterations through bootstrapping. 

In a reader study consisting of 200 images, our main finding is that COVID-GMIC outperforms radiologists A & B, respectively with 3 and 17 years of experience,  across time windows longer than 24 hours. Note that since the radiologists did not have access to clinical variables, their performance is not directly comparable to the COVID-GBM model; we include it only for reference. 

Interpretability to establish trust with clinicians

We also qualitatively evaluated the saliency maps computed by COVID-GMIC. Two examples are shown in Figure 3. Both patients were admitted to the intensive care unit and were intubated within 48 hours. In the first example, there are diffuse airspace opacities, though the saliency maps primarily highlight the medial right basilar and peripheral left basilar opacities. Similarly, the two regions of interest (ROI) patches (1 and 2) on the basilar region demonstrate comparable attention values, 0.49 and 0.46 respectively. In the second example, the extensive left mid to upper-lung abnormality (ROI patch 1) is highlighted, which correlates with the most extensive area of parenchymal consolidation.

Figure 3: From left to right: the original X-ray image, saliency maps for clinical deterioration within 24, 48, 72, and 96 hours, locations of region-of-interest (ROI) patches, and ROI patches with their associated attention scores. 

Deterioration risk curves

We designed a second model to compute deterioration risk curves, inspired by survival analysis. The second model, COVID-GMIC-DRC, predicts how the patient’s risk of deterioration evolves over time in the form of deterioration risk curves. The DRCs generated by the COVID-GMIC-DRC in the test set and the reliability plot are shown in Figure 4. The mean DRC for patients with adverse events (red dashed line) is higher than the DRC for patients without adverse events (blue dashed line) at all times. In the reliability plot, perfect calibration is indicated by the diagonal black dashed line. The figure shows that the model is well-calibrated.

Figure 4: (a) Deterioration risk curves (DRCs) and (b) reliability plot for COVID-GMIC-DRC. 

Implications for clinical practice

Overall, we developed and evaluated an AI system that is able to predict deterioration of COVID-19 patients presenting to the ED, where deterioration is defined as the composite outcome of mortality, intubation, or ICU admission. The system aims to provide clinicians with a quantitative estimate of the risk of deterioration, and how it is expected to evolve over time, in order to enable efficient triage and prioritization of patients at the high risk of deterioration. The tool may be of particular interest for pandemic hotspots where triage at admission is critical to allocate limited resources such as hospital beds. 

To allow for reproducibility and share our work with the research community, we made our code and parameters of trained models publicly available at https://github.com/nyukat/COVID-19_prognosis

References

[1] Shen, Yiqiu, et al. "An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization." Medical image analysis 68 (2021): 101908.

[2] Ke, Guolin, et al. "Lightgbm: A highly efficient gradient boosting decision tree." Advances in neural information processing systems 30 (2017): 3146-3154.

Farah Shamout

Assistant Professor Emerging Scholar, NYU Abu Dhabi