What is the performance of different machine learning algorithms in predicting disease outcome among COVID-19 patients?

During the height of COVID-19 pandemic, clinicians were forced to make difficult treatment decisions, given the large number of patients and limited resources. To help guide treatment strategies, we evaluated the performance of 18 machine learning models in predicting COVID-19 patient outcomes.

Like Comment
Read the paper

Machine learning models can aid physicians in their decision-making process. Particularly in a pandemic, such as COVID-19, models like this would facilitate the allocation of lifesaving resources to those patients that would benefit the most. A key consideration in the development of a machine learning model is the underlying algorithm. A few studies have evaluated a handful of machine learning algorithms and then identified those that had the best performance. However, there has not yet been any extensive and comprehensive evaluation of various algorithms available for generating machine learning models to determine which work best in a given application.

            To address this gap, we evaluated eighteen different machine learning algorithms for their ability to predict COVID-19 disease outcome. The eighteen algorithms fell into nine broad categories. For developing the models, we selected the electronic health records (EHRs) of 3,597 patients from the Mass-General Brigham database who presented to the emergency department (ED) between the months of March and April 2020. We developed distinct models to predict two primary outcomes: whether the patient was admitted to the ICU within 5 days of presenting to the ED and mortality within 28 days. For evaluating the efficacy of the models, we used EHRs from 1,711 patients who came to the ED between May and August 2020.

            We found that ensemble-based models have moderately better performance than other models in predicting ICU admission and mortality. We showed this by running cross validation tests and by assessing their predictive ability using a temporal validation dataset. On the basis of SHAP analysis, we also demonstrated that for predicting ICU admission, C-reactive protein, lactate dehydrogenase, procalcitonin, lymphocyte percentage, neutrophile percentage, oxygen saturation and respiratory rate were key parameters that determined the accuracy of our ensemble-based model predictions. Similarly, for mortality prediction, low eGFR, use of ventilator, lymphocyte percentage, neutrophile percentage, respiratory rate, procalcitonin, serum anion gap and serum potassium were the leading predictors. Another interesting observation was that the overall performance of all models decreased with the temporal validation dataset.

            Our study quantitates and systemically compares multiple machine learning models and demonstrates the improved performance of ensemble-based methods for predicting COVID-19 disease outcome. This might be attributed to the fact that ensemble methods are meta-algorithms that combine several different machine learning techniques into one unified predictive model. The drop in model performance on the temporal validation dataset might be due to the changes in patient management that occurred between March and August 2020. While developing these models, we show extensive hyperparameter tuning based on F1 scores. Our SHAP analysis also elegantly describes the impact of patient features on the outcome of COVID-19 disease.

Figure. Overview of our model developing strategy for all machine learning algorithms.

            There were a few limitations to our study. We used Brier score as a means to calibrate the models, due to lack of availability of predicted probabilities for all models. We used k-nearest neighbor algorithm for imputation which has a risk of data distortion. Although SHAP analysis is helpful in determining the variables of importance in a model, they have to be specially adapted to a particular model. From a clinical point of view, a key limitation was the dependence of the model on certain laboratory parameters that do not result until after a patient has left the ED which would delay the utility of the model.

            Based on our current study, we recommend using ensemble-based methods for developing clinical prediction models in COVID-19. Deploying such models would be helpful in augmenting the clinical decision-making process. Importantly, we demonstrate a method for evaluating multiple machine learning algorithms to generate the optimal model for predicting specific outcomes.

Sonu Subudhi

Postdoctoral research fellow, Massachusetts General Hospital and Harvard Medical School