Introduction
The 2019 coronavirus disease (COVID-19) is highly transmissible with complex clinical features and many unfamiliar aspects. This pandemic originated from a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus so far presents heterogeneous and mutable clinical manifestations patterns leading to an increasing number of mortalities and other devastating outcomes [1-3]. The clinical presentations of the virus ranged from asymptomatic or mild symptoms to severe complications including pneumonia, respiratory insufficiency, multi-organ failure leading to intensive care unit (ICU) hospitalization and consequently death. Many COVID-19 patients worsen swiftly after a period of relatively mild symptoms, highlighting the need for early risk stratification [4, 5]. Thus, the ability to identify patients at risk for deteriorating or critical conditions during their hospitalization episode may be useful for informing proper decisions by healthcare providers and policy makings [6, 7]. Besides, currently in the absence of licensed therapeutic or antiviral drugs, the exponentially increasing COVID-19 cases overwhelmed the healthcare facilities due to severe scarcity of hospital resources and clinical workers' physical and emotional tiredness, which needs an effective triage comprehensive controlling measures [8, 9]. In this situation, the overwhelmed healthcare systems have attempted to contain the outbreak by leveraging Machine Learning (ML) models to achieve proper decision-making, including recognizing high-risk cases, triage patients, and resource allocation [5, 10, 11].
ML as a branch of Artificial Intelligence (AI) enables pulling out high-quality predictive models from mining a huge raw dataset [12]. ML demonstrated great potentials for help decision-making in myriad fields of COVID-19 pandemic including, but not restricted to, detecting epidemics, feature selection, and classification based on multimedia data, quick diagnosis, prognosis, prediction, and assessing disease severity and predicting clinical outcomes [13-15]. During this pandemic, it is crucial to support frontline health care authorities to efficiently triage patients and optimize allocating the limited resources [16, 17]. Thus, the purpose of this study was to develop and validate a prediction model based on ML algorithms to predict hospitalized COVID-19 patients for transfer to ICU based on clinical parameters.
Materials and Methods
This retrospective, single-center study was conducted based on cumulative data of COVID-19 patients who were admitted from March 9, 2020, to December 20, 2020, in Mostafa Khomeini Hospital, affiliated to Ilam University of Medical Sciences (ILUMS), focal point center for COVID-19 care and treatment in Ilam, West of Iran. During this period, a total of 10925 suspected cases with COVID-19 have been referred to Mostafa Khomeini outpatient and Emergency Departments. Of those, 3017 cases were introduced as confirmed COVID-19 by Real-time Polymerase Chain Reaction (RT-PCR) test. From this number, only hospitalized patients who were diagnosed by positive tests of RT-PCR for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were included in this study. Data from 1300 patients in the Ilam university registry database were obtained. Seventy-five incomplete case records with many missing data (more than 70%) were excluded from the analysis. Also, the missing value was imputed with the mean or mode of each variable. The final sample size used in this analysis was 1225 patients hospitalized in the hospital (Diagram 1).
The predictors were individual patient- level demographic and clinical features. Data collection table designed in 6 categories with 53 risk factors (Table 1).
This study was approved by the ethical committee board of ILUMS. To protect the privacy and confidentiality of patients, the unique identification information of all patients is concealed in the process of data collection. Two Health Information Management experts checked all data, and a third researcher adjudicated any variance in interpretation between the two primary reviewers. The physicians who complete the form and the patient or their family members contacted to review and supplement for different variations and missing data. The outcome variable was ICU admission, and it had a binary distribution of "yes" if the patient needs to enter ICU or "no" otherwise. To construct the ICU admission prediction model, 13 ML algorithms from six different groups were applied. These algorithms included Bayes Net & Naive Bayes (from Bayes), Multilayer Perceptron (MLP), Support Vector Machine (SVM) (from functions), Kstar, Locally Weighted Learning (LWL) (from Lazy learners), OneR, PART (from Rules), J48, Random forest and Random tree (from Trees) and Adaboost, bagging (from Meta). Finally, the algorithms were implemented using WEKA 3.8 software. The evaluation of predictive models was done using confusion matrix performance metrics (Table2). To compare the performance of different algorithms in predicting ICU admission, we applied a 10-fold cross-validation method along with some evaluation measures including accuracy, sensitivity, specificity, precision, and F-measure (Table 3) and area under the curve (AUC) calculated for each ROC plot (AUC ROC).
Diagram 1) Flowchart describing patient selection
Table1) Baseline predictor variables and outcomes measures
Table 2) Confusion matrix
Table 3) The performance evaluation measures
Findings
Out of 1225 hospitalized COVID-19 patients in the retrospective study, 664 (54.20%) were men and 561 (45.80%) were women, and the median age of the participant was 57.25±17.80. Descriptive statistics for the 1225 records in this dataset are shown in Tables 4 and 5.
The machine learning algorithms: In this step, after preparing the dataset, the ICU admission classifier was developed using 13 ML algorithms including LWL, Kstar, MLP, SVM, Naive Bayes, Bayesian network, OneR, PART, J48, Random forest, Random tree, Adaboost, and bagging. A 10-fold cross-validation method was applied along with some evaluation metrics, including accuracy, sensitivity, specificity, and F-measure, to compare the performance of different algorithms in predicting ICU admission.
Evaluate the performance of each subgroup (DM algorithms): The performance of ML algorithms were calculated based on confusion matrix measures, including accuracy, sensitivity, specificity, precision, and F-Measure (Diagrams 2-4). Model performance evaluation is a fundamental part of building an effective ML model. According to (Diagrams 2-4), the performance evaluation results showed that the SVM accuracy of 91.551% and sensitivity of 91.6% yielded better results. But by considering the other three criteria, the Bayes Net with the specificity of 75.5%, precision of 90.8%, and F-Measure of 90.4% attained better performance. This means that the SVM algorithm yielded better performance in identifying people who need to be admitted to the ICU.
Table 4) The descriptive statistics of qualitative variables of the study
Table 5) The descriptive statistics of quantitative variables of the study
Diagram 2) Accuracy of ML algorithms for predicting ICU admission
Diagram 3) F-measure of ML algorithms for predicting ICU admission
Diagram 4) Sensitivity, specificity, and precision of ML algorithms for predict ICU admission
Evaluate the performance of each group: The performance measures of the algorithms in each of the six ML groups, including Bayesian, Functions, Lazy, Rules, Trees, and Meta techniques, were shown in Diagram 5. According to Diagram 5, the Meta algorithms (Adaboost, bagging) with an accuracy of 90.37%, the sensitivity of 90.35%, the precision of 88.25%, and F-measure of 88.35% have the best capabilities in ICU risk prediction. However, the Bayesian algorithms (Naive Bayes, Bayesian network) had the best performance in terms of specificity criterion (65.45%).
Finally, the AUC ROC rate for Adaboost (the best algorithm) was gained by about 91% (Diagram 6).
Diagram 5) Comparison of ML algorithm performance for ICU risk prediction
Discussion
We began with whether it is probable to predict the patient's future need for ICU admission according to baseline clinical parameters in the initial course of COVID-19 hospitalization. Indeed, this study aims to retrospectively develop models to estimate the likelihood of transferring the COVID-19 patients to the ICU to optimize planning and allocation of the scarce ICU resources during the COVID-19 outbreak. Then, we evaluated the performance of various ML algorithms to select the best model for predict ICU admission. Based on our analysis of 13 ML algorithms, we found that Meta algorithms (Adaboost, bagging) have moderately better performance than other selected ML algorithms.
These prediction models decrease the current uncertainty and ambiguity in COVID-19 clinical practice by presenting measurable, non-subjective, evidence-based medicine [18, 19]. Accurate ICU admission prediction can support the sharing of limited hospital resources and enhance health care quality [19]. Additionally, advanced ICU prediction can identify the susceptible and critical population and give support to reduce deaths as soon as possible. Designing an accurate and valid prediction model may improve the quality of care and increase the patients' survival rate. It also provides a better plan for clinicians to decrease the complications and improve patient survival chances.
So far, several studies have especially evaluated the application of ML techniques for ICU prediction in COVID-19 patients. Zhao and et al. analyzed the data of 1087 COVID-19 hospitalized patients and developed a risk score model using ML to predict potential COVID-19 ICU admission and mortality outcome; the results indicated the AUC of 0.74 and 0.83, respectively [4]. Another study conducted by Zhou and colleagues analyzed 1087 positive COVID-19 patients' data to predict the individual risk for ICU admission based on the nomogram model. The results showed good discriminative ability for predicting the risk of ICU admission with a C-index of 0.829 and 0.776 for training and validation groups, respectively [20]. In a retrospective study, developed XGBoos model for COVID-19, pneumonia, mechanically ventilated ICU, and mortality prediction, the results showed AUCROC of 91%, 82%, and 87%, respectively [21]. Agieb et al. also suggested a prediction model using three Naive Bayes, K-NN, and SVM models for predicting the likelihood of ICU admission. Results showed that the SVM model, with an accuracy of 92.58% and sensitivity of 88.58%, attained the best capabilities in predicting ICU admission [22].
Pan and colleagues studied 123 patients with confirmed COVID-19 to construct a risk prediction model through four ML algorithms to anticipate patient deterioration with COVID-19 (AUC=0.92)[23]. In another study performed by Assaf that analyzed 6995 confirmed patients data with severe COVID-19, three ML models were developed to assess patients at risk of deterioration during their hospitalization. Finally, the model reached 88.0% sensitivity, 92.7% specificity, and 92.0% accuracy in predicting critical COVID-19 cases [24]. Foenini et al. also proposed a prediction model to precisely and quickly quantified ICU hospitalization risk by 13 clinical variables that achieved the sensitivity of 91% and specificity of 91% (AUC of 0.93) [25].
The developed Meta algorithms in this study, like those reported in other studies [26], have achieved optimum results with a mean accuracy of 90.37%. However, due to the fact that in predicting the need for ICU admission, the correct prediction of cases that need hospitalization is more important, in this study, it was shown that Meta and SVM algorithms (accuracy of 90.37% and 91.551%, respectively) have the best performance in this regard.
This study may assist clinicians in enabling early detection, effective intervention and possibly reduce mortality in patients with COVID-19. Using such models in hospitals could help improve care, thereby better aligning clinical decisions with prognosis in critically ill patients with COVID-19.
This study had several limitations that need to be addressed. Firstly, this is a retrospective study in there was some recorded data was uneven or imbalanced; thus, we balanced them by removing noises and incomplete records as much as possible from the dataset. To solve the imbalanced data set problem, in which the number of documents related to the positive class is significantly lower than the negative, different criteria were measured to assess the performance of each ML algorithm. Also, by using the 10-fold cross-validation method, the results bias was minimized. Secondly, this was a retrospective study based on a single-center registry system. This may limit the generalizability of the proposed model. However, the Ilam CoV registry is a database gathered at the central hospital in Ilam province that delivers special healthcare services to COVID-19 patients during the pandemic.
Nevertheless, we will use multicenter data to validate the proposed model to augment the generalized prediction. In the future, our computational model's performance will be improved if we test more classification techniques at a more extensive, multicenter, and qualitative dataset. Also, larger cohorts, prospective settings, and clinical trials are needed before elucidating its contribution to improving the outcome of COVID-19.
Conclusion
Given the limited capacity of the ICU beds, predicting the number of patients who need to be admitted to the ICU can help in properly managing and scheduling resources. This model can be conveniently used to indicate the individual risk for ICU admission of patients with COVID19 and optimize the use of limited resources. The proposed model could automatically identify high-risk patients as early as the time of entry or during hospitalization. This group of patients needs intensive monitoring and instant treatment when unfavorable prognostic indicators are observed, thus, hopefully improving patient outcomes. In addition, it is expected to providing objective evidence to aid in the decision of supportive and therapeutic treatment and quantifying the risk of mortality from this infection, based on parameters during hospitalization. In general, the results showed that selected ML algorithms accurately predicted the need for the critical care of patients and outperformed the conventional triage and early warning approaches.
Acknowledgments: We thank the Research Deputy of the North Khorasan University of Medical Sciences for financially supporting this project.
Ethical Permissions: The study was approved by the ethical committee board of North Khorasan University of Medical Sciences (ethic code: IR.NKUMS.REC.1400.080).
Conflict of Interests: This article is a joint research project between North Khorasan and Ilam Universities of Medical Sciences.
Authors' Contributions: Orooji A. (First author), Introduction author/Original researcher (20%); Kazemi-Arpanahi H. (Second author), Assistant researcher (20%); Kaffashian M. (Third author), Discussion author (20%); Kalvandi Gh. (Forth author), Assistant researcher (20%); Shanbehzadeh M. (Fifth author), Assistant researcher (20%).
Funding/Sources: -