| Peer-Reviewed

COVID-19 Prediction and Detection Using Machine Learning Algorithms: Catboost and Linear Regression

Received: 13 September 2021     Accepted: 4 October 2021     Published: 12 October 2021
Views:       Downloads:
Abstract

A global pandemic COVID-19 has been rapidly spreading, and the predictions for infected rate shows how the cases will increase or decrease. Even though the number of people who get the corona vaccine is increasing, COVID-19 has been a serious worldwide problem. As machine learning and deep learning were implemented to predict COVID-19 in recent days, machine learning to predict the number of confirmed and death cases of COVID-19 was used. Prediction graphs of our proposed model play a crucial role for preventing more people getting infected. The project collected the number of daily infected cases in New York from March 21th 2020 to March 6th 2021. For precise results, the dataset in 6 different kinds of the machine learning methods was used. The methods were Decision Tree, Random Forest, Linear Regression, Gradient Boosting, XGboosting, and LGBM. RMSE and MAE values fluctuated from 9.95 to 68.85 and 5.99 to 58.76. The most accurate model was Linear Regression, RMSE and MAE with 9.96 and 5.99 for death cases and 597.61 and 346.04 for confirmed cases. Therefore, those prediction graph almost matched the same as the real number graph that the project drew with an actual dataset. The other dataset was about common COVID-19 symptoms, and the Catboost model listed from the most influential factor, breathing problem. Collecting data from other areas and specifying the patients’ features could have improved the quality of the research, though overall the result was successful.

Published in American Journal of Theoretical and Applied Statistics (Volume 10, Issue 5)
DOI 10.11648/j.ajtas.20211005.11
Page(s) 208-215
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

COVID-19, Machine Learning, Linear Regression

References
[1] Coronavirus in the WORLD: Latest case and death tolls in 24h per country. (n.d.). Retrieved March 24, 2021, from https://www.sortiraparis.com/news/in-paris/articles/212134-coronavirus-in-the-world-as-of-datadatestodayfrlatest-latest-case-and-death-toll/lang/en
[2] Novel Coronavirus – China. (2020, January 13). Retrieved January 09, 2021, from https://www.who.int/csr/don/12-january-2020-novel-coronavirus-china/en/
[3] Coronavirus disease (COVID-19): How is it Transmitted? (n.d.). Retrieved March 24, 2021, from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/question-and-answers-hub/q-a-detail/coronavirus-disease-COVID-19-how-is-it-transmitted
[4] Coronavirus. (n.d.). Retrieved March 24, 2021, from https://www.who.int/health-topics/coronavirus#tab=tab_3
[5] World Health Organization. (n.d.). Episode #14 - COVID-19 - Tests. World Health Organization. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/media-resources/science-in-5/episode-14---COVID-19---tests?gclid=Cj0KCQjw5PGFBhC2ARIsAIFIMNclWkv6n-pt-Zl06aTK2VBepUdH_u42soguf6QpPg28jJdtWnm7LmoaAuZVEALw_wcB.
[6] Several european countries under new COVID lockdown restrictions. (n.d.). Retrieved March 24, 2021, from https://www.voanews.com/COVID-19-pandemic/several-europeAn-countries-under-new-covid-lockdown-restrictions
[7] Bhatia, G., Dutta, P. K., & McClure, J. (2021, June 3). COVID-19 vaccine rollout: charts, maps and eligibility by country. Reuters. https://graphics.reuters.com/world-coronavirus-tracker-and-maps/vaccination-rollout-and-access/
[8] The COVID Tracking Project. (n.d.). https://covidtracking.com/
[9] Tamhane, R., & Mulge, S. (2020). Prediction of COVID-19 outbreak using machine learning. International Research Journal of Engineering and Technology (IRJET), 7 (5).
[10] Shrivastav, L. K., & Jha, S. K. (2020). A gradient boosting machine learning approach in modeling the impact of temperature and humidity on the transmission rate of COVID-19 in India. Applied Intelligence. https://doi.org/10.1007/s10489-020-01997-6
[11] Parbat, D., & Chakraborty, M. (2020). A Python Based Support Vector Regression Model for Prediction of COVID-19 Cases in India. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3591840
[12] Shahid, F., Zameer, A., & Muneeb, M. (2020). Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos, Solitons & Fractals, 140, 110212. https://doi.org/10.1016/j.chaos.2020.110212
[13] Gupta, A. K., Singh, V., Mathur, P., & Travieso-Gonzalez, C. M. (2020). Prediction of COVID-19 pandemic measuring criteria using support vector machine, prophet and linear regression models in Indian scenario. Journal of Interdisciplinary Mathematics, 24 (1), 89–108. https://doi.org/10.1080/09720502.2020.1833458
[14] Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349 (6245), 255–260. https://doi.org/10.1126/science.aaa8415
[15] Cayir, A., Yenidogan, I., & Dag, H. (2018). Feature Extraction Based on Deep Learning for Some Traditional Machine Learning Methods. 2018 3rd International Conference on Computer Science and Engineering (UBMK). https://doi.org/10.1109/ubmk.2018.8566383
[16] Freund, Y., Schapire, R., & Abe, N. (1999). A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 14 (771-780), 1612.
[17] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, A. Gulin (2018), “CatBoost: unbiased boosting with categorical features“, Advanced in Neural Information Processing Systems 31, pp. 6639-6649.
[18] Khademi, F., Akbari, M., Jamal, S. M., & Nikoo, M. (2017). Multiple linear regression, artificial neural network, and fuzzy logic prediction of 28 days compressive strength of concrete. Frontiers of Structural and Civil Engineering, 11 (1), 90-99.
Cite This Article
  • APA Style

    Justine Shinjae Kim. (2021). COVID-19 Prediction and Detection Using Machine Learning Algorithms: Catboost and Linear Regression. American Journal of Theoretical and Applied Statistics, 10(5), 208-215. https://doi.org/10.11648/j.ajtas.20211005.11

    Copy | Download

    ACS Style

    Justine Shinjae Kim. COVID-19 Prediction and Detection Using Machine Learning Algorithms: Catboost and Linear Regression. Am. J. Theor. Appl. Stat. 2021, 10(5), 208-215. doi: 10.11648/j.ajtas.20211005.11

    Copy | Download

    AMA Style

    Justine Shinjae Kim. COVID-19 Prediction and Detection Using Machine Learning Algorithms: Catboost and Linear Regression. Am J Theor Appl Stat. 2021;10(5):208-215. doi: 10.11648/j.ajtas.20211005.11

    Copy | Download

  • @article{10.11648/j.ajtas.20211005.11,
      author = {Justine Shinjae Kim},
      title = {COVID-19 Prediction and Detection Using Machine Learning Algorithms: Catboost and Linear Regression},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {10},
      number = {5},
      pages = {208-215},
      doi = {10.11648/j.ajtas.20211005.11},
      url = {https://doi.org/10.11648/j.ajtas.20211005.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20211005.11},
      abstract = {A global pandemic COVID-19 has been rapidly spreading, and the predictions for infected rate shows how the cases will increase or decrease. Even though the number of people who get the corona vaccine is increasing, COVID-19 has been a serious worldwide problem. As machine learning and deep learning were implemented to predict COVID-19 in recent days, machine learning to predict the number of confirmed and death cases of COVID-19 was used. Prediction graphs of our proposed model play a crucial role for preventing more people getting infected. The project collected the number of daily infected cases in New York from March 21th 2020 to March 6th 2021. For precise results, the dataset in 6 different kinds of the machine learning methods was used. The methods were Decision Tree, Random Forest, Linear Regression, Gradient Boosting, XGboosting, and LGBM. RMSE and MAE values fluctuated from 9.95 to 68.85 and 5.99 to 58.76. The most accurate model was Linear Regression, RMSE and MAE with 9.96 and 5.99 for death cases and 597.61 and 346.04 for confirmed cases. Therefore, those prediction graph almost matched the same as the real number graph that the project drew with an actual dataset. The other dataset was about common COVID-19 symptoms, and the Catboost model listed from the most influential factor, breathing problem. Collecting data from other areas and specifying the patients’ features could have improved the quality of the research, though overall the result was successful.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - COVID-19 Prediction and Detection Using Machine Learning Algorithms: Catboost and Linear Regression
    AU  - Justine Shinjae Kim
    Y1  - 2021/10/12
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ajtas.20211005.11
    DO  - 10.11648/j.ajtas.20211005.11
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 208
    EP  - 215
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20211005.11
    AB  - A global pandemic COVID-19 has been rapidly spreading, and the predictions for infected rate shows how the cases will increase or decrease. Even though the number of people who get the corona vaccine is increasing, COVID-19 has been a serious worldwide problem. As machine learning and deep learning were implemented to predict COVID-19 in recent days, machine learning to predict the number of confirmed and death cases of COVID-19 was used. Prediction graphs of our proposed model play a crucial role for preventing more people getting infected. The project collected the number of daily infected cases in New York from March 21th 2020 to March 6th 2021. For precise results, the dataset in 6 different kinds of the machine learning methods was used. The methods were Decision Tree, Random Forest, Linear Regression, Gradient Boosting, XGboosting, and LGBM. RMSE and MAE values fluctuated from 9.95 to 68.85 and 5.99 to 58.76. The most accurate model was Linear Regression, RMSE and MAE with 9.96 and 5.99 for death cases and 597.61 and 346.04 for confirmed cases. Therefore, those prediction graph almost matched the same as the real number graph that the project drew with an actual dataset. The other dataset was about common COVID-19 symptoms, and the Catboost model listed from the most influential factor, breathing problem. Collecting data from other areas and specifying the patients’ features could have improved the quality of the research, though overall the result was successful.
    VL  - 10
    IS  - 5
    ER  - 

    Copy | Download

Author Information
  • Emma Willard School, Troy, USA

  • Sections