<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<?covid-19-tdm?>
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Public Health</journal-id>
<journal-title>Frontiers in Public Health</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Public Health</abbrev-journal-title>
<issn pub-type="epub">2296-2565</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpubh.2023.1252357</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Public Health</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Detection of COVID-19 epidemic outbreak using machine learning</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" equal-contrib="yes">
<name><surname>Cho</surname>
<given-names>Giphil</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn0001"><sup>&#x2020;</sup></xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes">
<name><surname>Park</surname>
<given-names>Jeong Rye</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn0001"><sup>&#x2020;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Choi</surname>
<given-names>Yongin</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Ahn</surname>
<given-names>Hyeonjeong</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Lee</surname>
<given-names>Hyojung</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1801557/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Artificial Intelligence and Software, Kangwon National University</institution>, <addr-line>Samcheok-si</addr-line>, <country>Republic of Korea</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Mathematics, Kyungpook National University</institution>, <addr-line>Daegu</addr-line>, <country>Republic of Korea</country></aff>
<aff id="aff3"><sup>3</sup><institution>Busan Center for Medical Mathematics, National Institute for Mathematical Sciences</institution>, <addr-line>Daejeon</addr-line>, <country>Republic of Korea</country></aff>
<aff id="aff4"><sup>4</sup><institution>Department of Statistics, Kyungpook National University</institution>, <addr-line>Daegu</addr-line>, <country>Republic of Korea</country></aff>
<author-notes>
<fn fn-type="edited-by" id="fn0002"><p>Edited by: Fathiah Zakham, University of Helsinki, Finland</p>
</fn>
<fn fn-type="edited-by" id="fn0003"><p>Reviewed by: Junxiang Chen, Indiana University, United States; Ana Clara Gomes da Silva, Universidade de Pernambuco, Brazil; Dinh Tuan Phan Le, New York City Health and Hospitals Corporation, United States</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Hyojung Lee, <email>hjlee@knu.ac.kr</email></corresp>
<fn fn-type="equal" id="fn0001"><p><sup>&#x2020;</sup>These authors have contributed equally to this work</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>12</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>11</volume>
<elocation-id>1252357</elocation-id>
<history>
<date date-type="received">
<day>03</day>
<month>07</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>01</day>
<month>12</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Cho, Park, Choi, Ahn and Lee.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Cho, Park, Choi, Ahn and Lee</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<sec id="sec1">
<title>Background</title>
<p>The coronavirus disease (COVID-19) pandemic has spread rapidly across the world, creating an urgent need for predictive models that can help healthcare providers prepare and respond to outbreaks more quickly and effectively, and ultimately improve patient care. Early detection and warning systems are crucial for preventing and controlling epidemic spread.</p>
</sec>
<sec id="sec2">
<title>Objective</title>
<p>In this study, we aimed to propose a machine learning-based method to predict the transmission trend of COVID-19 and a new approach to detect the start time of new outbreaks by analyzing epidemiological data.</p>
</sec>
<sec id="sec3">
<title>Methods</title>
<p>We developed a risk index to measure the change in the transmission trend. We applied machine learning (ML) techniques to predict COVID-19 transmission trends, categorized into three labels: decrease (L0), maintain (L1), and increase (L2). We used Support Vector Machine (SVM), Random Forest (RF), and XGBoost (XGB) as ML models. We employed grid search methods to determine the optimal hyperparameters for these three models. We proposed a new method to detect the start time of new outbreaks based on label 2, which was sustained for at least 14&#x2009;days (i.e., the duration of maintenance). We compared the performance of different ML models to identify the most accurate approach for outbreak detection. We conducted sensitivity analysis for the duration of maintenance between 7&#x2009;days and 28&#x2009;days.</p>
</sec>
<sec id="sec4">
<title>Results</title>
<p>ML methods demonstrated high accuracy (over 94%) in estimating the classification of the transmission trends. Our proposed method successfully predicted the start time of new outbreaks, enabling us to detect a total of seven estimated outbreaks, while there were five reported outbreaks between March 2020 and October 2022 in Korea. It means that our method could detect minor outbreaks. Among the ML models, the RF and XGB classifiers exhibited the highest accuracy in outbreak detection.</p>
</sec>
<sec id="sec5">
<title>Conclusion</title>
<p>The study highlights the strength of our method in accurately predicting the timing of an outbreak using an interpretable and explainable approach. It could provide a standard for predicting the start time of new outbreaks and detecting future transmission trends. This method can contribute to the development of targeted prevention and control measures and enhance resource management during the pandemic.</p>
</sec>
</abstract>
<kwd-group>
<kwd>COVID-19</kwd>
<kwd>prediction</kwd>
<kwd>machine learning</kwd>
<kwd>early detection</kwd>
<kwd>outbreak</kwd>
</kwd-group>
<contract-num rid="cn2">NRF-2022R1A2C3011711</contract-num>
<contract-num rid="cn2">NRF-2022R1A5A1033624</contract-num>
<contract-num rid="cn4">2020R1C1C1A01012557</contract-num>
<contract-num rid="cn6">NRF-2021R1I1A1A01057767</contract-num>
<contract-num rid="cn8">B23820000</contract-num>
<contract-sponsor id="cn1">National Research Foundation of Korea (NRF)<named-content content-type="fundref-id">10.13039/501100003725</named-content></contract-sponsor>
<contract-sponsor id="cn2">Korean government (MSIT)</contract-sponsor>
<contract-sponsor id="cn3">NRF</contract-sponsor>
<contract-sponsor id="cn4">Korean government</contract-sponsor>
<contract-sponsor id="cn5">NRF</contract-sponsor>
<contract-sponsor id="cn6">Korean government</contract-sponsor>
<contract-sponsor id="cn7">National Institute for Mathematical Sciences (NIMS)</contract-sponsor>
<contract-sponsor id="cn8">Korean government (MSIT)</contract-sponsor>
<counts>
<fig-count count="7"/>
<table-count count="3"/>
<equation-count count="2"/>
<ref-count count="36"/>
<page-count count="12"/>
<word-count count="8270"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Infectious Diseases: Epidemiology and Prevention</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="sec6">
<label>1</label>
<title>Introduction</title>
<p>The coronavirus disease (COVID-19) pandemic is caused by the novel coronavirus SARS-CoV-2, which has spread rapidly and affected human lives worldwide. Since the start of the pandemic non-pharmaceutical interventions (NPIs) such as wearing masks, social distancing, and pharmaceutical vaccination have been implemented to control the spread of the virus. However, the emergence of new variants of the virus has raised concerns about their potential for increased transmission. The pandemic continues to impact human lives, and it is crucial to control it and reduce its transmission.</p>
<p>Predictions can be made in several ways. One common approach is to use mathematical models that consider factors such as the rate of transmission, number of cases, and effectiveness of control interventions such as social distancing and vaccination. These models can predict future trends in COVID-19 transmission dynamics and estimate the number of cases and deaths (<xref ref-type="bibr" rid="ref1 ref2 ref3">1&#x2013;3</xref>). Mathematical models are widely used for predicting infectious diseases, but they can be difficult to adapt to various external factors such as social distancing or the emergence of new variants (<xref ref-type="bibr" rid="ref4">4</xref>, <xref ref-type="bibr" rid="ref5">5</xref>).</p>
<p>Another approach is to use machine learning (ML) methods to detect changes in the trend of transmission and potential outbreaks (<xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6&#x2013;9</xref>). Shahid et al. (<xref ref-type="bibr" rid="ref6">6</xref>) predicted the confirmed cases, deaths, and recoveries of COVID-19 in 10 major countries using ARIMA, SVR, LSTM, and Bi-LSTM. Chakraborty et al. (<xref ref-type="bibr" rid="ref10">10</xref>) performed short-term forecasts of future COVID-19 cases in Canada, France, Republic of Korea, the United Kingdom, and India, using a hybrid forecasting approach based on the ARIMA and wavelet-based models. Katragadda et al. (<xref ref-type="bibr" rid="ref9">9</xref>) explored the COVID-19 spread growth in America by comparing the mobility of local people and visitors, and forecasted the number of cases using various ML models.</p>
<p>Investigating the start point of infectious disease outbreaks and analyzing the transmission dynamics of epidemics is critical for several reasons. First, understanding the source of an outbreak can help identify the underlying cause of the disease and prevent future outbreaks. Second, analyzing the transmission dynamics of epidemics can provide important information on how the disease spreads and who is at risk. This information can then be used to develop effective preventive and control measures. Third, investigating the start point of an outbreak and analyzing the transmission dynamics can help determine the scope and severity of the outbreak. This information is important for determining the level of response required to control an outbreak and to protect public health. Therefore, understanding the start point of infectious disease outbreaks and analyzing transmission dynamics is essential for the effective investigation, prevention, and control of outbreaks.</p>
<p>Early detection (ED) methods and warning systems for epidemics are important to prevent and control the spread of the virus. Shi et al. (<xref ref-type="bibr" rid="ref11">11</xref>) developed statistical models combining least absolute shrinkage and selection operator with the ARIMA model to forecast the spread of dengue pandemic in Singapore. Several studies have used statistical methods for the ED of infectious disease outbreaks using statistical methods (<xref ref-type="bibr" rid="ref11 ref12 ref13">11&#x2013;13</xref>). ML has been proposed as a useful tool for ED of COVID-19 outbreak (<xref ref-type="bibr" rid="ref14 ref15 ref16">14&#x2013;16</xref>). Martinez-Velazquez et al. (<xref ref-type="bibr" rid="ref14">14</xref>) detected the COVID-19 outbreak using self-reported symptom data and evaluated the performance of models using 15 ML classifiers, such as decision tree, neural network, Support Vector Machine (SVM), and Random Forest (RF).</p>
<p>Korea experienced five reported outbreaks from March 2020 to October 2022. The start times of outbreaks were not clearly determined, as different start dates were reported, as summarized in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S1</xref>. Here, we investigated national COVID-19 outbreaks without considering regional factors, as the country&#x2019;s size is not very large (<xref ref-type="bibr" rid="ref17">17</xref>). Additionally, policy decisions related to COVID-19 are managed at the national level by the Korea Disease Control and Prevention Agency (KDCA). No explainable standards were recommended to determine the start time of the COVID-19 outbreak. In this study, we aimed to develop a method to detect early COVID-19 outbreaks or identify potential early outbreaks using ML by analyzing epidemiological data in the Republic of Korea.</p>
</sec>
<sec sec-type="methods" id="sec7">
<label>2</label>
<title>Methods</title>
<p>The method used to detect the emergence of the COVID-19 outbreak is illustrated in <xref ref-type="fig" rid="fig1">Figure 1</xref>. We propose a novel method using the risk index and machine learning, without requiring any new developments in the machine learning method. This approach enables us to interpret the transmission trend using the risk index function and various data.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Schematic for the outbreak detection of COVID-19 outbreak. <bold>(A)</bold> The reported dates of the new COVID-19 outbreaks and the proportion of variants. <bold>(B)</bold> Transmission trend is estimated using ML techniques of classification. <bold>(C)</bold> Detection of new outbreak using the risk index and ML techniques.</p>
</caption>
<graphic xlink:href="fpubh-11-1252357-g001.tif"/>
</fig>
<sec id="sec8">
<label>2.1</label>
<title>Epidemiological data</title>
<p>We analyzed epidemiological data on reported cases of COVID-19 from February 18, 2020 to October 31, 2022, provided by KDCA (<xref ref-type="bibr" rid="ref18">18</xref>) in the Republic of Korea, shown in <xref ref-type="supplementary-material" rid="SM1">Supplementary Figures S1A,B</xref>. The proportions of delta and omicron variants were obtained from covariance data (<xref ref-type="bibr" rid="ref19">19</xref>, <xref ref-type="bibr" rid="ref20">20</xref>). We computed the number of delta variant cases and omicron cases by multiplying the daily COVID-19 cases with proportional data (<xref ref-type="bibr" rid="ref18 ref19 ref20">18&#x2013;20</xref>).</p>
<p>Previous studies mentioned that enhanced social distancing was a crucial intervention to prevent the spread of COVID-19 transmission in Korea (<xref ref-type="bibr" rid="ref21 ref22 ref23">21&#x2013;23</xref>).</p>
<p>We used collected data on social distancing measures among NPIs from a press release by KDCA (<xref ref-type="bibr" rid="ref24">24</xref>), where we divided the levels of social distancing into four categories based on their intensity (distancing level 1 to 4) (<xref ref-type="bibr" rid="ref25 ref26 ref27">25&#x2013;27</xref>). <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S2</xref> summarizes the important times to change the level of social distancing. The higher the level, the more stringent the control intervention implemented. In addition, <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S1C</xref> and <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S3</xref> show the proportion of days of the week on the yearly number of COVID-19 cases.</p>
</sec>
<sec id="sec9">
<label>2.2</label>
<title>Ethical considerations</title>
<p>The data are presented in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S3</xref>. The datasets were fully anonymized and did not include any personally identifiable information. Thus, ethical approval was not required for this analysis.</p>
</sec>
<sec id="sec10">
<label>2.3</label>
<title>Overview of the estimation of transmission trend of COVID-19 epidemic</title>
<p><xref ref-type="fig" rid="fig1">Figure 1</xref> shows a schematic of the detection of early outbreaks. <xref ref-type="fig" rid="fig1">Figure 1A</xref> shows newly reported COVID-19 cases and several outbreaks in Korea, along with the proportion of variants. <xref ref-type="fig" rid="fig1">Figures 1B</xref>,<xref ref-type="fig" rid="fig1">C</xref> shows a new method for estimating the start time of the new outbreak.</p>
</sec>
<sec id="sec11">
<label>2.4</label>
<title>Sample data</title>
<sec id="sec12">
<label>2.4.1</label>
<title>Define calibration and prediction periods</title>
<p>The daily number of COVID-19 cases was collected for specific periods of <italic>k</italic> days. Let <inline-formula>
<mml:math id="M1">
<mml:mi>I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
</mml:math>
</inline-formula> denote the number of COVID-19 cases on day <italic>t</italic>. The first sample data of the cases is defined as <inline-formula>
<mml:math id="M2">
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mn>1</mml:mn>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mn>2</mml:mn>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mi>k</mml:mi>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>, where <inline-formula>
<mml:math id="M3">
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
</mml:math>
</inline-formula> denotes <inline-formula>
<mml:math id="M4">
<mml:mi>I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
</mml:math>
</inline-formula> on the <inline-formula>
<mml:math id="M5">
<mml:mi>&#x03C9;</mml:mi>
</mml:math>
</inline-formula>-th sample data. The sample data comprise two partitions of time periods: a calibration period, excluding the most recent <italic>x</italic> days, and a prediction period, including the most recent <inline-formula>
<mml:math id="M6">
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula> days to predict the most recent <italic>x</italic> days, where the length of the calibration period is <inline-formula>
<mml:math id="M7">
<mml:mi>y</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula> and the length of the prediction period is <italic>x</italic>, as shown in <xref ref-type="fig" rid="fig2">Figure 2A</xref>.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>Sample data and risk index. <bold>(A&#x2013;C)</bold> Outline of the methods. <bold>(A)</bold> The sample data are generated for the calibration period and prediction period from February 2020 to October 2022. <bold>(B)</bold> Risk index for transmission trend is developed. <bold>(C)</bold> Transmission trends are grouped as decrease (L0), maintain (L1), increase (L2) using risk index.</p>
</caption>
<graphic xlink:href="fpubh-11-1252357-g002.tif"/>
</fig>
<p>In other words, the sample data <inline-formula>
<mml:math id="M8">
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:math>
</inline-formula> can be expressed as <inline-formula>
<mml:math id="M9">
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
<mml:mo>&#x222A;</mml:mo>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mtext>,</mml:mtext>
</mml:math>
</inline-formula> where <inline-formula>
<mml:math id="M10">
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mn>1</mml:mn>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mn>2</mml:mn>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mi>y</mml:mi>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> denotes the sample data for the calibration period and <inline-formula>
<mml:math id="M11">
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mi>k</mml:mi>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> denotes the sample data for the prediction period. In general, for the time window <inline-formula>
<mml:math id="M12">
<mml:mi>&#x03C9;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mfenced open="{" close="}" separators=",,">
<mml:mn>1</mml:mn>
<mml:mo>&#x2026;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mfenced>
</mml:math>
</inline-formula> with a total of <italic>n</italic> sample data, the <inline-formula>
<mml:math id="M13">
<mml:mi>&#x03C9;</mml:mi>
</mml:math>
</inline-formula>-th sample data of the cases are defined as <inline-formula>
<mml:math id="M14">
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mi>&#x03C9;</mml:mi>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi mathvariant="normal">&#x03C9;</mml:mi>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>&#x03C9;</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>&#x03C9;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>.</p>
<p>The time interval for each <inline-formula>
<mml:math id="M15">
<mml:mi>&#x03C9;</mml:mi>
</mml:math>
</inline-formula>-th sample data is defined as <inline-formula>
<mml:math id="M16">
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mi>&#x03C9;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>&#x03C9;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>&#x03C9;</mml:mi>
<mml:mspace width="0.25em"/>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>. <inline-formula>
<mml:math id="M17">
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
</mml:math>
</inline-formula> comprises the time period for the calibration period (<inline-formula>
<mml:math id="M18">
<mml:msubsup>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>) and the time period for the prediction period (<inline-formula>
<mml:math id="M19">
<mml:msubsup>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>), expressed by <inline-formula>
<mml:math id="M20">
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
<mml:mo>&#x222A;</mml:mo>
<mml:msubsup>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>, where the time periods are defined as <inline-formula>
<mml:math id="M21">
<mml:msubsup>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mi>&#x03C9;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>&#x03C9;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x03C4;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M22">
<mml:msubsup>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03C4;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>&#x03C9;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula>, and <inline-formula>
<mml:math id="M23">
<mml:msub>
<mml:mi>&#x03C4;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>&#x03C9;</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>y</mml:mi>
</mml:math>
</inline-formula> is the final time of the calibration period.</p>
<p>Moreover, for each <inline-formula>
<mml:math id="M24">
<mml:mi>&#x03C9;</mml:mi>
</mml:math>
</inline-formula>-th sample data, <inline-formula>
<mml:math id="M25">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M26">
<mml:msubsup>
<mml:mi>&#x03C3;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> denote the mean and standard deviation of <inline-formula>
<mml:math id="M27">
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> for the calibration period, respectively. Likewise, <inline-formula>
<mml:math id="M28">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mspace width="0.25em"/>
<mml:mi mathvariant="normal">and</mml:mi>
<mml:mspace width="0.25em"/>
<mml:msubsup>
<mml:mi>&#x03C3;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> are the average number and standard deviation of <inline-formula>
<mml:math id="M29">
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> for the prediction period, respectively.</p>
<p>In the present study, we set the calibration period to 21&#x2009;days (i.e.,<inline-formula>
<mml:math id="M30">
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>35</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>14</mml:mn>
</mml:math>
</inline-formula>) and the time window as 1&#x2009;day from February 18, 2020, to October 31, 2022. The sample data of the cases consisted of 953 sets (i.e., <inline-formula>
<mml:math id="M31">
<mml:mi>n</mml:mi>
<mml:mo>=</mml:mo>
</mml:math>
</inline-formula>953), which comprised 667 training data and 286 test data (the ratio of train data to test data was assumed to be 7:3), where all sample data of the cases were defined as <inline-formula>
<mml:math id="M32">
<mml:mi>S</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}" separators=",,,">
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2026;</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>953</mml:mn>
</mml:msub>
</mml:mfenced>
</mml:math>
</inline-formula>. We considered various periods, where the calibration periods ranged from 14 to 28&#x2009;days and the predication periods ranged from 7 to 21&#x2009;days, assuming that the calibration periods were longer than the prediction periods.</p>
</sec>
<sec id="sec13">
<label>2.4.2</label>
<title>Normalization and regression analysis</title>
<p>We normalized the sample data from <inline-formula>
<mml:math id="M33">
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
</mml:math>
</inline-formula> to <inline-formula>
<mml:math id="M34">
<mml:msub>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
</mml:math>
</inline-formula> using the min-max normalization. Moreover, we applied the linear regression model to the sample data for the calibration period (<inline-formula>
<mml:math id="M35">
<mml:msubsup>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi>C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>) and prediction period (<inline-formula>
<mml:math id="M36">
<mml:msubsup>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi>P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>), where <inline-formula>
<mml:math id="M37">
<mml:msub>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi>C</mml:mi>
</mml:msubsup>
<mml:mo>&#x222A;</mml:mo>
<mml:msubsup>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi>P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>. Here, <inline-formula>
<mml:math id="M38">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M39">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> denote the slopes obtained from the linear regression model for the samples <inline-formula>
<mml:math id="M40">
<mml:msubsup>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M41">
<mml:msubsup>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>, respectively, which are defined as the increment rates. <inline-formula>
<mml:math id="M42">
<mml:msup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:mfenced>
</mml:math>
</inline-formula> denotes the vector of the mean number of COVID-19 cases during the calibration period. <inline-formula>
<mml:math id="M43">
<mml:msup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:mfenced>
</mml:math>
</inline-formula> denotes the vector of the average number of COVID-19 cases during the prediction period. That is, the regression analysis for each sample data <inline-formula>
<mml:math id="M44">
<mml:mi>&#x03C9;</mml:mi>
</mml:math>
</inline-formula> as follows:<disp-formula id="E1">
<mml:math id="M45">
<mml:mo stretchy="true">{</mml:mo>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msubsup>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B1;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal"></mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msubsup>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msubsup>
<mml:mover accent="true">
<mml:mi>s</mml:mi>
<mml:mo stretchy="true">^</mml:mo>
</mml:mover>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B1;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal"></mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msubsup>
<mml:mi>T</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>where <inline-formula>
<mml:math id="M46">
<mml:msubsup>
<mml:mi>&#x03B1;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B1;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> are intercept values of the linear regression model for calibration period and prediction period, respectively. <inline-formula>
<mml:math id="M47">
<mml:msup>
<mml:mi>&#x03C3;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:msubsup>
<mml:mi>&#x03C3;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:mfenced>
</mml:math>
</inline-formula> denotes the vector of the standard deviation of the COVID-19 cases for the calibration period. <inline-formula>
<mml:math id="M48">
<mml:msup>
<mml:mi>&#x03C3;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}">
<mml:msubsup>
<mml:mi>&#x03C3;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:mfenced>
</mml:math>
</inline-formula> denotes the vector of the standard deviation of COVID-19 cases for the prediction period. &#x201C;<italic>Week</italic>&#x201D; represents the day of the week, corresponding to final time of the calibration period (<inline-formula>
<mml:math id="M49">
<mml:msub>
<mml:mi>&#x03C4;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
<mml:mo stretchy="true">)</mml:mo>
</mml:math>
</inline-formula>. &#x201C;<italic>Delta</italic>&#x201D; denotes the number of delta variant and &#x201C;<italic>Omicron</italic>&#x201D; denotes the number of omicron variant. &#x201C;<italic>Policy</italic>&#x201D; denotes the level of NPIs implemented in Korea.</p>
</sec>
</sec>
<sec id="sec14">
<label>2.5</label>
<title>Development of risk index and labeling for transmission trend</title>
<p>In the present study, we developed a method for early detection of potential infectious disease outbreaks by estimating the starting point of such outbreaks. Previous studies have focused on detecting outbreaks early through statistical or machine learning techniques based on data such as the number of COVID-19 cases, NPIs, and variant viruses in (<xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15 ref16">11&#x2013;16</xref>). As an alternative new approach, we aimed to quantify the risk potential to indicate the increasing trends and changes of transmission trends from calibration period to prediction period.</p>
<sec id="sec15">
<label>2.5.1</label>
<title>Definition of risk index</title>
<p>We proposed a quantitative representation of these changes as the risk index, which can be used to classify the risk of potential outbreaks, as described in <xref ref-type="fig" rid="fig2">Figure 2B</xref>. For each <inline-formula>
<mml:math id="M50">
<mml:mi>&#x03C9;</mml:mi>
</mml:math>
</inline-formula>-th sample data, we selected two functions of <inline-formula>
<mml:math id="M51">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M52">
<mml:mi>g</mml:mi>
</mml:math>
</inline-formula> for transmission trend changes, which consist of the mean of COVID-19 cases (<inline-formula>
<mml:math id="M53">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math id="M54">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>) and the increment rate (<inline-formula>
<mml:math id="M55">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math id="M56">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>) for calibration period and prediction period, respectively. <inline-formula>
<mml:math id="M57">
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mspace width="0.25em"/>
<mml:mi mathvariant="normal">and</mml:mi>
<mml:mspace width="0.25em"/>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:math>
</inline-formula> represent the positive scaling parameters of the functions <inline-formula>
<mml:math id="M58">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M59">
<mml:mi>g</mml:mi>
</mml:math>
</inline-formula>. The risk index <inline-formula>
<mml:math id="M60">
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:msub>
<mml:mi>&#x03C4;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> is expressed as follows.<disp-formula id="E2">
<mml:math id="M61">
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:msub>
<mml:mi>&#x03C4;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>&#x03C9;</mml:mi>
</mml:mfenced>
<mml:mspace width="0.25em"/>
<mml:mi>g</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>&#x03C9;</mml:mi>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mo>sinh</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:mfrac>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
<mml:mtext>.</mml:mtext>
</mml:math>
</disp-formula>(1)<list list-type="roman-lower">
<list-item>
<p><bold>Change of the mean of COVID-19 cases</bold>: The function <inline-formula>
<mml:math id="M62">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> represents the rate of change to describe how much the COVID-19 cases have increased during the prediction period based on the calibration period. The function <inline-formula>
<mml:math id="M63">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> denotes the hyperbolic sine (sinh) function of relative difference between <inline-formula>
<mml:math id="M64">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M65">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> divided by <inline-formula>
<mml:math id="M66">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>. If <inline-formula>
<mml:math id="M67">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>&#x003E;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>, the function <inline-formula>
<mml:math id="M68">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> exhibits positive exponential growth. Otherwise, the function <inline-formula>
<mml:math id="M69">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> becomes negative exponential decay.</p>
</list-item>
<list-item>
<p><bold>Change of the increment rate of COVID-19 cases</bold>: The function <inline-formula>
<mml:math id="M70">
<mml:mi>g</mml:mi>
</mml:math>
</inline-formula> represents the change of the increment rate for transmission trend to describe how much the slope in prediction period (<inline-formula>
<mml:math id="M71">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>) has increased from the slope in calibration period (<inline-formula>
<mml:math id="M72">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>) for the linear regression model. The function <inline-formula>
<mml:math id="M73">
<mml:mi>g</mml:mi>
</mml:math>
</inline-formula> is defined as an exponential function of the difference between <inline-formula>
<mml:math id="M74">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M75">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>. If <inline-formula>
<mml:math id="M76">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>&#x003E;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>, the function <inline-formula>
<mml:math id="M77">
<mml:mi>g</mml:mi>
</mml:math>
</inline-formula> has positive exponential growth with <inline-formula>
<mml:math id="M78">
<mml:mi>g</mml:mi>
<mml:mo>&#x003E;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>. Otherwise, the function <inline-formula>
<mml:math id="M79">
<mml:mi>g</mml:mi>
</mml:math>
</inline-formula> becomes exponential decay with <inline-formula>
<mml:math id="M80">
<mml:mn>0</mml:mn>
<mml:mo>&#x003C;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>.</p>
</list-item>
</list></p>
<p>We defined the risk index as the product of two functions. For example, one sample shows <inline-formula>
<mml:math id="M81">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>&#x003E;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M82">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>&#x003E;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>. Then, the function <inline-formula>
<mml:math id="M83">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> exhibits positive exponential growth. The function <inline-formula>
<mml:math id="M84">
<mml:mi>g</mml:mi>
</mml:math>
</inline-formula> amplifies the function <inline-formula>
<mml:math id="M85">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> because of <inline-formula>
<mml:math id="M86">
<mml:mi>g</mml:mi>
<mml:mo>&#x003E;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>. However, another sample shows <inline-formula>
<mml:math id="M87">
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>&#x003E;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M88">
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msubsup>
<mml:mo>&#x003C;</mml:mo>
<mml:msubsup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msubsup>
</mml:math>
</inline-formula>. Then, the function <inline-formula>
<mml:math id="M89">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> exhibits positive exponential growth. The function <inline-formula>
<mml:math id="M90">
<mml:mi>g</mml:mi>
</mml:math>
</inline-formula> plays a role in decreasing the function <inline-formula>
<mml:math id="M91">
<mml:mi>f</mml:mi>
</mml:math>
</inline-formula> because of <inline-formula>
<mml:math id="M92">
<mml:mn>0</mml:mn>
<mml:mo>&#x003C;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>.</p>
</sec>
<sec id="sec16">
<label>2.5.2</label>
<title>Labeling for transmission dynamics using risk index</title>
<p>We calculated the values of risk index for each sample data point (<inline-formula>
<mml:math id="M93">
<mml:mi>S</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfenced open="{" close="}" separators=",,,">
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2026;</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>953</mml:mn>
</mml:msub>
</mml:mfenced>
</mml:math>
</inline-formula>). We uniformly divided the values of risk index <inline-formula>
<mml:math id="M94">
<mml:msub>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:msub>
<mml:mi>&#x03C4;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mrow>
<mml:mi>&#x03C9;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mfenced open="{" close="}" separators=",,">
<mml:mn>1</mml:mn>
<mml:mo>&#x2026;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mfenced>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> into three groups and determined labels as decrease (L0), maintain (L1), and increase (L2) in the transmission trend. We used a dataset with a similar size for each class (or label) as demonstrated in the previous study (<xref ref-type="bibr" rid="ref28">28</xref>).</p>
<p>For instance, in the groups with small values of risk index, <inline-formula>
<mml:math id="M95">
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">I</mml:mi>
<mml:mfenced open="(" close=")">
<mml:msub>
<mml:mi>&#x03C4;</mml:mi>
<mml:mi>&#x03C9;</mml:mi>
</mml:msub>
</mml:mfenced>
</mml:math>
</inline-formula>, indicating L0, we interpreted that the transmission trend would decrease for the prediction period, compared to that in the calibration period. <xref ref-type="supplementary-material" rid="SM1">Supplementary Figures S2A&#x2013;C</xref> shows examples of the sample data labeled in L0, L1, and L2, respectively.</p>
</sec>
</sec>
<sec id="sec17">
<label>2.6</label>
<title>Machine learning approaches to estimate the transmission trend</title>
<p>We used eight features to estimate the transmission trends using ML techniques. <xref ref-type="table" rid="tab1">Table 1</xref> summarizes the features of the training and testing sample data.</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption>
<p>Description of features for training the sample data.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Features</th>
<th align="left" valign="top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">
<inline-formula>
<mml:math id="M96">
<mml:msup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>
</td>
<td align="left" valign="middle">Average number of COVID-19 cases for calibration period</td>
</tr>
<tr>
<td align="left" valign="middle">
<inline-formula>
<mml:math id="M97">
<mml:msup>
<mml:mi>&#x03C3;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>
</td>
<td align="left" valign="middle">Standard deviation of COVID-19 cases for calibration period</td>
</tr>
<tr>
<td align="left" valign="middle">
<inline-formula>
<mml:math id="M98">
<mml:msup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>
</td>
<td align="left" valign="middle">Slope obtained from the linear regression model of COVID-19 cases for calibration period</td>
</tr>
<tr>
<td align="left" valign="middle">
<inline-formula>
<mml:math id="M99">
<mml:mi mathvariant="normal">Week</mml:mi>
</mml:math>
</inline-formula>
</td>
<td align="left" valign="middle">Start day of the week for calibration period</td>
</tr>
<tr>
<td align="left" valign="middle">
<inline-formula>
<mml:math id="M100">
<mml:mi mathvariant="italic">Delt</mml:mi>
<mml:msup>
<mml:mi>a</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>
</td>
<td align="left" valign="middle">Average number of Delta variant for calibration period</td>
</tr>
<tr>
<td align="left" valign="middle">
<inline-formula>
<mml:math id="M101">
<mml:mi mathvariant="italic">Omicro</mml:mi>
<mml:msup>
<mml:mi>n</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>
</td>
<td align="left" valign="middle">Average number of Omicron variant for calibration period</td>
</tr>
<tr>
<td align="left" valign="middle">
<inline-formula>
<mml:math id="M102">
<mml:mi mathvariant="italic">Polic</mml:mi>
<mml:msup>
<mml:mi>y</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>
</td>
<td align="left" valign="middle">Average level of NPIs for calibration period</td>
</tr>
<tr>
<td align="left" valign="middle">
<inline-formula>
<mml:math id="M103">
<mml:mi mathvariant="italic">Polic</mml:mi>
<mml:msup>
<mml:mi>y</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>
</td>
<td align="left" valign="middle">Average level of NPIs for prediction period</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We applied ML techniques such as SVM, RF, and XGB (<xref ref-type="bibr" rid="ref29 ref30 ref31">29&#x2013;31</xref>). SVM is a supervised learning ML model used for classification. SVM uses support vectors to define decision boundaries and classifies unclassified points by comparing them with the corresponding decision boundaries.</p>
<p>SVM can be considered a model that adds a constraint condition to the perceptron-based model to find the most stable decision boundary. RF is a type of ensemble learning method used for classification and regression. It learns multiple decision trees in parallel to output classification or average predictions. A feature of RF is that the trees have slightly different characteristics due to their randomness. This property results in the decorrelation of the predictions of each tree, thereby improving the generalization performance. In addition, randomization makes the forest robust to noise data. XGB is an ensemble model that uses the boosting technique in a number of decision trees, which represents Extreme Gradient Boosting. XGB is characterized by the implementation of parallel learning to support Gradient Boost, an algorithm implemented using the existing boosting technique. In addition, XGB has a strong resistance to overfitting owing to its regularization function.</p>
<p>Grid search methods were used to determine the best performing hyperparameters for the three models. We used a 10-fold cross validation of the training data to determine the best performance. As a result of applying the grid search method to the three ML methods, the regularization parameter, gamma, and kernel in SVM were 50, 0.3, and the radial basis function, respectively. The number of trees and maximum depth of the RF and XGB algorithms were 85 and 14, and 110 and 7, respectively. <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S4</xref> summarizes the range of parameters used in the grid search process. We divided the training and test data into the same ratio for label 0, label 1, and label 2. To evaluate the performance of the three models, we show confusion matrices and receiver operating characteristic (ROC) curves for the test data and compare the accuracy of the three models with <italic>F</italic>1-score and AUC for L0, L1, and L2. We used Python language version 3.10 and scikit-learn version 1.1.3. In addition, we used <italic>SVC</italic>, <italic>RandomForestClassifier</italic>, <italic>XGBClassifier</italic> functions of scikit-learn to simulate the three classification algorithms.</p>
</sec>
<sec id="sec18">
<label>2.7</label>
<title>Outbreak detection method</title>
<p>Determining the start time of the new outbreak is important for controlling the spread of COVID-19. <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S5</xref> lists the start time of the reported outbreaks in Korea, including the important characteristics of each outbreak. In this study, we propose a new approach to detect a new outbreak, which we called as &#x201C;estimated outbreak,&#x201D; described in <xref ref-type="fig" rid="fig2">Figure 2C</xref>. We compared the reported outbreaks with the estimated outbreaks.</p>
<p>Estimated outbreaks have two approaches. First, we determined the estimated outbreak using the risk index. We defined the start time of the new outbreak as the first day when L2 designated from risk index (RI) was maintained for at least 14&#x2009;days. The start time of the early outbreak estimated from RI is denoted by ED from RI. Second, we determined the estimated outbreak using the machine learning methods. We defined the start time of the new outbreak as the first day when label 2, estimated from ML methods, was maintained for at least 14&#x2009;days, denoted by ED from ML. There are three ED from ML methods; (i) ED from SVM, (ii) ED from RF, and (iii) ED from XGB. Here, 14&#x2009;days is the duration of the maintenance. Republic Korea&#x2019;s COVID-19 prevention policy is established after more than 2&#x2009;weeks, which is why we designated a 2&#x2009;weeks period. We varied the duration of maintenance between 7&#x2013;28&#x2009;days.</p>
<p>Moreover, we analyzed the performance of the proposed methods around ED from RI. To do that, we compared the start time of estimated outbreaks during the 4&#x2009;weeks, 2&#x2009;weeks before and after the ED from RI. We defined and set the warning period and the interval for comparing the performance of the ML methods to be 4&#x2009;weeks.</p>
</sec>
<sec id="sec19">
<label>2.8</label>
<title>Data availability</title>
<p>We developed the proposed method in Python 3.10 and made the codes using source data freely available on GitHub at <ext-link xlink:href="https://github.com/modeling-computation/covid-19_outbreak/" ext-link-type="uri">https://github.com/modeling-computation/covid-19_outbreak/</ext-link>.</p>
</sec>
</sec>
<sec sec-type="results" id="sec20">
<label>3</label>
<title>Results</title>
<sec id="sec21">
<label>3.1</label>
<title>Estimation of the transmission trend</title>
<p><xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S2</xref> shows examples of the sample data with three labels. We calculated the correlation between the labels and the scaling parameters in Eq. (1). The labels, which were classified using the risk index, accurately reflected the trend of increase, maintenance, and decrease in <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S3</xref>. We set the scaling parameters to 0.01 because the correlation was high (0.6) when <inline-formula>
<mml:math id="M104">
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math id="M105">
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:math>
</inline-formula> were 0.01, as <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S3A</xref> shows. <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S3B</xref> displays the correlations between the labels and all eight features described in <xref ref-type="table" rid="tab1">Table 1</xref>. The slope (<inline-formula>
<mml:math id="M106">
<mml:msup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>) and standard deviation of the COVID-19 cases (<inline-formula>
<mml:math id="M107">
<mml:msup>
<mml:mi>&#x03C3;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>) for the calibration period had a strong correlation with labels. <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S3C</xref> illustrates the range of the risk index for each label using a box plot. The box plot clearly indicates that high values of the risk index correspond to label 2.</p>
<p><xref ref-type="fig" rid="fig3">Figure 3</xref> evaluates the performance of ML methods such as SVM, RF, and XGB. <xref ref-type="fig" rid="fig3">Figures 3A</xref>&#x2013;<xref ref-type="fig" rid="fig3">C</xref> presents confusion matrices for each method. The most critical errors occur when either predicting L2 when the actual label is L0, or predicting the L0 when the actual label is L2. RF and XGB did not make any of these errors, while SVM had two such cases. <xref ref-type="fig" rid="fig3">Figures 3D</xref>&#x2013;<xref ref-type="fig" rid="fig3">F</xref> depicts the ROC curve for each class. The area under the curve (AUC), which measures accuracy in the ROC curve, was found to be close to 1 for all three ML methods. <xref ref-type="table" rid="tab2">Table 2</xref> summarizes the accuracy of the ML methods. The accuracies of SVM, RF, and XGB were higher than 0.94, with values of 0.9441, 0.9580, and 0.9545, respectively. The prediction of the <italic>F</italic>1-score for L0 (Decrease) or L2 (Increase) was particularly accurate, with values of 0.95 and higher.</p>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>Confusion matrix and ROC curve using the test data labeled as L0, L1, L2. <bold>(A&#x2013;C)</bold> Confusion matrix using SVM, RF, and XGB, respectively. <bold>(D&#x2013;F)</bold> ROC using SVM, RF, and XGB, respectively.</p>
</caption>
<graphic xlink:href="fpubh-11-1252357-g003.tif"/>
</fig>
<table-wrap position="float" id="tab2">
<label>Table 2</label>
<caption>
<p>Accuracy of test data in three ML methods.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top" rowspan="2">Estimator</th>
<th align="center" valign="top" rowspan="2">Accuracy</th>
<th align="center" valign="top" colspan="3"><italic>F</italic>1-score</th>
</tr>
<tr>
<th align="center" valign="top">Label 0 (L0: decrease)</th>
<th align="center" valign="top">Label 1 (L1: maintain)</th>
<th align="center" valign="top">Label 2 (L2: increase)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">SVM</td>
<td align="char" valign="top" char=".">0.9441</td>
<td align="char" valign="middle" char=".">0.9570</td>
<td align="char" valign="middle" char=".">0.9231</td>
<td align="char" valign="middle" char=".">0.9529</td>
</tr>
<tr>
<td align="left" valign="middle">RF</td>
<td align="char" valign="top" char=".">0.9580</td>
<td align="char" valign="middle" char=".">0.9688</td>
<td align="char" valign="middle" char=".">0.9375</td>
<td align="char" valign="middle" char=".">0.9681</td>
</tr>
<tr>
<td align="left" valign="middle">XGB</td>
<td align="char" valign="top" char=".">0.9545</td>
<td align="char" valign="middle" char=".">0.9630</td>
<td align="char" valign="middle" char=".">0.9326</td>
<td align="char" valign="middle" char=".">0.9684</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="fig" rid="fig4">Figure 4</xref> shows the feature importance in RF and XGB. The features of standard deviation (<inline-formula>
<mml:math id="M108">
<mml:msup>
<mml:mi>&#x03C3;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>), the increment rate (<inline-formula>
<mml:math id="M109">
<mml:msup>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>), and mean (<inline-formula>
<mml:math id="M110">
<mml:msup>
<mml:mi>&#x03BC;</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>) of the COVID-19 cases for the calibration period were important for both methods. The control intervention (<inline-formula>
<mml:math id="M111">
<mml:mi mathvariant="italic">Polic</mml:mi>
<mml:msup>
<mml:mi>y</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>) also had a high rank of importance in RF, and the delta variant (<inline-formula>
<mml:math id="M112">
<mml:mi mathvariant="italic">Delt</mml:mi>
<mml:msup>
<mml:mi>a</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:msup>
</mml:math>
</inline-formula>) was an important feture in XGB.</p>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p>Feature importance among all eight features. <bold>(A)</bold> Feature importance using Random Forest. <bold>(B)</bold> Feature importance using XGBoost.</p>
</caption>
<graphic xlink:href="fpubh-11-1252357-g004.tif"/>
</fig>
<p>We conducted a sensitivity analysis by changing the calibration period from 14 to 28&#x2009;days and the prediction periods from 7 to 21&#x2009;days, as <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S6</xref> indicates. The results showed that the highest accuracy was achieved with a calibration period of 21&#x2009;days and a prediction periods of 14&#x2009;days.</p>
</sec>
<sec id="sec22">
<label>3.2</label>
<title>Estimation of the start time for outbreaks</title>
<p>Korea experienced several outbreaks between March 2020 and October 2022. <xref ref-type="fig" rid="fig5">Figure 5A</xref> shows the number of COVID-19 cases from 9 June 2021 to 7 July 2021 for an estimated outbreak. The black dashed line in <xref ref-type="fig" rid="fig5">Figure 5A</xref> represents the reported outbreak. The asterisks in <xref ref-type="fig" rid="fig5">Figure 5B</xref> (&#x2605;) presents the ED from RI. The shaded areas indicate the labels as L0 (green), L1 (yellow), and L2 (red) according to the risk index. We determined the start time of the new outbreak when the label remained at L2 for 2&#x2009;weeks, which was the duration of maintenance. Therefore, the ED from RI for this outbreak was 23 June 2021. <xref ref-type="fig" rid="fig5">Figure 5C</xref> compares the ED from RI with the ED from ML. The ED from RF and ED from XGB showed the same dates as the ED from RI.</p>
<fig position="float" id="fig5">
<label>Figure 5</label>
<caption>
<p>Estimation of the start time of COVID-19 outbreaks. <bold>(A)</bold> The bars show the COVID-19 cases from 9 June 2021 to 7 July 2021. The black dashed line marks the reported outbreak. <bold>(B)</bold> The label is obtained from the risk index. The blue asterisk (&#x2605;) represents ED from RI. <bold>(C)</bold> Comparison between ED from RI and ED from ML during the warning period from ED from RI. The black solid line shows the number of COVID-19 cases (left <italic>y</italic>-axis). The blue dashed line shows the calculated risk index (RI) (right <italic>y</italic>-axis). The results of ED from ML are marked as SVM (&#x25CF;), RF (&#x25BC;), and XGB (+). The shaded areas indicate the labels as L0 (green), L1 (yellow), and L2 (red) according to the risk index.</p>
</caption>
<graphic xlink:href="fpubh-11-1252357-g005.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig6">Figure 6</xref> summarizes all estimated outbreaks. <xref ref-type="fig" rid="fig6">Figure 6A</xref> displays the number of COVID-19 cases with the five reported outbreaks. We obtained seven estimated outbreaks, numbered (1)&#x2013;(7), based on ED from RI in <xref ref-type="fig" rid="fig6">Figure 6B</xref>. Black dashed lines in <xref ref-type="fig" rid="fig6">Figure 6B</xref> indicate the reported outbreaks. This method declared the ED a few days earlier than the start time of reported outbreaks. There were seven estimated outbreaks, including the 1st and 5th ones [(1) and (5)], while there were only five reported outbreaks.</p>
<fig position="float" id="fig6">
<label>Figure 6</label>
<caption>
<p>Comparison of estimated outbreaks. <bold>(A)</bold> The epidemic curve is shown from 18 February 2020 to 31 October 2022. The black dashed lines mark five reported outbreaks, described in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S5</xref>. <bold>(B)</bold> The label is obtained from the risk index in the black solid line. The blue asterisk (&#x2605;) represents ED from RI. The magenta shaded region indicates the warning period from ED from RI. <bold>(C)</bold> Comparison between ED from RI and ED from ML during the warning period from ED from RI for (1)&#x2013;(7) estimated outbreaks. The black solid line shows the number of COVID-19 cases (left <italic>y</italic>-axis). The blue dashed line shows the calculated RI on the right <italic>y</italic>-axis. ED from ML are marked as SVM (&#x25CF;), RF (&#x25BC;), and XGB (+). The shaded areas indicate the labels as L0 (green), L1 (yellow), and L2 (red) according to the risk index.</p>
</caption>
<graphic xlink:href="fpubh-11-1252357-g006.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig6">Figure 6C</xref> shows the specific results of each outbreak using ML methods. The figure also displays the COVID-19 cases (black solid line) and the risk index (blue dashed line). The ED from RI and the ED from ML predicted the same start dates of the (2), (3), (6), and (7) outbreaks. However, for the (1), (4), and (5) outbreaks, the ED from RI and the ED from ML differed by only 1&#x2009;day. This means that both methods predicted almost identical start dates.</p>
<p><xref ref-type="table" rid="tab3">Table 3</xref> summarizes the accuracy of the results between the reported and estimated outbreaks. We compared the accuracy of ML on the start time of outbreaks (1)&#x2013;(7). We examine the results during the warning period, which was between 2&#x2009;weeks before and after the ED from RI. The overall accuracy was high, ranging from 80% to 100%. Regarding the warning period for 4&#x2009;weeks, RF showed the most accurate estimation with 100% accuracy, except for (1) and (5) outbreaks. This implies that RF detected the ED better for the rapid increase in a trend than other ML methods such as SVM and XGB.</p>
<table-wrap position="float" id="tab3">
<label>Table 3</label>
<caption>
<p>Comparison of the accuracy of the test data between the reported outbreak and estimation of ED using ML method (ED from ML).</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th rowspan="2"/>
<th align="center" valign="top" rowspan="2">Reported outbreak<xref ref-type="table-fn" rid="tfn1">
<sup>a</sup>
</xref></th>
<th align="center" valign="top" rowspan="2">ED from RI</th>
<th align="center" valign="top" colspan="6">ED from ML</th>
</tr>
<tr>
<th align="center" valign="top" colspan="2">ED from SVM</th>
<th align="center" valign="top" colspan="2">ED from RF</th>
<th align="center" valign="top" colspan="2">ED from XGB</th>
</tr>
<tr>
<th align="left" valign="middle">Estimated outbreak</th>
<th align="center" valign="middle">Date</th>
<th align="center" valign="middle">Date</th>
<th align="center" valign="middle">Date</th>
<th align="center" valign="middle">Accuracy</th>
<th align="center" valign="middle">Date</th>
<th align="center" valign="middle">Accuracy</th>
<th align="center" valign="middle">Date</th>
<th align="center" valign="middle">Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">(1)</td>
<td align="center" valign="middle">&#x2014;</td>
<td align="center" valign="middle">2020-04-30</td>
<td align="center" valign="middle">2020-04-29</td>
<td align="center" valign="middle">0.923</td>
<td align="center" valign="middle">2020-04-29</td>
<td align="center" valign="middle">0.923</td>
<td align="center" valign="middle">2020-04-29</td>
<td align="char" valign="middle" char=".">0.923</td>
</tr>
<tr>
<td align="left" valign="middle">(2)</td>
<td align="center" valign="middle">2020-08-12</td>
<td align="center" valign="middle">2020-07-31</td>
<td align="center" valign="middle">2020-07-31</td>
<td align="center" valign="middle">0.857</td>
<td align="center" valign="middle">2020-07-31</td>
<td align="center" valign="middle">1.000</td>
<td align="center" valign="middle">2020-07-31</td>
<td align="char" valign="middle" char=".">1.000</td>
</tr>
<tr>
<td align="left" valign="middle">(3)</td>
<td align="center" valign="middle">2020-11-13</td>
<td align="center" valign="middle">2020-10-25</td>
<td align="center" valign="middle">2020-10-25</td>
<td align="center" valign="middle">0.857</td>
<td align="center" valign="middle">2020-10-25</td>
<td align="center" valign="middle">1.000</td>
<td align="center" valign="middle">2020-10-25</td>
<td align="char" valign="middle" char=".">1.000</td>
</tr>
<tr>
<td align="left" valign="middle">(4)</td>
<td align="center" valign="middle">2021-06-23</td>
<td align="center" valign="middle">2021-06-23</td>
<td align="center" valign="middle">2021-06-24</td>
<td align="center" valign="middle">0.889</td>
<td align="center" valign="middle">2021-06-23</td>
<td align="center" valign="middle">1.000</td>
<td align="center" valign="middle">2021-06-23</td>
<td align="char" valign="middle" char=".">1.000</td>
</tr>
<tr>
<td align="left" valign="middle">(5)</td>
<td align="center" valign="middle">-</td>
<td align="center" valign="middle">2021-10-27</td>
<td align="center" valign="middle">2021-10-27</td>
<td align="center" valign="middle">1.000</td>
<td align="center" valign="middle">2021-10-28</td>
<td align="center" valign="middle">0.833</td>
<td align="center" valign="middle">2021-10-28</td>
<td align="char" valign="middle" char=".">0.833</td>
</tr>
<tr>
<td align="left" valign="middle">(6)</td>
<td align="center" valign="middle">2022&#x2013;01&#x2013;30</td>
<td align="center" valign="middle">2022-01-12</td>
<td align="center" valign="middle">2022-01-12</td>
<td align="center" valign="middle">0.857</td>
<td align="center" valign="middle">2022-01-12</td>
<td align="center" valign="middle">1.000</td>
<td align="center" valign="middle">2022-01-12</td>
<td align="char" valign="middle" char=".">1.000</td>
</tr>
<tr>
<td align="left" valign="middle">(7)</td>
<td align="center" valign="middle">2022-07-01</td>
<td align="center" valign="middle">2022-06-24</td>
<td align="center" valign="middle">2022-06-24</td>
<td align="center" valign="middle">1.000</td>
<td align="center" valign="middle">2022-06-24</td>
<td align="center" valign="middle">1.000</td>
<td align="center" valign="middle">2022-06-24</td>
<td align="char" valign="middle" char=".">1.000</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>There are seven outbreaks estimated from ML methods during a 4&#x2009;weeks, denoted by (1)&#x2013;(7). The date of ED from RI and ED from ML shows the timing of the early outbreak from the estimation.</p>
<fn id="tfn1"><label>a</label><p>Reported outbreak represents the start time of the outbreaks, summarized in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S5</xref>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p><xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S4</xref> compares ED from RI with ED from ML by different durations of maintenance. When the duration changed to 7 or 21&#x2009;days, there was no significant difference in the results. However, starting from 28&#x2009;days, some outbreak detection points were not identified for a few outbreaks.</p>
<p>So far, we have used the training and testing datasets with a random 7:3 split ratio. Here, we conduct a simulation to assess the applicability of our approach for future prediction of the transmission trend. We divide the data into the train data from February 2020 to April 2022, when the omicron variant became prominent, and the test data from May to October 2022. We obtain sufficiently high accuracy on the test data as 0.8647 for RF and 0.8529 for XGB, even though those values decrease by approximately 5%&#x2013;10%, compared to predictions made with randomly shuffled data. We need to figure out if our estimation can capture the fact that the start time of the 7th outbreak falls within the test data period.</p>
<p><xref ref-type="fig" rid="fig7">Figure 7</xref> shows the result of the estimation using the train data (February 2020&#x2013;April 2022) and the test data (May 2022&#x2013;October 2022). Based on the ED from RI results, the start time of the 7th outbreak was determined to be on 24 June 2022. In comparison, the machine learning predictions yielded the following results: the ED from SVM and the ED from XGB were 4&#x2009;days later and 2&#x2009;days earlier, respectively. However, the ED from RF accurately predicted the exact same day. Therefore, this result confirms that our approach can effectively predict the early outbreaks.</p>
<fig position="float" id="fig7">
<label>Figure 7</label>
<caption>
<p>Estimation of the outbreak using the train data (February 2020&#x2013;April 2022) and the test data (May 2022&#x2013;October 2022). Comparison between ED from RI and ED from ML during the warning period from ED from RI. The black solid line shows the number of COVID-19 cases (left <italic>y</italic>-axis). The blue dashed line shows the calculated RI (right <italic>y</italic>-axis). The results of ED from ML are marked as SVM (&#x25CF;), RF (&#x25BC;), and XGB (+). The shaded areas indicate the labels as L0 (green), L1 (yellow), and L2 (red) according to the risk index.</p>
</caption>
<graphic xlink:href="fpubh-11-1252357-g007.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="sec23">
<label>4</label>
<title>Discussion</title>
<p>In the present study, we aimed to propose a machine learning-based method to predict the transmission trend of COVID-19 and to detect the start time of new outbreaks by analyzing epidemiological data in the Republic of Korea. To do so, we first, evaluated the performance of ML methods such as SVM, RF, and XGB in estimating the transmission trend. We developed a risk index to measure changes in the transmission trend, which were categorized into three groups: decrease (L0), maintain (L1), and increase (L2). We achieved a high accuracy (over 94%) in predicting the classification of transmission trends. Specifically, the SVM, RF, and XGB methods yielded accuracies of 0.9441, 0.9580, and 0.9545, respectively, as shown in <xref ref-type="fig" rid="fig3">Figure 3</xref> and <xref ref-type="table" rid="tab2">Table 2</xref>.</p>
<p>Second, we estimated new outbreaks from March 2020 to October 2022 in Korea. We proposed a new method for identifying the start time of new outbreaks when the label 2 is sustained for at least 14&#x2009;days, which means the duration of maintenance is set to be 14&#x2009;days. According to this standard, we estimated outbreaks using two approaches: (i) ED from RI, (ii) ED from ML. We obtained seven estimated outbreaks, numbered (1)&#x2013;(7) based on ED from RI, as shown in <xref ref-type="fig" rid="fig6">Figure 6</xref> and <xref ref-type="table" rid="tab3">Table 3</xref>, while there were only five reported outbreaks. This means that the proposed method could be applied to detect minor outbreaks such as (1) and (5). We found that both the ED from RI and the ED from ML accurately predicted the same start dates for the (2), (3), (6), and (7) outbreaks. For the (1), (4), and (5) outbreaks, the ED from RI and the ED from ML differed by only 1&#x2009;day. This indicates that both methods predicted start dates that were nearly identical. Additionally, we compared the accuracy of ED from ML in predicting the start time of outbreaks (1)&#x2013;(7) during the warning period, which is the time period before and after 2&#x2009;weeks from the ED from RI. The overall accuracy was high, ranging between 80%&#x2013;100%. RF and XGB achieved the highest accuracy for outbreak detection, with 100% accuracy, except for the (1) and (5) outbreaks.</p>
<p>Third, we conducted a sensitivity analysis in our study, which included two components: (i) we evaluated the impact of different calibration periods (ranging from 14 to 28&#x2009;days) and prediction periods (ranging from 7 to 21&#x2009;days), with the calibration period being longer than the prediction period. Based on our analysis, we determined that the highest accuracy was obtained when using a calibration period of 21 days and a prediction period of 14 days, as presented in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S6</xref>. (ii) We varied the duration of maintenance for L2 between 7 and 28&#x2009;days, as shown in <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S4</xref>. We observed that there was no significant difference in the results when the duration was changed to 7 or 21&#x2009;days. However, when the duration was extended to 28&#x2009;days, some outbreak detection points were missed for a few outbreaks.</p>
<p>This study has several limitations. First, previous studies (<xref ref-type="bibr" rid="ref32">32</xref>, <xref ref-type="bibr" rid="ref33">33</xref>) have shown that vaccination reduces the number of severe cases. However, this study did not consider the effect of vaccination. We assumed that vaccination had a greater impact on reducing the number of infected patients than on the occurrence of outbreaks. Thus, we did not consider vaccination because we aimed to predict the occurrence and trend of outbreaks using classification methods.</p>
<p>Second, there is a limitation of insufficient data available, as COVID-19 has only had a period of 2&#x2009;years of circulation compared to diseases such as influenza and norovirus that exhibit long-term epidemic patterns, which have been studied using ML to predict the start time of outbreaks in (<xref ref-type="bibr" rid="ref34">34</xref>, <xref ref-type="bibr" rid="ref35">35</xref>). To overcome this, we analyzed the pattern of COVID-19 transmission in Korea and successfully extracted features that were highly related to the labels listed in <xref ref-type="table" rid="tab1">Table 1</xref>. Consequently, we were able to achieve high accuracy in predicting the trend of epidemic patterns in three categories: increase, maintain, and decrease.</p>
<p>Despite these limitations, our study proposes a novel approach for estimating the start time of new outbreaks using machine learning methods and a risk index function, which has not been previously studied. Our approach offers several advantages and potential applications. In previous studies (<xref ref-type="bibr" rid="ref14">14</xref>, <xref ref-type="bibr" rid="ref36">36</xref>), only the data on the number of infected patients were utilized for predictions of COVID-19 transmission. However, we incorporated various data, including the intensity changes in NPIs policies implemented by the Korean government and the prevalence of variant viruses (especially delta and omicron). Thus, our interpretation is comprehensive by analyzing the epidemiological data.</p>
<p>We newly suggested a risk index to quantify the changes of transmission trend. The risk index indicates the change of the transmission trend, which can be used to classify the risk of potential outbreaks. This measurement is a mathematically interpretable novel measurement that was not used in previous research. Using this metric, we are able to classify sample data into three distinct patterns (Increase, Maintain, Decrease) and assign labels accordingly.</p>
<p>Moreover, the variability in NPI intensity can be contingent on policy decisions. This means that by adjusting the NPI levels during the prediction period, we can anticipate shifts in future patterns of infection. This has the potential to assist in determining effective policy steps. In essence, our proposed predictive method can be utilized as a scientific foundation for establishing policy levels.</p>
<p>Previous research (<xref ref-type="bibr" rid="ref14">14</xref>, <xref ref-type="bibr" rid="ref36">36</xref>) showed that the prediction accuracy for early detection of outbreak exhibited around 60%&#x2013;80% even though the proposed methods were different. However, in the current study, employing machine learning techniques for the categorization on test data yielded a significantly higher accuracy of approximately 94%. Notably, a higher accuracy was achieved specifically for the Increase category (L2). By incorporating various datasets and utilizing the novel risk index for categorizing infection patterns, our proposed method contributed to achieving robust predictive performance even with limited data.</p>
<p>Overall, our study highlights the strength of our approach in accurately predicting the timing of an outbreak using an interpretable and explainable method. This method is also applicable to other infectious diseases and can contribute to the development of targeted prevention and control measures, facilitating better management of resources during the pandemic. It would enable healthcare providers to respond more effectively to COVID-19. Our proposed method identified outbreaks using machine learning-based approaches and can be further improved by collecting more data and establishing appropriate criteria for classes in future studies.</p>
</sec>
<sec sec-type="conclusions" id="sec24">
<label>5</label>
<title>Conclusion</title>
<p>In conclusion, this study proposed a novel method for detecting the start time of new outbreaks and predicting transmission trends using machine learning-based approaches and a risk index function. The method achieved high accuracy in estimating the classification of transmission trends and successfully identified outbreaks with an interpretable and explainable method. The accuracy of SVM, RF, and XGB was higher than 0.94, with RF achieving the highest accuracy for outbreak detection. The method provides a standard for predicting the start time of new outbreaks, enabling healthcare providers to respond more effectively to COVID-19 transmission. Overall, the study demonstrates the strength of machine learning-based approaches in accurately predicting the timing of outbreaks, ultimately improving patient care and reducing the burden on healthcare systems.</p>
</sec>
<sec sec-type="data-availability" id="sec25">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec sec-type="ethics-statement" id="sec26">
<title>Ethics statement</title>
<p>Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements.</p>
</sec>
<sec sec-type="author-contributions" id="sec27">
<title>Author contributions</title>
<p>GC and JP: analyzed the data. GC, JP, YC, HA, and HL: drafted and revised the manuscript and interpreted the results. All authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="sec28">
<title>Funding</title>
<p>HL was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2022R1C1C1006237, NRF-2022R1A5A1033624, RS-2023-00227944). GC was supported by an NRF grant funded by the Korean government (No. NRF-2020R1C1C1A01012557). JP was supported by an NRF grant funded by the Korean government (No. NRF-2021R1I1A1A01057767). YC was supported by a National Institute for Mathematical Sciences (NIMS) grant funded by the Korean government (MSIT) (No. B23820000).</p>
</sec>
<sec sec-type="COI-statement" id="sec29">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="sec30">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fpubh.2023.1252357/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fpubh.2023.1252357/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Viana</surname><given-names>J</given-names></name> <name><surname>van Dorp</surname><given-names>CH</given-names></name> <name><surname>Nunes</surname><given-names>A</given-names></name> <name><surname>Gomes</surname><given-names>MC</given-names></name> <name><surname>van Boven</surname><given-names>M</given-names></name> <name><surname>Kretzschmar</surname><given-names>ME</given-names></name> <etal/></person-group>. <article-title>Controlling the pandemic during the SARS-CoV-2 vaccination rollout</article-title>. <source>Nat Commun</source>. (<year>2021</year>) <volume>12</volume>:<fpage>3674</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41467-021-23938-8</pub-id>, PMID: <pub-id pub-id-type="pmid">34135335</pub-id></citation></ref>
<ref id="ref2"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname><given-names>S</given-names></name> <name><surname>Hill</surname><given-names>EM</given-names></name> <name><surname>Tildesley</surname><given-names>MJ</given-names></name> <name><surname>Dyson</surname><given-names>L</given-names></name> <name><surname>Keeling</surname><given-names>MJ</given-names></name></person-group>. <article-title>Vaccination and non-pharmaceutical interventions for COVID-19: a mathematical modelling study</article-title>. <source>Lancet Infect Dis</source>. (<year>2021</year>) <volume>21</volume>:<fpage>793</fpage>&#x2013;<lpage>802</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S1473-3099(21)00143-2</pub-id>, PMID: <pub-id pub-id-type="pmid">33743847</pub-id></citation></ref>
<ref id="ref3"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giordano</surname><given-names>G</given-names></name> <name><surname>Colaneri</surname><given-names>M</given-names></name> <name><surname>Di Filippo</surname><given-names>A</given-names></name> <name><surname>Blanchini</surname><given-names>F</given-names></name> <name><surname>Bolzern</surname><given-names>P</given-names></name> <name><surname>De Nicolao</surname><given-names>G</given-names></name> <etal/></person-group>. <article-title>Modeling vaccination rollouts, SARS-CoV-2 variants and the requirement for non-pharmaceutical interventions in Italy</article-title>. <source>Nat Med</source>. (<year>2021</year>) <volume>27</volume>:<fpage>993</fpage>&#x2013;<lpage>8</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41591-021-01334-5</pub-id>, PMID: <pub-id pub-id-type="pmid">33864052</pub-id></citation></ref>
<ref id="ref4"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>AlArjani</surname><given-names>A</given-names></name> <name><surname>Nasseef</surname><given-names>MT</given-names></name> <name><surname>Kamal</surname><given-names>SM</given-names></name> <name><surname>Rao</surname><given-names>BVS</given-names></name> <name><surname>Mahmud</surname><given-names>M</given-names></name> <name><surname>Uddin</surname><given-names>MS</given-names></name></person-group>. <article-title>Application of mathematical modeling in prediction of COVID-19 transmission dynamics</article-title>. <source>Arab J Sci Eng</source>. (<year>2022</year>) <volume>47</volume>:<fpage>10163</fpage>&#x2013;<lpage>86</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s13369-021-06419-4</pub-id>, PMID: <pub-id pub-id-type="pmid">35018276</pub-id></citation></ref>
<ref id="ref5"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pagel</surname><given-names>C</given-names></name> <name><surname>Yates</surname><given-names>CA</given-names></name></person-group>. <article-title>Role of mathematical modelling in future pandemic response policy</article-title>. <source>BMJ</source>. (<year>2022</year>) <volume>378</volume>:<fpage>e070615</fpage>. doi: <pub-id pub-id-type="doi">10.1136/bmj-2022-070615</pub-id>, PMID: <pub-id pub-id-type="pmid">36109042</pub-id></citation></ref>
<ref id="ref6"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shahid</surname><given-names>F</given-names></name> <name><surname>Zameer</surname><given-names>A</given-names></name> <name><surname>Muneeb</surname><given-names>M</given-names></name></person-group>. <article-title>Predictions for COVID-19 with deep learning models of LSTM, GRU and bi-LSTM</article-title>. <source>Chaos Solitons Fractals</source>. (<year>2020</year>) <volume>140</volume>:<fpage>110212</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.chaos.2020.110212</pub-id>, PMID: <pub-id pub-id-type="pmid">32839642</pub-id></citation></ref>
<ref id="ref7"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dairi</surname><given-names>A</given-names></name> <name><surname>Harrou</surname><given-names>F</given-names></name> <name><surname>Zeroual</surname><given-names>A</given-names></name> <name><surname>Hittawe</surname><given-names>MM</given-names></name> <name><surname>Sun</surname><given-names>Y</given-names></name></person-group>. <article-title>Comparative study of machine learning methods for COVID-19 transmission forecasting</article-title>. <source>J Biomed Inform</source>. (<year>2021</year>) <volume>118</volume>:<fpage>103791</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jbi.2021.103791</pub-id>, PMID: <pub-id pub-id-type="pmid">33915272</pub-id></citation></ref>
<ref id="ref8"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balli</surname><given-names>S</given-names></name></person-group>. <article-title>Data analysis of COVID-19 pandemic and short-term cumulative case forecasting using machine learning time series methods</article-title>. <source>Chaos Solitons Fractals</source>. (<year>2021</year>) <volume>142</volume>:<fpage>110512</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.chaos.2020.110512</pub-id>, PMID: <pub-id pub-id-type="pmid">33281306</pub-id></citation></ref>
<ref id="ref9"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Katragadda</surname><given-names>S</given-names></name> <name><surname>Bhupatiraju</surname><given-names>RT</given-names></name> <name><surname>Raghavan</surname><given-names>V</given-names></name> <name><surname>Ashkar</surname><given-names>Z</given-names></name> <name><surname>Gottumukkala</surname><given-names>R</given-names></name></person-group>. <article-title>Examining the COVID-19 case growth rate due to visitor vs. local mobility in the United States using machine learning</article-title>. <source>Sci Rep</source>. (<year>2022</year>) <volume>12</volume>:<fpage>12337</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-022-16561-0</pub-id>, PMID: <pub-id pub-id-type="pmid">35853927</pub-id></citation></ref>
<ref id="ref10"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chakraborty</surname><given-names>T</given-names></name> <name><surname>Ghosh</surname><given-names>I</given-names></name></person-group>. <article-title>Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: a data-driven analysis</article-title>. <source>Chaos Solitons Fractals</source>. (<year>2020</year>) <volume>135</volume>:<fpage>109850</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.chaos.2020.109850</pub-id>, PMID: <pub-id pub-id-type="pmid">32355424</pub-id></citation></ref>
<ref id="ref11"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname><given-names>Y</given-names></name> <name><surname>Liu</surname><given-names>X</given-names></name> <name><surname>Kok</surname><given-names>SY</given-names></name> <name><surname>Rajarethinam</surname><given-names>J</given-names></name> <name><surname>Liang</surname><given-names>S</given-names></name> <name><surname>Yap</surname><given-names>G</given-names></name> <etal/></person-group>. <article-title>Three-month real-time dengue forecast models: an early warning system for outbreak alerts and policy decision support in Singapore</article-title>. <source>Environ Health Perspect</source>. (<year>2016</year>) <volume>124</volume>:<fpage>1369</fpage>&#x2013;<lpage>75</lpage>. doi: <pub-id pub-id-type="doi">10.1289/ehp.1509981</pub-id>, PMID: <pub-id pub-id-type="pmid">26662617</pub-id></citation></ref>
<ref id="ref12"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Son</surname><given-names>WS</given-names></name> <name><surname>Park</surname><given-names>JE</given-names></name> <name><surname>Kwon</surname><given-names>O</given-names></name></person-group>. <article-title>Early detection of influenza outbreak using time derivative of incidence</article-title>. <source>EPJ Data Sci</source>. (<year>2020</year>) <volume>9</volume>:<fpage>28</fpage>. doi: <pub-id pub-id-type="doi">10.1140/epjds/s13688-020-00246-7</pub-id>, PMID: <pub-id pub-id-type="pmid">32934899</pub-id></citation></ref>
<ref id="ref13"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vianello</surname><given-names>C</given-names></name> <name><surname>Strozzi</surname><given-names>F</given-names></name> <name><surname>Mocellin</surname><given-names>P</given-names></name> <name><surname>Cimetta</surname><given-names>E</given-names></name> <name><surname>Fabiano</surname><given-names>B</given-names></name> <name><surname>Manenti</surname><given-names>F</given-names></name> <etal/></person-group>. <article-title>A perspective on early detection systems models for COVID-19 spreading</article-title>. <source>Biochem Biophys Res Commun</source>. (<year>2021</year>) <volume>538</volume>:<fpage>244</fpage>&#x2013;<lpage>52</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.bbrc.2020.12.010</pub-id>, PMID: <pub-id pub-id-type="pmid">33342518</pub-id></citation></ref>
<ref id="ref14"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martinez-Velazquez</surname><given-names>R</given-names></name> <name><surname>Tobon</surname><given-names>VD</given-names></name> <name><surname>Sanchez</surname><given-names>A</given-names></name> <name><surname>El Saddik</surname><given-names>A</given-names></name> <name><surname>Petriu</surname><given-names>E</given-names></name></person-group>. <article-title>A machine learning approach as an aid for early COVID-19 detection</article-title>. <source>Sensors</source>. (<year>2021</year>) <volume>21</volume>:<fpage>4202</fpage>. doi: <pub-id pub-id-type="doi">10.3390/s21124202</pub-id>, PMID: <pub-id pub-id-type="pmid">34207437</pub-id></citation></ref>
<ref id="ref15"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kogan</surname><given-names>NE</given-names></name> <name><surname>Clemente</surname><given-names>L</given-names></name> <name><surname>Liautaud</surname><given-names>P</given-names></name> <name><surname>Kaashoek</surname><given-names>J</given-names></name> <name><surname>Link</surname><given-names>NB</given-names></name> <name><surname>Nguyen</surname><given-names>AT</given-names></name> <etal/></person-group>. <article-title>An early warning approach to monitor COVID-19 activity with multiple digital traces in near real time</article-title>. <source>Sci Adv</source>. (<year>2021</year>) <volume>7</volume>:<fpage>1</fpage>, <fpage>33674304</fpage>&#x2013;<lpage>33674316</lpage>. doi: <pub-id pub-id-type="doi">10.1126/sciadv.abd6989</pub-id>, PMID: <pub-id pub-id-type="pmid">33674304</pub-id></citation></ref>
<ref id="ref16"><label>16.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Shi</surname><given-names>J</given-names></name> <name><surname>Jain</surname><given-names>M</given-names></name> <name><surname>Narasimhan</surname><given-names>G</given-names></name></person-group>. <article-title>Time series forecasting (TSF) using various deep learning models</article-title>. (<year>2022</year>) <italic>arXiv</italic>. Available at: <ext-link xlink:href="https://doi.org/10.48550/arXiv.2204.11115" ext-link-type="uri">https://doi.org/10.48550/arXiv.2204.11115</ext-link>. [Epub ahead of preprint]</citation></ref>
<ref id="ref17"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>S</given-names></name> <name><surname>Kim</surname><given-names>M</given-names></name> <name><surname>Lee</surname><given-names>S</given-names></name> <name><surname>Lee</surname><given-names>YJ</given-names></name></person-group>. <article-title>Discovering spatiotemporal patterns of COVID-19 pandemic in South Korea</article-title>. <source>Sci Rep</source>. (<year>2021</year>) <volume>11</volume>:<fpage>34963690</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-021-03487-2</pub-id></citation></ref>
<ref id="ref18"><label>18.</label><citation citation-type="other"><person-group person-group-type="author"><collab id="coll1">Coronavirus (COVID-19), Republic of Korea. Central Disaster Management Headquarters</collab></person-group>. <comment>Available at: </comment><ext-link xlink:href="https://ncov.kdca.go.kr/" ext-link-type="uri">https://ncov.kdca.go.kr/</ext-link>. (Accessed August 20, 2023)</citation></ref>
<ref id="ref19"><label>19.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Hodcroft</surname><given-names>E</given-names></name></person-group> <article-title>CoVariants</article-title>. <comment>Available at: </comment><ext-link xlink:href="https://covariants.org/" ext-link-type="uri">https://covariants.org/</ext-link>. (Accessed April 30, 2023)</citation></ref>
<ref id="ref20"><label>20.</label><citation citation-type="other"><person-group person-group-type="author"><collab id="coll2">Tracking SARS-CoV-2 variants. World Health Organization</collab></person-group>. <comment>Available at: </comment><ext-link xlink:href="https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/" ext-link-type="uri">https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/</ext-link>. (Accessed April 30, 2023)</citation></ref>
<ref id="ref21"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>H</given-names></name> <name><surname>Kim</surname><given-names>Y</given-names></name> <name><surname>Kim</surname><given-names>E</given-names></name> <name><surname>Lee</surname><given-names>S</given-names></name></person-group>. <article-title>Risk assessment of importation and local transmission of COVID-19 in South Korea: statistical modeling approach</article-title>. <source>JMIR Public Health Surveill</source>. (<year>2021</year>) <volume>7</volume>:<fpage>33819165</fpage>. doi: <pub-id pub-id-type="doi">10.2196/26784</pub-id></citation></ref>
<ref id="ref22"><label>22.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Siraj</surname><given-names>A</given-names></name> <name><surname>Worku</surname><given-names>A</given-names></name> <name><surname>Berhane</surname><given-names>K</given-names></name> <name><surname>Aregawi</surname><given-names>M</given-names></name> <name><surname>Eshetu</surname><given-names>M</given-names></name> <name><surname>Mirkuzie</surname><given-names>A</given-names></name> <etal/></person-group>. <article-title>Early estimates of COVID-19 infections in small, medium and large population clusters</article-title>. <source>BMJ Glob Health</source>. (<year>2020</year>) <volume>5</volume>:<fpage>32948617</fpage>. doi: <pub-id pub-id-type="doi">10.1136/bmjgh-2020-003055</pub-id>, PMID: <pub-id pub-id-type="pmid">32948617</pub-id></citation></ref>
<ref id="ref23"><label>23.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choi</surname><given-names>Y</given-names></name> <name><surname>Kim</surname><given-names>JS</given-names></name> <name><surname>Choi</surname><given-names>H</given-names></name> <name><surname>Lee</surname><given-names>H</given-names></name> <name><surname>Lee</surname><given-names>CH</given-names></name></person-group>. <article-title>Assessment of social distancing for controlling COVID-19 in Korea: an age-structured modeling approach</article-title>. <source>Int J Environ Res Public Health</source>. (<year>2020</year>) <volume>17</volume>:<fpage>7474</fpage>. doi: <pub-id pub-id-type="doi">10.3390/ijerph17207474</pub-id>, PMID: <pub-id pub-id-type="pmid">33066581</pub-id></citation></ref>
<ref id="ref24"><label>24.</label><citation citation-type="other"><person-group person-group-type="author"><collab id="coll3">Public Data Portal, Republic of Korea</collab></person-group>. <comment>Available at: </comment><ext-link xlink:href="https://www.data.go.kr/data/15106451/fileData.do" ext-link-type="uri">https://www.data.go.kr/data/15106451/fileData.do</ext-link>. (Accessed August 20, 2023)</citation></ref>
<ref id="ref25"><label>25.</label><citation citation-type="other"><person-group person-group-type="author"><collab id="coll4">Coronavirus (COVID-19), Republic of Korea. Central Disaster Management Headquarters</collab></person-group>. <comment>Available at: </comment><ext-link xlink:href="https://ncov.kdca.go.kr/en/tcmBoardList.do?brdId=12&#x0026;brdGubun=125&#x0026;dataGubun=&#x0026;ncvContSeq=&#x0026;contSeq=&#x0026;board_id=&#x0026;gubun" ext-link-type="uri">https://ncov.kdca.go.kr/en/tcmBoardList.do?brdId=12&#x0026;brdGubun=125&#x0026;dataGubun=&#x0026;ncvContSeq=&#x0026;contSeq=&#x0026;board_id=&#x0026;gubun</ext-link>. (Accessed April 30, 2023)</citation></ref>
<ref id="ref26"><label>26.</label><citation citation-type="other"><person-group person-group-type="author"><collab id="coll5">Social Distance Implementation Plan for COVID-19. Korea Disease Control and Prevention Agency</collab></person-group>. <comment>Available at: </comment><ext-link xlink:href="https://ncov.kdca.go.kr/socdisBoardList.do?brdId=6&#x0026;brdGubun=64&#x0026;dataGubun=641" ext-link-type="uri">https://ncov.kdca.go.kr/socdisBoardList.do?brdId=6&#x0026;brdGubun=64&#x0026;dataGubun=641</ext-link>. (Accessed April 30, 2023)</citation></ref>
<ref id="ref27"><label>27.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>H</given-names></name> <name><surname>Jang</surname><given-names>G</given-names></name> <name><surname>Cho</surname><given-names>G</given-names></name></person-group>. <article-title>Forecasting COVID-19 cases by assessing the effect of social distancing in Republic of Korea</article-title>. <source>Alex Eng J</source>. (<year>2022</year>) <volume>61</volume>:<fpage>9203</fpage>&#x2013;<lpage>17</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.aej.2022.02.037</pub-id></citation></ref>
<ref id="ref28"><label>28.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kc</surname><given-names>K</given-names></name> <name><surname>Yin</surname><given-names>Z</given-names></name> <name><surname>Wu</surname><given-names>M</given-names></name> <name><surname>Wu</surname><given-names>Z</given-names></name></person-group>. <article-title>Evaluation of deep learning-based approaches for COVID-19 classification based on chest X-ray images</article-title>. <source>Signal Image Video Process</source>. (<year>2021</year>) <volume>15</volume>:<fpage>959</fpage>&#x2013;<lpage>66</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s11760-020-01820-2</pub-id>, PMID: <pub-id pub-id-type="pmid">33432267</pub-id></citation></ref><ref id="ref29"><label>29.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cortes</surname><given-names>C</given-names></name> <name><surname>Vapnik</surname><given-names>V</given-names></name></person-group>. <article-title>Support-vector networks</article-title>. <source>Mach Learn</source>. (<year>1995</year>) <volume>20</volume>:<fpage>273</fpage>&#x2013;<lpage>97</lpage>. doi: <pub-id pub-id-type="doi">10.1007/BF00994018</pub-id></citation></ref>
<ref id="ref30"><label>30.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Ho</surname><given-names>Tin Kam</given-names></name></person-group> <article-title>Random decision forests</article-title> <conf-name>Proceedings of 3rd International Conference on Document Analysis and Recognition</conf-name>; (<year>1995</year>) <fpage>14</fpage>&#x2013;<lpage>16</lpage>; <publisher-loc>Montreal, Canada</publisher-loc>: <publisher-name>IEEE Computer Society Press</publisher-name>.</citation></ref>
<ref id="ref31"><label>31.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>Tianqi</given-names></name> <name><surname>Guestrin</surname><given-names>Carlos</given-names></name></person-group>. <article-title>XGBoost: a scalable tree boosting system</article-title>. <italic>arXiv</italic>. Available at: <ext-link xlink:href="https://doi.org/10.48550/arXiv.1603.02754" ext-link-type="uri">https://doi.org/10.48550/arXiv.1603.02754</ext-link>. [Epub ahead preprint]</citation></ref>
<ref id="ref32"><label>32.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mozaffer</surname><given-names>F</given-names></name> <name><surname>Cherian</surname><given-names>P</given-names></name> <name><surname>Krishna</surname><given-names>S</given-names></name> <name><surname>Wahl</surname><given-names>B</given-names></name> <name><surname>Menon</surname><given-names>GI</given-names></name></person-group>. <article-title>Effect of hybrid immunity, school reopening, and the omicron variant on the trajectory of the COVID-19 epidemic in India: a modelling study</article-title>. <source>Lancet Reg Health Southeast Asia</source>. (<year>2023</year>) <volume>8</volume>:<fpage>100095</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.lansea.2022.100095</pub-id>, PMID: <pub-id pub-id-type="pmid">36267800</pub-id></citation></ref>
<ref id="ref33"><label>33.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>X</given-names></name> <name><surname>Huang</surname><given-names>H</given-names></name> <name><surname>Ju</surname><given-names>J</given-names></name> <name><surname>Sun</surname><given-names>R</given-names></name> <name><surname>Zhang</surname><given-names>J</given-names></name></person-group>. <article-title>Impact of vaccination on the COVID-19 pandemic in U.S. states</article-title>. <source>Sci Rep</source>. (<year>2022</year>) <volume>12</volume>:<fpage>1554</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-022-05498-z</pub-id>, PMID: <pub-id pub-id-type="pmid">35091640</pub-id></citation></ref>
<ref id="ref34"><label>34.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>S</given-names></name> <name><surname>Cho</surname><given-names>E</given-names></name> <name><surname>Jang</surname><given-names>G</given-names></name> <name><surname>Kim</surname><given-names>S</given-names></name> <name><surname>Cho</surname><given-names>G</given-names></name></person-group>. <article-title>Early detection of norovirus outbreak using machine learning methods in South Korea</article-title>. <source>PLoS One</source>. (<year>2022</year>) <volume>17</volume>:<fpage>e0277671</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0277671</pub-id>, PMID: <pub-id pub-id-type="pmid">36383630</pub-id></citation></ref>
<ref id="ref35"><label>35.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amin</surname><given-names>S</given-names></name> <name><surname>Uddin</surname><given-names>MI</given-names></name> <name><surname>Alsaeed</surname><given-names>DH</given-names></name> <name><surname>Khan</surname><given-names>A</given-names></name> <name><surname>Adnan</surname><given-names>M</given-names></name> <name><surname>Aziz</surname><given-names>F</given-names></name></person-group>. <article-title>Early detection of seasonal outbreaks from twitter data using machine learning approaches</article-title>. <source>Complexity</source>. (<year>2021</year>) <volume>2021</volume>:<fpage>5520366</fpage>. doi: <pub-id pub-id-type="doi">10.1155/2021/5520366</pub-id></citation></ref>
<ref id="ref36"><label>36.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jombart</surname><given-names>T</given-names></name> <name><surname>Ghozzi</surname><given-names>S</given-names></name> <name><surname>Schumacher</surname><given-names>D</given-names></name> <name><surname>Taylor</surname><given-names>TJ</given-names></name> <name><surname>Leclerc</surname><given-names>QJ</given-names></name> <name><surname>Jit</surname><given-names>M</given-names></name> <etal/></person-group>. <article-title>Real-time monitoring of COVID-19 dynamics using automated trend fitting and anomaly detection</article-title>. <source>Philos Trans R Soc B</source>. (<year>2021</year>) <volume>376</volume>:<fpage>20200266</fpage>. doi: <pub-id pub-id-type="doi">10.1098/rstb.2020.0266</pub-id>, PMID: <pub-id pub-id-type="pmid">34053271</pub-id></citation></ref>
</ref-list>
</back>
</article>