<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Public Health</journal-id>
<journal-title>Frontiers in Public Health</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Public Health</abbrev-journal-title>
<issn pub-type="epub">2296-2565</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpubh.2023.1203628</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Public Health</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Using the Baidu index to predict trends in the incidence of tuberculosis in Jiangsu Province, China</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Wang</surname> <given-names>Yue</given-names></name>
<uri xlink:href="https://loop.frontiersin.org/people/2257068/overview"/>
</contrib>
<contrib contrib-type="author"><name><surname>Zhou</surname> <given-names>Haitao</given-names></name>
</contrib>
<contrib contrib-type="author"><name><surname>Zheng</surname> <given-names>Li</given-names></name>
</contrib>
<contrib contrib-type="author"><name><surname>Li</surname> <given-names>Min</given-names></name>
</contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Hu</surname> <given-names>Bin</given-names></name><xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1677901/overview"/>
</contrib>
</contrib-group>
<aff><institution>School of Public Health, Xuzhou Medical University</institution>, <addr-line>Xuzhou, Jiangsu</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by" id="fn0002">
<p>Edited by: Hai-Feng Pan, Anhui Medical University, China</p>
</fn>
<fn fn-type="edited-by" id="fn0003">
<p>Reviewed by: Qinyi Tan, Southwest University, China; Sasho Stoleski, Saints Cyril and Methodius University of Skopje, North Macedonia</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Bin Hu, <email>hubin@xzhmu.edu.cn</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>07</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>11</volume>
<elocation-id>1203628</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>04</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>05</day>
<month>07</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Wang, Zhou, Zheng, Li and Hu.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Wang, Zhou, Zheng, Li and Hu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<sec id="sec1">
<title>Objective</title>
<p>To analyze the time series in the correlation between search terms related to tuberculosis (TB) and actual incidence data in China. To screen out the &#x201C;leading&#x201D; terms and construct a timely and efficient TB prediction model that can predict the next wave of TB epidemic trend in advance.</p>
</sec>
<sec id="sec2">
<title>Methods</title>
<p>Monthly incidence data of tuberculosis in Jiangsu Province, China, were collected from January 2011 to December 2020. A scoping approach was used to identify TB search terms around common TB terms, prevention, symptoms and treatment. Search terms for Jiangsu Province, China, from January 2011 to December 2020 were collected from the Baidu index database.<xref rid="fn0001" ref-type="fn">
<sup>1</sup></xref> Correlation coefficients between search terms and actual incidence were calculated using Python 3.6 software. The multiple linear regression model was constructed using SPSS 26.0 software, which also calculated the goodness of fit and prediction error of the model predictions.</p>
</sec>
<sec id="sec3">
<title>Results</title>
<p>A total of 16 keywords with correlation coefficients greater than 0.6 were screened, of which 11 were the leading terms. The R<sup>2</sup> of the prediction model was 0.67 and the MAPE was 10.23%.</p>
</sec>
<sec id="sec4">
<title>Conclusion</title>
<p>The TB prediction model based on Baidu Index data was able to predict the next wave of TB epidemic trends and intensity 2&#x2009;months in advance. This forecasting model is currently only available for Jiangsu Province.</p>
</sec>
</abstract>
<kwd-group>
<kwd>tuberculosis</kwd>
<kwd>Baidu index</kwd>
<kwd>prediction</kwd>
<kwd>multiple linear regression</kwd>
<kwd>timely</kwd>
</kwd-group>
<contract-num rid="cn2">KC20200</contract-num>
<contract-num rid="cn2">SJCX22_1291</contract-num>
<contract-num rid="cn2">SJCX21_1157</contract-num>
<contract-sponsor id="cn1">Science and Technology Program Project of Xuzhou</contract-sponsor>
<contract-sponsor id="cn2">Postgraduate Research and Practice Innovation Program of Jiangsu Province</contract-sponsor>
<counts>
<fig-count count="4"/>
<table-count count="4"/>
<equation-count count="4"/>
<ref-count count="29"/>
<page-count count="8"/>
<word-count count="4774"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Infectious Diseases: Epidemiology and Prevention</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="sec5">
<title>Background</title>
<p>Tuberculosis (TB) is an infectious disease of the lung caused by <italic>Mycobacterium tuberculosis</italic>, which is mainly transmitted through the respiratory tract and poses a serious threat to human life and health. It&#x2019;s also the second leading cause of death from infectious diseases. According to the WHO Global Tuberculosis Report 2022, 10.6 million new cases of tuberculosis were reported worldwide in 2021. This represents a 3.6% increase in morbidity compared with 2020. TB deaths increased for the second year in a row since 2019, reversing the declining trend in TB deaths over the past decade. The global TB epidemic is more severe than before.</p>
<p>The incidence of Chinese tuberculosis was the third highest among countries with a high burden of tuberculosis. In China, the mortality rate of tuberculosis was the second highest among the statutory reporting of infectious diseases. The prevalence of tuberculosis was still far from the strategic goal of &#x201C;Ending TB by 2035.&#x201D; The early prevention and the timely model for predicting new TB outbreaks can propose early warning of TB outbreaks and monitoring of symptoms. Therefore, effective control of the prevalence and development of TB can minimize the impact on people&#x2019;s lives and health. Jiangsu Province is located on the east coast of mainland China (<xref rid="fig1" ref-type="fig">Figure 1</xref>), spanning 30&#x00B0;45&#x2032;&#x2013;35&#x00B0;08&#x2032;N latitude and 116&#x00B0;21&#x2032;&#x2013;121&#x00B0;56&#x2032;E longitude, with a total area of 107,200 square kilometers. The rate of Internet penetration in Jiangsu Province is 61.5%. The Internet penetration rate of southern regions have reached over 65%, exceeding the national average. The larger sample size of the Internet search data can effectively reduce the data bias caused by insufficient data volume.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Geographical location of Jiangsu Province.</p>
</caption>
<graphic xlink:href="fpubh-11-1203628-g001.tif"/>
</fig>
<p>Internet query data has been widely used as a new source of data related to early warning and prediction of infectious diseases. Ginsberg et al. (<xref ref-type="bibr" rid="ref1">1</xref>) used Google to build an influenza prediction model by automatically acquiring search terms. It&#x2019;s predicted results were 1&#x2013;2&#x2009;weeks earlier than traditional CDC surveillance. Li et al. (<xref ref-type="bibr" rid="ref2">2</xref>) used Twitter data to predict influenza epidemic trends with strong real-time performance. Althouse et al. (<xref ref-type="bibr" rid="ref3">3</xref>) used Google search engine to monitor dengue-related search terms and built two linear regression models respectively, which was confirmed good correlation between model predicted values and actual surveillance data. In China, Li et al. (<xref ref-type="bibr" rid="ref4">4</xref>) used Google search engine data and achieved good prediction results through cross-validation analysis. There have been lots of related studies on infectious disease prediction and early warning based on search data at home and abroad. To summarize the above research, we can find that most of the infectious disease surveillance and early warning studies based on Internet data were focused on infectious diseases such as influenza, dengue fever and AIDS (<xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">5&#x2013;8</xref>). Meanwhile, Milinovich Gabriel (<xref ref-type="bibr" rid="ref9">9</xref>) showed that prediction models using Internet data performed better in infectious diseases transmitted through the respiratory tract. But there are few studies on prediction of tuberculosis based on Internet data in China. This is the first time such an Internet search term based early warning surveillance system for TB has been developed. Xue Gong showed that the spatial distribution of Baidu index in China was higher in the eastern region than other region (<xref ref-type="bibr" rid="ref10">10</xref>). As a result, Jiangsu Province has a large data base, to a certain extent, so it is able to reduce error bias.</p>
<p>According to the 50th Statistical Report on the Development of the Internet in China released by the China Internet Network Information Center, as of June 2022, the Internet penetration rate reached 74.4% and the size of Internet users was 1.051 billion, of which the size of search engine users reached 770 million, accounting for 77.8% of all Internet users. In China, Baidu has become the mainstream search engine. Its market coverage has been accounted for 89.1%. Baidu Index is a China-specific version of Google Trends launched in 2006 (<xref ref-type="bibr" rid="ref10">10</xref>). Its functions are broadly similar to the Google Trends. Since 2010, when Google Search ceased its services in mainland China, Baidu Index has become the most popular search analysis tool in China (<xref ref-type="bibr" rid="ref11">11</xref>). Web search data can directly or indirectly reflect the behavior and psychology of Internet users. Some studies on socio-economic activities have attempted to dissect the connotative relationship between search data and the predicted objects. With the rapid development of the Internet and information technology, susceptible people tend to choose to &#x201C;seek medical consultation&#x201D; on the Internet (<xref ref-type="bibr" rid="ref12">12</xref>). So, the search term index covers a large number of early latency and health behavior search information of susceptible people. There are some shortcomings in the existing infectious disease surveillance system (<xref ref-type="bibr" rid="ref13">13</xref>, <xref ref-type="bibr" rid="ref14">14</xref>). Firstly, the traditional infectious disease surveillance and early warning system has a single source of data, which comes from clinical incidence, laboratory surveillance data provided by medical institutions, CDC and sentinel hospitals. Secondly, the acquisition of data was aggregated by departments at all levels after reporting, leading to a relative lag in the early warning gateway and a lack of certain timeliness (<xref ref-type="bibr" rid="ref15">15</xref>). While the Internet monitoring system avoids the cascading design of traditional monitoring model (<xref ref-type="bibr" rid="ref16">16</xref>). This paper explains the association between search data and case numbers in terms of individual health status, health information needs and online health information seeking behavior. Whether they are susceptible, latent or infected, people with symptoms of TB will have a need for health information. Baidu, as a common search engine, has become the first choice for searching information, so the Baidu index contains a large number of health information search behaviors. In addition, network search data has the advantages of large sample size, rapid response and ease of access, allowing data to be obtained and predictions to be made in the early symptom period.</p>
</sec>
<sec sec-type="methods" id="sec6">
<title>Methods</title>
<sec id="sec7">
<title>Correlation analysis</title>
<p>Correlation analysis is a statistical method for studying the correlation between two and more random variables that are at equal levels. In this study, Pearson&#x2019;s correlation coefficient was used to describe the correlation between TB data and relevant search terms. In <xref ref-type="disp-formula" rid="EQ1">Eq. 1</xref>, <italic>X<sub>i</sub></italic> means the Baidu index of the search term, <italic>Y<sub>i</sub></italic> is the incidence of TB. The value of <italic>r</italic> is in the range of [&#x2212;1,1]. The larger the |<italic>r</italic>| means the higher the correlation between the BDI and actual incidence. The initial screening criteria of this study is |<italic>r</italic>|&#x2009;&#x2265;&#x2009;0.5, which means the moderate or higher correlation.</p>
<disp-formula id="EQ1">
<label>(1)</label>
<mml:math id="M1">
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>X</mml:mi>
<mml:mo>&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>Y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>Y</mml:mi>
<mml:mo>&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>X</mml:mi>
<mml:mo>&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>Y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>Y</mml:mi>
<mml:mo>&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</sec>
<sec id="sec8">
<title>Correlation time series change characteristics</title>
<p>Time series correlation analysis is the calculation of the correlation coefficient between the time series of the alternative and benchmark indicators after shifting the time units. The calculation formula is given in <xref ref-type="disp-formula" rid="EQ2">Eq. 2</xref>.</p>
<disp-formula id="EQ2">
<label>(2)</label>
<mml:math id="M2">
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>X</mml:mi>
<mml:mo>&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>Y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>Y</mml:mi>
<mml:mo>&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>X</mml:mi>
<mml:mo>&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>Y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>Y</mml:mi>
<mml:mo>&#x00AF;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In the <xref ref-type="disp-formula" rid="EQ2">Eq. 2</xref>, <italic>d</italic> is the lead time, <italic>i</italic> is the reference time, <italic>r<sub>d</sub></italic> is the time difference correlation coefficient. If <italic>r<sub>d</sub></italic> is negative, it is the &#x201C;leading feature,&#x201D; <italic>r<sub>d</sub></italic> is &#x201C;0&#x201D; and &#x201C;positive&#x201D; means &#x201C;synchronous&#x201D; and &#x201C;lagging&#x201D; feature, respectively. This study used the time sequence change feature to filter out the search terms with &#x201C;leading&#x201D; feature.</p>
</sec>
<sec id="sec9">
<title>Multiple linear regression forecasting</title>
<p>Multiple linear regression is used to analyze the linear relationship between a single dependent variable and multiple independent variables. Based on tolerance and variance inflation factors to determine the multiple covariance between the dependent and independent variables. See <xref ref-type="disp-formula" rid="EQ3">Eq. 3</xref> for expression.</p>
<disp-formula id="EQ3">
<label>(3)</label>
<mml:math id="M3">
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>&#x03B5;</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In the <xref ref-type="disp-formula" rid="EQ3">Eq. 3</xref>, y is the number of predicted incidences of tuberculosis, <inline-formula>
<mml:math id="M4">
<mml:mrow>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x03B2;</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the model parameter, <inline-formula>
<mml:math id="M5">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the Baidu index of the search term, and <inline-formula>
<mml:math id="M6">
<mml:mi>&#x03B5;</mml:mi>
</mml:math>
</inline-formula> is the error term that represents the effect of random factors. The study involved 11 variables, so used stepwise regression to avoid overfitting the prediction model.</p>
</sec>
</sec>
<sec sec-type="results" id="sec10">
<title>Results</title>
<sec id="sec11">
<title>Prevalence profile</title>
<p>The cumulative number of reported cases of TB in Jiangsu Province from 2010 to 2020 was 399,508, with an annual average of 39,950 cases. Trend, seasonal and random error analysis of monthly incidence data from January 2010 to December 2019 (<xref rid="fig2" ref-type="fig">Figure 2</xref>) revealed a clear seasonality in the number of monthly TB cases. The epidemic peaks from March to July each year, followed by a declining trend in the number of cases, with random errors fluctuating within a certain range.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>Tuberculosis prevalence in Jiangsu Province (2011&#x2013;2020).</p>
</caption>
<graphic xlink:href="fpubh-11-1203628-g002.tif"/>
</fig>
</sec>
<sec id="sec12">
<title>Correlation analysis and time-series change characteristics</title>
<p>By calculating the correlation between the search terms and the actual morbidity data, the initial screening was carried out according to the |r|&#x2009;&#x2265;&#x2009;0.5 and deleted the search terms with too low a frequency. In the end, 11 search terms with high correlation were initially screened. Its differences were statistically significant, and the search term correlation coefficients are shown in <xref rid="tab1" ref-type="table">Table 1</xref>.</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption>
<p>Correlation coefficients for initial screening search terms.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Search term</th>
<th align="center" valign="top">
<italic>r</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">Persistent low fever (&#x6301;&#x7EED;&#x4F4E;&#x70E7;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.53</td>
</tr>
<tr>
<td align="left" valign="middle">Night sweats (&#x76D7;&#x6C57;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.72</td>
</tr>
<tr>
<td align="left" valign="middle">Cough (&#x54B3;&#x55FD;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.70</td>
</tr>
<tr>
<td align="left" valign="middle">Sore throat (&#x54BD;&#x5589;&#x75DB;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.55</td>
</tr>
<tr>
<td align="left" valign="middle">Poor appetite (&#x98DF;&#x6B32;&#x4E0D;&#x632F;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.60</td>
</tr>
<tr>
<td align="left" valign="middle">Early signs of tuberculosis (&#x80BA;&#x7ED3;&#x6838;&#x7684;&#x65E9;&#x671F;&#x75C7;&#x72B6;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.50</td>
</tr>
<tr>
<td align="left" valign="middle">BCG vaccine (&#x5361;&#x4ECB;&#x82D7;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.61</td>
</tr>
<tr>
<td align="left" valign="middle">Difficulty in breathing (&#x547C;&#x5438;&#x56F0;&#x96BE;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.59</td>
</tr>
<tr>
<td align="left" valign="middle">Fatigue (&#x4E4F;&#x529B;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.59</td>
</tr>
<tr>
<td align="left" valign="middle">Tuberculosis (&#x80BA;&#x7ED3;&#x6838;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.70</td>
</tr>
<tr>
<td align="left" valign="middle">PPD (&#x7ED3;&#x6838;&#x83CC;&#x7D20;&#x8BD5;&#x9A8C;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.72</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Then the correlation coefficients of 11 search terms in &#x201C;leading 2&#x2009;months&#x201D; (<italic>d</italic>&#x2009;=&#x2009;2) were calculated and compared with the simultaneous ones. The differences were statistically significant, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.05. As shown in <xref rid="tab2" ref-type="table">Table 2</xref>, the six search terms with &#x201C;leading&#x201D; characteristics were screened. <xref rid="fig3" ref-type="fig">Figure 3</xref> shows the trend between the &#x201C;leading&#x201D; search terms and the actual incidence data. Before 2015, the Chinese Internet was still in its infancy. In the same time, the Internet healthcare was still in its infancy. Medical treatment, medical information and disease knowledge science were the main themes at this stage. People were not yet familiar with using Baidu to search for knowledge related to tuberculosis. So the model missed a peak in 2015. After a period of a new pandemic, the frequency of search terms for &#x201C;respiratory symptoms&#x201D; increases rapidly until the beginning of 2020. The lag in reporting of case numbers. So the data show that the search terms exceed the beginning of COVID in 2020.</p>
<table-wrap position="float" id="tab2">
<label>Table 2</label>
<caption>
<p>Correlation coefficients for &#x201C;2&#x2009;months ahead&#x201D; search terms.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Search term</th>
<th align="center" valign="top">Synchronization (<italic>r</italic>)</th>
<th align="center" valign="top">leading 2&#x2009;months (<italic>r</italic>)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">Persistent low fever (&#x6301;&#x7EED;&#x4F4E;&#x70E7;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.53</td>
<td align="char" valign="middle" char=".">&#x2212;0.58</td>
</tr>
<tr>
<td align="left" valign="middle">Night sweats (&#x76D7;&#x6C57;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.72</td>
<td align="char" valign="middle" char=".">&#x2212;0.79</td>
</tr>
<tr>
<td align="left" valign="middle">Cough (&#x54B3;&#x55FD;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.70</td>
<td align="char" valign="middle" char=".">&#x2212;0.71</td>
</tr>
<tr>
<td align="left" valign="middle">Sore throat (&#x54BD;&#x5589;&#x75DB;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.55</td>
<td align="char" valign="middle" char=".">&#x2212;0.63</td>
</tr>
<tr>
<td align="left" valign="middle">Poor appetite (&#x98DF;&#x6B32;&#x4E0D;&#x632F;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.60</td>
<td align="char" valign="middle" char=".">&#x2212;0.61</td>
</tr>
<tr>
<td align="left" valign="middle">Early signs of tuberculosis (&#x80BA;&#x7ED3;&#x6838;&#x7684;&#x65E9;&#x671F;&#x75C7;&#x72B6;)</td>
<td align="char" valign="middle" char=".">&#x2212;0.50</td>
<td align="char" valign="middle" char=".">&#x2212;0.56</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>Trend between search term Baidu index and actual data.</p>
</caption>
<graphic xlink:href="fpubh-11-1203628-g003.tif"/>
</fig>
</sec>
<sec id="sec13">
<title>Multiple linear regression model</title>
<sec id="sec14">
<title>Modeling</title>
<p>There was a &#x201C;2-month time&#x201D; lag between the input and the output variables. The &#x201C;leading&#x201D; search term Baidu index in January was used to predict the prevalence and intensity of TB in March. The input variables were the Baidu index of &#x201C;leading&#x201D; search terms from January 2011 to October 2020, and each input variable was statistically different from the other (<italic>p</italic>&#x2009;&#x003C;&#x2009;0.05). The output variable was the monthly incidence prediction data from March 2011 to December 2022, of which the proportion of the training set is 90%. The independent variable is the Baidu index (x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>, &#x2026;, x<sub>6</sub>) of the leading search terms [&#x201C;persistent low fever (&#x6301;&#x7EED;&#x4F4E;&#x70E7;),&#x201D; &#x201C;night sweats (&#x76D7;&#x6C57;),&#x201D; &#x201C;cough (&#x54B3;&#x55FD;),&#x201D; &#x201C;sore throat (&#x54BD;&#x5589;&#x75DB;),&#x201D; &#x201C;loss of appetite (&#x98DF;&#x6B32;&#x4E0D;&#x632F;),&#x201D; &#x201C;early symptoms of tuberculosis (&#x80BA;&#x7ED3;&#x6838;&#x7684;&#x65E9;&#x671F;&#x75C7;&#x72B6;)&#x201D;]. The dependent variable is the actual incidence of tuberculosis (<italic>y</italic>). The regression model was obtained by selecting the &#x201C;input&#x201D; method for all the independent variables. According to the SPSS 26.0 output, a multiple linear regression model was obtained:</p>
<disp-formula id="EQ4">
<label>(4)</label>
<mml:math id="M9">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mi>y</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0.134</mml:mn>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.163</mml:mn>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.016</mml:mn>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>0.156</mml:mn>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>4</mml:mn>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.103</mml:mn>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>5</mml:mn>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>0.015</mml:mn>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>6</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>4470.978</mml:mn>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>Finally, the results showed that <italic>F</italic> is 37.968 and the difference was statistically significant (<italic>p</italic> &#x003C;&#x2009;0.05), indicating a linear relationship between the independent and dependent variables.</p>
</sec>
<sec id="sec15">
<title>Forecast results</title>
<p>According to the forecast results in <xref rid="tab3" ref-type="table">Table 3</xref>, the relative error of the forecast for other months is mostly between 10 and 20%, which is relatively small and the forecast effect is relatively accurate. Considering the offset caused by the &#x201C;Spring Festival effect,&#x201D; there was large the relative error of the forecast prediction in March.</p>
<table-wrap position="float" id="tab3">
<label>Table 3</label>
<caption>
<p>Multiple linear regression model prediction results.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Date</th>
<th align="center" valign="top">Predicted value</th>
<th align="center" valign="top">Actual value</th>
<th align="center" valign="top">Relative error</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">2020.03</td>
<td align="char" valign="top" char=",">3,123</td>
<td align="char" valign="top" char=",">2,253</td>
<td align="char" valign="top" char=".">38.61%</td>
</tr>
<tr>
<td align="left" valign="top">2020.04</td>
<td align="char" valign="top" char=",">3,155</td>
<td align="char" valign="top" char=",">2,674</td>
<td align="char" valign="top" char=".">17.98%</td>
</tr>
<tr>
<td align="left" valign="top">2020.05</td>
<td align="char" valign="top" char=",">3,054</td>
<td align="char" valign="top" char=",">2,719</td>
<td align="char" valign="top" char=".">12.32%</td>
</tr>
<tr>
<td align="left" valign="top">2020.06</td>
<td align="char" valign="top" char=",">3,099</td>
<td align="char" valign="top" char=",">2,744</td>
<td align="char" valign="top" char=".">12.93%</td>
</tr>
<tr>
<td align="left" valign="top">2020.07</td>
<td align="char" valign="top" char=",">3,129</td>
<td align="char" valign="top" char=",">2,898</td>
<td align="char" valign="top" char=".">7.97%</td>
</tr>
<tr>
<td align="left" valign="top">2020.08</td>
<td align="char" valign="top" char=",">2,939</td>
<td align="char" valign="top" char=",">2,679</td>
<td align="char" valign="top" char=".">9.70%</td>
</tr>
<tr>
<td align="left" valign="top">2020.09</td>
<td align="char" valign="top" char=",">2,585</td>
<td align="char" valign="top" char=",">2,676</td>
<td align="char" valign="top" char=".">3.40%</td>
</tr>
<tr>
<td align="left" valign="top">2020.10</td>
<td align="char" valign="top" char=",">2,272</td>
<td align="char" valign="top" char=",">2,379</td>
<td align="char" valign="top" char=".">4.49%</td>
</tr>
<tr>
<td align="left" valign="top">2020.11</td>
<td align="char" valign="top" char=",">2,237</td>
<td align="char" valign="top" char=",">2,493</td>
<td align="char" valign="top" char=".">10.27%</td>
</tr>
<tr>
<td align="left" valign="top">2020.12</td>
<td align="char" valign="top" char=",">2,312</td>
<td align="char" valign="top" char=",">2,146</td>
<td align="char" valign="top" char=".">7.73%</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="sec16">
<title>Evaluation of the results</title>
<p>The degree of fit of the model was evaluated using the mean absolute value (<italic>MAE</italic>) and the mean absolute percentage error (<italic>MAPE</italic>) to evaluate the error of the model, which represents the mean of the absolute errors between the predicted values, and the smaller and the better the prediction (<xref rid="tab4" ref-type="table">Table 4</xref>).</p>
<table-wrap position="float" id="tab4">
<label>Table 4</label>
<caption>
<p>Multiple linear model prediction errors.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Evaluation indicators</th>
<th align="center" valign="top">Multiple linear regression</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">
<italic>MAE</italic>
</td>
<td align="char" valign="top" char=".">318.06</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>MAPE</italic>
</td>
<td align="char" valign="top" char=".">10.23%</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>R<sup>2</sup></italic>
</td>
<td align="char" valign="top" char=".">0.67</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref rid="fig4" ref-type="fig">Figure 4</xref> shows the visualization of &#x201C;leading 2-month&#x201D; model predicted values. Firstly, in terms of the overall predictive value, the fit and predictive effect of the multiple linear regression model is satisfactory, with little difference between the predicted and actual values, and the goodness of fit test result is 0.672, which means the variable can explain 67.2% of the variation in the dependent variable. It indicates the predictive model has some extrapolation. At the same time, the predicted results were basically the same as the epidemic trend of the actual situation. The predicted emergence of the epidemic wave was basically consistent with the time point of the actual incidence. The multiple linear regression has good predictive ability and can predict the epidemic trend of tuberculosis in a timely and effective manner.</p>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p>Visualization of predicted results.</p>
</caption>
<graphic xlink:href="fpubh-11-1203628-g004.tif"/>
</fig>
</sec>
</sec>
</sec>
<sec sec-type="conclusions" id="sec17">
<title>Conclusion</title>
<sec id="sec18">
<title>There is a linear relationship between the search term Baidu index and the actual morbidity data</title>
<p>In terms of search behavior, the search terms chosen for this study were consistent with the logic of search behavior. The search terms used in this study cover the four main categories of prevention, treatment, symptoms and common terms for tuberculosis, which can make full use of the health information of suspected infected and susceptible people before they go to the clinic. Bringing forward the predicted juncture to the incubation period or early onset. Secondly, the correlation analysis confirmed that there was a linear correlation between the Baidu index data and the actual data. Among them, 11 search terms were highly correlated with the actual incidence, indicating the potential effectiveness of the Baidu index in predicting the prevalence of tuberculosis. Among the search terms initially screened, those with &#x201C;synchronous&#x201D; and &#x201C;lagging&#x201D; characteristics were eliminated by calculating the time series change of correlation, by filtering the search terms with &#x201C;leading&#x201D; characteristics, the prediction point is further advanced to the pre-pandemic period.</p>
</sec>
<sec id="sec19">
<title>The forecast results are time-sensitive</title>
<p>Due to the &#x201C;2-month time&#x201D; lag between the input and output variables of the model, the TB prediction model developed in this study is able to predict the next wave of TB epidemic trends and intensity 2&#x2009;months in advance, which is different from traditional prediction models. The traditional models were based on previous incidence data. It&#x2019;s principle is to predict outcomes by analyzing patterns in historical data. The data source of this study is the Baidu index, which has the characteristics of real-time, rapid and large amount of internet search data. According to their own symptoms, the incubation period of tuberculosis and susceptible people generates health information search behavior, Then, according to the search behavior, the generation of Baidu index is real-time. It can effectively capture the dynamic changes of the real prevalence situation and monitor the infection and prevalence of tuberculosis in a timely manner. Therefore, the prediction model has a strong timeliness and can effectively capture the health information of latent and susceptible people and can predict the pandemic trend of TB in 2&#x2009;months in advance.</p>
</sec>
</sec>
<sec sec-type="discussions" id="sec20">
<title>Discussion</title>
<p>In this paper, we construct a prediction model for infectious diseases using web search data, which is the same as the conclusion of other researchers. The search data can be a better complement to traditional surveillance data (<xref ref-type="bibr" rid="ref17 ref18 ref19 ref20 ref21 ref22 ref23">17&#x2013;23</xref>).</p>
<p>The innovation of this paper is the temporal correlation of search terms, which can predict the trend and intensity of the next wave of TB epidemic 2&#x2009;months in advance. This is different from the findings of other researchers, where existing search terms are analyzed only at the level of correlation size without further exploration (<xref ref-type="bibr" rid="ref24 ref25 ref26 ref27 ref28 ref29">24&#x2013;29</xref>). In contrast, this paper provides an in-depth analysis of the time-series variation characteristics of search terms.</p>
</sec>
<sec id="sec21">
<title>Limitations</title>
<sec id="sec22">
<title>Further screening of search terms with high specificity</title>
<p>The next step in the study is to identify search terms with high specificity. In this paper, the search terms &#x201C;how to treat tuberculosis&#x201D; and &#x201C;tuberculosis treatment drugs&#x201D; were selected mainly because people tend to search for more practical and cost-effective treatments on the Internet&#xFF0C;which based on their search habits and disease progression. The search terms selected in this study were only classified from four aspects: &#x201C;prevention,&#x201D; &#x201C;treatment,&#x201D; &#x201C;symptoms&#x201D; and &#x201C;commonly used words,&#x201D; without considering other search terms. The search terms in this study mainly included pre-visit information of medical records. Solutions are sought online after the onset of some symptoms in the early stages of the disease. The specialized terms such as &#x201C;BCG,&#x201D; &#x201C;chest x-ray &#x201C;can only be learned after the consultation, and patients will follow the medical advice after the consultation rather than searching online. Therefore, the terminology of clinical diagnosis was not included in this study. It is not comprehensive enough and may lead to the omission of some search terms with high specificity. The next study should take the non-linear relationship into account, analyzing the relationship between the search terms and the actual data. In addition, eliminate the redundant data and retain the data with high specificity.</p>
</sec>
<sec id="sec23">
<title>Extrapolation of the Baidu index-based prediction model to predictions related to respiratory diseases</title>
<p>The search terms selected for this study included common symptoms of respiratory infectious diseases. But, this study only explored the relationship between the search terms and the incidence data of tuberculosis, without further extending to other respiratory diseases. Future studies should not only focus on the specificity of the search terms, but also should take the universality into account.</p>
<p>The results and findings of this study could be assessed for other respiratory diseases. To capture and detect trends in the prevalence of infectious diseases in a timely manner, and predict the peak of outbreaks in advance to minimize the impact of disease transmission on patients&#x2019; lives and property.</p>
</sec>
<sec id="sec24">
<title>Baidu index predictions should be extrapolated to other parts of China</title>
<p>Due to the differences in Baidu indexes between different provinces in China, the findings of this paper show that the forecasting method is feasible only in Jiangsu Province. Further studies should extend the model from this study to other areas of China.</p>
</sec>
</sec>
<sec sec-type="data-availability" id="sec25">
<title>Data availability statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="sec26">
<title>Author contributions</title>
<p>YW: research topic selection, design, data processing and analysis, and writing thesis. HZ: data checking and analysis and revising the thesis. ML: research supervision, statistical analysis of data, and participation in data analysis and interpretation. LZ: research supervision, statistical analysis of data, and participation in data analysis and interpretation. BH: research idea development and research process coordination. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="funding-information" id="sec27">
<title>Funding</title>
<p>This work was supported by the Science and Technology Program Project of Xuzhou (No. KC20200) and Postgraduate Research and Practice Innovation Program of Jiangsu Province (No. SJCX23_1407).</p>
</sec>
<sec sec-type="COI-statement" id="sec28">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="ref1">
<label>1.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ginsberg</surname> <given-names>J</given-names></name> <name><surname>Mohebbi</surname> <given-names>M</given-names></name> <name><surname>Patel</surname> <given-names>R</given-names></name> <name><surname>Brammer</surname> <given-names>L</given-names></name> <name><surname>Smolinski</surname> <given-names>MS</given-names></name> <name><surname>Brilliant</surname> <given-names>L</given-names></name></person-group>. <article-title>Detecting influenza epidemics using search engine query data</article-title>. <source>Nature</source>. (<year>2009</year>) <volume>457</volume>:<fpage>1012</fpage>&#x2013;<lpage>4</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nature07634</pub-id>, PMID: <pub-id pub-id-type="pmid">19020500</pub-id></citation>
</ref>
<ref id="ref2">
<label>2.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Z</given-names></name> <name><surname>Lai</surname> <given-names>SJ</given-names></name> <name><surname>Zhang</surname> <given-names>HL</given-names></name> <name><surname>Wang</surname> <given-names>L</given-names></name> <name><surname>Zhou</surname> <given-names>D</given-names></name> <name><surname>Liu</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>Hand, foot and mouth disease in China: evaluating an automated system for the detection of outbreaks</article-title>. <source>Bull World Health Organ</source>. (<year>2014</year>) <volume>92</volume>:<fpage>656</fpage>&#x2013;<lpage>63</lpage>. doi: <pub-id pub-id-type="doi">10.2471/BLT.13.130666</pub-id>, PMID: <pub-id pub-id-type="pmid">25378756</pub-id></citation>
</ref>
<ref id="ref3">
<label>3.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Althouse</surname> <given-names>BM</given-names></name> <name><surname>Ng</surname> <given-names>Y</given-names></name> <name><surname>Cummings</surname> <given-names>D</given-names></name></person-group>. <article-title>Prediction of denge incidence using search query surveillance</article-title>. <source>PLoS Negl Trop Dis</source>. (<year>2011</year>) <volume>5</volume>:<fpage>1258</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pntd.0001258</pub-id></citation>
</ref>
<ref id="ref4">
<label>4.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>X</given-names></name> <name><surname>Liu</surname> <given-names>F</given-names></name> <name><surname>Dong</surname> <given-names>J</given-names></name> <name><surname>Lv</surname> <given-names>B</given-names></name></person-group>. <article-title>Influenza surveillance in China based on internet search data</article-title>. <source>Syst Eng Theory Prac.</source> (<year>2013</year>) <volume>33</volume>:<fpage>3028</fpage>&#x2013;<lpage>34</lpage>.</citation>
</ref>
<ref id="ref5">
<label>5.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y</given-names></name> <name><surname>Yakob</surname> <given-names>L</given-names></name> <name><surname>Bonsall</surname> <given-names>MB</given-names></name> <name><surname>Hu</surname> <given-names>W</given-names></name></person-group>. <article-title>Predicting seasonal influenza epidemics using cross-hemisphere influenza surveillance data and local internet query data</article-title>. <source>Sci Rep</source>. (<year>2019</year>) <volume>9</volume>:<fpage>3262</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-019-39871-2</pub-id>, PMID: <pub-id pub-id-type="pmid">30824756</pub-id></citation>
</ref>
<ref id="ref6">
<label>6.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>G</given-names></name> <name><surname>Chen</surname> <given-names>Y</given-names></name> <name><surname>Chen</surname> <given-names>B</given-names></name> <name><surname>Wang</surname> <given-names>H</given-names></name> <name><surname>Shen</surname> <given-names>L</given-names></name> <name><surname>Liu</surname> <given-names>L</given-names></name> <etal/></person-group>. <article-title>Using the Baidu search index to predict the incidence of HIV/AIDS in China</article-title>. <source>Sci Rep</source>. (<year>2018</year>) <volume>8</volume>:<fpage>9038</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-018-27413-1</pub-id>, PMID: <pub-id pub-id-type="pmid">29899360</pub-id></citation>
</ref>
<ref id="ref7">
<label>7.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>C</given-names></name> <name><surname>Yang</surname> <given-names>Y</given-names></name> <name><surname>Wu</surname> <given-names>S</given-names></name> <name><surname>Wu</surname> <given-names>W</given-names></name> <name><surname>Xue</surname> <given-names>H</given-names></name> <name><surname>An</surname> <given-names>K</given-names></name> <etal/></person-group>. <article-title>Search trends and prediction of human brucellosis using Baidu index data from 2011 to 2018 in China</article-title>. <source>Sci Rep</source>. (<year>2020</year>) <volume>10</volume>:<fpage>5896</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-020-62517-7</pub-id>, PMID: <pub-id pub-id-type="pmid">32246053</pub-id></citation>
</ref>
<ref id="ref8">
<label>8.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>J</given-names></name> <name><surname>Zou</surname> <given-names>Y</given-names></name> <name><surname>Peng</surname> <given-names>Y</given-names></name> <name><surname>Li</surname> <given-names>K</given-names></name> <name><surname>Jiang</surname> <given-names>T</given-names></name></person-group>. <article-title>Research on the prediction of dengue fever epidemic based on Baidu index</article-title>. <source>Comput Applic Soft</source>. (<year>2016</year>) <volume>33</volume>:<fpage>42-46+78</fpage>. doi: <pub-id pub-id-type="doi">10.3969/j.issn.1000-386x.2016.07.010</pub-id></citation>
</ref>
<ref id="ref9">
<label>9.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Milinovich</surname> <given-names>J</given-names></name> <name><surname>Avril</surname> <given-names>S</given-names></name> <name><surname>Clements</surname> <given-names>A</given-names></name> <name><surname>Brownstein</surname> <given-names>JS</given-names></name> <name><surname>Tong</surname> <given-names>S</given-names></name> <name><surname>Hu</surname> <given-names>W</given-names></name></person-group>. <article-title>Using internet search queries for infectious disease surveillance: screening diseases for suitability</article-title>. <source>BMC Infect Dis</source>. (<year>2014</year>) <volume>14</volume>:<fpage>690</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12879-014-0690-1</pub-id>, PMID: <pub-id pub-id-type="pmid">25551277</pub-id></citation>
</ref>
<ref id="ref10">
<label>10.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gong</surname> <given-names>X</given-names></name> <name><surname>Han</surname> <given-names>Y</given-names></name> <name><surname>Hou</surname> <given-names>M</given-names></name> <name><surname>Guo</surname> <given-names>R</given-names></name></person-group>. <article-title>Online public attention during the early days of the COVID-19 pandemic: Infoveillance study based on Baidu index</article-title>. <source>J Med Internet Res</source>. (<year>2020</year>) <volume>6</volume>:<fpage>e23098</fpage>. doi: <pub-id pub-id-type="doi">10.2196/23098</pub-id>, PMID: <pub-id pub-id-type="pmid">32960177</pub-id></citation>
</ref>
<ref id="ref11">
<label>11.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>M</given-names></name> <name><surname>Chen</surname> <given-names>H</given-names></name> <name><surname>Song</surname> <given-names>H</given-names></name> <name><surname>Wang</surname> <given-names>L</given-names></name> <name><surname>Zheng</surname> <given-names>T</given-names></name></person-group>. <article-title>Research progress of infectious disease prediction and early warning based on internet big data</article-title>. <source>China Public Health</source>. (<year>2021</year>) <volume>37</volume>:<fpage>1478</fpage>&#x2013;<lpage>82</lpage>. doi: <pub-id pub-id-type="doi">10.11847/zgggws1136289</pub-id></citation>
</ref>
<ref id="ref12">
<label>12.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>R</given-names></name>
</person-group>. <article-title>Influenza prediction mechanism and empirical research by incorporating Baidu index</article-title>. <source>J Intelligence</source>. (<year>2018</year>) <volume>37</volume>:<fpage>206</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.3772/j.issn.1000-0135.2018.02.009</pub-id></citation>
</ref>
<ref id="ref13">
<label>13.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>X</given-names></name> <name><surname>Li</surname> <given-names>L</given-names></name> <name><surname>Xu</surname> <given-names>W</given-names></name> <name><surname>Zhang</surname> <given-names>Y</given-names></name> <name><surname>Zhao</surname> <given-names>Z</given-names></name></person-group>. <article-title>Correlation analysis of specific keywords and Baidu index with influenza virus activity</article-title>. <source>China Public Health</source>. (<year>2016</year>) <volume>32</volume>:<fpage>1543</fpage>&#x2013;<lpage>6</lpage>. doi: <pub-id pub-id-type="doi">10.11847/zgggws2016-32-11-25</pub-id></citation>
</ref>
<ref id="ref14">
<label>14.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>L</given-names></name> <name><surname>Zhou</surname> <given-names>Z</given-names></name> <name><surname>Wu</surname> <given-names>Q</given-names></name> <name><surname>Meng</surname> <given-names>X</given-names></name> <name><surname>Qi</surname> <given-names>X</given-names></name> <name><surname>Wang</surname> <given-names>X</given-names></name> <etal/></person-group>. <article-title>Correlation analysis and prediction of influenza data with specific keywords</article-title>. <source>China Public Health</source>. (<year>2021</year>) <volume>37</volume>:<fpage>1813</fpage>&#x2013;<lpage>8</lpage>. doi: <pub-id pub-id-type="doi">10.11847/zgggws1132684</pub-id></citation>
</ref>
<ref id="ref15">
<label>15.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>W</given-names></name> <name><surname>Lan</surname> <given-names>Y</given-names></name> <name><surname>Lyu</surname> <given-names>W</given-names></name> <name><surname>Leng</surname> <given-names>ZW</given-names></name> <name><surname>Feng</surname> <given-names>LZ</given-names></name> <name><surname>Lai</surname> <given-names>SJ</given-names></name> <etal/></person-group>. <article-title>Establishment of multi-point trigger and multi-channel surveillance mechanism for intelligent early warning of infectious diseases in China</article-title>. <source>Liu xing bing xue za zhi</source>. (<year>2020</year>) <volume>41</volume>:<fpage>1753</fpage>&#x2013;<lpage>7</lpage>. doi: <pub-id pub-id-type="doi">10.3760/cma.j.cn112338-20200722-00972</pub-id>, PMID: <pub-id pub-id-type="pmid">32746606</pub-id></citation>
</ref>
<ref id="ref16">
<label>16.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>B</given-names></name> <name><surname>Wang</surname> <given-names>L</given-names></name> <name><surname>Sun</surname> <given-names>Y</given-names></name> <name><surname>Song</surname> <given-names>H</given-names></name></person-group>. <article-title>Research progress on early warning of infectious disease surveillance based on big data</article-title>. <source>China Public Health</source>. (<year>2016</year>) <volume>32</volume>:<fpage>1276</fpage>&#x2013;<lpage>9</lpage>. doi: <pub-id pub-id-type="doi">10.11847/zgggws2016-32-09-38</pub-id></citation>
</ref>
<ref id="ref17">
<label>17.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>Q</given-names></name> <name><surname>Nsoesie</surname> <given-names>E</given-names></name> <name><surname>Lu</surname> <given-names>B</given-names></name> <name><surname>Peng</surname> <given-names>G</given-names></name> <name><surname>Chunara</surname> <given-names>R</given-names></name> <name><surname>Brownstein</surname> <given-names>JS</given-names></name></person-group>. <article-title>Monitoring influenza epidemics in China with search query from Baidu</article-title>. <source>PLoS One</source>. (<year>2013</year>) <volume>8</volume>:<fpage>e64323</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0064323</pub-id></citation>
</ref>
<ref id="ref18">
<label>18.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McIver</surname> <given-names>DJ</given-names></name> <name><surname>Brownstein</surname> <given-names>JS</given-names></name></person-group>. <article-title>Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time</article-title>. <source>PLoS Comput Biol</source>. (<year>2014</year>) <volume>10</volume>:<fpage>e1003581</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003581</pub-id>, PMID: <pub-id pub-id-type="pmid">24743682</pub-id></citation>
</ref>
<ref id="ref19">
<label>19.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Q</given-names></name> <name><surname>Gel</surname> <given-names>YR</given-names></name> <name><surname>Ramirez Ramirez</surname> <given-names>LL</given-names></name> <name><surname>Nezafati</surname> <given-names>K</given-names></name> <name><surname>Zhang</surname> <given-names>Q</given-names></name> <name><surname>Tsui</surname> <given-names>KL</given-names></name></person-group>. <article-title>Forecasting influenza in Hong Kong with Google search queries and statistical model fusion</article-title>. <source>PLoS One</source>. (<year>2017</year>) <volume>12</volume>:<fpage>e0176690</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0176690</pub-id>, PMID: <pub-id pub-id-type="pmid">28464015</pub-id></citation>
</ref>
<ref id="ref20">
<label>20.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>S</given-names></name> <name><surname>Santillana</surname> <given-names>M</given-names></name> <name><surname>Kou</surname> <given-names>SC</given-names></name></person-group>. <article-title>Accurate estimation of influenza epidemics using Google search data via ARGO</article-title>. <source>Proc Natl Acad Sci</source>. (<year>2015</year>) <volume>112</volume>:<fpage>14473</fpage>&#x2013;<lpage>8</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1515373112</pub-id>, PMID: <pub-id pub-id-type="pmid">26553980</pub-id></citation>
</ref>
<ref id="ref21">
<label>21.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lampos</surname> <given-names>V</given-names></name> <name><surname>Miller</surname> <given-names>AC</given-names></name> <name><surname>Crossan</surname> <given-names>S</given-names></name> <name><surname>Stefansen</surname> <given-names>C</given-names></name></person-group>. <article-title>Advances in nowcasting influenza-like illness rates using search query logs</article-title>. <source>Sci Rep</source>. (<year>2015</year>) <volume>5</volume>:<fpage>12760</fpage>. doi: <pub-id pub-id-type="doi">10.1038/srep12760</pub-id></citation>
</ref>
<ref id="ref22">
<label>22.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>S</given-names></name> <name><surname>Santillana</surname> <given-names>M</given-names></name> <name><surname>Brownstein</surname> <given-names>JS</given-names></name> <name><surname>Gray</surname> <given-names>J</given-names></name> <name><surname>Richardson</surname> <given-names>S</given-names></name> <name><surname>Kou</surname> <given-names>SC</given-names></name></person-group>. <article-title>Using electronic health records and internet search information for accurate influenza forecasting</article-title>. <source>BMC Infect Dis</source>. (<year>2017</year>) <volume>17</volume>:<fpage>332</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12879-017-2424-7</pub-id></citation>
</ref>
<ref id="ref23">
<label>23.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xiao</surname> <given-names>Q</given-names></name> <name><surname>Liu</surname> <given-names>H</given-names></name> <name><surname>Feldman</surname> <given-names>M</given-names></name></person-group>. <article-title>Tracking and predicting hand, foot, and mouth disease (HFMD) epidemics in China by Baidu queries</article-title>. <source>Epidemiol Infect</source>. (<year>2017</year>) <volume>145</volume>:<fpage>1699</fpage>&#x2013;<lpage>707</lpage>. doi: <pub-id pub-id-type="doi">10.1017/S0950268817000231</pub-id>, PMID: <pub-id pub-id-type="pmid">28222831</pub-id></citation>
</ref>
<ref id="ref24">
<label>24.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davidson</surname> <given-names>M</given-names></name> <name><surname>Haim</surname> <given-names>DA</given-names></name> <name><surname>Radin</surname> <given-names>J</given-names></name></person-group>. <article-title>Using networks to combine &#x201C;big data&#x201D; and traditional surveillance to improve influenza predictions</article-title>. <source>Sci Rep</source>. (<year>2015</year>) <volume>5</volume>:<fpage>8154</fpage>. doi: <pub-id pub-id-type="doi">10.1038/srep08154</pub-id>, PMID: <pub-id pub-id-type="pmid">25634021</pub-id></citation>
</ref>
<ref id="ref25">
<label>25.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lai</surname> <given-names>S</given-names></name>
</person-group>. <article-title>The changing epidemiology of dengue in China, 1990-2014: a descriptive analysis of 25 years of nationwide surveillance data</article-title>. <source>BMC Med</source>. (<year>2015</year>) <volume>13</volume>:<fpage>100</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12916-015-0336-1</pub-id>, PMID: <pub-id pub-id-type="pmid">25925417</pub-id></citation>
</ref>
<ref id="ref26">
<label>26.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zimmer</surname> <given-names>C</given-names></name>
</person-group>. <article-title>Reconstructing the hidden states in time course data of stochastic models</article-title>. <source>Math Biosci</source>. (<year>2015</year>) <volume>269</volume>:<fpage>117</fpage>&#x2013;<lpage>29</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.mbs.2015.08.015</pub-id></citation>
</ref>
<ref id="ref27">
<label>27.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hickmann</surname> <given-names>KS</given-names></name> <name><surname>Fairchild</surname> <given-names>G</given-names></name> <name><surname>Priedhorsky</surname> <given-names>R</given-names></name> <name><surname>Generous</surname> <given-names>N</given-names></name> <name><surname>Hyman</surname> <given-names>JM</given-names></name> <name><surname>Deshpande</surname> <given-names>A</given-names></name> <etal/></person-group>. <article-title>Forecasting the 2013&#x2013;2014 influenza season using Wikipedia</article-title>. <source>PLoS Comput Biol</source>. <volume>11</volume>:<fpage>e1004239</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004239</pub-id>, PMID: <pub-id pub-id-type="pmid">25974758</pub-id></citation>
</ref>
<ref id="ref28">
<label>28.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ortiz</surname> <given-names>JR</given-names></name> <name><surname>Zhou</surname> <given-names>H</given-names></name> <name><surname>Shay</surname> <given-names>DK</given-names></name> <name><surname>Neuzil</surname> <given-names>KM</given-names></name> <name><surname>Fowlkes</surname> <given-names>AL</given-names></name> <name><surname>Goss</surname> <given-names>CH</given-names></name></person-group>. <article-title>Monitoring influenza activity in the United States: a comparison of traditional surveillance systems with Google flu trends</article-title>. <source>PLoS One</source>. (<year>2011</year>) <volume>6</volume>:<fpage>e18687</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0018687</pub-id>, PMID: <pub-id pub-id-type="pmid">21556151</pub-id></citation>
</ref>
<ref id="ref29">
<label>29.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Broniat</surname> <given-names>DA</given-names></name> <name><surname>Paul</surname> <given-names>MJ</given-names></name> <name><surname>Dredze</surname> <given-names>M</given-names></name></person-group>. <article-title>National and local influenza surveillance through twitter: an analysis of the 2012&#x2013;2013 influenza epidemic</article-title>. <source>PLoS One</source>. (<year>2013</year>) <volume>8</volume>:<fpage>e83672</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0083672</pub-id></citation>
</ref>
</ref-list>
<fn-group>
<fn id="fn0001">
<p>
<sup>1</sup>
<ext-link xlink:href="http://index.Baidu.com/" ext-link-type="uri">http://index.Baidu.com/</ext-link>
</p>
</fn>
</fn-group>
</back>
</article>