<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2022.734347</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Location and Language Independent Fake Rumor Detection Through Epidemiological and Structural Graph Analysis of Social Connections</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Serpanos</surname> <given-names>Dimitrios</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1391926/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Xenos</surname> <given-names>Georgios</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1475086/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Tsouvalas</surname> <given-names>Billy</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1477537/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Computer Technology Institute and Press Diophantus</institution>, <addr-line>Patras</addr-line>, <country>Greece</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Electrical and Computer Engineering, University of Patras</institution>, <addr-line>Patras</addr-line>, <country>Greece</country></aff>
<aff id="aff3"><sup>3</sup><institution>Industrial Systems Institute/ATHENA</institution>, <addr-line>Patras</addr-line>, <country>Greece</country></aff>
<aff id="aff4"><sup>4</sup><institution>Computer Science Department, Virginia Commonwealth University</institution>, <addr-line>Richmond, VA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Jean-Gabriel Ganascia, Universit&#x000E9; Pierre et Marie Curie, France</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Abelino Jimenez, University of Chile, Chile; Sumi Helal, University of Florida, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Dimitrios Serpanos <email>serpanos&#x00040;cti.gr</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to AI for Human Learning and Behavior Change, a section of the journal Frontiers in Artificial Intelligence</p></fn>
<fn fn-type="equal" id="fn002"><p>&#x02020;These authors have contributed equally to this work</p></fn></author-notes>
<pub-date pub-type="epub">
<day>27</day>
<month>04</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>5</volume>
<elocation-id>734347</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>07</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>03</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Serpanos, Xenos and Tsouvalas.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Serpanos, Xenos and Tsouvalas</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>Detection and identification of misinformation and fake news is a complex problem that intersects several disciplines, ranging from sociology to computer science and mathematics. In this work, we focus on social media analyzing characteristics that are independent of the text language (language-independent) and social context (location-independent) and common to most social media, not only Twitter as mostly analyzed in the literature. Specifically, we analyze temporal and structural characteristics of information flow in the social networks and we evaluate the importance and effect of two different types of features in the detection process of fake rumors. Specifically, we extract epidemiological features exploiting epidemiological models for spreading false rumors; furthermore, we extract graph-based features from the graph structure of the information cascade of the social graph. Using these features, we evaluate them for fake rumor detection with 3 configurations: (i) using only epidemiological features, (ii) using only graph-based features, and (iii) using the combination of epidemiological and graph-based features. Evaluation is performed with a Gradient Boosting classifier on two benchmark fake rumor detection datasets. Our results demonstrate that epidemiological models fit rumor propagation well, while graph-based features lead to more effective classification of rumors; the combination of epidemiological and graph-based features leads to improved performance.</p></abstract>
<kwd-group>
<kwd>misinformation</kwd>
<kwd>rumor propagation</kwd>
<kwd>rumor classification</kwd>
<kwd>epidemiological models</kwd>
<kwd>graph-based detection</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="8"/>
<equation-count count="10"/>
<ref-count count="39"/>
<page-count count="15"/>
<word-count count="8837"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Information diffusion describes the way in which information is disseminated through a network (Rogers et al., <xref ref-type="bibr" rid="B26">2014</xref>). Although the original meaning of the process encompasses interpersonal communication channels (Katz and Lazarsfeld, <xref ref-type="bibr" rid="B11">2017</xref>), given the increasing popularity of social media in the last two decades, information diffusion, and especially rumor propagation nowadays, tends to be linked with the dissemination of information over specific digital social networking platforms, such as Twitter. The Diffusion of Innovations (DOI) theory (Rogers et al., <xref ref-type="bibr" rid="B26">2014</xref>) states that information diffusion is not solely determined by attributes of the statement or the novelty introduced, but it is also integrally related to the communication channel properties, i.e. the individuals involved or other network characteristics. Considering the accelerated pace of information diffusion in the digital age, it becomes readily apparent that rumor propagation and fake news dissemination is developing into a critically complex issue, with many and convoluted parameters.</p>
<p>Three techniques have been mostly employed in tackling this issue: structural analysis, temporal analysis, and content-based techniques. Structural analysis refers to the analysis of information cascades and the communication channel properties, whereas temporal analysis takes into account the transitional characteristics of information diffusion and aims to arrive at conclusions by evaluating the diffusion activity based on time. Content analysis is the most commonly used technique; by capitalizing on Natural Language Processing (NLP) advances in the last decade, it manages to offer contextual characteristics to the information diffusion problem. An essential aspect of these techniques is their ability to generalize for claims, i.e. whether the extracted results are appropriate to make a generalized claim. Content analysis provides some clear comparative advantages over the other two techniques, but it is location-specific and dependent on the <italic>ad hoc</italic> regional and social context (Siwakoti et al., <xref ref-type="bibr" rid="B27">2021</xref>); thus, it does not generalize easily for several social media in various countries. On the other hand, a structural or temporal analysis of information diffusion offers universal results, which are location-independent and language-independent.</p>
<p>Significant research effort has been spent studying and modeling information diffusion on social media (Jin et al., <xref ref-type="bibr" rid="B9">2013</xref>, <xref ref-type="bibr" rid="B10">2014</xref>; Cannarella and Spechler, <xref ref-type="bibr" rid="B3">2014</xref>; Goel et al., <xref ref-type="bibr" rid="B6">2016</xref>). Fake news propagation and classification have received significant attention in the past few years and have been analyzed with a variety of techniques that include propagation-based, content-based and social context-based characteristics. However, existing work takes into account content information, which makes the analyses dependent on the language and the social context of users. This is also due to the scarcity of relevant datasets, which leads to results based on the few available datasets, mainly Twitter (Ma et al., <xref ref-type="bibr" rid="B21">2017</xref>). In contrast, our work focuses on analysis that does not consider language and social context characteristics, enabling a location and language independent fake rumor detection.</p>
<p>In this paper, we focus on the problem of rumor modeling and extraction. Considering that most literature adopts the term &#x0201C;fake news,&#x0201D; in our work, we adopt the term &#x0201C;rumor&#x0201D; because we differentiate rumors from fake news as follows: we consider that rumors are doubtful statements that cannot be easily verified, while fake news is intentionally fabricated information presented as true. The difference, although subtle, distinguishes the data samples on which our methodology applies, from the generic ones that appear in the literature as samples of misinformation data, in general.</p>
<p>In our work, we perform temporal analysis through epidemiological models over the information diffusion on Twitter and we extract specific temporal features; the choice of Twitter is due to the availability of suitable datasets. Additionally, we analyze the graph structure of the information cascade of the communication channel and we extract graph-based features. Having extracted the features using these two techniques, we proceed to evaluate them in three classification schemes: (i) using only epidemiological features, (ii) using only graph-based features, and (iii) using a combination of epidemiological and graph-based features. More specifically, the contributions of our work are the following:
<list list-type="order">
<list-item><p>We show that epidemiological models fit rumor propagation data, especially model SEIZ, and we evaluate their performance on fake rumor classification tasks on Twitter datasets.</p></list-item>
<list-item><p>We construct a graph model of the information diffusion and we extract graph-based features of the propagation.</p></list-item>
<list-item><p>We perform binary and multi-class classification using the graph-based features and a combination of graph and epidemiological features, achieving higher performance.</p></list-item>
<list-item><p>We present a location-independent and language independent fake rumor detection method.</p></list-item>
</list></p>
<p>The paper is organized as follows. Section Related Work presents prior work. In section A Location-Independent and Language-Independent Approach, we present the epidemiological models and the graph modeling method as well as the employed classification schemes; for the epidemiological models we describe how they are employed in fitting the rumor propagation data and for the graph models how feature extraction is accomplished. Section Experiments and Evaluation presents our evaluation datasets and the results of rumor classification performance using epidemiological features, graph-based features and the combination of both. Section Conclusions presents our conclusions and directions for future work.</p></sec>
<sec id="s2">
<title>Related Work</title>
<p>Epidemiological models have been used extensively to model information diffusion in complex networks. Such models classify the human population into different compartments and define different transitions between them to simulate the spread of information. The SI (Susceptible-Infected) model was originally proposed in 2001 (Pastor-Satorras and Vespignani, <xref ref-type="bibr" rid="B25">2021</xref>) indicating that epidemiological models can help to describe the propagation of information on scale-free networks. Later another variant, the SIS (Susceptible-Infected-Susceptible) model was introduced (Newman, <xref ref-type="bibr" rid="B24">2003</xref>) and used multiple times (Gross et al., <xref ref-type="bibr" rid="B8">2006</xref>; Jin et al., <xref ref-type="bibr" rid="B9">2013</xref>, <xref ref-type="bibr" rid="B10">2014</xref>); the model allows infected users to return to the Susceptible compartment. Other more complex models have been proposed and used, such as the SEIR (Susceptible-Exposed-Infected-Removed) model (Wang et al., <xref ref-type="bibr" rid="B31">2014</xref>), the S-SEIR model (Xu et al., <xref ref-type="bibr" rid="B34">2013</xref>), where the spread of information is dependent on its value to the user, and the SCIR model (Xuejun, <xref ref-type="bibr" rid="B35">2015</xref>) where a Contacted compartment is added, modeling how followers of a certain user react after he posts an online message. Another approach (Cannarella and Spechler, <xref ref-type="bibr" rid="B3">2014</xref>) considers a modification of the SIR model where the adoption of an idea is considered an infection and its abandonment is considered a recovery. A more widely used model is the SEIZ (Susceptible-Exposed-Infected-Skeptics) model (Bettencourt et al., <xref ref-type="bibr" rid="B2">2006</xref>); this model can fit long term ideas adoption. Recently, the SEIZ was employed to model the spread of fake and real news on Twitter (Jin et al., <xref ref-type="bibr" rid="B9">2013</xref>, <xref ref-type="bibr" rid="B10">2014</xref>). The authors proposed a simple method to classify news as either true or fake using the results of the SEIZ fitting; while the model performs well, the authors have used only a small amount of viral news stories to test their hypothesis.</p>
<p>Another promising and popular research direction related to the propagation of rumors and fake news is the exploitation of the diffusion graphs. Typically, graphs are constructed, representing information diffusion paths, and then features are extracted, which are used either to interpret the propagation, or to perform classification based on a scheme. In the following sections, we discuss how graphs and graph characteristics are employed to address the information diffusion problem. There have been several approaches aiming to provide a qualitative context in the way that information is disseminated. By studying the diffusion patterns in a social network along with the significance of individual nodes (users), one may recognize specific individuals as critically important to the information diffusion (Valente, <xref ref-type="bibr" rid="B29">1996</xref>). Such users, who may be characterized as &#x0201C;opinion leaders,&#x0201D; tend to be the connection between mass media and people in the community (Katz and Lazarsfeld, <xref ref-type="bibr" rid="B11">2017</xref>). Furthermore, they are more important than average individuals in the diffusion of information (Watts and Dodds, <xref ref-type="bibr" rid="B32">2007</xref>) and have been linked with the size and the structural virality of the diffusion (Goel et al., <xref ref-type="bibr" rid="B6">2016</xref>; Meng et al., <xref ref-type="bibr" rid="B23">2018</xref>). Along the same lines of research, efforts have also been directed to understand the clustering patterns of news propagation (Gonz&#x000E1;lez-Bail&#x000F3;n and Wang, <xref ref-type="bibr" rid="B7">2013</xref>). Analysis of such clusters has yielded a deeper comprehension of the importance of &#x0201C;brokers,&#x0201D; or users that connect otherwise separated clusters. Such qualitative interpretations of the information diffusion graph structure (nodes and edges) are crucial to understand how diffusion efficiency is related to the network users (Valente and Fujimoto, <xref ref-type="bibr" rid="B30">2010</xref>). Other research efforts, usually directed toward detection or classification schemes, focus more on extracting structural graph features relevant to diffusion. While there are attempts that concentrate on graph characteristics, such as the size of the cascade (graph), the root outgoing degrees, the followers count, or the geo-coordinates (Taxidou and Fischer, <xref ref-type="bibr" rid="B28">2014</xref>), most directions employ a combination of temporal, graph, and/or content analysis (Abulaish et al., <xref ref-type="bibr" rid="B1">2019</xref>; Lu and Li, <xref ref-type="bibr" rid="B18">2020</xref>; Wu et al., <xref ref-type="bibr" rid="B33">2020</xref>; Lotfi et al., <xref ref-type="bibr" rid="B17">2021</xref>). Although most research on information diffusion and rumor propagation contains text and content analysis, such features are location-specific and do not generalize universally (Siwakoti et al., <xref ref-type="bibr" rid="B27">2021</xref>). Based on this, in our work, we employ a framework that consists of a temporal aspect in the form of an epidemiological model, and a graph-based structure in the form of the information cascade of each rumor.</p>
<p>Baseline rumor detection approaches are categorized based on their different feature engineering methods. The categories are (i) user characteristics, (ii) social context and content, and (iii) propagation structure characteristics. Handcrafted features based on user characteristics include such features as the number of followers, number of posts, and relevant information referring to the user of the social media platform (Kwon et al., <xref ref-type="bibr" rid="B14">2013</xref>; Liu et al., <xref ref-type="bibr" rid="B15">2015</xref>). Feature engineering employing social context, content, and linguistic attributes has been proven to yield important results. Such features include word sequences, phrase inquiries, and the relation of specific language with sentiment (Castillo et al., <xref ref-type="bibr" rid="B4">2011</xref>; Yang et al., <xref ref-type="bibr" rid="B36">2012</xref>; Kwon et al., <xref ref-type="bibr" rid="B14">2013</xref>, <xref ref-type="bibr" rid="B13">2017</xref>; Ma et al., <xref ref-type="bibr" rid="B20">2015</xref>, <xref ref-type="bibr" rid="B19">2016</xref>, <xref ref-type="bibr" rid="B22">2018</xref>; Zhao et al., <xref ref-type="bibr" rid="B39">2015</xref>; Liu and Wu, <xref ref-type="bibr" rid="B16">2018</xref>; Yuan et al., <xref ref-type="bibr" rid="B38">2020</xref>; Choi et al., <xref ref-type="bibr" rid="B5">2021</xref>). Features based on the structural characteristics of the propagation have also been an integral part several rumor detection approaches (Ma et al., <xref ref-type="bibr" rid="B21">2017</xref>, <xref ref-type="bibr" rid="B22">2018</xref>; Liu and Wu, <xref ref-type="bibr" rid="B16">2018</xref>). These graph-based features include information referring to the diffusion of a rumor in a social media network.</p>
<p>Several existing methods have explicitly proposed feature engineering, in order to improve the achieved accuracy (Castillo et al., <xref ref-type="bibr" rid="B4">2011</xref>; Yang et al., <xref ref-type="bibr" rid="B36">2012</xref>; Kwon et al., <xref ref-type="bibr" rid="B14">2013</xref>, <xref ref-type="bibr" rid="B13">2017</xref>; Liu and Wu, <xref ref-type="bibr" rid="B16">2018</xref>; Ma et al., <xref ref-type="bibr" rid="B22">2018</xref>). Below, we elaborate further on rumor detection approaches that include feature engineering and we discuss the baseline approaches of the field.</p>
<p>A decision tree classification (DTC) approach using word frequency, message, user, topic, and propagation characteristics to extract features has shown that such combinations allow for promising results (Castillo et al., <xref ref-type="bibr" rid="B4">2011</xref>). Another content-oriented approach includes inquiring for specific phrases in the message (Zhao et al., <xref ref-type="bibr" rid="B39">2015</xref>). The combination of content and user characteristics employing an RBF kernel Support Vector Machine (SVM-RBF) for classification (Yang et al., <xref ref-type="bibr" rid="B36">2012</xref>) has demonstrated that the combination of such features is valid and achieves good results, while Support Vector Machine algorithms have also been used to classify features combining the temporal evolution of the content, along with the relevant social context and the underlying linguistically expressed sentiment with great success (Ma et al., <xref ref-type="bibr" rid="B20">2015</xref>). A content-based approach using language pattern features from user comments and employing a Recurrent Neural Network (RNN) has also delivered important results (Ma et al., <xref ref-type="bibr" rid="B19">2016</xref>), while others have attempted to combine the structure and content, in the form of semantic analysis, to perform the classification tasks (Ma et al., <xref ref-type="bibr" rid="B22">2018</xref>). Along the same lines, content semantics and structural characteristics are the basis for feature engineering for similar attempts (Liu and Wu, <xref ref-type="bibr" rid="B16">2018</xref>). More propagation-based heavy approaches have examined the temporal evolution of the propagation tree and classification takes place using an SVM classifier (Ma et al., <xref ref-type="bibr" rid="B21">2017</xref>). A more holistic approach combines user characteristics, linguistic, and network features in a Random Forest classification scheme (Kwon et al., <xref ref-type="bibr" rid="B13">2017</xref>).</p></sec>
<sec id="s3">
<title>A Location-Independent and Language-Independent Approach</title>
<p>In this section, we present the specific epidemiological models employed to model rumor propagation in online social media. We define and describe the function of the fitting parameters of these models, and we present the governing formulae for each model. Furthermore, we elaborate on feature engineering, regarding the graph modeling component of our approach, and we present the method that combines the two modeling practices in a rumor classification scheme.</p>
<sec>
<title>Epidemiological Models</title>
<p>We use epidemiological models to model the diffusion of rumors and detect whether they are false or not, focusing on the SI and SEIZ models. The SI model is employed for its adaptability to scale-free networks (Pastor-Satorras and Vespignani, <xref ref-type="bibr" rid="B25">2021</xref>), which allows its extension to rumor propagation problems, while the SEIZ model has already been proven to model fake news diffusion effectively (Jin et al., <xref ref-type="bibr" rid="B9">2013</xref>, <xref ref-type="bibr" rid="B10">2014</xref>). Below, we present the model definitions and how they can specifically be used to model rumor diffusion on social networks.</p>
<sec>
<title>SI Model</title>
<p>The SI model classifies the total population of users (N) in two groups, namely the Susceptible (S) and Infected (I) compartments. A user is considered <italic>Infected</italic> if she/he has retweeted the original rumor tweet and <italic>Susceptible</italic> if she/he has not retweeted. Thus, a user stays indefinitely in the Infected state and cannot move back to the Susceptible state. This means that, in the beginning of the spread, the majority of users are in the Susceptible state and, after a sufficient time period, all users will end up in the Infected state. The rate of contact (state change) between the susceptible and infected populations per given unit of time dt is &#x003B2;.</p>
<p>The SI model is described formally by the following ordinary differential equations system:
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mi>S</mml:mi><mml:mi>I</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003B2;</mml:mi><mml:mi>S</mml:mi><mml:mi>I</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <italic>N</italic>(<italic>t</italic>) &#x0003D; <italic>S</italic>(<italic>t</italic>) &#x0002B; <italic>I</italic>(<italic>t</italic>)</p></sec>
<sec>
<title>SEIZ Model</title>
<p>The SEIZ model, as adopted by Jin et al. (<xref ref-type="bibr" rid="B9">2013</xref>), is composed of four different compartments, <italic>Susceptible</italic> (S), <italic>Exposed</italic> (E), <italic>Infected</italic> (I) and <italic>Skeptics</italic> (Z) and is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. The Susceptible state represents users who have not seen the original tweet, while the Exposed state represents the users who have seen the original rumor tweet but take some time before retweeting the original tweet. Infected are considered the users who have retweeted the rumor and Skeptics denote the users who have seen the tweet but have chosen not to retweet it. A Susceptible user can transfer to the Skeptics state with a rate b and probability l or to the Exposed state with probability (1&#x02013;l). At the same time, a Susceptible user can immediately believe the rumor and move to the Infected state with probability p or to the Exposed state with probability (1&#x02013;p). Finally, an Exposed user can transfer to the Infected state in 2 different ways: (i) by coming in further contact with an Infected user, with contact rate &#x003C1;, or (ii) by adopting the rumor independently with rate &#x003B5;.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The SEIZ model.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-734347-g0001.tif"/>
</fig>
<p>The SEIZ model is defined by a set of different parameters (Jin et al., <xref ref-type="bibr" rid="B9">2013</xref>), as depicted in <xref ref-type="fig" rid="F1">Figure 1</xref>. The contact rates between the different user states quantify how often a user gets in contact with a user of another state (&#x003B2;, b, e for the S-I, S-Z, E-I transitions, respectively). These rates multiplied with the probability of a user changing state when in contact with another user (l, 1-l, p and 1-p for the S-Z, S-E, S-I and S-E transitions, respectively) give the effective rate of users changing states [bl, &#x003B2;&#x003C1;, b(1-l), &#x003B2;(1-p) for the S-Z, S-I, S-E via Z, S-E via I transitions, respectively]; this effective rate is the rate at which users change states. Finally, an incubation rate (&#x003B5; for the E-I transition) defines how often a user changes state without getting in contact with any other user.</p>
<p>The SEIZ model is represented by the following ordinary differential equations system (Jin et al., <xref ref-type="bibr" rid="B9">2013</xref>):
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mi>S</mml:mi><mml:mfrac><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>b</mml:mi><mml:mi>S</mml:mi><mml:mfrac><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>p</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x003B2;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>S</mml:mi><mml:mfrac><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:mi>b</mml:mi><mml:mi>S</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003C1;</mml:mi><mml:mi>E</mml:mi><mml:mfrac><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003F5;</mml:mi><mml:mi>E</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>p</mml:mi><mml:mi>&#x003B2;</mml:mi><mml:mi>S</mml:mi><mml:mfrac><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003C1;</mml:mi><mml:mi>E</mml:mi><mml:mfrac><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003F5;</mml:mi><mml:mi>E</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E6"><label>(6)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>l</mml:mi><mml:mi>b</mml:mi><mml:mi>S</mml:mi><mml:mfrac><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p></sec>
<sec>
<title>Fitting&#x02014;Parameter Identification</title>
<p>For each rumor we quantify the different model parameters, in order to use them as features to classify the rumors. To achieve this, we calculate the optimal values for each given model as well as the optimal initial populations of the compartments and the total population (N) for each rumor. For both the SI and the SEIZ models we perform a least squares fitting on the Twitter data. As described earlier, we consider everyone who has retweeted or replied to an original tweet as Infected (I).</p>
<p>To fit the epidemiological models on the datasets we first pre-process the raw data to construct sequences that give the cumulative volume of retweets per given time unit (time interval). In our approach we use 1 min time intervals. We also observe that most of the retweets happen early on in the diffusion tree and thus we restrict the fittings of the models only to the first 240 h (or 10 days) of diffusion. More specifically, we perform several fittings using a limit of 48, 72, 120 and 240 h. Finally we fit the models using a time limit of just 4 h to evaluate if the models are able to fit the early diffusion of rumors. <xref ref-type="fig" rid="F2">Figure 2</xref> presents an example of the abovementioned fitting for both the SI and SEIZ models.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>SI and SEIZ fitting examples for the first 72 h of diffusion of a particular rumor.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-734347-g0002.tif"/>
</fig>
<p>For each fitting we measure the root mean square error (RMSE) between the fitted and the observed cumulative retweets time sequences. We then average the RMSE for each dataset to evaluate and compare the total fitting processes for every dataset and time limit.</p></sec></sec>
<sec>
<title>Graph Models</title>
<p>Graph modeling constitutes our second approach; we describe the extraction of graph-based features. Considering that we use Twitter datasets for the evaluation of our methods, the description takes into account the features of the specific datasets, although the method can be applied to graphs that are extracted from datasets originating from alternative social platforms as well. In the case of the Twitter datasets, it is important to note that the datasets are labeled. Each rumor cascade is labeled as True, False, Unverified, or Non-Rumor.</p>
<sec>
<title>Graph Modeling</title>
<p>Considering the dataset characteristics, the directed graph structure is composed of nodes that represent the Twitter users involved in the cascade, and edges that represent the retweeting action performed by the destination user on the message appearing on the source user&#x00027;s timeline. The edges&#x00027; weights are equal to the absolute time of diffusion. The root user is considered to send the message at time <italic>t</italic> = 0.</p>
<p>For the graph modeling of every cascade in the datasets, we note that the graph edge weights, which represent the time of the retweet, are absolute in value. In case of a negative weight appearing in the data (due to dataset inconsistencies), we re-calibrate the whole cascade by adding an offset value to all diffusion times, following Eq. 7. Furthermore, we consider that diffusion takes place only when a user has retweeted the message. Thus, each destination user appears only once, since a user can retweet a message at most once. We also note that, since we only consider the retweeting action as the edge connection of the graph, a source user cannot be a destination user (retweeting is only one-way). Following these considerations, the resulting rumor propagation graph is acyclic.
<disp-formula id="E7"><label>(7)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>f</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mi>f</mml:mi><mml:mi>u</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mtext>_</mml:mtext><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mrow><mml:msub><mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mi>f</mml:mi><mml:mi>u</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mtext>_</mml:mtext><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mrow><mml:msub><mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0003C;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
In <xref ref-type="fig" rid="F3">Figure 3</xref>, we present the graphical representation of a false story cascade from Twitter16, which demonstrates the above characteristics. In this example, the root is clearly the user node with the most outgoing edges.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Graph of false story cascade from the Twitter16 dataset.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-734347-g0003.tif"/>
</fig></sec>
<sec>
<title>Feature Extraction</title>
<p>Given the constructed graph, we extract the relevant features. For each cascade, the graph-based feature set contains the following:
<list list-type="bullet">
<list-item><p>Average degree: The average number of outgoing edges for every user.</p></list-item>
<list-item><p>Root degree: The total outgoing edges of the root user.</p></list-item>
<list-item><p>Structural virality: A graph theoretic metric for measuring the structural diversity of information diffusion. Describes the manner of diffusion; single broadcast or through several cascading individual nodes.</p></list-item>
<list-item><p>Closeness centrality: A measure of the nodes&#x00027; capability to spread information efficiently through the graph. It measures the inverse distance of a node to all other nodes.</p></list-item>
<list-item><p>Number of users: The number of unique Twitter users involved in a cascade.</p></list-item>
<list-item><p>Max hops: The maximum number of hops (node to node via edge) possible in a cascade.</p></list-item>
<list-item><p>Nodes of levels 0 and 1: The number of users at one and two degrees of separation from the root user respectively.</p></list-item>
<list-item><p>Baseline average diffusion time for levels 0 and 1: The average diffusion time for users at one and two degrees of separation away from the root user respectively (average calculated by sum of edge weights divided by population involved on level).</p></list-item>
<list-item><p>Average diffusion time (averaged over all cascade users) for levels 0 and 1: The average diffusion time for users at one and two degrees of separation away from the root user respectively (average calculated by sum of edge weights divided by population involved in whole cascade).</p></list-item>
</list></p>
<p>The last two features are both closely related to the levels of the diffusion. We note that for these features, the calculations are performed with no upper time threshold. However, we also calculate both of them using an upper time threshold, so as to simulate early detection and be relevant to the epidemiological model results. The calculations involve upper limits of 4, 24, 48, and 72 h.</p>
<p>We use the collected features for the calculation of the average diffusion times per level of diffusion for each type of cascade (True, False, Unverified, Non-Rumor). We calculate the average diffusion time per level as follows:
<disp-formula id="E8"><label>(8)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>a</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mo>&#x02211;</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mi>f</mml:mi><mml:mi>u</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mtext>_</mml:mtext><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mtext>_</mml:mtext><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mtext>_</mml:mtext><mml:mi>u</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>s</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02208;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mtext>_</mml:mtext><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mtext>_</mml:mtext><mml:mi>u</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>s</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02208;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>&#x000B7;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mfrac><mml:mrow><mml:mi>n</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mtext>_</mml:mtext><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mtext>_</mml:mtext><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mtext>_</mml:mtext><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mtext>_</mml:mtext><mml:mi>n</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mtext>_</mml:mtext><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mtext>_</mml:mtext><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mtext>_</mml:mtext><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mtext>_</mml:mtext><mml:mi>s</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mtext>_</mml:mtext><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where the number of level instances refers to the number of cascades that have max hops greater or equal than the specific level.</p></sec></sec>
<sec>
<title>Graph and Epidemiological Models Combination</title>
<p>After identifying the appropriate epidemiological models and extracting the relevant features, we perform tests to find a potential correlation between the epidemiological and the graph-based features. It has not been feasible to identify such correlation. Thus, we merge the two feature sets, in order to arrive at higher classification accuracy. The motivation to merge the two feature sets originates from the fact that the epidemiological features represent temporal characteristics of the rumor diffusion, whereas the graph-based features represent structural characteristics of the diffusion cascade of each rumor. So, combining them, enables a holistic approach to the analysis of the rumor diffusion process in space and time.</p></sec>
<sec>
<title>Classification</title>
<p>We test our models for rumor detection and classification. First, we use the fitted epidemiological model parameters, to classify the rumors using the labels provided in the dataset. We consider two distinct classification tasks. One where we perform a multi-class classification predicting one of the four labels: True rumor, False rumor, Unverified rumor and Non-rumor. We then perform a binary classification, where we only consider the True and False rumors. To train our classifiers we use a Gradient Boosting Trees algorithm using the fitted model parameters as features and performing a very light hyper parameter tuning. Regarding both the graph-based features and the combined feature set containing graph and epidemiological features, we again employ Gradient Boosting Trees, because both graph and epidemiological features are easily tabulated and the algorithm works well for numerical and categorical values, as in our case. The Gradient Boosting Trees algorithm allows for the sequential connection of individual decision trees and requires very little data pre-processing; thus, it fits very well the particular data of the rumor diffusion problem.</p>
<p>We split the dataset 75/25 as a training and test set; this is typical in the prior work that uses these datasets. To evaluate the performance of the classifiers, we measure their accuracy, precision and recall on both classification tasks.</p>
<p>Next, we test whether the SEIZ model can detect false rumors by using a single ratio RSI of the fitted parameters as suggested by Jin et al. (<xref ref-type="bibr" rid="B9">2013</xref>). Their work suggests that larger ratios correspond to real news propagation, while smaller values to fake news spreading. The RSI value is a combination of the SEIZ fitted parameters for each rumor and we derive it in Eq. 9, where p, &#x003B2;, l, b, &#x003C1; and &#x003B5; are the SEIZ fitted parameters:
<disp-formula id="E10"><label>(9)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mi>I</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>p</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>&#x003B2;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003C1;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003F5;</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
In <xref ref-type="fig" rid="F4">Figure 4</xref>, we present the RSI values of different rumors; we denote the True rumors with blue color and the Fake ones with red.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>RSI values of different rumors. Red = False rumors, Blue = True rumors.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-734347-g0004.tif"/>
</fig>
<p>One can easily observe that it is difficult to find a specific threshold value of RSI that would adequately classify the rumors. Based on the results we consider a more complex detection technique where we use a K-nearest neighbors algorithm, classifying a rumor based on its &#x0201C;closest neighbors.&#x0201D; We test two different values for the number of neighbors we consider for each rumor, five and 20 neighbors.</p></sec></sec>
<sec id="s4">
<title>Experiments and Evaluation</title>
<p>In this section we present our experiments, including the necessary pre-processing, and our evaluation results. We present the results of rumor modeling and classification using epidemiological features, graph-based features as well as using their combination. Finally, we compare our method with other similar existing approaches to misinformation diffusion.</p>
<sec>
<title>Dataset and Data Pre-Processing</title>
<p>We work with two well-known publicly available datasets, Twitter15 and Twitter16 (Ma et al., <xref ref-type="bibr" rid="B21">2017</xref>). The datasets describe the diffusion trees of various rumors on Twitter. More specifically, they provide temporal information about every retweet of an original tweet containing a specific rumor posted by a single root user at <italic>t</italic> = 0.</p>
<p>An important limitation we should consider is that every rumor included in the datasets has a single root, meaning that we only have information about how the rumor propagated after a single user shared it on Twitter. In reality we would expect that rumors would have multiple users sharing them independently thus creating multiple independent parallel propagation trees.</p>
<p>We use the above mentioned Twitter15 and Twitter16, since, although other Twitter datasets may be publicly available, most of them do not publish the actual propagation information and instead provide tweets and users unique identifiers (IDs) that can be used to reconstruct the actual propagation tree using the Twitter API. This makes it very difficult to reproduce the original propagation sequences as significant numbers of users delete their own tweets and some even their accounts.</p>
<p>The datasets consist of 1,490 and 818 rumor propagation trees, respectively. The rumors themselves are labeled as True rumors, False rumors, Unverified rumors or Non-rumor events (Ma et al., <xref ref-type="bibr" rid="B21">2017</xref>). True, False and Unverified labels refer to tweets containing unsubstantiated claims that could not be verified at the time of posting. Tweets that eventually get verified are labeled as True, tweets proved to contain fake claims are labeled as False, while tweets that contain information that can be neither confirmed nor disproved are labeled as Unverified. Finally, tweets that contain legitimate, fact-based information are considered Non-rumors. <xref ref-type="table" rid="T1">Table 1</xref> summarizes key statistics of the dataset (Ma et al., <xref ref-type="bibr" rid="B21">2017</xref>).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Statistics of Twitter15 and Twitter16 datasets (Ma et al., <xref ref-type="bibr" rid="B21">2017</xref>).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Statistic</bold></th>
<th valign="top" align="center"><bold>Twitter15</bold></th>
<th valign="top" align="center"><bold>Twitter16</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Number of users</td>
<td valign="top" align="center">276,663</td>
<td valign="top" align="center">173,487</td>
</tr>
<tr>
<td valign="top" align="left">Number of tweets</td>
<td valign="top" align="center">1,490</td>
<td valign="top" align="center">818</td>
</tr>
<tr>
<td valign="top" align="left">Average time length/tree</td>
<td valign="top" align="center">1,337 h</td>
<td valign="top" align="center">848 h</td>
</tr>
<tr>
<td valign="top" align="left">Average posts/tree</td>
<td valign="top" align="center">223</td>
<td valign="top" align="center">251</td>
</tr>
<tr>
<td valign="top" align="left">Max posts/tree</td>
<td valign="top" align="center">1,768</td>
<td valign="top" align="center">2,765</td>
</tr>
<tr>
<td valign="top" align="left">Minimum posts / tree</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">81</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We also perform some minimal data cleaning because we identified some minor inconsistencies in the dataset. Specifically, we noticed that some propagation trees contained negative timestamps and did not uniformly designate the root user who first tweeted the rumor. To correct these inconsistencies, we padded all timestamps in the affected propagation trees to make them positive and rearranged the trees so that root users appear in all rumors consistently.</p></sec>
<sec>
<title>Epidemiological Model Results</title>
<p><xref ref-type="table" rid="T2">Table 2</xref> presents the average Root Mean Square Error (RMSE) for the different time limits for Twitter15 and Twitter16 datasets (lower values denote better fitting). We observe that the SI model, despite its simplicity, fits the data very well, while it marginally outperforms the more complex SEIZ model on the 240 h time limit experiment. Despite that, the SEIZ model performs better on all other experimental setups, especially on the early detection tasks where the diffusion is more rapid.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Twitter15 and Twitter16 fitting error (RMSE).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Twitter15</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Twitter16</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>SI</bold></th>
<th valign="top" align="center"><bold>SEIZ</bold></th>
<th valign="top" align="center"><bold>SI</bold></th>
<th valign="top" align="center"><bold>SEIZ</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">240 h</td>
<td valign="top" align="center">19.55</td>
<td valign="top" align="center">21.592</td>
<td valign="top" align="center">21.38</td>
<td valign="top" align="center">25.578</td>
</tr>
<tr>
<td valign="top" align="left">120 h</td>
<td valign="top" align="center">24.69</td>
<td valign="top" align="center">24.41</td>
<td valign="top" align="center">26.08</td>
<td valign="top" align="center">29.661</td>
</tr>
<tr>
<td valign="top" align="left">72 h</td>
<td valign="top" align="center">29.29</td>
<td valign="top" align="center">26.83</td>
<td valign="top" align="center">30.98</td>
<td valign="top" align="center">28.075</td>
</tr>
<tr>
<td valign="top" align="left">48 h</td>
<td valign="top" align="center">32.88</td>
<td valign="top" align="center">24.126</td>
<td valign="top" align="center">34.86</td>
<td valign="top" align="center">29.232</td>
</tr>
<tr>
<td valign="top" align="left">4 h</td>
<td valign="top" align="center">35.24</td>
<td valign="top" align="center">6.855</td>
<td valign="top" align="center">34.66</td>
<td valign="top" align="center">6.205</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="table" rid="T3">Table 3</xref> presents the multiclass classification results for both models, where rumors are classified as True, False, Unverified or Non-Rumor in both Twitter15 and Twitter16 datasets. <xref ref-type="table" rid="T4">Table 4</xref> presents the binary classification results, where rumors are classified as either True or False. The baseline accuracy for the multiclass classification is approximately 0.25 as 4 different classes are present and approximately 0.5 for the binary classification task as the dataset is almost perfectly balanced. The results indicate that both models perform almost the same at classifying the rumors. At the same time, the different time limits have low impact on performance. Interestingly, all models perform better on the Twitter16 dataset, indicating that Twitter16 is an easier dataset for our classifiers.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Twitter15 and Twitter16 SI and SEIZ 4 class classification results.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>Twitter15</bold></th>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>Twitter16</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1 score</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1 score</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="9"><bold>SI</bold></td>
</tr>
<tr>
<td valign="top" align="left">240 h</td>
<td valign="top" align="center">38.98</td>
<td valign="top" align="center">37.80</td>
<td valign="top" align="center">37.56</td>
<td valign="top" align="center">38.98</td>
<td valign="top" align="center">46.83</td>
<td valign="top" align="center">45.68</td>
<td valign="top" align="center">45.90</td>
<td valign="top" align="center">46.83</td>
</tr>
<tr>
<td valign="top" align="left">120 h</td>
<td valign="top" align="center"><bold>42.47</bold></td>
<td valign="top" align="center"><bold>42.05</bold></td>
<td valign="top" align="center"><bold>41.86</bold></td>
<td valign="top" align="center"><bold>42.47</bold></td>
<td valign="top" align="center"><bold>49.76</bold></td>
<td valign="top" align="center"><bold>49.47</bold></td>
<td valign="top" align="center"><bold>49.91</bold></td>
<td valign="top" align="center"><bold>49.76</bold></td>
</tr>
<tr>
<td valign="top" align="left">72 h</td>
<td valign="top" align="center">41.13</td>
<td valign="top" align="center">40.12</td>
<td valign="top" align="center">39.78</td>
<td valign="top" align="center">41.13</td>
<td valign="top" align="center">48.29</td>
<td valign="top" align="center">47.38</td>
<td valign="top" align="center">47.83</td>
<td valign="top" align="center">48.29</td>
</tr>
<tr>
<td valign="top" align="left">48 h</td>
<td valign="top" align="center">37.90</td>
<td valign="top" align="center">36.63</td>
<td valign="top" align="center">36.41</td>
<td valign="top" align="center">37.90</td>
<td valign="top" align="center">42.44</td>
<td valign="top" align="center">42.33</td>
<td valign="top" align="center">42.35</td>
<td valign="top" align="center">42.44</td>
</tr>
<tr>
<td valign="top" align="left">4 h</td>
<td valign="top" align="center">33.24</td>
<td valign="top" align="center">32.33</td>
<td valign="top" align="center">32.03</td>
<td valign="top" align="center">33.24</td>
<td valign="top" align="center">38.73</td>
<td valign="top" align="center">38.90</td>
<td valign="top" align="center">39.49</td>
<td valign="top" align="center">38.73</td>
</tr>
<tr>
<td valign="top" align="left" colspan="9"><bold>SEIZ</bold></td>
</tr>
<tr>
<td valign="top" align="left">240 h</td>
<td valign="top" align="center"><bold>39.52</bold></td>
<td valign="top" align="center"><bold>38.31</bold></td>
<td valign="top" align="center"><bold>38.15</bold></td>
<td valign="top" align="center"><bold>39.52</bold></td>
<td valign="top" align="center">46.34</td>
<td valign="top" align="center">45.49</td>
<td valign="top" align="center">46.25</td>
<td valign="top" align="center">46.34</td>
</tr>
<tr>
<td valign="top" align="left">120 h</td>
<td valign="top" align="center">37.10</td>
<td valign="top" align="center">36.67</td>
<td valign="top" align="center">36.50</td>
<td valign="top" align="center">37.10</td>
<td valign="top" align="center"><bold>47.80</bold></td>
<td valign="top" align="center"><bold>47.98</bold></td>
<td valign="top" align="center"><bold>49.49</bold></td>
<td valign="top" align="center"><bold>47.80</bold></td>
</tr>
<tr>
<td valign="top" align="left">72 h</td>
<td valign="top" align="center">36.56</td>
<td valign="top" align="center">35.64</td>
<td valign="top" align="center">35.51</td>
<td valign="top" align="center">36.56</td>
<td valign="top" align="center">45.37</td>
<td valign="top" align="center">45.61</td>
<td valign="top" align="center">46.16</td>
<td valign="top" align="center">45.37</td>
</tr>
<tr>
<td valign="top" align="left">48 h</td>
<td valign="top" align="center">37.90</td>
<td valign="top" align="center">36.98</td>
<td valign="top" align="center">36.85</td>
<td valign="top" align="center">37.90</td>
<td valign="top" align="center">40.49</td>
<td valign="top" align="center">40.26</td>
<td valign="top" align="center">40.47</td>
<td valign="top" align="center">40.49</td>
</tr>
<tr>
<td valign="top" align="left">4 h</td>
<td valign="top" align="center">36.76</td>
<td valign="top" align="center">35.62</td>
<td valign="top" align="center">35.46</td>
<td valign="top" align="center">36.76</td>
<td valign="top" align="center">36.76</td>
<td valign="top" align="center">36.61</td>
<td valign="top" align="center">36.84</td>
<td valign="top" align="center">36.76</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The best performing experimental setups are provided in bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Twitter15 and Twitter16 SI and SEIZ binary classification results.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>Twitter15</bold></th>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>Twitter16</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1 score</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="9"><bold>SI</bold></td>
</tr>
<tr>
<td valign="top" align="left">240 h</td>
<td valign="top" align="center">54.59</td>
<td valign="top" align="center">54.57</td>
<td valign="top" align="center">54.49</td>
<td valign="top" align="center">54.59</td>
<td valign="top" align="center">63.11</td>
<td valign="top" align="center">63.07</td>
<td valign="top" align="center">63.21</td>
<td valign="top" align="center">63.11</td>
</tr>
<tr>
<td valign="top" align="left">120 h</td>
<td valign="top" align="center">50.27</td>
<td valign="top" align="center">50.25</td>
<td valign="top" align="center">50.28</td>
<td valign="top" align="center">50.27</td>
<td valign="top" align="center">64.08</td>
<td valign="top" align="center">64.07</td>
<td valign="top" align="center">64.08</td>
<td valign="top" align="center">64.08</td>
</tr>
<tr>
<td valign="top" align="left">72 h</td>
<td valign="top" align="center">52.43</td>
<td valign="top" align="center">52.43</td>
<td valign="top" align="center">52.43</td>
<td valign="top" align="center">52.43</td>
<td valign="top" align="center">66.02</td>
<td valign="top" align="center">66.00</td>
<td valign="top" align="center">66.09</td>
<td valign="top" align="center">66.02</td>
</tr>
<tr>
<td valign="top" align="left">48 h</td>
<td valign="top" align="center"><bold>54.59</bold></td>
<td valign="top" align="center"><bold>54.57</bold></td>
<td valign="top" align="center"><bold>54.59</bold></td>
<td valign="top" align="center"><bold>54.59</bold></td>
<td valign="top" align="center">66.02</td>
<td valign="top" align="center">66.02</td>
<td valign="top" align="center">66.03</td>
<td valign="top" align="center">66.02</td>
</tr>
<tr>
<td valign="top" align="left">4 h</td>
<td valign="top" align="center">50.00</td>
<td valign="top" align="center">49.98</td>
<td valign="top" align="center">50.14</td>
<td valign="top" align="center">50.00</td>
<td valign="top" align="center"><bold>67.96</bold></td>
<td valign="top" align="center"><bold>67.69</bold></td>
<td valign="top" align="center"><bold>68.47</bold></td>
<td valign="top" align="center"><bold>67.96</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="9"><bold>SEIZ</bold></td>
</tr>
<tr>
<td valign="top" align="left">240 h</td>
<td valign="top" align="center">51.89</td>
<td valign="top" align="center">51.86</td>
<td valign="top" align="center">51.91</td>
<td valign="top" align="center">51.89</td>
<td valign="top" align="center">60.19</td>
<td valign="top" align="center">60.15</td>
<td valign="top" align="center">60.21</td>
<td valign="top" align="center">60.19</td>
</tr>
<tr>
<td valign="top" align="left">120 h</td>
<td valign="top" align="center">49.19</td>
<td valign="top" align="center">49.11</td>
<td valign="top" align="center">49.21</td>
<td valign="top" align="center">49.19</td>
<td valign="top" align="center">63.11</td>
<td valign="top" align="center">63.09</td>
<td valign="top" align="center">63.11</td>
<td valign="top" align="center">63.11</td>
</tr>
<tr>
<td valign="top" align="left">72 h</td>
<td valign="top" align="center">52.43</td>
<td valign="top" align="center">52.42</td>
<td valign="top" align="center">52.45</td>
<td valign="top" align="center">52.43</td>
<td valign="top" align="center">65.05</td>
<td valign="top" align="center">65.01</td>
<td valign="top" align="center">65.17</td>
<td valign="top" align="center">65.05</td>
</tr>
<tr>
<td valign="top" align="left">48 h</td>
<td valign="top" align="center">54.59</td>
<td valign="top" align="center">54.45</td>
<td valign="top" align="center">54.62</td>
<td valign="top" align="center">54.89</td>
<td valign="top" align="center"><bold>68.93</bold></td>
<td valign="top" align="center"><bold>68.91</bold></td>
<td valign="top" align="center"><bold>68.95</bold></td>
<td valign="top" align="center"><bold>68.93</bold></td>
</tr>
<tr>
<td valign="top" align="left">4 h</td>
<td valign="top" align="center"><bold>54.89</bold></td>
<td valign="top" align="center"><bold>54.88</bold></td>
<td valign="top" align="center"><bold>55.03</bold></td>
<td valign="top" align="center"><bold>54.89</bold></td>
<td valign="top" align="center">66.02</td>
<td valign="top" align="center">65.98</td>
<td valign="top" align="center">66.05</td>
<td valign="top" align="center">66.02</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The best performing experimental setups are provided in bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p><xref ref-type="table" rid="T5">Table 5</xref> presents binary classification results (True or False), using a nearest neighbor algorithm and considering the 5 and 20 nearest neighbors. For this classification, we use the fitted SEIZ parameters, combining them in a single RSI value for every rumor. Despite the simplicity of the technique, we get very similar results as we get from using the more complex Gradient Boosting algorithm. Thus, we confirm that the RSI values can be used (to a certain extent) to detect fake rumors as suggested by Jin et al. (<xref ref-type="bibr" rid="B9">2013</xref>). However, no single threshold value that splits the data efficiently was identified in our experiments.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>RSI classification results for Twitter15 and Twitter16 datasets.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>5 Nearest Neighbors</bold></th>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>20 Nearest Neighbors</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1 score</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1 score</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="9"><bold>RSI for Twitter15</bold></td>
</tr>
<tr>
<td valign="top" align="left">240 hours</td>
<td valign="top" align="center"><bold>51.35</bold></td>
<td valign="top" align="center"><bold>51.01</bold></td>
<td valign="top" align="center"><bold>51.44</bold></td>
<td valign="top" align="center"><bold>51.35</bold></td>
<td valign="top" align="center"><bold>54.05</bold></td>
<td valign="top" align="center"><bold>53.91</bold></td>
<td valign="top" align="center"><bold>54.14</bold></td>
<td valign="top" align="center"><bold>54.05</bold></td>
</tr>
<tr>
<td valign="top" align="left">120 hours</td>
<td valign="top" align="center">48.62</td>
<td valign="top" align="center">48.62</td>
<td valign="top" align="center">48.62</td>
<td valign="top" align="center">48.62</td>
<td valign="top" align="center">49.19</td>
<td valign="top" align="center">49.11</td>
<td valign="top" align="center">49.21</td>
<td valign="top" align="center">49.19</td>
</tr>
<tr>
<td valign="top" align="left">72 hours</td>
<td valign="top" align="center">49.24</td>
<td valign="top" align="center">49.24</td>
<td valign="top" align="center">49.24</td>
<td valign="top" align="center">49.24</td>
<td valign="top" align="center">52.97</td>
<td valign="top" align="center">52.92</td>
<td valign="top" align="center">53.01</td>
<td valign="top" align="center">52.97</td>
</tr>
<tr>
<td valign="top" align="left">48 hours</td>
<td valign="top" align="center">48.61</td>
<td valign="top" align="center">48.61</td>
<td valign="top" align="center">48.61</td>
<td valign="top" align="center">48.61</td>
<td valign="top" align="center">49.73</td>
<td valign="top" align="center">49.62</td>
<td valign="top" align="center">49.75</td>
<td valign="top" align="center">49.73</td>
</tr>
<tr>
<td valign="top" align="left">4 hours</td>
<td valign="top" align="center">50.57</td>
<td valign="top" align="center">50.57</td>
<td valign="top" align="center">50.57</td>
<td valign="top" align="center">50.57</td>
<td valign="top" align="center">50.00</td>
<td valign="top" align="center">48.36</td>
<td valign="top" align="center">50.73</td>
<td valign="top" align="center">50.00</td>
</tr>
<tr>
<td valign="top" align="left" colspan="9"><bold>RSI for Twitter16</bold></td>
</tr>
<tr>
<td valign="top" align="left">240 hours</td>
<td valign="top" align="center"><bold>64.08</bold></td>
<td valign="top" align="center"><bold>64.08</bold></td>
<td valign="top" align="center"><bold>64.09</bold></td>
<td valign="top" align="center"><bold>64.08</bold></td>
<td valign="top" align="center">58.25</td>
<td valign="top" align="center">57.16</td>
<td valign="top" align="center">59.40</td>
<td valign="top" align="center">58.25</td>
</tr>
<tr>
<td valign="top" align="left">120 hours</td>
<td valign="top" align="center">52.43</td>
<td valign="top" align="center">52.43</td>
<td valign="top" align="center">52.50</td>
<td valign="top" align="center">52.43</td>
<td valign="top" align="center">55.34</td>
<td valign="top" align="center">54.31</td>
<td valign="top" align="center">56.05</td>
<td valign="top" align="center">55.34</td>
</tr>
<tr>
<td valign="top" align="left">72 hours</td>
<td valign="top" align="center">54.37</td>
<td valign="top" align="center">54.32</td>
<td valign="top" align="center">54.36</td>
<td valign="top" align="center">54.37</td>
<td valign="top" align="center">54.37</td>
<td valign="top" align="center">54.06</td>
<td valign="top" align="center">54.59</td>
<td valign="top" align="center">54.37</td>
</tr>
<tr>
<td valign="top" align="left">48 hours</td>
<td valign="top" align="center">61.17</td>
<td valign="top" align="center">61.05</td>
<td valign="top" align="center">61.36</td>
<td valign="top" align="center">61.17</td>
<td valign="top" align="center"><bold>63.11</bold></td>
<td valign="top" align="center"><bold>62.64</bold></td>
<td valign="top" align="center"><bold>63.94</bold></td>
<td valign="top" align="center"><bold>63.11</bold></td>
</tr>
<tr>
<td valign="top" align="left">4 hours</td>
<td valign="top" align="center">51.46</td>
<td valign="top" align="center">51.43</td>
<td valign="top" align="center">51.44</td>
<td valign="top" align="center">51.46</td>
<td valign="top" align="center">50.49</td>
<td valign="top" align="center">48.64</td>
<td valign="top" align="center">50.36</td>
<td valign="top" align="center">50.49</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The best performing experimental setups are provided in bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>Although epidemiological models yield reasonable results, all models fail to adequately classify the rumors, especially in the Twitter15 dataset. The limitations are mainly two: (i) epidemiological models fail to account for the structural components of the diffusion tree and (ii) machine learning models are trained on fitted parameters -instead of the real data- and the models ultimately produce underperforming detection results despite the robust fitting.</p></sec>
<sec>
<title>Graph Model Results</title>
<p>Graph-based modeling is exploited to capture the structural components of the diffusion mechanism, in order to overcome the limitations of the epidemiological models in rumor classification.</p>
<p><xref ref-type="fig" rid="F5">Figures 5</xref>, <xref ref-type="fig" rid="F6">6</xref> show the average diffusion time per level (Eq. 9) for the Twitter15 and Twitter16 datasets, respectively. The calculation of the average diffusion time incorporates information about the population of users of each cascade in the form of maximum hops from the root user. We identify one similarity among the two datasets: the cascades of False label propagate at a much slower rate than the cascades of any other label (True, Non-Rumor, Unverified) in the two initial levels of diffusion. We exploit this observation to establish an early detection mechanism.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Weighted average diffusion time per diffusion level&#x02014;Twitter15.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-734347-g0005.tif"/>
</fig>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Weighted average diffusion time per diffusion level&#x02014;Twitter16.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-734347-g0006.tif"/>
</fig>
<p>Using the graph-based features, we perform binary and 4-class classification. For these two classification schemes, we employ a Gradient Boosting algorithm (as described in section classification) and we present the results for the binary and 4-class classifications in <xref ref-type="table" rid="T6">Tables 6</xref>, <xref ref-type="table" rid="T7">7</xref>, respectively. The feature set regarding the graph-based features is denoted as &#x0201C;Graph &#x0002B; &#x0003C; X Hours ADT,&#x0201D; where X denotes the upper time threshold for the Average Diffusion Time calculation for the particular feature set. By producing results based on an upper time threshold for the Average Diffusion Time, we can evaluate whether graph-based features may be used efficiently for early misinformation detection. Regarding the Twitter15 dataset, we observe that for both binary and 4-class classifications, the best accuracy is achieved in the under-24-h time threshold (60.22% and 40.48%, respectively). However, for the Twitter16 dataset, the best accuracy is achieved using the full extent time for the binary classification (82.52%) and the under-48-h time threshold for the 4-class classification (60%). The Twitter15 dataset is much larger and contains more users and tweets than Twitter16, which seems to be an easier dataset for our classifiers. Given the observed discrepancy in the results between the two datasets, we cannot safely conclude on whether using solely graph-based features is sufficient for an early detection mechanism.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>Twitter15 and 16 Graph-based and combined feature sets&#x02014;binary classification results with labels = {true, false}.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="8" style="border-bottom: thin solid #000000;"><bold>Graph-based feature set binary classification</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>Twitter15</bold></th>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>Twitter16</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Feature Set</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;4 h ADT</td>
<td valign="top" align="center">58.60</td>
<td valign="top" align="center">59.69</td>
<td valign="top" align="center">58.16</td>
<td valign="top" align="center">61.29</td>
<td valign="top" align="center">75.73</td>
<td valign="top" align="center">76.19</td>
<td valign="top" align="center">72.73</td>
<td valign="top" align="center">80.00</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;24 h ADT</td>
<td valign="top" align="center">60.22</td>
<td valign="top" align="center">59.34</td>
<td valign="top" align="center">60.67</td>
<td valign="top" align="center">58.06</td>
<td valign="top" align="center">77.67</td>
<td valign="top" align="center">77.67</td>
<td valign="top" align="center">75.47</td>
<td valign="top" align="center">80.00</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;48 h ADT</td>
<td valign="top" align="center">57.53</td>
<td valign="top" align="center">56.83</td>
<td valign="top" align="center">57.78</td>
<td valign="top" align="center">55.91</td>
<td valign="top" align="center">75.73</td>
<td valign="top" align="center">77.06</td>
<td valign="top" align="center">71.19</td>
<td valign="top" align="center">84.00</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;72 h ADT</td>
<td valign="top" align="center"><bold>60.22</bold></td>
<td valign="top" align="center"><bold>61.46</bold></td>
<td valign="top" align="center"><bold>59.60</bold></td>
<td valign="top" align="center"><bold>63.44</bold></td>
<td valign="top" align="center">77.50</td>
<td valign="top" align="center">78.50</td>
<td valign="top" align="center">73.68</td>
<td valign="top" align="center">84.00</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; Total ADT</td>
<td valign="top" align="center">57.53</td>
<td valign="top" align="center">58.64</td>
<td valign="top" align="center">57.14</td>
<td valign="top" align="center">60.22</td>
<td valign="top" align="center"><bold>82.52</bold></td>
<td valign="top" align="center"><bold>82.69</bold></td>
<td valign="top" align="center"><bold>79.63</bold></td>
<td valign="top" align="center"><bold>86.00</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="8">Graph &#x0002B; SEIZ feature set binary classification</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;4 h ADT &#x0002B; SEIZ</td>
<td valign="top" align="center">62.24</td>
<td valign="top" align="center">56.98</td>
<td valign="top" align="center">59.76</td>
<td valign="top" align="center">54.44</td>
<td valign="top" align="center">73.79</td>
<td valign="top" align="center">74.29</td>
<td valign="top" align="center">70.91</td>
<td valign="top" align="center">78.00</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;48 h ADT &#x0002B; SEIZ</td>
<td valign="top" align="center"><bold>63.13</bold></td>
<td valign="top" align="center"><bold>61.38</bold></td>
<td valign="top" align="center"><bold>58.00</bold></td>
<td valign="top" align="center"><bold>65.17</bold></td>
<td valign="top" align="center">75.73</td>
<td valign="top" align="center">76.64</td>
<td valign="top" align="center">71.93</td>
<td valign="top" align="center">82.00</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;72 h ADT &#x0002B; SEIZ</td>
<td valign="top" align="center">58.08</td>
<td valign="top" align="center">56.08</td>
<td valign="top" align="center">53.00</td>
<td valign="top" align="center">59.55</td>
<td valign="top" align="center"><bold>77.67</bold></td>
<td valign="top" align="center"><bold>78.10</bold></td>
<td valign="top" align="center"><bold>74.55</bold></td>
<td valign="top" align="center"><bold>82.00</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The best performing experimental setups are provided in bold</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p>Twitter15 and 16 Graph-based and combined feature sets &#x02212;4-class classification results.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="8" style="border-bottom: thin solid #000000;"><bold>Graph-based feature set 4-class classification</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>Twitter15</bold></th>
<th valign="top" align="center" colspan="4" style="border-bottom: thin solid #000000;"><bold>Twitter16</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>Feature Set</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;4 h ADT</td>
<td valign="top" align="center">37.27</td>
<td valign="top" align="center">37.66</td>
<td valign="top" align="center">38.54</td>
<td valign="top" align="center">37.27</td>
<td valign="top" align="center">51.71</td>
<td valign="top" align="center">51.80</td>
<td valign="top" align="center">52.60</td>
<td valign="top" align="center">51.71</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;24 h ADT</td>
<td valign="top" align="center"><bold>40.48</bold></td>
<td valign="top" align="center"><bold>40.19</bold></td>
<td valign="top" align="center"><bold>40.19</bold></td>
<td valign="top" align="center"><bold>40.48</bold></td>
<td valign="top" align="center">56.10</td>
<td valign="top" align="center">56.01</td>
<td valign="top" align="center">57.34</td>
<td valign="top" align="center">56.10</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;48 h ADT</td>
<td valign="top" align="center">39.68</td>
<td valign="top" align="center">39.20</td>
<td valign="top" align="center">38.95</td>
<td valign="top" align="center">39.68</td>
<td valign="top" align="center"><bold>60.00</bold></td>
<td valign="top" align="center"><bold>59.42</bold></td>
<td valign="top" align="center"><bold>59.95</bold></td>
<td valign="top" align="center"><bold>60.00</bold></td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;72 h ADT</td>
<td valign="top" align="center">37.53</td>
<td valign="top" align="center">37.09</td>
<td valign="top" align="center">36.99</td>
<td valign="top" align="center">37.53</td>
<td valign="top" align="center">59.02</td>
<td valign="top" align="center">58.75</td>
<td valign="top" align="center">59.10</td>
<td valign="top" align="center">59.02</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; Total ADT</td>
<td valign="top" align="center">38.34</td>
<td valign="top" align="center">38.41</td>
<td valign="top" align="center">39.02</td>
<td valign="top" align="center">38.34</td>
<td valign="top" align="center">52.20</td>
<td valign="top" align="center">51.86</td>
<td valign="top" align="center">51.75</td>
<td valign="top" align="center">52.20</td>
</tr>
<tr>
<td valign="top" align="left" colspan="9">Graph &#x0002B; SEIZ feature set 4-class classification</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;4 h ADT &#x0002B; SEIZ</td>
<td valign="top" align="center"><bold>46.54</bold></td>
<td valign="top" align="center"><bold>44.74</bold></td>
<td valign="top" align="center"><bold>44.72</bold></td>
<td valign="top" align="center"><bold>46.54</bold></td>
<td valign="top" align="center">56.91</td>
<td valign="top" align="center">55.88</td>
<td valign="top" align="center">56.00</td>
<td valign="top" align="center">56.91</td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;48 h ADT &#x0002B; SEIZ</td>
<td valign="top" align="center">39.34</td>
<td valign="top" align="center">37.47</td>
<td valign="top" align="center">36.51</td>
<td valign="top" align="center">39.34</td>
<td valign="top" align="center"><bold>62.35</bold></td>
<td valign="top" align="center"><bold>61.42</bold></td>
<td valign="top" align="center"><bold>61.88</bold></td>
<td valign="top" align="center"><bold>62.35</bold></td>
</tr>
<tr>
<td valign="top" align="left">Graph &#x0002B; &#x0003C;72 h ADT &#x0002B; SEIZ</td>
<td valign="top" align="center">42.15</td>
<td valign="top" align="center">40.50</td>
<td valign="top" align="center">39.93</td>
<td valign="top" align="center">42.15</td>
<td valign="top" align="center">52.23</td>
<td valign="top" align="center">52.48</td>
<td valign="top" align="center">53.51</td>
<td valign="top" align="center">52.23</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The best performing experimental setups are provided in bold</italic>.</p>
</table-wrap-foot>
</table-wrap></sec>
<sec>
<title>Combining Graph and Epidemiological Models</title>
<p>Since no correlation was found between the epidemiological and the graph-based features (as described in section Graph and Epidemiological Models Combination), we merge the two feature sets and we present the classification results for this combined feature set. The merging the two feature sets enables us to capture both temporal (epidemiological) and structural (graph) components of the rumor diffusion process. We employ a Gradient Boosting algorithm and we present the results for the binary and 4-class classification in <xref ref-type="table" rid="T6">Tables 6</xref>, <xref ref-type="table" rid="T7">7</xref>, respectively. In both tables, we provide the results for the graph-based feature set as well as for the merged graph and SEIZ feature set, in order to enable a clear comparison. Following the same notation as before, we present the merged feature set, which is denoted as &#x0201C;Graph &#x0002B; &#x0003C; X Hours ADT &#x0002B; SEIZ,&#x0201D; where SEIZ refers to the epidemiological feature set and X denotes the upper time threshold for the Average Diffusion Time.</p>
<p>Clearly, in both binary and 4-class classification, the results are better in absolute values when we employ the merged feature set, in contrast to the epidemiological or the graph based ones independently. Furthermore, the best accuracy is achieved at a lower time threshold when employing the merged features. Given these two results, we conclude that the combination of the epidemiological (temporal) and the graph (spatial) features enable higher accuracy for early detection as well as higher detection accuracy in absolute value.</p></sec>
<sec>
<title>Comparison With Existing Methods</title>
<p><xref ref-type="table" rid="T8">Table 8</xref> presents the comparison of our solution with alternative state-of-the-art approaches to the misinformation diffusion problem for to the 4-class classification results.</p>
<table-wrap position="float" id="T8">
<label>Table 8</label>
<caption><p>Baseline models classification results for Twitter15 and Twitter16 datasets.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Accuracy</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>Twitter15</bold></th>
<th valign="top" align="center"><bold>Twitter16</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">DTC (Castillo et al., <xref ref-type="bibr" rid="B4">2011</xref>)</td>
<td valign="top" align="center">45.4</td>
<td valign="top" align="center">46.5</td>
</tr>
<tr>
<td valign="top" align="left">SVM-RBF (Yang et al., <xref ref-type="bibr" rid="B36">2012</xref>)</td>
<td valign="top" align="center">31.8</td>
<td valign="top" align="center">32.1</td>
</tr>
<tr>
<td valign="top" align="left">SVM-TS (Ma et al., <xref ref-type="bibr" rid="B20">2015</xref>)</td>
<td valign="top" align="center">54.4</td>
<td valign="top" align="center">57.4</td>
</tr>
<tr>
<td valign="top" align="left">DTR (Zhao et al., <xref ref-type="bibr" rid="B39">2015</xref>)</td>
<td valign="top" align="center">40.9</td>
<td valign="top" align="center">41.4</td>
</tr>
<tr>
<td valign="top" align="left">GRU (Ma et al., <xref ref-type="bibr" rid="B19">2016</xref>)</td>
<td valign="top" align="center">64.6</td>
<td valign="top" align="center">63.3</td>
</tr>
<tr>
<td valign="top" align="left">RFC (Kwon et al., <xref ref-type="bibr" rid="B13">2017</xref>)</td>
<td valign="top" align="center">56.5</td>
<td valign="top" align="center">58.5</td>
</tr>
<tr>
<td valign="top" align="left">PTK (Ma et al., <xref ref-type="bibr" rid="B21">2017</xref>)</td>
<td valign="top" align="center">75.0</td>
<td valign="top" align="center">73.2</td>
</tr>
<tr>
<td valign="top" align="left">RvNN (Ma et al., <xref ref-type="bibr" rid="B22">2018</xref>)</td>
<td valign="top" align="center">72.3</td>
<td valign="top" align="center">73.7</td>
</tr>
<tr>
<td valign="top" align="left">PPC (Liu and Wu, <xref ref-type="bibr" rid="B16">2018</xref>)</td>
<td valign="top" align="center">84.2</td>
<td valign="top" align="center">86.3</td>
</tr>
<tr>
<td valign="top" align="left">Our approach</td>
<td valign="top" align="center">46.5</td>
<td valign="top" align="center">62.4</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>It is important to note that, our approach is the only approach that is location-independent and language-independent, since it is based on spreading (epidemiological) and network (graph) characteristics and does not take into account any characteristics of the content. In contrast, all existing methods consider content characteristics to some extent. Thus, the effectiveness of our approach cannot be directly compared with the results in the literature; the comparison with existing results is biased due to the consideration of content characteristics by all the methods we compare to. All the baseline approaches from the literature which are included in <xref ref-type="table" rid="T8">Table 8</xref> combine content, user characteristics, and/or propagation-based features; this enables alternative solutions for higher accuracy results than our approach.</p>
<p>The experimental results of the baseline models that appear in <xref ref-type="table" rid="T8">Table 8</xref> are drawn from the literature (Yuan, <xref ref-type="bibr" rid="B37">2019</xref>; Ke et al., <xref ref-type="bibr" rid="B12">2020</xref>). As <xref ref-type="table" rid="T8">Table 8</xref> shows, our approach achieves results that rank it in the top 7 approaches for Twitter15 and the top 5 for Twitter16, in regard to accuracy, relatively to the 10 alternatives included in <xref ref-type="table" rid="T8">Table 8</xref>.</p>
<p>Importantly, another common characteristic of existing approaches is the employment of user characteristics extracted from the Twitter API, which allows for a plethora of features. However, such an approach renders the solution platform-specific. The use of the Twitter API introduces issues, such as the deletion of accounts and the loss of the associated features over time. This can be defined as dataset aging, where the data collected at a given point in time become gradually irrelevant and/or non-reproducible. In the case of Twitter15 and Twitter16, several users and tweets are no longer available on Twitter itself, due to account suspensions and/or thread deletions. This, in turn, leads to considerable difficulty in the reproduction of results and to transferability studies. Clearly, reliance on the particular characteristics of Twitter users or specific tweets is severely impeded by dataset aging. Our approach is oblivious to and unaffected by dataset aging, because it examines only the behavioral aspects of diffusion in space and time and does not consider any other external parameters.</p></sec></sec>
<sec sec-type="conclusions" id="s5">
<title>Conclusions</title>
<p>We propose a readily deployable solution for rumor detection on social media. The proposed framework is based on the diffusion cascade of an input rumor and does not require any additional user characteristics. This enables the use of the framework for any online social media platform, in contrast to existing literature that focuses only on Twitter. Furthermore, the framework does not require complex pre-trained language models or other high complexity content-based solutions and, thus, is applicable to real life systems, with no architectural adjustments or additional overhead.</p>
<p>Our method extracts temporal and structural features, enabling classification based on those features. The temporal-based features are extracted by fitting an epidemiological model on the original information diffusion data, and the structural features are gathered from a graph modeling of the same propagation data. We used the publicly available rumor propagation datasets Twitter15 and Twitter16 to evaluate the method. Experiments for binary (True, False) and 4-class (True, Non-Rumor, False, and Unverified) classification lead to the result that graph-based features perform better than epidemiological ones, while the most accurate classification is achieved through the combination of both feature sets.</p>
<p>Our method is location-independent and language-independent, leading to an approach that can be applied without consideration of the rumor content, in contrast to existing approaches that take into account rumor content. Our method is scalable and easily adaptable to existing real life systems with no modifications or additional overhead. However, it requires further investigation, due to dataset limitations. For example, our evaluations are for rumor cascades that begin from individual users, whereas in real life, multiple users may post the same rumor independently. Our method can accommodate multiple user sources, but no real dataset contains such information. Importantly, our method may provide improved results in that case, since machine learning techniques provide higher precision for large datasets.</p>
<p>The limitations of existing datasets are leading our future work, which focuses on collection of additional rumor propagation datasets from different online social networks and collection of data about rumors spreading from multiple root users, creating parallel diffusion trees. These datasets will enable us to evaluate the effectiveness of epidemiological models further and to evaluate graph-based features in rumor modeling and detection as well as overcoming some of the limitations of the Twitter15 and Twitter16 datasets.</p></sec>
<sec sec-type="data-availability" id="s6">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="https://www.dropbox.com/s/7ewzdrbelpmrnxu/rumdetect2017.zip?dl=0">https://www.dropbox.com/s/7ewzdrbelpmrnxu/rumdetect2017.zip?dl=0</ext-link>.</p></sec>
<sec id="s7">
<title>Author Contributions</title>
<p>All authors contributed to manuscript revision, read, and approved the submitted version.</p></sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p></sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Abulaish</surname> <given-names>M.</given-names></name> <name><surname>Kumari</surname> <given-names>N.</given-names></name> <name><surname>Fazil</surname> <given-names>M.</given-names></name> <name><surname>Singh</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). <article-title>A graph-theoretic embedding-based approach for rumor detection in twitter</article-title>, in <source>Proceedings IEEE/WIC/ACM International Conference on Web Intelligence</source> (<publisher-loc>New York. NY</publisher-loc>), <fpage>466</fpage>&#x02013;<lpage>470</lpage>. <pub-id pub-id-type="doi">10.1145/3350546.3352569</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bettencourt</surname> <given-names>L. M. A.</given-names></name> <name><surname>Cintr&#x000F3;n-Arias</surname> <given-names>A.</given-names></name> <name><surname>Kaiser</surname> <given-names>D. I.</given-names></name> <name><surname>Castillo-Ch&#x000E1;vez</surname> <given-names>C.</given-names></name></person-group> (<year>2006</year>). <article-title>The power of a good idea: quantitative modeling of the spread of ideas from epidemiological models</article-title>. <source>Phys. A</source> <volume>364</volume>, <fpage>513</fpage>&#x02013;<lpage>536</lpage>. <pub-id pub-id-type="doi">10.1016/j.physa.2005.08.083</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cannarella</surname> <given-names>J.</given-names></name> <name><surname>Spechler</surname> <given-names>J. A.</given-names></name></person-group> (<year>2014</year>). <article-title>Epidemiological modeling of online social network dynamics</article-title>. arxiv[preprint].arxiv:1401.4208.</citation></ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Castillo</surname> <given-names>C.</given-names></name> <name><surname>Mendoza</surname> <given-names>M.</given-names></name> <name><surname>Poblete</surname> <given-names>B.</given-names></name></person-group> (<year>2011</year>). <article-title>Information credibility on twitter</article-title>, in <source>Proceedings of the 20th International Conference on World Wide Web</source> (<publisher-loc>Hyderabad</publisher-loc>), <fpage>675</fpage>&#x02013;<lpage>684</lpage>. <pub-id pub-id-type="doi">10.1145/1963405.1963500</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choi</surname> <given-names>J.</given-names></name> <name><surname>Ko</surname> <given-names>T.</given-names></name> <name><surname>Choi</surname> <given-names>Y.</given-names></name> <name><surname>Byun</surname> <given-names>H.</given-names></name> <name><surname>Kim</surname> <given-names>C. K.</given-names></name></person-group> (<year>2021</year>). <article-title>Dynamic graph convolutional networks with attention mechanism for rumor detection on social media</article-title>. <source>PLoS ONE</source> <volume>16</volume>, <fpage>e0256039</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0256039</pub-id><pub-id pub-id-type="pmid">34407111</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goel</surname> <given-names>S.</given-names></name> <name><surname>Anderson</surname> <given-names>A.</given-names></name> <name><surname>Hofman</surname> <given-names>J.</given-names></name> <name><surname>Watts</surname> <given-names>D. J.</given-names></name></person-group> (<year>2016</year>). <article-title>The structural virality of online diffusion</article-title>. <source>Manag. Sci</source>. <volume>62</volume>, <fpage>180</fpage>&#x02013;<lpage>196</lpage>. <pub-id pub-id-type="pmid">32208431</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gonz&#x000E1;lez-Bail&#x000F3;n</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>N.</given-names></name></person-group> (<year>2013</year>). <source>Networked Discontent: The Anatomy of Protest Campaigns in Social Media</source>. <publisher-loc>Amsterdam</publisher-loc>: <publisher-name>Elsevier</publisher-name>.</citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gross</surname> <given-names>T.</given-names></name> <name><surname>D&#x00027;Lima</surname> <given-names>C. J. D.</given-names></name> <name><surname>Blasius</surname> <given-names>B.</given-names></name></person-group> (<year>2006</year>). <article-title>Epidemic dynamics on an adaptive network</article-title>. <source>Phys. Rev. Lett</source>. <volume>96</volume>, <fpage>208701</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevLett.96.208701</pub-id><pub-id pub-id-type="pmid">27639702</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>F.</given-names></name> <name><surname>Dougherty</surname> <given-names>E.</given-names></name> <name><surname>Saraf</surname> <given-names>P.</given-names></name> <name><surname>Cao</surname> <given-names>Y.</given-names></name> <name><surname>Ramakrishnan</surname> <given-names>N.</given-names></name></person-group> (<year>2013</year>). <article-title>Epidemiological modeling of news and rumors on twitter</article-title>, in <source>Proceedings of the 7th Workshop on Social Network Mining and Analysis</source> (<publisher-loc>Chicago, IL</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1145/2501025.2501027</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Zhao</surname> <given-names>L.</given-names></name> <name><surname>Dougherty</surname> <given-names>E.</given-names></name> <name><surname>Cao</surname> <given-names>Y.</given-names></name> <name><surname>Lu</surname> <given-names>C. T.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Misinformation propagation in the age of twitter</article-title>. <source>Computer</source> <volume>47</volume>, <fpage>90</fpage>&#x02013;<lpage>94</lpage>. <pub-id pub-id-type="doi">10.1109/MC.2014.361</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Katz</surname> <given-names>E.</given-names></name> <name><surname>Lazarsfeld</surname> <given-names>P. F.</given-names></name></person-group> (<year>2017</year>). <source>Personal Influence: The Part Played by People in the Flow of Mass Communications</source>. <publisher-loc>Decatur, IL</publisher-loc>: <publisher-name>Routledge</publisher-name>. <pub-id pub-id-type="doi">10.4324/9781315126234</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ke</surname> <given-names>Z.</given-names></name> <name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Zhou</surname> <given-names>C.</given-names></name> <name><surname>Sheng</surname> <given-names>J.</given-names></name> <name><surname>Silamu</surname> <given-names>W.</given-names></name> <name><surname>Guo</surname> <given-names>Q.</given-names></name></person-group> (<year>2020</year>). <article-title>Rumor detection on social media via fused semantic information and a propagation heterogeneous graph</article-title>. <source>Symmetry</source> <volume>12</volume>, <fpage>1806</fpage>. <pub-id pub-id-type="doi">10.3390/sym12111806</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kwon</surname> <given-names>S.</given-names></name> <name><surname>Cha</surname> <given-names>M.</given-names></name> <name><surname>Jung</surname> <given-names>K.</given-names></name></person-group> (<year>2017</year>). <article-title>Rumor detection over varying time windows</article-title>. <source>PLoS ONE</source> <volume>12</volume>, <fpage>e0168344</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0168344</pub-id><pub-id pub-id-type="pmid">28081135</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kwon</surname> <given-names>S.</given-names></name> <name><surname>Cha</surname> <given-names>M.</given-names></name> <name><surname>Jung</surname> <given-names>K.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name></person-group> (<year>2013</year>). <article-title>Prominent features of rumor propagation in online social media</article-title>, <source>IEEE 13th International Conference on Data Mining</source> (<publisher-loc>Dallas. TX</publisher-loc>), <fpage>1103</fpage>&#x02013;<lpage>1108</lpage>. <pub-id pub-id-type="doi">10.1109/ICDM.2013.61</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Nourbakhsh</surname> <given-names>A.</given-names></name> <name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Fang</surname> <given-names>R.</given-names></name> <name><surname>Shah</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). <article-title>Real-time rumor debunking on twitter</article-title>, in <source>Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source> (<publisher-loc>New York, NY</publisher-loc>), <fpage>1867</fpage>&#x02013;<lpage>1870</lpage>. <pub-id pub-id-type="doi">10.1145/2806416.2806651</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>Y. F. B.</given-names></name></person-group> (<year>2018</year>). <article-title>Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks</article-title>, in <source>Thirty-Second AAAI Conference on Artificial Intelligence</source>. <publisher-loc>New Orleans, LA</publisher-loc>.</citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lotfi</surname> <given-names>S.</given-names></name> <name><surname>Mirzarezaee</surname> <given-names>M.</given-names></name> <name><surname>Hosseinzadeh</surname> <given-names>M.</given-names></name> <name><surname>Seydi</surname> <given-names>V.</given-names></name></person-group> (<year>2021</year>). <article-title>Detection of rumor conversations in Twitter using graph convolutional networks</article-title>. <source>Appl. Intell</source>. <volume>51</volume>, <fpage>4774</fpage>&#x02013;<lpage>4787</lpage>. <pub-id pub-id-type="doi">10.1007/s10489-020-02036-0</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>Y. J.</given-names></name> <name><surname>Li</surname> <given-names>C. T.</given-names></name></person-group> (<year>2020</year>). <article-title>GCAN: Graph-aware co-attention networks for explainable fake news detection on social media</article-title>. arxiv[preprint].arxiv:2004.11648. <pub-id pub-id-type="doi">10.18653/v1/2020.acl-main.48</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Gao</surname> <given-names>W.</given-names></name> <name><surname>Mitra</surname> <given-names>P.</given-names></name> <name><surname>Kwon</surname> <given-names>S.</given-names></name> <name><surname>Jansen</surname> <given-names>B. J.</given-names></name> <name><surname>Wong</surname> <given-names>K. F.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Detecting rumors from microblogs with recurrent neural networks</article-title>, in <source>Pro Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)</source> (<publisher-loc>New York, NY</publisher-loc>), <fpage>3818</fpage>&#x02013;<lpage>3824</lpage>.</citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Gao</surname> <given-names>W.</given-names></name> <name><surname>Wei</surname> <given-names>Z.</given-names></name> <name><surname>Lu</surname> <given-names>Y.</given-names></name> <name><surname>Wong</surname> <given-names>K. F.</given-names></name></person-group> (<year>2015</year>). <article-title>Detect rumors using time series of social context information on microblogging websites</article-title>, in <source>Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source> (<publisher-loc>Melbourne, VIC</publisher-loc>), <fpage>1751</fpage>&#x02013;<lpage>1754</lpage>. <pub-id pub-id-type="doi">10.1145/2806416.2806607</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Gao</surname> <given-names>W.</given-names></name> <name><surname>Wong</surname> <given-names>K. F.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Detect rumors in microblog posts using propagation structure via kernel learning.&#x0201D;</article-title> in <source>Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1</source>, <fpage>708</fpage>&#x02013;<lpage>717</lpage>. <pub-id pub-id-type="doi">10.18653/v1/P17-1066</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Gao</surname> <given-names>W.</given-names></name> <name><surname>Wong</surname> <given-names>K. F.</given-names></name></person-group> (<year>2018</year>). <article-title>Rumor detection on twitter with tree-structured recursive neural networks</article-title>, in <source>Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers)</source> (<publisher-loc>Melbourne, VIC</publisher-loc>), <fpage>1980</fpage>&#x02013;<lpage>1989</lpage>. <pub-id pub-id-type="doi">10.18653/v1/P18-1184</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meng</surname> <given-names>J.</given-names></name> <name><surname>Peng</surname> <given-names>W.</given-names></name> <name><surname>Tan</surname> <given-names>P. N.</given-names></name> <name><surname>Liu</surname> <given-names>W.</given-names></name> <name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Bae</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Diffusion size and structural virality: the effects of message and network features on spreading health information on twitter</article-title>. <source>Comput. Hum. Behav</source>. <volume>89</volume>, <fpage>111</fpage>&#x02013;<lpage>120</lpage>. <pub-id pub-id-type="doi">10.1016/j.chb.2018.07.039</pub-id><pub-id pub-id-type="pmid">32288177</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Newman</surname> <given-names>M. E. J.</given-names></name></person-group> (<year>2003</year>). <article-title>The structure and function of complex networks</article-title>. <source>SIAM Rev</source>. <volume>45</volume>, <fpage>167</fpage>&#x02013;<lpage>256</lpage>. <pub-id pub-id-type="doi">10.1137/S003614450342480</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pastor-Satorras</surname> <given-names>R.</given-names></name> <name><surname>Vespignani</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Epidemic spreading in scale-free networks</article-title>. <source>Phys. Rev. Lett</source>. <volume>86</volume>, <fpage>3200</fpage>&#x02013;<lpage>3203</lpage>. <pub-id pub-id-type="doi">10.1103/PhysRevLett.86.3200</pub-id><pub-id pub-id-type="pmid">11290142</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rogers</surname> <given-names>E. M.</given-names></name> <name><surname>Singhal</surname> <given-names>A.</given-names></name> <name><surname>Quinlan</surname> <given-names>M. M.</given-names></name></person-group> (<year>2014</year>). <source>Diffusion of Innovations</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Routledge</publisher-name>.</citation></ref>
<ref id="B27">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Siwakoti</surname> <given-names>S.</given-names></name> <name><surname>Yadav</surname> <given-names>K.</given-names></name> <name><surname>Thange</surname> <given-names>I.</given-names></name> <name><surname>Bariletto</surname> <given-names>N.</given-names></name> <name><surname>Zanotti</surname> <given-names>L.</given-names></name> <name><surname>Ghoneim</surname> <given-names>A</given-names></name> <etal/></person-group>. (<year>2021</year>). <source>Localized Misinformation in a Global Pandemic: Report on COVID-19 Narratives around the World</source>. <publisher-loc>Princeton, NJ</publisher-loc>: <publisher-name>Princeton University</publisher-name>. p. 1&#x02013;68. Available online at: <ext-link ext-link-type="uri" xlink:href="https://esoc.princeton.edu/publications/localized-misinformation-global-pandemic-report-covid-19-narratives-around-world">https://esoc.princeton.edu/publications/localized-misinformation-global-pandemic-report-covid-19-narratives-around-world</ext-link></citation></ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Taxidou</surname> <given-names>I.</given-names></name> <name><surname>Fischer</surname> <given-names>P. M.</given-names></name></person-group> (<year>2014</year>). <article-title>Online analysis of information diffusion in twitter</article-title>, in <source>Proceedings of the 23rd International Conference on World Wide Web</source> (<publisher-loc>Seoul</publisher-loc>), <fpage>1313</fpage>&#x02013;<lpage>1318</lpage>. <pub-id pub-id-type="doi">10.1145/2567948.2580050</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Valente</surname> <given-names>T. W.</given-names></name></person-group> (<year>1996</year>). <article-title>Social network thresholds in the diffusion of innovations</article-title>. <source>Soc. Netw</source>. <volume>18</volume>, <fpage>69</fpage>&#x02013;<lpage>89</lpage>. <pub-id pub-id-type="doi">10.1016/0378-8733(95)00256-1</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Valente</surname> <given-names>T. W.</given-names></name> <name><surname>Fujimoto</surname> <given-names>K.</given-names></name></person-group> (<year>2010</year>). <article-title>Bridging: locating critical connectors in a network</article-title>. <source>Soc. Netw</source>. <volume>32</volume>, <fpage>212</fpage>&#x02013;<lpage>220</lpage>. <pub-id pub-id-type="doi">10.1016/j.socnet.2010.03.003</pub-id><pub-id pub-id-type="pmid">20582157</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Yang</surname> <given-names>X. Y.</given-names></name> <name><surname>Xu</surname> <given-names>K.</given-names></name> <name><surname>Ma</surname> <given-names>J. F.</given-names></name></person-group> (<year>2014</year>). <article-title>SEIR-based model for the information spreading over SNS</article-title>. <source>ACTA Electronica Sinica 42, no</source>. <volume>11</volume>:<fpage>2325</fpage>. <pub-id pub-id-type="doi">10.3969/j.issn.0372-2112.2014.11.031</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Watts</surname> <given-names>D. J.</given-names></name> <name><surname>Dodds</surname> <given-names>P. S.</given-names></name></person-group> (<year>2007</year>). <article-title>Influentials, networks, and public opinion formation</article-title>. <source>J. Consum. Res</source>. <volume>34</volume>, <fpage>441</fpage>&#x02013;<lpage>458</lpage>. <pub-id pub-id-type="doi">10.1086/518527</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Pi</surname> <given-names>D.</given-names></name> <name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Xie</surname> <given-names>M.</given-names></name> <name><surname>Cao</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Rumor detection based on propagation graph neural network with attention mechanism</article-title>. <source>Expert Syst. Appl</source>. <volume>158</volume>, <fpage>113595</fpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2020.113595</pub-id><pub-id pub-id-type="pmid">32565619</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>R.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Xing</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>Research on information dissemination model for social networking services</article-title>. <source>Int. J. Comput. Sci. Appl</source>. <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>6</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.destechpub.com/wp-content/uploads/2019/01/IJCSA-Volume-2-Issue-1-February-2013.pdf">https://www.destechpub.com/wp-content/uploads/2019/01/IJCSA-Volume-2-Issue-1-February-2013.pdf</ext-link></citation></ref>
<ref id="B35">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Xuejun</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>Research on propagation model of public opinion topics based on SCIR in microblogging</article-title>. <source>Comput. Eng. Appl</source>. <volume>8</volume>, <fpage>20</fpage>&#x02013;<lpage>26</lpage>.</citation></ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>F.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Yu</surname> <given-names>X.</given-names></name> <name><surname>Yang</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>Automatic detection of rumor on sina weibo</article-title>, in <source>Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics</source> (<publisher-loc>Beijing</publisher-loc>). <pub-id pub-id-type="doi">10.1145/2350190.2350203</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>C.</given-names></name> <name><surname>Ma</surname> <given-names>Q.</given-names></name> <name><surname>Zhou</surname> <given-names>W.</given-names></name> <name><surname>Han</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Jointly embedding the local and global relations of heterogeneous graph for rumor detection</article-title>, in <source>Proceedings of 2019 IEEE International Conference on Data Mining (ICDM)</source> (<publisher-loc>Beijing</publisher-loc>), <fpage>796</fpage>&#x02013;<lpage>805</lpage>. <pub-id pub-id-type="doi">10.1109/ICDM.2019.00090</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>C.</given-names></name> <name><surname>Ma</surname> <given-names>Q.</given-names></name> <name><surname>Zhou</surname> <given-names>W.</given-names></name> <name><surname>Han</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Early detection of fake news by utilizing the credibility of news, publishers, and users based on weakly supervised learning</article-title>. arxiv[preprint].arxiv:2012.04233. <pub-id pub-id-type="doi">10.18653/v1/2020.coling-main.475</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>Z.</given-names></name> <name><surname>Resnick</surname> <given-names>P.</given-names></name> <name><surname>Mei</surname> <given-names>O.</given-names></name></person-group> (<year>2015</year>). <article-title>Enquiring minds: early detection of rumors in social media from enquiry posts</article-title>, in <source>Proceedings of the 24th International Conference on World Wide Web</source> (<publisher-loc>Florence</publisher-loc>), <fpage>1395</fpage>&#x02013;<lpage>1405</lpage>. <pub-id pub-id-type="doi">10.1145/2736277.2741637</pub-id></citation></ref>
</ref-list>
</back>
</article>