<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">746181</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2021.746181</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>SS-RNN: A Strengthened Skip Algorithm for Data Classification Based on Recurrent Neural Networks</article-title>
<alt-title alt-title-type="left-running-head">Cao et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">SS-RNN Algorithm for Data Classification</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Cao</surname>
<given-names>Wenjie</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1322194/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Shi</surname>
<given-names>Ya-Zhou</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1222600/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Qiu</surname>
<given-names>Huahai</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Zhang</surname>
<given-names>Bengong</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1170265/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, <addr-line>Wuhan</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>School of Computer Science and Artificial Intelligence, Wuhan Textile University, <addr-line>Wuhan</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/117988/overview">Robert Friedman</ext-link>, Retired, Columbia, SC, United&#x20;States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/413567/overview">Huang Yu-an</ext-link>, Shenzhen University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/704184/overview">Hong Peng</ext-link>, South China University of Technology, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Bengong Zhang, <email>bgzhang@wtu.edu.cn</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>10</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>746181</elocation-id>
<history>
<date date-type="received">
<day>23</day>
<month>07</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>09</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Cao, Shi, Qiu and Zhang.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Cao, Shi, Qiu and Zhang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Recurrent neural networks are widely used in time series prediction and classification. However, they have problems such as insufficient memory ability and difficulty in gradient back propagation. To solve these problems, this paper proposes a new algorithm called SS-RNN, which directly uses multiple historical information to predict the current time information. It can enhance the long-term memory ability. At the same time, for the time direction, it can improve the correlation of states at different moments. To include the historical information, we design two different processing methods for the SS-RNN in continuous and discontinuous ways, respectively. For each method, there are two ways for historical information addition: 1) direct addition and 2) adding weight weighting and function mapping to activation function. It provides six pathways so as to fully and deeply explore the effect and influence of historical information on the RNNs. By comparing the average accuracy of real datasets with long short-term memory, Bi-LSTM, gated recurrent units, and MCNN and calculating the main indexes (Accuracy, Precision, Recall, and F1-score), it can be observed that our method can improve the average accuracy and optimize the structure of the recurrent neural network and effectively solve the problems of exploding and vanishing gradients.</p>
</abstract>
<kwd-group>
<kwd>RNN</kwd>
<kwd>LSTM</kwd>
<kwd>SS-RNN</kwd>
<kwd>data classification</kwd>
<kwd>deep learning</kwd>
</kwd-group>
<contract-sponsor id="cn001">Foundation for Innovative Research Groups of the National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100012659</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Data classification is one of the most important tasks for different applications, such as text categorization, tone recognition, image classification, microarray gene expression, and protein structure prediction (<xref ref-type="bibr" rid="B7">Choi et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B19">Johnson and Zhang, 2017</xref>; <xref ref-type="bibr" rid="B28">Malhotra et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B2">Aggarwal et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B13">Fang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B30">Miko&#x142;ajczyk and Grochowski, 2018</xref>; <xref ref-type="bibr" rid="B22">Kerkeni et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B35">Saritas and Yasar, 2019</xref>; <xref ref-type="bibr" rid="B47">Yildirim et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B5">Chandrasekar et&#x20;al., 2020</xref>). Many types of information (e.g., language, music, and gene) can be represented as sequential data that often contains related information separated by many time steps, and these long-term dependencies are difficult to model as we must retain information from the whole sequence with greater complexity of the model (<xref ref-type="bibr" rid="B42">Trinh et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B25">Liu et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B37">Shewalkar, 2019</xref>; <xref ref-type="bibr" rid="B48">Yu et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B50">Zhao et&#x20;al., 2020</xref>).</p>
<p>With the rapid development of artificial intelligence and machine learning, the recurrent neural network (RNN) models have been gaining interest as a statistical tool for dealing with the complexities of sequential data (<xref ref-type="bibr" rid="B8">Chung et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B21">Keren and Schuller, 2016</xref>; <xref ref-type="bibr" rid="B33">Sadeghian et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B46">Yang et&#x20;al., 2019</xref>). In RNNs, the recurrent layers or hidden layers consist of recurrent cells, and whose states are affected by both past states and current input with feedback connections (<xref ref-type="bibr" rid="B48">Yu et&#x20;al., 2019</xref>). However, the errors signal back-propagated through time often suffer from exponential growth or decay, a dilemma commonly referred to as exploding or vanishing gradient. To alleviate this issue, the variants of RNNs with gating mechanisms, such as long short-term memory (LSTM) networks and gated recurrent units (GRU), have been proposed. LSTMs have been shown to learn many difficult sequential tasks effectively, including speech recognition, machine translation, trajectory prediction, and correlation analysis (<xref ref-type="bibr" rid="B11">Elman, 1990</xref>; <xref ref-type="bibr" rid="B20">Jordan, 1990</xref>; <xref ref-type="bibr" rid="B18">Hochreiter and Schmidhuber, 1997</xref>; <xref ref-type="bibr" rid="B36">Schuster and Paliwal, 1997</xref>; <xref ref-type="bibr" rid="B6">Cho et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B3">Alahi et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B51">Zhou et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B40">Su et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B16">Gupta et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B17">Hasan et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B24">Li and Cao, 2018</xref>; <xref ref-type="bibr" rid="B34">Salman et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B43">Vemula et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B45">Xu et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B46">Yang et&#x20;al., 2019</xref>). In LSTMs, the information from the past can be stored within a hidden state that is combined with the latest input at each time step, allowing long-term dependencies to be captured. In spite of this, LSTMs are unable to capture the history information far from the current time step, given that the hidden state tends to focus on the more recent past, a finding proven by <xref ref-type="bibr" rid="B50">Zhao et&#x20;al. (2020)</xref> along with a statistical perspective.</p>
<p>To address this problem, several improved RNNs have been proposed (<xref ref-type="bibr" rid="B4">Arpit et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B12">ElSaid et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B1">Abbasvandi and Nasrabadi, 2019</xref>; <xref ref-type="bibr" rid="B31">Ororbia et&#x20;al., 2019</xref>). For example, <xref ref-type="bibr" rid="B15">Gui et&#x20;al. (2019)</xref> introduced a novel reinforcement learning-based method to model the dependency relationship between words by computing the recurrent transition functions based on the skip connections. Inspired by the attention mechanism, <xref ref-type="bibr" rid="B32">Ostmeyer and Cowell (2019)</xref> developed a new kind of RNN model by calculating a recurrent weighted average (RWA) over every past processing step (not just the preceding step) to capture long-term dependencies, which performs far better than an LSTM on several challenging tasks. Based on the RWA, <xref ref-type="bibr" rid="B27">Maginnis and Richemond (2017)</xref> further presented a recurrent discounted attention (RDA) model by allowing it to discount the attention applied to previous time steps in order to carry out tasks requiring equal weighting over all information seen or tasks in which new information is more important than old. Later, <xref ref-type="bibr" rid="B10">DiPietro et&#x20;al. (2017)</xref> introduced a mixed history RNN (MIST RNN) model, a NARX (nonlinear auto-regressive with extra inputs) RNN architecture that allows direct connections from the very distant past, and showed that MIST RNNs can improve performance substantially over LSTM on tasks requiring very long-term dependencies. In addition, <xref ref-type="bibr" rid="B50">Zhao et&#x20;al. (2020)</xref> proposed the long memory filter that can be viewed as a soft attention mechanism, and proved that long-term memory can be acquired by using long memory filter. Very recently, <xref ref-type="bibr" rid="B26">Ma et&#x20;al. (2021)</xref> proposed an end-to-end time series classification architecture called Echo Memory-Augmented Network (EMAN), and which uses a learnable sparse attention mechanism to capture important historical information and incorporate it into the feature representation of the current time step. However, how to well balance the accuracy and efficiency by adding past time information is still difficult to&#x20;solve.</p>
<p>In this work, we propose a new algorithm called Strengthened Skip RNN (SS-RNN) to enhance the long-term memory ability by using multiple historical information to predict the next time information. To explore the effective method for the addition of historical information, we design six models for SS-RNN to include the past information into the current moment in continuous and discontinuous ways, respectively. For each way, the additional historical information can be directly added or added by weight weighting and function mapping. To test the SS-RNN with different models, five groups of datasets (Arrhythmia dataset, Epilepsy dataset 1, Epilepsy dataset 2, Breast cancer dataset, and Diabetes dataset) were used, and we also calculated these indexes to show the classification efficiency of our model: accuracy, precision, recall, and F1-score. From the results in <italic>Results</italic>, it is observed that Model A with <italic>skip</italic> &#x3d; 3 has the greatest influence on the network. The important thing is that our SS-RNN method can effectively solve the problems of exploding gradient and vanishing gradient (<xref ref-type="bibr" rid="B14">Gers et&#x20;al., 2000</xref>; <xref ref-type="bibr" rid="B39">Song et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B41">Tao et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B9">Das et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B29">Mayet et&#x20;al., 2020</xref>).</p>
</sec>
<sec id="s2">
<title>Theoretical Model Analysis and Data Collection</title>
<sec id="s2-1">
<title>SS-RNN Model Analysis</title>
<p>As for RNNs, the classical LSTM cell is proposed to deal with the problem of &#x201c;long-term dependencies&#x201d; by introducing a &#x201c;gate&#x201d; into the cell to improve the remembering capacity of the standard recurrent cell.<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where <inline-formula id="inf1">
<mml:math id="m2">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf2">
<mml:math id="m3">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf3">
<mml:math id="m4">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf4">
<mml:math id="m5">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf5">
<mml:math id="m6">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf6">
<mml:math id="m7">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf7">
<mml:math id="m8">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf8">
<mml:math id="m9">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are weight matrices and <inline-formula id="inf9">
<mml:math id="m10">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf10">
<mml:math id="m11">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf11">
<mml:math id="m12">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf12">
<mml:math id="m13">
<mml:mrow>
<mml:mtext>and&#xa0;</mml:mtext>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are biases of LSTM to be learned during training. The above variables can parameterize the transformations of the input gate <inline-formula id="inf13">
<mml:math id="m14">
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, forget gate <inline-formula id="inf14">
<mml:math id="m15">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and output gate <inline-formula id="inf15">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> respectively.<inline-formula id="inf16">
<mml:math id="m17">
<mml:mi>&#x3c3;</mml:mi>
</mml:math>
</inline-formula> in <xref ref-type="disp-formula" rid="e1">Eq. 1</xref> is the sigmoid function and <inline-formula id="inf17">
<mml:math id="m18">
<mml:mo>&#x22c5;</mml:mo>
</mml:math>
</inline-formula> stands for element-wise multiplication. <inline-formula id="inf18">
<mml:math id="m19">
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> denotes the cell state of LSTM. <inline-formula id="inf19">
<mml:math id="m20">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> includes the inputs of LSTM cell unit, and <inline-formula id="inf20">
<mml:math id="m21">
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the hidden layer (<xref ref-type="bibr" rid="B44">Wang et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B23">Kong et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B48">Yu et&#x20;al., 2019</xref>). One can find the mathematical models of the RNN and GRU in the <xref ref-type="sec" rid="s11">Supplementary Material</xref>.</p>
<p>Based on the LSTM model, we propose our SS-RNN model, which better utilizes historical information and could enhance the long-term memory of the model. The architecture of the SS-RNN model is shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>. It consists of a feature extractor and a three-layer strengthened skip LSTM (SS-LSTM) network (<xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>). The feature extractor is added here to process the datasets with multiple features (not time series data) like the Diabetes data and Breast cancer data used in this paper. It extracts the features of multiple feature data. Then, the output of the feature extractor is reshaped to a matrix of 32&#x2a;4 for further input into the SS-LSTM network (refer to <xref ref-type="sec" rid="s11">Supplementary Figure S55</xref> and <xref ref-type="sec" rid="s11">Supplementary Material</xref>). For standard time series datasets, such as Arrhythmia dataset, Epilepsy dataset 1, and Epilepsy dataset 2 used in this paper, we input them to SS-LSTM directly for training. <xref ref-type="fig" rid="F1">Figure&#x20;1B</xref> shows the structure of a neuron in the second layer SS-LSTM, and the information at moment of <italic>t</italic>-<italic>skip</italic> (<italic>skip</italic> is positive integer) is used to strengthen the memory of the moment&#x20;<italic>t</italic>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>
<bold>(A)</bold> The architecture of the SS-RNN model for data classification. <bold>(B)</bold> The structure of a neuron in the second SS-LSTM layer with the information of moment <italic>t-skip</italic> used to strengthen the long memory at the moment <italic>t</italic>. <bold>(C)</bold> The internal schematic diagram of an LSTM cell. <bold>(D)</bold> The structure of the second layer and the third layer of the SS-LSTM network.</p>
</caption>
<graphic xlink:href="fgene-12-746181-g001.tif"/>
</fig>
<p>In comparison with the LSTM model, by adding the&#x20;information from time <italic>t &#x2212;</italic> 1, the information from the time of <italic>t-skip</italic> is also involved in the input at current time <italic>t</italic> (i.e.,<inline-formula id="inf21">
<mml:math id="m22">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>). So, the SS-RNN mathematical model can be written as follows:<disp-formula id="e2">
<mml:math id="m23">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>where <inline-formula id="inf22">
<mml:math id="m24">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf23">
<mml:math id="m25">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf24">
<mml:math id="m26">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf25">
<mml:math id="m27">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are weight matrices for the corresponding inputs of the network activation functions, and <inline-formula id="inf26">
<mml:math id="m28">
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>k</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the output of the moment <italic>t-skip</italic>.</p>
<p>Obviously, from the above model, there are two important issues to address: 1) information of which historical moments should be involved into the current moment? 2) how should the past information be involved into the current moment? To answer these two questions, we enumerated all the methods to add the historical information to the current recurrent unit. These methods can be divided into continuous addition and discontinuous addition. The last information input consists of adding directly and weight weighting and function mapping for calculation. There are in total six models (Models A&#x2013;F used in this work) for the addition of historical information, shown in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>, and the detailed descriptions can be seen below (also, refer to the <xref ref-type="sec" rid="s11">Supplementary Material</xref>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Structures of six models (e.g., <italic>skip</italic> &#x3d; 3) used in the SS-RNN. <bold>(A)</bold> Model A, the method is discontinuous addition without weight weighting and function mapping. <bold>(B)</bold> Model B, the method is discontinuous addition with weight weighting and function mapping. <bold>(C)</bold> Model C, the method is continuous addition without weight weighting and function mapping. <bold>(D)</bold> Model D, the method is continuous addition with weight weighting and function mapping. <bold>(E)</bold> Model E, add all the information of the time by corresponding skip before; the method is discontinuous addition with weight weighting and function mapping. <bold>(F)</bold> Model F, add all the information of the time by corresponding skip before; the method is discontinuous addition without weight weighting and function mapping.</p>
</caption>
<graphic xlink:href="fgene-12-746181-g002.tif"/>
</fig>
<p>Model A The information of historical moments (<italic>t-skip</italic>) is directly added to the current moment (<italic>t</italic>) and the method is discontinuous (<xref ref-type="fig" rid="F2">Figure&#x20;2A</xref>). The mathematical expressions of the LSTM cell can be written as follows:<disp-formula id="e3">
<mml:math id="m29">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">N</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">o</mml:mi>
<mml:mi mathvariant="bold-italic">t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="bold">tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mi mathvariant="bold-italic">t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mi mathvariant="bold-italic">t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">N</mml:mi>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where <italic>skip</italic> is the order and <italic>i</italic>&#x2208;<bold>N&#x2b;</bold> (<bold>N&#x2b;</bold> is the set of positive integers); the part marked in bold indicates that the original formula has been changed. The order of Model A in <xref ref-type="fig" rid="F2">Figure&#x20;2A</xref> is 3. For example, as shown in <xref ref-type="fig" rid="F2">Figure&#x20;2A</xref>, when <italic>t</italic>&#x20;&#x3d; 4 with <italic>skip</italic> &#x3d; 3, the input of recurrent unit <italic>h</italic>
<sub>
<italic>4</italic>
</sub> comes from <italic>h</italic>
<sub>
<italic>1</italic>
</sub>, <italic>h</italic>
<sub>
<italic>3</italic>
</sub> and <italic>x</italic>
<sub>
<italic>4</italic>
</sub>, and <italic>h</italic>
<sub>
<italic>1</italic>
</sub> is directly added to the original output of <italic>h</italic>
<sub>
<italic>4</italic>
</sub> to form a new output of <italic>h</italic>
<sub>
<italic>4</italic>
</sub>. Every three moments, additional historical information is added to the current moment.</p>
<p>Model B Similar to Model A, but the past information is added to the current moment after the transformation of the activation function by a weight of <inline-formula id="inf27">
<mml:math id="m30">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="fig" rid="F2">Figure&#x20;2B</xref>). The corresponding mathematical expressions can be rewritten as:<disp-formula id="e4">
<mml:math id="m31">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">fskip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t-skip</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>x</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">iskip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t-skip</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>Q</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t-skip</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo stretchy="true">&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="italic">tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="italic">tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo stretchy="true">&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">oskip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t-skip</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="italic">tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>When <italic>t</italic>&#x20;&#x3d; 4, the input of loop unit <italic>h</italic>
<sub>
<italic>4</italic>
</sub> comes from <italic>h</italic>
<sub>
<italic>1</italic>
</sub>, <italic>h</italic>
<sub>
<italic>3</italic>,</sub> and <italic>x</italic>
<sub>
<italic>4</italic>
</sub>. After <italic>h</italic>
<sub>
<italic>1</italic>
</sub> is weighted, the function is transformed to add it to the current moment and form the output of new&#x20;<italic>h</italic>
<sub>
<italic>4</italic>
</sub>.</p>
<p>Model C It continuously adds additional historical information to the current moment in a direct addition (<xref ref-type="fig" rid="F2">Figure&#x20;2C</xref>). The corresponding mathematical expressions can be rewritten as:<disp-formula id="e5">
<mml:math id="m32">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="italic">tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>The parts in bold represent changes to the original formula. The other part is the basic formula of LSTM. For example, when <italic>t</italic>&#x20;&#x3d; 4, the input of loop unit h<sub>4</sub> comes from <italic>h</italic>
<sub>
<italic>1</italic>
</sub>, <italic>h</italic>
<sub>
<italic>3</italic>
</sub>, and <italic>x</italic>
<sub>
<italic>4</italic>
</sub>, and <italic>h</italic>
<sub>
<italic>1</italic>
</sub> is directly added to the current moment to form the output of new h<sub>4</sub>. Model C can be regarded as the general form of Model A. In both models, the additional historical information is calculated in the same way. Model A adds historical information intermittently, and Model C adds historical information continuously where every current moment adds the historical information of the moment of <italic>t-skip</italic>, and which leads to a greater computational complexity for the&#x20;model.</p>
<p>Model D It continuously adds historical information to the current moment in the form of weight weighting and function mapping (<xref ref-type="fig" rid="F2">Figure&#x20;2D</xref>). The corresponding mathematical expressions can be rewritten as:<disp-formula id="e6">
<mml:math id="m33">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">fskip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">iskip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">oskip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>
</p>
<p>When <italic>t</italic>&#x20;&#x3d; 4, the input of loop unit <italic>h</italic>
<sub>
<italic>4</italic>
</sub> comes from <italic>h</italic>
<sub>
<italic>1</italic>
</sub>, <italic>h</italic>
<sub>
<italic>3</italic>
</sub>, and <italic>x</italic>
<sub>
<italic>4</italic>
</sub>, and <italic>h</italic>
<sub>
<italic>1</italic>
</sub> is directly added to the current moment to form the&#x20;output of new <italic>h</italic>
<sub>
<italic>4</italic>
</sub>. Model D can be regarded as the general&#x20;form of Model B. In Model B and Model D,&#x20;additional historical information is calculated in the same way, Model B adds historical information intermittently, and Model D adds historical information continuously.</p>
<p>Model E It intermittently adds additional historical information to the current moment in the form of weight weighting and function mapping (<xref ref-type="fig" rid="F2">Figure&#x20;2E</xref>). The corresponding mathematical expressions can be rewritten as:<disp-formula id="e7">
<mml:math id="m34">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">M</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">M</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">N</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">N</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">Q</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">Q</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>c</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">o</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">o</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">o</mml:mi>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>When <italic>t</italic>&#x20;&#x3d; 4, the input of loop unit <italic>h</italic>
<sub>
<italic>4</italic>
</sub> comes from <italic>h</italic>
<sub>
<italic>1</italic>
</sub>, <italic>h</italic>
<sub>
<italic>2</italic>
</sub>, <italic>h</italic>
<sub>
<italic>3</italic>
</sub>, and <italic>x</italic>
<sub>
<italic>4</italic>
</sub>, and <italic>h</italic>
<sub>
<italic>1</italic>
</sub> and <italic>h</italic>
<sub>
<italic>2</italic>
</sub> are added to the current moment through weight weighting and function mapping and constitutes the output of new&#x20;<italic>h</italic>
<sub>
<italic>4</italic>
</sub>.</p>
<p>Model F It intermittently adds historical information to the&#x20;current moment, and the historical information directly adds to the current moment (<xref ref-type="fig" rid="F2">Figure&#x20;2F</xref>). The corresponding mathematical expressions after the improvement of LSTM can be rewritten as:<disp-formula id="e8">
<mml:math id="m35">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">N</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">o</mml:mi>
<mml:mi mathvariant="bold-italic">t</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="bold-italic">tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mi mathvariant="bold-italic">t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mi mathvariant="bold-italic">t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">s</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">if</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold-italic">N</mml:mi>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="bold-italic">t</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="bold">1</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="bold-italic">skip</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
</sec>
<sec id="s2-2">
<title>Data Collection</title>
<p>To test the effect of long-term memory introduced in this work on data classification, we first conduct experiments on three time series datasets (i.e.,&#x20;Arrhythmia dataset, Epilepsy dataset 1, and Epilepsy dataset 2). In addition, due to the potential correlations between the characteristics in some non-time-series biomedical data, we also perform experiments on two disease datasets: Diabetes dataset and Breast cancer dataset, to validate the ability of the model on non-time series data classification. Each dataset was split into training and testing set using the standard split. <xref ref-type="table" rid="T1">Table&#x20;1</xref> summarizes the details of the five datasets.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Description of five datasets used in this work for data classification.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Datasets</th>
<th align="center">Source</th>
<th align="center">Size</th>
<th align="center">Train<xref ref-type="table-fn" rid="Tfn2">
<sup>a</sup>
</xref>
</th>
<th align="center">Test<xref ref-type="table-fn" rid="Tfn2">
<sup>a</sup>
</xref>
</th>
<th align="center">Classes<xref ref-type="table-fn" rid="Tfn1">
<sup>b</sup>
</xref>
</th>
<th align="center">Sources</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Arrhythmia dataset</td>
<td align="left">MIT-BIH Arrhythmia Database</td>
<td align="center">109,338</td>
<td align="center">87,470</td>
<td align="center">21,868</td>
<td align="center">5</td>
<td align="left">
<ext-link ext-link-type="uri" xlink:href="https://www.physionet.org/content/mitdb/1.0.0/">https://www.physionet.org/content/mitdb/1.0.0/</ext-link>
</td>
</tr>
<tr>
<td align="left">Epilepsy dataset 1</td>
<td align="left">Epileptologie Bonn</td>
<td align="center">11,500</td>
<td align="center">9,200</td>
<td align="center">2,300</td>
<td align="center">5</td>
<td align="left">
<ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml/datasets/Epileptic+Seizure+Recognition">https://archive.ics.uci.edu/ml/datasets/Epileptic&#x2b;Seizure&#x2b;Recognition</ext-link>
</td>
</tr>
<tr>
<td align="left">Epilepsy dataset 2</td>
<td align="left">CHB-MIT Scalp EEG Database</td>
<td align="center">361,377</td>
<td align="center">289,102</td>
<td align="center">72,275</td>
<td align="center">2</td>
<td align="left">
<ext-link ext-link-type="uri" xlink:href="https://physionet.org/content/chbmit/1.0.0/">https://physionet.org/content/chbmit/1.0.0/</ext-link>
</td>
</tr>
<tr>
<td align="left">Diabetes dataset</td>
<td align="left">UC Irvine Machine Learning Repository</td>
<td align="center">520</td>
<td align="center">416</td>
<td align="center">104</td>
<td align="center">2</td>
<td align="left">
<ext-link ext-link-type="uri" xlink:href="http://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset">http://archive.ics.uci.edu/ml/datasets/Early&#x2b;stage&#x2b;diabetes&#x2b;risk&#x2b;prediction&#x2b;dataset</ext-link>
</td>
</tr>
<tr>
<td align="left">Breast cancer dataset</td>
<td align="left">UC Irvine Machine Learning Repository</td>
<td align="center">116</td>
<td align="center">93</td>
<td align="center">23</td>
<td align="center">2</td>
<td align="left">
<ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra">https://archive.ics.uci.edu/ml/datasets/Breast&#x2b;Cancer&#x2b;Coimbra</ext-link>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>
<sup>a</sup>Sizes of the training and testing sets for the five datasets, respectively.</p>
</fn>
<fn id="Tfn1">
<label>b</label>
<p>Number of classes of five datasets.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Arrhythmia dataset It contains 109,338 recordings of 48&#x20;half-hour excerpts of two-channel ambulatory ECG, and which have been divided into five classes based on the heart rate: one normal and four abnormal.</p>
<p>Epilepsy datasets There are two well-known Epilepsy datasets used in this work. One is from the Department of Epilepsy at the University of Bonn, Germany, and which contains five categories (A&#x2013;E) of 100&#x20;single-channel 23.6-s segments of electroencephalogram (EEG) signals (11,500 in total). The other is from Children&#x2019;s Hospital Boston including 361,377 EEG recordings from 22 epileptic patients and these recordings have been grouped into two classes.</p>
<p>Diabetes dataset It contains 16 features, such as age, sex, and polyuria, and the source is from the University of California at Irvine Machine Learning Repository. This has been collected using direct questionnaires from the patients at Sylhet Diabetes Hospital in Sylhet, Bangladesh, and approved by a medical doctor.</p>
<p>Breast cancer dataset It contains nine features from UC Irvine Machine Learning Repository (see <xref ref-type="sec" rid="s11">Supplementary Material</xref>).</p>
<p>The original five datasets are available through the websites listed in <xref ref-type="table" rid="T1">Table&#x20;1</xref>, and we also rearranged them for the convenience of use, and which can be found in the <xref ref-type="sec" rid="s11">Supplementary Material</xref>.</p>
</sec>
<sec id="s2-3">
<title>Evaluation Index</title>
<p>For the classification task, the models are evaluated by the classification accuracy, precision, recall, and F1-score, which are defined by the confusion matrix. It is one of the most intuitive metrics used for evaluating the performance and accuracy of the model in machine learning, especially used for classification problems. The terms associated with confusion matrix can be defined as follows: True positives (TP), when the actual class of the data point is 1 and the predicted outcome is also 1. True negatives (TN) are the cases when the actual class of the given data point is 0 and the predicted result is also 0. False positives (FP) are the cases when the actual class of the data point is 0 and the predicted outcome is 1, which can be assumed that the model predicts incorrectly as the actual class is positive. False negatives (FN) are the cases when the actual class should be 1 and the predicted outcome is 0, where the model predicts incorrectly as negative. The forms are expressed as follows:<disp-formula id="e9">
<mml:math id="m36">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>
</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec id="s3-1">
<title>The Workflow of the SS-RNN</title>
<p>In SS-RNN, the information of historical moments (e.g., <italic>t</italic>-<italic>skip</italic>) can be added to the current moment (i.e.,&#x20;<italic>t</italic>) to accurately classify sequential data with long-term dependences. To determine the best methods of the past information addition and verify the effectiveness of the SS-RNN model, we did six groups of comparison experiments on five datasets, respectively. The six different models (Models A&#x2013;F) and five datasets are shown in <italic>Theoretical Model Analysis and Data Collection</italic>. For each experiment, there are three steps: data preprocessing, training, and&#x20;test.</p>
<p>Data preprocessing Outliers and missing values often appear in the dataset, whereas the network model cannot process those data samples. We first fill the missing values with the mean of the variable and delete the samples with outliers, which can be judged from the method of Anomaly Detection. The pre-processed time series datasets (e.g., Arrhythmia and Epilepsy datasets in <xref ref-type="table" rid="T1">Table&#x20;1</xref>) can be directly input into the SS-LSTM model. However, for the non-time series data with multiple features and different dimensions (e.g., Diabetes and Breast cancer datasets used in this work), after the above preprocessing, it needs to be fed into the feature extractor to obtain a new set of data and their characters, which can be further transformed to a matrix of 32&#x2a;4 as input into the SS-LSTM. Taking the Diabetes dataset as an example, we also give detailed descriptions in the <xref ref-type="sec" rid="s11">Supplementary Material</xref>.</p>
<p>Training For each dataset (<xref ref-type="table" rid="T1">Table&#x20;1</xref>), the training set is used to train the model. The optimized parameters of the network are as follows: dimensions of the network are 128, 64, 32, and 16, respectively (<xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>). For each dataset, the configuration of the SS-LSTM model is implemented in Pytorch using <xref ref-type="disp-formula" rid="e3">Eqs 3</xref>&#x2013;<xref ref-type="disp-formula" rid="e8">8</xref>, and the dimensions for the three layers of the SS-LSTM model are&#x20;18, 8, and 5, respectively. The activation function is <italic>tanh</italic>, and the training algorithm is stochastic gradient descent with a learning rate of 0.01 and a training epoch of 50. Here, we used the cross-entropy loss as the objective function for training the network:<disp-formula id="e10">
<mml:math id="m37">
<mml:mrow>
<mml:mtext>Loss</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>log</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="true">&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>where <inline-formula id="inf28">
<mml:math id="m38">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the true value, and <inline-formula id="inf29">
<mml:math id="m39">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="true">&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is corresponding predicted value. The batch size of each dataset after fine-tuning is shown in the <xref ref-type="sec" rid="s11">Supplementary Material</xref>.</p>
<p>Test For each dataset, 25 different comparative experiments were performed using different structures of LSTM. One of the experiments adopted the ordinary LSTM, while the others used SS-LSTM with different models (i.e.,&#x20;Models A&#x2013;F in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>). For each model, the values of <italic>skip</italic> were set as 2, 3, 4 and 5, respectively. Furthermore, we also used the other classical models (LSTM, GRU and Bi-LSTM) to create the classification set for three of the datasets (i.e.,&#x20;Arrhythmia, Epilepsy 1 and Diabetes), and made a comparison with our SS-RNN&#x20;model.</p>
</sec>
<sec id="s3-2">
<title>Testing the Models With Data</title>
<p>To test the effect of the addition of past information on the data classification, we used our network with six different SS-LSTM models (Models A&#x2013;F; <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>) to classify the data for five datasets (<xref ref-type="table" rid="T1">Table&#x20;1</xref>), respectively. For each SS-LSTM model, different values of <italic>skip</italic> (e.g., <italic>skip</italic> &#x3d; 2, 3, 4, 5) were used. As shown in <xref ref-type="sec" rid="s11">Supplementary Figures S1&#x2013;S10, S12&#x2013;S17, S19&#x2013;S24, S26&#x2013;S31</xref> in the <xref ref-type="sec" rid="s11">Supplementary Material</xref>, the loss functions calculated by <xref ref-type="disp-formula" rid="e10">Eq. 10</xref> for the experiments in this work always converged before 50 steps, indicating that 50 steps are sufficient for the training and test processes.</p>
<sec id="s3-2-1">
<title>Epilepsy Dataset 1</title>
<p>For the Epilepsy dataset, 19,200 samples were used to train our SS-LSTM, which were further tested by the rest of the samples. The loss functions show that Models A and C are more stable than the others, and the loss value of the training set is consistent with the test set, indicating that no overfitting has occurred (<xref ref-type="fig" rid="F3">Figure&#x20;3</xref>, <xref ref-type="sec" rid="s11">Supplementary Figures S1&#x2013;S4</xref>). As shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, the value of loss function of Model A is also the lowest among all models (<xref ref-type="fig" rid="F4">Figure&#x20;4A</xref>), and the predicted accuracy of Model A is &#x223c;47%, which is not only higher than that (&#x223c;37%) of the original LSTM, but also significantly better than those predicted by SS-LSTM with other models (e.g., &#x223c;40% of Model C with <italic>skip</italic> &#x3d; 4, i.e.,&#x20;Model C-4). The results indicate that the past information (<italic>t-skip</italic>) directly added to the current moment (<italic>t</italic>) could effectively improve the classified accuracy on the Epilepsy dataset 1. However, Model C with <italic>skip</italic> &#x3d; 2 has the lowest predicted accuracy (&#x223c;24%), and which could suggest that Model C is not suitable for processing this dataset.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>The change curves of loss function between train set and test set with different skip value of Epilepsy dataset 1. e.g., Model A-2 is Model A <bold>(A)</bold> with <italic>skip</italic>&#x20;&#x3d; 2, Model C-4 is Model C <bold>(B)</bold> with <italic>skip</italic> &#x3d; 4.</p>
</caption>
<graphic xlink:href="fgene-12-746181-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>The loss functions <bold>(A)</bold> and predicted accuracy <bold>(B)</bold> of each model for classification of Epilepsy dataset 1. Original represents the results from the original LSTM, and others represent the results from the SS-LSTM with different models, e.g., SkipA-2 is Model A with <italic>skip</italic> &#x3d; 2 (<xref ref-type="fig" rid="F2">Figure&#x20;2</xref>). For each violin in <bold>(A, B)</bold>, the top of the black rectangle is the three-quarter digit, the bottom is the quarter digit, the white dots are the mean, and the width of the orange area is the distribution of density.</p>
</caption>
<graphic xlink:href="fgene-12-746181-g004.tif"/>
</fig>
</sec>
<sec id="s3-2-2">
<title>Diabetes Dataset</title>
<p>As shown in <xref ref-type="fig" rid="F5">Figure&#x20;5A</xref>, the predicted accuracy of most SS-LSTM models is much higher than that (&#x223c;61%) of the original LSTM model for the Diabetes dataset, and Model A with <italic>skip</italic> &#x3d; 3 has the highest accuracy (&#x223c;98%). The accuracy of Model B is significantly and positively correlated with the order. The change curve of the loss function of each model in the training process is also shown in the <xref ref-type="sec" rid="s11">Supplementary Material</xref>.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>The predicted accuracy of each model for classification on Diabetes dataset <bold>(A)</bold>, Arrhythmia dataset <bold>(B)</bold>, Epilepsy dataset 2&#x20;<bold>(C)</bold> and Breast cancer dataset <bold>(D)</bold>.</p>
</caption>
<graphic xlink:href="fgene-12-746181-g005.tif"/>
</fig>
</sec>
<sec id="s3-2-3">
<title>Arrhythmia Dataset and Other Datasets</title>
<p>For the Arrhythmia dataset, the long-term memory in Model A can markedly improve the classification accuracy, e.g., the ACC increases from &#x223c;82 to 94%, as <italic>skip</italic> increasing from 2 to 5 (<xref ref-type="fig" rid="F5">Figure&#x20;5B</xref>). Surprisingly, although the large values of <italic>skip</italic> can also be helpful for Model F, the ACC of Model F with any <italic>skip</italic> values is obviously lower than the original LSTM. Furthermore, in the other models (i.e.,&#x20;Model B, Model D, and Model E), no matter how the <italic>skip</italic> changes, the accuracy stays at the same level (&#x223c;82%), which suggests that the addition of past information could be a burden for the RNN and has no positive effect on data classification (<xref ref-type="fig" rid="F5">Figure&#x20;5B</xref>). The structure of Models B and D, which both have a common characteristic that adopts the same way of weight weighting and function mapping to put the historical information added to the current time damage the dynamic performance of the RNN. So, this is not an ideal method for the Arrhythmia dataset.</p>
<p>We also have experiments on Epilepsy dataset 2 and Breast cancer dataset, and the relevant results and analysis are shown in the <xref ref-type="sec" rid="s11">Supplementary Material</xref>.</p>
</sec>
</sec>
<sec id="s3-3">
<title>Comparison Results With Other Models</title>
<p>Furthermore, we also made classifications for three of the datasets (Arrhythmia dataset, Epilepsy dataset 1, and Diabetes dataset) by using the classical networks such as LSTM, GRU, and Bi-LSTM with default parameters of the torch.nn module, and compared the results with that from the SS-RNN with Model A of <italic>skip</italic> &#x3d; 3 (<xref ref-type="fig" rid="F6">Figure&#x20;6</xref>; <xref ref-type="table" rid="T2">Tables 2</xref>&#x2013;<xref ref-type="table" rid="T4">4</xref>). We also show the simulation results of other indexes with Model A of <italic>skip</italic> &#x3d; 3 and Model C of <italic>skip</italic> &#x3d; 5 in the <xref ref-type="sec" rid="s11">Supplementary Material</xref>.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Comparisons between LSTM, GRU, Bi-LSTM, and our SS-RNN (SkipA-3). <bold>(A)</bold> Accuracy of the Arrhythmia dataset. <bold>(B)</bold> Accuracy of the Epilepsy dataset 1. <bold>(C)</bold> Accuracy of the Diabetes dataset.</p>
</caption>
<graphic xlink:href="fgene-12-746181-g006.tif"/>
</fig>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Arrhythmia dataset classification comparison results with LSTM, GRU, and Bi-LSTM.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Accuracy</th>
<th align="center">Precision</th>
<th align="center">Recall</th>
<th align="center">F1-score</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">LSTM</td>
<td align="char" char=".">0.9181</td>
<td align="char" char=".">0.9564</td>
<td align="char" char=".">0.9380</td>
<td align="char" char=".">0.9316</td>
</tr>
<tr>
<td align="left">GRU</td>
<td align="char" char=".">0.9380</td>
<td align="char" char=".">0.9660</td>
<td align="char" char=".">0.9380</td>
<td align="char" char=".">0.9479</td>
</tr>
<tr>
<td align="left">Bi-LSTM</td>
<td align="char" char=".">0.9274</td>
<td align="char" char=".">0.9596</td>
<td align="char" char=".">0.9274</td>
<td align="char" char=".">0.9384</td>
</tr>
<tr>
<td align="left">SS-RNN(SkipA-3)</td>
<td align="char" char=".">0.9524</td>
<td align="char" char=".">0.9670</td>
<td align="char" char=".">0.9524</td>
<td align="char" char=".">0.9573</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Epilepsy dataset 1 classification comparison results with LSTM, GRU, and Bi-LSTM.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Accuracy</th>
<th align="center">Precision</th>
<th align="center">Recall</th>
<th align="center">F1-score</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">LSTM</td>
<td align="char" char=".">0.7178</td>
<td align="char" char=".">0.7190</td>
<td align="char" char=".">0.7178</td>
<td align="char" char=".">0.2506</td>
</tr>
<tr>
<td align="left">GRU</td>
<td align="char" char=".">0.7226</td>
<td align="char" char=".">0.7240</td>
<td align="char" char=".">0.7226</td>
<td align="char" char=".">0.2540</td>
</tr>
<tr>
<td align="left">Bi-LSTM</td>
<td align="char" char=".">0.1926</td>
<td align="char" char=".">0.0371</td>
<td align="char" char=".">0.1926</td>
<td align="char" char=".">0.3276</td>
</tr>
<tr>
<td align="left">SS-RNN(SkipA-3)</td>
<td align="char" char=".">0.7126</td>
<td align="char" char=".">0.7115</td>
<td align="char" char=".">0.7126</td>
<td align="char" char=".">0.3834</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Diabetes dataset classification comparison results with LSTM, GRU, and Bi-LSTM.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Accuracy</th>
<th align="center">Precision</th>
<th align="center">Recall</th>
<th align="center">F1-score</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">LSTM</td>
<td align="char" char=".">0.6154</td>
<td align="char" char=".">0.3787</td>
<td align="char" char=".">0.6154</td>
<td align="char" char=".">0.4689</td>
</tr>
<tr>
<td align="left">GRU</td>
<td align="char" char=".">0.8556</td>
<td align="char" char=".">0.8832</td>
<td align="char" char=".">0.8558</td>
<td align="char" char=".">0.8467</td>
</tr>
<tr>
<td align="left">Bi-LSTM</td>
<td align="char" char=".">0.6154</td>
<td align="char" char=".">0.3787</td>
<td align="char" char=".">0.6154</td>
<td align="char" char=".">0.4689</td>
</tr>
<tr>
<td align="left">SS-RNN(SkipA-3)</td>
<td align="char" char=".">0.9808</td>
<td align="char" char=".">0.9817</td>
<td align="char" char=".">0.9808</td>
<td align="char" char=".">0.9809</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>From <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>, it shows that our SS-RNN method can improve the classification accuracy as compared to the classical methods. Also, from <xref ref-type="table" rid="T2">Tables 2</xref>&#x2013;<xref ref-type="table" rid="T4">4</xref>, it can be found that the other main indexes are almost improved. At the same time, we compared our method with the latest methods RNN, RNN&#x2b;GRU, RNN&#x2b;LSTM, and MCNN (<xref ref-type="bibr" rid="B49">Zhang et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B38">Singh et&#x20;al., 2018</xref>) with the same Arrhythmia dataset. The result is shown in <xref ref-type="table" rid="T5">Table&#x20;5</xref>, which also indicates that our SS-RNN method can improve the classification accuracy.</p>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Arrhythmia dataset classification comparison results with RNN, RNN&#x2b;GRU, RNN&#x2b;LSTM, and MCNN.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Accuracy</th>
<th align="center">Recall (Sensitivity)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">RNN</td>
<td align="char" char=".">0.8540</td>
<td align="char" char=".">0.8060</td>
</tr>
<tr>
<td align="left">RNN GRU</td>
<td align="char" char=".">0.8250</td>
<td align="char" char=".">0.7890</td>
</tr>
<tr>
<td align="left">RNN LSTM</td>
<td align="char" char=".">0.8810</td>
<td align="char" char=".">0.9240</td>
</tr>
<tr>
<td align="left">MCNN</td>
<td align="char" char=".">0.9110</td>
<td align="char" char=".">NA<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
</tr>
<tr>
<td align="left">SS-RNN(SkipA-3)</td>
<td align="char" char=".">0.9524</td>
<td align="char" char=".">0.9524</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="Tfn2">
<label>a</label>
<p>NA means that it is not available in the original&#x20;paper.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>In fact, as a variant of LSTM, GRU reduces the forget gate and input gate, and adds the update gate. GRU has simpler internal structure and less parameters than LSTM, which reduces the risk of overfitting. Although LSTM and GRU partially solve the problem of the vanishing gradient of the RNN, the information loss is still very severe in the propagation of a very long distance. Bi-LSTM, namely, bi-directional LSTM, does not change any internal structure of LSTM itself. LSTM is applied twice in different directions, and then the LSTM results obtained twice are spliced as the final output. For datasets with both forward and backward dependencies, this method can enhance the correlation between data and improve the efficiency of the model. It is often used to capture some specific pre or post features of language and syntax in natural language processing. However, in biological datasets like ECG and EEG, the progression and onset of diseases are irreversible, so the relationship between data in the reverse time direction is not of practical significance for disease classification. In addition, excessive number of parameters may lead to overfitting in network training, so the Bi-LSTM model is not suitable&#x20;here.</p>
<p>The long-term memory ability of LSTM and GRU is weak, and with the increase of the time step, the farther away the memory, the more information the model forgot and the less it remembered. Our model has enhanced the information in distant moments, which makes up for the defect of long-term dependence in RNNs. Therefore, our SS-RNN method can improve the precision, recall, F1-score, and finally improve the classification accuracy of sequential data compared with other models.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>The performance of the loss function is different between five datasets and six models. Model A has the best performance. In Epilepsy dataset 1, Model A has the lowest loss function and the highest accuracy of all models. In the Diabetes dataset, the loss function of Model A-3 is the lowest and the accuracy is the highest. In the Arrhythmia dataset, the performance of the loss function of each model is different, and Model A has the best performance, in which the loss function is negatively correlated with the order and the accuracy is positively correlated with the order. In Epilepsy dataset 2, overfitting occurred due to the convergence effect of each model. Therefore, Model A did not show good performance, Model C had the lowest loss function, and Model D-2 had the highest accuracy. As for the Breast cancer dataset, the training effect of the network is not optimal because the data scale is too small, and the average loss of Model D-5 is the lowest and the accuracy is the highest. There is a certain relationship between order and accuracy in each&#x20;model.</p>
<p>Furthermore, we calculated the average accuracy of the six models by our method on the Arrhythmia dataset, Epilepsy dataset 1, and Diabetes dataset. Comparing the results with the original LSTM, GRU, and Bi-LSTM models, the average accuracy is improved. It is shown in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>. It means that our SS-RNN method is generally useful. We also compared the average accuracy of the six models by our method with original LSTM with five groups of datasets. It can be found in <xref ref-type="sec" rid="s11">Supplementary Figure S54</xref>. According to the results in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>, it shows that Model A is the best with the highest average accuracy among the six ways of adding historical information. By comparing the differences of various adding methodologies, it can be found that the discontinuous adding method is better than the continuous adding method, while the direct adding method is better than the method of weight weighting and function mapping. It does not mean that more historical information is better. Adding more historical information did not improve the memory ability of the RNN. Different data have different dependence intensity, so the same model has different modeling performance for different datasets.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Average accuracy of the Arrhythmia dataset, Epilepsy dataset 1, and Diabetes dataset between the original LSTM, GRU, and Bi-LSTM models and our six models without batch size&#x20;tuned.</p>
</caption>
<graphic xlink:href="fgene-12-746181-g007.tif"/>
</fig>
</sec>
<sec sec-type="conclusion" id="s5">
<title>Conclusion</title>
<p>In order to effectively capture the long-term dependencies in sequential data, we propose the SS-RNN, which allows the historical information to add to the moment by different methods. We designed six models with different <italic>skips</italic> to simulate the possible patterns of the addition of past information, and tested them on five disease-related datasets with different sizes and data types. By comparing our method with the original LSTM, GRU, and Bi-LSTM and the recent methods RNN&#x2b;GRU, RNN&#x2b;LSTM, and MCNN, the simulation results suggest that our method can significantly improve the accuracy of sequential data classification. Furthermore, the best method to add the past information could be the method discontinuous addition without weight weighting and function mapping. It can effectively solve the problems of exploding gradient and vanishing gradient. There is a certain correlation between the model performance and the&#x20;order.</p>
<p>The SS-RNN provides a new idea to improve the classification accuracy of sequential data by optimizing the LSTM model. Therefore, users can also optimize their own network model by adding the SS-RNN module, which is of great significance for the classification diagnosis and precision treatment of diseases. Although the SS-RNN generally has a good classification effect for large datasets, the performance of the model for small sample datasets needs to be further improved. In the future, few-shot learning could be further introduced to train the SS-RNN network to improve the classification efficiency for small sample datasets. The code of the SS-RNN model can be available through github (<ext-link ext-link-type="uri" xlink:href="https://github.com/WTU-RCNS-Bioinformatics-Lab/SS-RNN">https://github.com/WTU-RCNS-Bioinformatics-Lab/SS-RNN</ext-link>).</p>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="sec" rid="s11">Supplementary Material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>BZ, WC, and YS designed the research. WC, HQ, and YS performed the experiments. WC, HQ, and BZ analyzed the data. WC, YS, and BZ wrote the manuscript. All authors discussed the results and reviewed the manuscript.</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>This work was supported by grants from the National Natural Science Foundation of China (No. 11971367).</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>We are grateful to Professors Jian Zhang (Nanjing University) and Wenbing Zhang (Wuhan University) for valuable discussions.</p>
</ack>
<sec id="s11">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2021.746181/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2021.746181/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abbasvandi</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Nasrabadi</surname>
<given-names>A. M.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A Self-Organized Recurrent Neural Network for Estimating the Effective Connectivity and its Application to EEG Data</article-title>. <source>Comput. Biol. Med.</source> <volume>110</volume>, <fpage>93</fpage>&#x2013;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1016/j.compbiomed.2019.05.012</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aggarwal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gupta</surname>
<given-names>D. K.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A Review of Different Text Categorization Techniques</article-title>. <source>Int. J.&#x20;Eng. Technol. (Ijet)</source> <volume>7</volume> (<issue>3.8</issue>), <fpage>11</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.14419/ijet.v7i3.8.15210</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alahi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Goel</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Ramanathan</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Robicquet</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fei-Fei</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Savarese</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Social Lstm: Human Trajectory Prediction in Crowded Spaces</article-title>,&#x201d; in <conf-name>2016 Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)</conf-name>, (<publisher-loc>Las Vegas, NV, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>961</fpage>&#x2013;<lpage>971</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2016.110</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arpit</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kanuparthi</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kerg</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Ke</surname>
<given-names>N. R.</given-names>
</name>
<name>
<surname>Mitliagkas</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>H-Detach: Modifying the LSTM Gradient towards Better Optimization</article-title>. <comment>arXiv [Preprint]. Available at: arXiv:1810.03023</comment>. </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chandrasekar</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Sureshkumar</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>T. S.</given-names>
</name>
<name>
<surname>Shanmugapriya</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Disease Prediction Based on Micro Array Classification Using Deep Learning Techniques</article-title>. <source>Microprocessors and Microsystems</source> <volume>77</volume>, <fpage>103189</fpage>. <pub-id pub-id-type="doi">10.1016/j.micpro.2020.103189</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Cho</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Van Merri&#xeb;nboer</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Gulcehre</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bahdanau</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bougares</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Schwenk</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation</article-title>. <comment>arXiv [Preprint]. Available at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1406.1078">https://arxiv.org/abs/1406.1078</ext-link>
</comment>. <pub-id pub-id-type="doi">10.3115/v1/d14-1179</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Fazekas</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Sandler</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cho</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Convolutional Recurrent Neural Networks for Music Classification</article-title>,&#x201d; in <conf-name>2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</conf-name> (<publisher-loc>New Orleans, LA, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2392</fpage>&#x2013;<lpage>2396</lpage>. <pub-id pub-id-type="doi">10.1109/ICASSP.2017.7952585</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chung</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kastner</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Dinh</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Goel</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Courville</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>A Recurrent Latent Variable Model for Sequential Data</article-title>. <source>Adv. Neural Inf. Process. Syst.</source> <volume>28</volume>, <fpage>2980</fpage>&#x2013;<lpage>2988</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://proceedings.neurips.cc/paper/2015/file/b618c3210e934362ac261db280128c22-Paper.pdf">https://proceedings.neurips.cc/paper/2015/file/b618c3210e934362ac261db280128c22-Paper.pdf</ext-link>
</comment>. </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Das</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pratama</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ong</surname>
<given-names>Y. S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Skip-Connected Evolving Recurrent Neural Network for Data Stream Classification under Label Latency Scenario</article-title>. <source>Assoc. Adv. Artif. Intelligence</source> <volume>34</volume> (<issue>04</issue>), <fpage>3717</fpage>&#x2013;<lpage>3724</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v34i04.5781</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>DiPietro</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Rupprecht</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Navab</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hager</surname>
<given-names>G. D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies</article-title>. <comment>arXiv [Preprint]. Available at: arXiv:1702.07805</comment>. </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elman</surname>
<given-names>J.&#x20;L.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>Finding Structure in Time</article-title>. <source>Cogn. Sci.</source> <volume>14</volume> (<issue>2</issue>), <fpage>179</fpage>&#x2013;<lpage>211</lpage>. <pub-id pub-id-type="doi">10.1207/s15516709cog1402_1</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>ElSaid</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>El Jamiy</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Higgins</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wild</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Desell</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Optimizing Long Short-Term Memory Recurrent Neural Networks Using Ant colony Optimization to Predict Turbine Engine Vibration</article-title>. <source>Appl. Soft Comput.</source> <volume>73</volume>, <fpage>969</fpage>&#x2013;<lpage>991</lpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2018.09.013</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Shang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>MUFOLD-SS: New Deep Inception-Inside-Inception Networks for Protein Secondary Structure Prediction</article-title>. <source>Proteins</source> <volume>86</volume> (<issue>5</issue>), <fpage>592</fpage>&#x2013;<lpage>598</lpage>. <pub-id pub-id-type="doi">10.1002/prot.25487</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gers</surname>
<given-names>F. A.</given-names>
</name>
<name>
<surname>Schmidhuber</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Cummins</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Learning to Forget: Continual Prediction with LSTM</article-title>. <source>Neural Comput.</source> <volume>12</volume> (<issue>10</issue>), <fpage>2451</fpage>&#x2013;<lpage>2471</lpage>. <pub-id pub-id-type="doi">10.1162/089976600300015015</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gui</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Long Short-Term Memory with Dynamic Skip Connections</article-title>. <source>Assoc. Adv. Artif. Intelligence</source> <volume>33</volume> (<issue>01</issue>), <fpage>6481</fpage>&#x2013;<lpage>6488</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33016481</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gupta</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fei-Fei</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Savarese</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Alahi</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Social gan: Socially Acceptable Trajectories with Generative Adversarial Networks</article-title>,&#x201d; in <conf-name>2018 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</conf-name>, (<publisher-loc>Salt Lake City, UT, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2255</fpage>&#x2013;<lpage>2264</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00240</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hasan</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Setti</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Tsesmelis</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Del Bue</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Galasso</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Cristani</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Mx-lstm: Mixing Tracklets and Vislets to Jointly Forecast Trajectories and Head Poses</article-title>,&#x201d; in <conf-name>2018 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</conf-name>, (<publisher-loc>Salt Lake City, UT, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>6067</fpage>&#x2013;<lpage>6076</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00635</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hochreiter</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schmidhuber</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Long Short-Term Memory</article-title>. <source>Neural Comput.</source> <volume>9</volume> (<issue>8</issue>), <fpage>1735</fpage>&#x2013;<lpage>1780</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnson</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Deep Pyramid Convolutional Neural Networks for Text Categorization</article-title>. <source>Proc. 55th Annu. Meet. Assoc. Comput. Linguistics</source> <volume>1</volume>, <fpage>562</fpage>&#x2013;<lpage>570</lpage>. <pub-id pub-id-type="doi">10.18653/v1/P17-1052</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Jordan</surname>
<given-names>M. I.</given-names>
</name>
</person-group> (<year>1990</year>). &#x201c;<article-title>Attractor Dynamics and Parallelism in a Connectionist Sequential Machine</article-title>,&#x201d; in <conf-name>Artificial neural networks: concept learning</conf-name>. <fpage>112</fpage>&#x2013;<lpage>127</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://dl.acm.org/doi/abs/10.5555/104134.104148">https://dl.acm.org/doi/abs/10.5555/104134.104148</ext-link>
</comment>. </citation>
</ref>
<ref id="B21">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Keren</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Schuller</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data</article-title>,&#x201d; in <conf-name>2016 International Joint Conference on Neural Networks (IJCNN)</conf-name> (<publisher-loc>Vancouver, BC, Canada</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>3412</fpage>&#x2013;<lpage>3419</lpage>. <pub-id pub-id-type="doi">10.1109/IJCNN.2016.7727636</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kerkeni</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Serrestou</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Mbarki</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Raoof</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Ali Mahjoub</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cleder</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Automatic Speech Emotion Recognition Using Machine Learning</article-title>,&#x201d; in <source>Social Media and Machine Learning</source>. Editor <person-group person-group-type="editor">
<name>
<surname>Cano</surname>
<given-names>A.</given-names>
</name>
</person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>IntechOpen</publisher-name>). <pub-id pub-id-type="doi">10.5772/intechopen.84856</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kong</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>Z. Y.</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Hill</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Short-term Residential Load Forecasting Based on LSTM Recurrent Neural Network</article-title>. <source>IEEE Trans. Smart Grid</source> <volume>10</volume> (<issue>1</issue>), <fpage>841</fpage>&#x2013;<lpage>851</lpage>. <pub-id pub-id-type="doi">10.1109/TSG.2017.2753802</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Prediction for Tourism Flow Based on LSTM Neural Network</article-title>. <source>Proced. Comput. Sci.</source> <volume>129</volume>, <fpage>277</fpage>&#x2013;<lpage>283</lpage>. <pub-id pub-id-type="doi">10.1016/j.procs.2018.03.076</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>A LSTM and CNN Based Assemble Neural Network Framework for Arrhythmias Classification</article-title>,&#x201d; in <conf-name>ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</conf-name> (<publisher-loc>Brighton, UK</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1303</fpage>&#x2013;<lpage>1307</lpage>. <pub-id pub-id-type="doi">10.1109/ICASSP.2019.8682299</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ma</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhuang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Echo Memory-Augmented Network for Time Series Classification</article-title>. <source>Neural Networks</source> <volume>133</volume>, <fpage>177</fpage>&#x2013;<lpage>192</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2020.10.015</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maginnis</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Richemond</surname>
<given-names>P. H.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Efficiently Applying Attention to Sequential Data with the Recurrent Discounted Attention Unit</article-title>. <comment>arXiv [Preprint]. Available at: arXiv:1705.08480</comment>. </citation>
</ref>
<ref id="B28">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Malhotra</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>TV</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Vig</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Agarwal</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Shroff</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>TimeNet: Pre-trained Deep Recurrent Neural Network for Time Series Classification</article-title>. <comment>arXiv [Preprint]. Available at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1706.08838">https://arxiv.org/abs/1706.08838</ext-link>
</comment>. </citation>
</ref>
<ref id="B29">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Mayet</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Lambert</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Leguyadec</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Le Bolzer</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Schnitzler</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>SkipW: Resource Adaptable RNN with Strict Upper Computational Limit. International Conference on Learning Representations</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://openreview.net/forum?id=2CjEVW-RGOJ">https://openreview.net/forum?id&#x3d;2CjEVW-RGOJ</ext-link>
</comment>. </citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mikolajczyk</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Grochowski</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Data Augmentation for Improving Deep Learning in Image Classification Problem</article-title>,&#x201d; in <conf-name>2018 international interdisciplinary PhD workshop (IIPhDW)</conf-name> (<publisher-loc>Poland</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>117</fpage>&#x2013;<lpage>122</lpage>. <pub-id pub-id-type="doi">10.1109/IIPHDW.2018.8388338</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ororbia</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>ElSaid</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Desell</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Investigating Recurrent Neural Network Memory Structures Using Neuro-Evolution</article-title>,&#x201d; in <conf-name>2019 Proceedings of the genetic and evolutionary computation conference</conf-name>, (<publisher-name>Prague Czech Republic:Association for Computing Machinery</publisher-name>), <fpage>446</fpage>&#x2013;<lpage>455</lpage>. <pub-id pub-id-type="doi">10.1145/3321707.3321795</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ostmeyer</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Cowell</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Machine Learning on Sequential Data Using a Recurrent Weighted Average</article-title>. <source>Neurocomputing</source> <volume>331</volume>, <fpage>281</fpage>&#x2013;<lpage>288</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2018.11.066</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sadeghian</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kosaraju</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Sadeghian</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hirose</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Rezatofighi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Savarese</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Sophie: An Attentive gan for Predicting Paths Compliant to Social and Physical Constraints</article-title>,&#x201d; in <conf-name>2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</conf-name>, (<publisher-loc>Long Beach, CA, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1349</fpage>&#x2013;<lpage>1358</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2019.00144</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Salman</surname>
<given-names>A. G.</given-names>
</name>
<name>
<surname>Heryadi</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Abdurahman</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Suparta</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Single Layer &#x26; Multi-Layer Long Short-Term Memory (LSTM) Model with Intermediate Variables for Weather Forecasting</article-title>. <source>Proced. Comput. Sci.</source> <volume>135</volume>, <fpage>89</fpage>&#x2013;<lpage>98</lpage>. <pub-id pub-id-type="doi">10.1016/j.procs.2018.08.153</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saritas</surname>
<given-names>M. M.</given-names>
</name>
<name>
<surname>Yasar</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Performance Analysis of ANN and Naive Bayes Classification Algorithm for Data Classification</article-title>. <source>Int. J.&#x20;Intell. Syst. Appl.</source> <volume>7</volume> (<issue>2</issue>), <fpage>88</fpage>&#x2013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.18201/ijisae.2019252786</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schuster</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Paliwal</surname>
<given-names>K. K.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Bidirectional Recurrent Neural Networks</article-title>. <source>IEEE Trans. Signal. Process.</source> <volume>45</volume> (<issue>11</issue>), <fpage>2673</fpage>&#x2013;<lpage>2681</lpage>. <pub-id pub-id-type="doi">10.1109/78.650093</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shewalkar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Nyavanandi</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ludwig</surname>
<given-names>S. A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU</article-title>. <source>J.&#x20;Artif. Intelligence Soft Comput. Res.</source> <volume>9</volume> (<issue>4</issue>), <fpage>235</fpage>&#x2013;<lpage>245</lpage>. <pub-id pub-id-type="doi">10.2478/jaiscr-2019-0006</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Singh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pandey</surname>
<given-names>S. K.</given-names>
</name>
<name>
<surname>Pawar</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Janghel</surname>
<given-names>R. R.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Classification of Ecg Arrhythmia Using Recurrent Neural Networks</article-title>. <source>Proced. Comput. Sci.</source> <volume>132</volume>, <fpage>1290</fpage>&#x2013;<lpage>1297</lpage>. <pub-id pub-id-type="doi">10.1016/j.procs.2018.05.045</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Song</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Chung</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Dynamic Frame Skipping for Fast Speech Recognition in Recurrent Neural Network Based Acoustic Models</article-title>,&#x201d; in <conf-name>2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</conf-name> (<publisher-loc>Calgary, AB, Canada</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>4984</fpage>&#x2013;<lpage>4988</lpage>. <pub-id pub-id-type="doi">10.1109/ICASSP.2018.8462615</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Su</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Forecast the Plausible Paths in Crowd Scenes</article-title>,&#x201d; in <conf-name>2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)</conf-name>, (<publisher-loc>Melbourne, Australia</publisher-loc>: <publisher-name>International Joint Conferences on Artificial Intelligence</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>2</lpage>. <pub-id pub-id-type="doi">10.24963/ijcai.2017/386</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Thakker</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Dasika</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Beu</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Skipping Rnn State Updates without Retraining the Original Model</article-title>,&#x201d; in <conf-name>2019 Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems</conf-name>, (<publisher-loc>Coimbra, Portugal</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>31</fpage>&#x2013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1145/3362743.3362965</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Trinh</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Dai</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Luong</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Learning Longer-Term Dependencies in Rnns with Auxiliary Losses</article-title>,&#x201d; in <conf-name>2018 Proceedings of the 35th International Conference on Machine Learning</conf-name> (<publisher-loc>Stockholmsm&#xe4;ssan, Stockholm, Sweden</publisher-loc>: <publisher-name>Proceedings of Machine Learning Research (PMLR)</publisher-name>), <fpage>4965</fpage>&#x2013;<lpage>4974</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://proceedings.mlr.press/v80/trinh18a.html">http://proceedings.mlr.press/v80/trinh18a.html</ext-link>
</comment>. </citation>
</ref>
<ref id="B43">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Vemula</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Muelling</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Oh</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Social Attention: Modeling Attention in Human Crowds</article-title>,&#x201d; in <conf-name>2018 IEEE international Conference on Robotics and Automation (ICRA)</conf-name> (<publisher-loc>Brisbane, Australia</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>4601</fpage>&#x2013;<lpage>4607</lpage>. <pub-id pub-id-type="doi">10.1109/ICRA.2018.8460504</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Attention-based LSTM for Aspect-Level Sentiment Classification</article-title>,&#x201d; in <conf-name>2016 Proceedings of the 2016 conference on empirical methods in natural language processing</conf-name>, (<publisher-loc>Austin, TX, USA</publisher-loc>:<publisher-name>Association for Computational Linguistics</publisher-name>), <fpage>606</fpage>&#x2013;<lpage>615</lpage>. <pub-id pub-id-type="doi">10.18653/v1/d16-1058</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Piao</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction</article-title>,&#x201d; in <conf-name>2018 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR)</conf-name>, (<publisher-loc>Salt Lake City, UT, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>5275</fpage>&#x2013;<lpage>5284</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00553</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Traffic Flow Prediction Using LSTM with Feature Enhancement</article-title>. <source>Neurocomputing</source> <volume>332</volume>, <fpage>320</fpage>&#x2013;<lpage>327</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2018.12.016</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yildirim</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Baloglu</surname>
<given-names>U. B.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>R.-S.</given-names>
</name>
<name>
<surname>Ciaccio</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Acharya</surname>
<given-names>U. R.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A New Approach for Arrhythmia Classification Using Deep Coded Features and LSTM Networks</article-title>. <source>Comput. Methods Programs Biomed.</source> <volume>176</volume>, <fpage>121</fpage>&#x2013;<lpage>133</lpage>. <pub-id pub-id-type="doi">10.1016/j.cmpb.2019.05.004</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Si</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures</article-title>. <source>Neural Comput.</source> <volume>31</volume> (<issue>7</issue>), <fpage>1235</fpage>&#x2013;<lpage>1270</lpage>. <pub-id pub-id-type="doi">10.1162/neco_a_01199</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Heartid: a Multiresolution Convolutional Neural Network for Ecg-Based Biometric Human Identification in Smart Health Applications</article-title>. <source>IEEE Access</source> <volume>5</volume>, <fpage>11805</fpage>&#x2013;<lpage>11816</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2017.2707460</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lv</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Duan</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Do rnn and Lstm Have Long Memory?</article-title>,&#x201d; in <conf-name>2020 International Conference on Machine Learning</conf-name> (<publisher-loc>Vienna, Austria</publisher-loc>: <publisher-name>Proceedings of Machine Learning Research (PMLR)</publisher-name>), <fpage>11365</fpage>&#x2013;<lpage>11375</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://proceedings.mlr.press/v119/zhao20c.html">http://proceedings.mlr.press/v119/zhao20c.html</ext-link>
</comment>. </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>G.-B.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C.-L.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Z.-H.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Minimal Gated Unit for Recurrent Neural Networks</article-title>. <source>Int. J.&#x20;Autom. Comput.</source> <volume>13</volume> (<issue>3</issue>), <fpage>226</fpage>&#x2013;<lpage>234</lpage>. <pub-id pub-id-type="doi">10.1007/s11633-016-1006-2</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>