<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Neurosci.</journal-id>
<journal-title>Frontiers in Computational Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-5188</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fncom.2021.678158</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Gated Recurrent Units Viewed Through the Lens of Continuous Time Dynamical Systems</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Jordan</surname> <given-names>Ian D.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1260686/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Sok&#x000F3;&#x00142;</surname> <given-names>Piotr Aleksander</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1315334/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Park</surname> <given-names>Il Memming</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/519359/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Applied Mathematics and Statistics, Stony Brook University</institution>, <addr-line>Stony Brook, NY</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Institute of Advanced Computational Science, Stony Brook University</institution>, <addr-line>Stony Brook, NY</addr-line>, <country>United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Neurobiology and Behavior, Stony Brook University</institution>, <addr-line>Stony Brook, NY</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Martin A. Giese, University of T&#x000FC;bingen, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Lee DeVille, University of Illinois at Urbana-Champaign, United States; Mario Negrello, Erasmus Medical Center, Netherlands; J. Michael Herrmann, University of Edinburgh, United Kingdom</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Il Memming Park <email>memming.park&#x00040;stonybrook.edu</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>22</day>
<month>07</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>15</volume>
<elocation-id>678158</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>03</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>06</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Jordan, Sok&#x000F3;&#x00142; and Park.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Jordan, Sok&#x000F3;&#x00142; and Park</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract><p>Gated recurrent units (GRUs) are specialized memory elements for building recurrent neural networks. Despite their incredible success on various tasks, including extracting dynamics underlying neural data, little is understood about the specific dynamics representable in a GRU network. As a result, it is both difficult to know a priori how successful a GRU network will perform on a given task, and also their capacity to mimic the underlying behavior of their biological counterparts. Using a continuous time analysis, we gain intuition on the inner workings of GRU networks. We restrict our presentation to low dimensions, allowing for a comprehensive visualization. We found a surprisingly rich repertoire of dynamical features that includes stable limit cycles (nonlinear oscillations), multi-stable dynamics with various topologies, and homoclinic bifurcations. At the same time we were unable to train GRU networks to produce continuous attractors, which are hypothesized to exist in biological neural networks. We contextualize the usefulness of different kinds of observed dynamics and support our claims experimentally.</p></abstract>
<kwd-group>
<kwd>recurrent neural network</kwd>
<kwd>dynamical systems</kwd>
<kwd>continuous time</kwd>
<kwd>bifurcations</kwd>
<kwd>time-series</kwd>
</kwd-group>
<contract-sponsor id="cn001">National Institutes of Health<named-content content-type="fundref-id">10.13039/100000002</named-content></contract-sponsor>
<counts>
<fig-count count="12"/>
<table-count count="0"/>
<equation-count count="10"/>
<ref-count count="49"/>
<page-count count="14"/>
<word-count count="8353"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Recurrent neural networks (RNNs) can capture and utilize sequential structure in natural and artificial languages, speech, video, and various other forms of time series. The recurrent information flow within an RNN implies that the data seen in the past has influence on the current state of the RNN, forming a mechanism for having memory through (nonlinear) temporal traces that encode both <italic>what</italic> and <italic>when</italic>. Past works have used RNNs to study neural population dynamics (Costa et al., <xref ref-type="bibr" rid="B10">2017</xref>), and have demonstrated qualitatively similar dynamics between biological neural networks and artificial networks trained under analogs conditions (Mante et al., <xref ref-type="bibr" rid="B35">2013</xref>; Sussillo et al., <xref ref-type="bibr" rid="B43">2015</xref>; Cueva et al., <xref ref-type="bibr" rid="B11">2020</xref>). In turn, this brings into question the efficacy of using such networks as a means to study brain function. With this in mind, training standard vanilla RNNs to capture long-range dependences within a sequence is challenging due to the vanishing gradient problem (Hochreiter, <xref ref-type="bibr" rid="B22">1991</xref>; Bengio et al., <xref ref-type="bibr" rid="B4">1994</xref>). Several special RNN architectures have been proposed to mitigate this issue, notably the long short-term memory (LSTM) units (Hochreiter and Schmidhuber, <xref ref-type="bibr" rid="B23">1997</xref>) which explicitly guard against unwanted corruption of the information stored in the hidden state until necessary. Recently, a simplification of the LSTM called the <italic>gated recurrent unit</italic> (GRU) (Cho et al., <xref ref-type="bibr" rid="B7">2014</xref>) has become popular in the computational neuroscience and machine learning communities thanks to its performance in speech (Prabhavalkar et al., <xref ref-type="bibr" rid="B40">2017</xref>), music (Choi et al., <xref ref-type="bibr" rid="B8">2017</xref>), video (Dwibedi et al., <xref ref-type="bibr" rid="B13">2018</xref>), and extracting nonlinear dynamics underlying neural data (Pandarinath et al., <xref ref-type="bibr" rid="B38">2018</xref>). However, certain mechanistic tasks, specifically unbounded counting, come easy to LSTM networks but not to GRU networks (Weiss et al., <xref ref-type="bibr" rid="B45">2018</xref>).</p>
<p>Despite these empirical findings, we lack systematic understanding of the internal time evolution of GRU&#x00027;s memory structure and its capability to represent nonlinear temporal dynamics. Such an understanding will make clear what specific tasks (natural and artificial) can or cannot be performed (Bengio et al., <xref ref-type="bibr" rid="B4">1994</xref>), how computation is implemented (Beer, <xref ref-type="bibr" rid="B2">2006</xref>; Sussillo and Barak, <xref ref-type="bibr" rid="B42">2012</xref>), and help to predict qualitative behavior (Beer, <xref ref-type="bibr" rid="B1">1995</xref>; Zhao and Park, <xref ref-type="bibr" rid="B48">2016</xref>). In addition, a great deal of the literature discusses the local dynamics (equilibrium points) of RNNs (Bengio et al., <xref ref-type="bibr" rid="B4">1994</xref>; Sussillo and Barak, <xref ref-type="bibr" rid="B42">2012</xref>), but a complete theory requires an understanding of the global properties as well (Beer, <xref ref-type="bibr" rid="B1">1995</xref>). Furthermore, a deterministic understanding of a GRU network&#x00027;s topological structure will provide fundamental insight as to a trained network&#x00027;s generalization ability, and therefore help in understanding how to seed RNNs for specific tasks (Doya, <xref ref-type="bibr" rid="B12">1993</xref>; Sok&#x000F3;&#x00142; et al., <xref ref-type="bibr" rid="B41">2019</xref>).</p>
<p>In general, the hidden state dynamics of an RNN can be written as <bold>h</bold><sub><italic>t</italic>&#x0002B;1</sub> &#x0003D; <italic>f</italic>(<bold>h</bold><sub><italic>t</italic></sub>, <bold>x</bold><sub><italic>t</italic></sub>) where <bold>x</bold><sub><italic>t</italic></sub> is the current input in a sequence indexed by <italic>t</italic>, <italic>f</italic> is a nonlinear function, and <bold>h</bold><sub><italic>t</italic></sub> represents the hidden memory state that carries all information responsible for future output. In the absence of input, <bold>h</bold><sub><italic>t</italic></sub> evolves over time on its own:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>f</italic>(&#x000B7;) :&#x0003D; <italic>f</italic>(&#x000B7;, <bold>0</bold>) for notational simplicity. In other words, we can consider the temporal evolution of memory stored within an RNN as a trajectory of an autonomous dynamical system defined by Equation (1), and use dynamical systems theory to further investigate and classify the temporal features obtainable in an RNN. In this paper, we intend on providing a deep intuition of the inner workings of the GRU through a continuous time analysis. While RNNs are traditionally implemented in discrete time, we show in the next section that this form of the GRU can be interpreted as a numerical approximation of an underlying system of ordinary differential equations. Historically, discrete time systems are often more challenging to analyze when compared with their continuous time counterparts, primarily due to their more <italic>jumpy</italic> nature, allowing for more complex dynamics in low-dimensions (Pasemann, <xref ref-type="bibr" rid="B39">1997</xref>; Laurent and von Brecht, <xref ref-type="bibr" rid="B30">2017</xref>). Due to the relatively continuous nature of many abstract and physical systems, it may be of great use to analyze the underlying continuous time system of a trained RNN directly in some contexts, while interpreting the added dynamical complexity from the discretization as anomalies from numerical analysis (LeVeque and Leveque, <xref ref-type="bibr" rid="B31">1992</xref>; Thomas, <xref ref-type="bibr" rid="B44">1995</xref>; He et al., <xref ref-type="bibr" rid="B19">2016</xref>; Heath, <xref ref-type="bibr" rid="B20">2018</xref>). Furthermore, the recent development of <italic>Neural Ordinary Differential Equations</italic> have catalyzed the computational neuroscience and machine learning communities to turn much of their attention to continuous-time implementations of neural networks (Chen et al., <xref ref-type="bibr" rid="B6">2018</xref>; Morrill et al., <xref ref-type="bibr" rid="B37">2021</xref>).</p>
<p>We discuss a vast array of observed local and global dynamical structures, and validate the theory by training GRUs to predict time series with prescribed dynamics. As to not compromise the presentation, we restrict our analysis to low dimensions for easy visualization (Beer, <xref ref-type="bibr" rid="B1">1995</xref>; Zhao and Park, <xref ref-type="bibr" rid="B48">2016</xref>). However, given a trained GRU of any finite dimension, the findings here still apply, and can be applied with further analysis on a case by case basis (more information on this in the discussion). Furthermore, to ensure our work is accessible we will assume a pedagogical approach in our delivery. We recommend Meiss (Meiss, <xref ref-type="bibr" rid="B36">2007</xref>) for more background on the subject.</p>
</sec>
<sec id="s2">
<title>2. Underlying Continuous Time System of Gated Recurrent Units</title>
<p>The GRU uses two internal gating variables: the <italic>update gate</italic> <bold>z</bold><sub><italic>t</italic></sub> which protects the <italic>d</italic>-dimensional hidden state <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and the <italic>reset gate</italic> <bold>r</bold><sub><italic>t</italic></sub> which allows overwriting of the hidden state and controls the interaction with the input <inline-formula><mml:math id="M3"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>.</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>W</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x02003;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E3"><label>(3)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>W</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x02003;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E4"><label>(4)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02299;</mml:mo><mml:mo class="qopname">tanh</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>W</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02299;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02299;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mtext>&#x02003;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M7"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>W</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>W</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>W</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M8"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are the parameter matrices, <inline-formula><mml:math id="M9"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are bias vectors, &#x02299; represents element-wise multiplication, and &#x003C3;(<bold>z</bold>) &#x0003D; 1/(1&#x0002B;<italic>e</italic><sup>&#x02212;<bold>z</bold></sup>) is the element-wise logistic sigmoid function. Note that the hidden state is asymptotically contained within [&#x02212;1, 1]<sup><italic>d</italic></sup> due to the saturating nonlinearities, implying that if the state is initialized outside of this trapping region, it must eventually enter it in finite time and remain in it for all later time.</p>
<p>Note that the update gate <bold>z</bold><sub><italic>t</italic></sub> controls how fast each dimension of the hidden state decays, providing an adaptive time constant for memory. Specifically, as <inline-formula><mml:math id="M10"><mml:msub><mml:mrow><mml:mo class="qopname">lim</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02192;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, GRUs can implement perfect memory of the past and ignore <bold>x</bold><sub><italic>t</italic></sub>. Hence, a <italic>d</italic>-dimensional GRU is capable of keeping a near constant memory through the update gate&#x02014;near constant since 0 &#x0003C; [<sub><bold>z</bold><sub><italic>t</italic></sub>]<italic>j</italic></sub> &#x0003C; 1, where [&#x000B7;]<sub><italic>j</italic></sub> denotes <italic>j</italic>-th component of a vector. Moreover, the autoregressive weights (mainly <bold>U</bold><sub><italic>h</italic></sub> and <bold>U</bold><sub><italic>r</italic></sub>) can support time evolving memory (Laurent and von Brecht, <xref ref-type="bibr" rid="B30">2017</xref> considered this a hindrance and proposed removing all complex dynamical behavior in a simplified GRU).</p>
<p>To investigate the memory structure further, let us consider the dynamics of the hidden state in the absence of input, i.e., <bold>x</bold><sub><italic>t</italic></sub> &#x0003D; 0, &#x02200;<italic>t</italic>, which is of the form Equation (1). From a dynamical system&#x00027;s point of view, all inputs to the system can be understood as perturbations to the autonomous system, and therefore have no effect on the set of achievable dynamics. To utilize the rich descriptive language of continuous time dynamical systems theory, we recognize the autonomous GRU-RNN as a weighted forward Euler discretization to the following continuous time dynamical system:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x02003;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E6"><label>(6)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x02003;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E7"><label>(7)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mo>&#x02219;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02299;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo class="qopname">tanh</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02299;</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x02003;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M14"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mo>&#x02219;</mml:mo></mml:mover><mml:mo>&#x02261;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>d</mml:mtext><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mtext>d</mml:mtext><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula>. Since both &#x003C3;(&#x000B7;) and tanh(&#x000B7;) are smooth, this continuous limit is justified and serves as a basis for further analysis, as all GRU networks are attempting to approximate this continuous limit. In the following, GRU will refer to the continuous time version (Equation 7). Note that the update gate <bold>z</bold>(<italic>t</italic>) again plays the role of a state-dependent time constant for memory decay. We note, however, that <bold>z</bold>(<italic>t</italic>) adjusts flow speed point-wise, resulting in non-constant nonlinear slowing of all trajectories, as <bold>z</bold>(<italic>t</italic>) &#x02208; (0, 1). Since 1 &#x02212; <bold>z</bold>(<italic>t</italic>) &#x0003E; 0, and thus cannot change sign, it acts as a homeomorphism between (Equation 7) and the same system with this leading multiplicative term removed. Therefore, it does not change the topological structure of the dynamics (Kuznetsov, <xref ref-type="bibr" rid="B29">1998</xref>), and we can safely ignore the effects of <bold>z</bold>(<italic>t</italic>) in the following theoretical analysis (sections 3, 4). In these sections we set <bold>U</bold><sub><italic>z</italic></sub> &#x0003D; 0 and <bold>b</bold><sub><italic>z</italic></sub> &#x0003D; 0. A derivation of the continuous time GRU can be found in section 1 of the <xref ref-type="supplementary-material" rid="SM1">Supplementary Material</xref>. Further detail on the effects of <bold>z</bold>(<italic>t</italic>) are discussed in the final section of this paper.</p>
</sec>
<sec id="s3">
<title>3. Stability Analysis of a One Dimensional GRU</title>
<p>For a 1D GRU<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> (<italic>d</italic> &#x0003D; 1), Equation (7) reduces to a one dimensional dynamical system where every variable is a scalar. The expressive power of a 1D GRU is quite limited, as only three stability structures (topologies) exist (see section 2 in the <xref ref-type="supplementary-material" rid="SM1">Supplementary Material</xref>): (<xref ref-type="fig" rid="F1">Figure 1A</xref>) a single stable node, (<xref ref-type="fig" rid="F1">Figure 1B</xref>) a stable node and a half-stable node, and (<xref ref-type="fig" rid="F1">Figure 1C</xref>) two stable nodes separated by an unstable node (see <xref ref-type="fig" rid="F1">Figure 1</xref>). The corresponding time evolution of the hidden state are (A) decay to a fixed value, (B) decay to a fixed value, but from one direction halt at an intermediate value until perturbed, or (C) decay to one of two fixed values (bistability). The bistability can be used to model a switch, such as in the context of simple decision making, where inputs can perturb the system back and forth between states.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Three possible types of one dimensional flow for a 1D GRU; <bold>(A)</bold> monostability, <bold>(B)</bold> half-stability, <bold>(C)</bold> bistability. When &#x01E23; &#x0003E; 0, <italic>h</italic>(<italic>t</italic>) increases. This flow is indicated by a rightward arrow. Nodes ({<italic>h</italic>&#x02223;&#x01E23;(<italic>h</italic>) &#x0003D; 0}) are represented as circles and classified by their stability (Meiss, <xref ref-type="bibr" rid="B36">2007</xref>).</p></caption>
<graphic xlink:href="fncom-15-678158-g0001.tif"/>
</fig>
<p>The topology the GRU takes is determined by its parameters. If the GRU begins in a region of the parameter space corresponding to (A), we can smoothly vary the parameters to transverse (B) in the parameter space, and end up at (C). This is commonly known as a saddle-node bifurcation. Speaking generally, a bifurcation is the change in topology of a dynamical system, resulting from a smooth change in parameters. The point in parameter space at which the bifurcation occurs is called the bifurcation point (e.g., <xref ref-type="fig" rid="F1">Figure 1B</xref>), and we will refer to the fixed point that changes its stability at the bifurcation point as the <italic>bifurcation fixed point</italic> (e.g., the half-stable fixed point in <xref ref-type="fig" rid="F1">Figure 1B</xref>). The codimension of a bifurcation is the number of parameters which must vary in order to remain on the bifurcation manifold. In the case of our example, a saddle-node bifurcation is codimension-1 (Kuznetsov, <xref ref-type="bibr" rid="B29">1998</xref>). Right before transitioning to (B), from (A), the flow near where the half-stable node would appear can exhibit arbitrarily slow flow. We will refer to these as <italic>slow points</italic> (Sussillo and Barak, <xref ref-type="bibr" rid="B42">2012</xref>). In this context, slow points allow for metastable states, where a trajectory will flow toward the slow point, remain there for a period of time, before moving to the stable fixed point.</p>
</sec>
<sec id="s4">
<title>4. Analysis of a Two Dimensional GRU</title>
<p>We will see that the addition of a second GRU opens up a substantial variety of possible topological structures. For notational simplicity, we denote the two dimensions of <bold>h</bold> as <italic>x</italic> and <italic>y</italic>. We visualize the flow fields defined by Equation (7) in 2-dimensions as <italic>phase portraits</italic> which reveal the topological structures of interest (Meiss, <xref ref-type="bibr" rid="B36">2007</xref>). For starters, the phase portrait of two independent bistable GRUs can be visualized as <xref ref-type="fig" rid="F2">Figure 2A</xref>. It clearly shows 4 stable states as expected, with a total of 9 fixed points. This could be thought of as a continuous-time continuous-space implementation of a finite state machine with 4 states (<xref ref-type="fig" rid="F2">Figure 2B</xref>). The 3 types of observed fixed points&#x02014;stable (sinks), unstable (sources), and saddle points&#x02014;exhibit locally linear dynamics, however, the global geometry is nonlinear and their topological structures can vary depending on their arrangement.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Illustrative example of two independent bistable GRUs. <bold>(A)</bold> Phase portrait. The flow field <inline-formula><mml:math id="M15"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mo>&#x02219;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>&#x01E8B;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x01E8F;</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> is decomposed into direction (black arrows) and speed (color). Purple lines represent trajectories of the hidden state which converge to one of the four stable fixed points. Note the four quadrants coincide with the basin of attraction for each of the stable nodes. The fixed points appear when the x- and y-nullclines intersect. <bold>(B)</bold> The four stable nodes of this system can be interpreted as a continuous analog of 4-discrete states with input-driven transitions.</p></caption>
<graphic xlink:href="fncom-15-678158-g0002.tif"/>
</fig>
<p>We explored stability structures attainable by 2D GRUs. Due to the relatively large number of observed topologies, this section&#x00027;s main focus will be on demonstrating all observed local and global dynamical features obtainable by 2D GRUs. A catalog of all known topologies can be found in section 3 of the <xref ref-type="supplementary-material" rid="SM1">Supplementary Material</xref>, along with the parameters of every phase portrait depicted in this paper. We cannot say whether or not this catalog is exhaustive, but the sheer number of structures found is a testament to the expressive power of the GRU network, even in low dimensions.</p>
<p>Before proceeding, let us take this time to describe all the local dynamical features observed. In addition to the previously mentioned three types of fixed points, 2D GRUs can exhibit a variety of bifurcation fixed points, resulting from regions of parameter space that separate all topologies restricted to simple fixed points (i.e stable, unstable, and saddle points). Behaviorally speaking, these fixed points act as hybrids between the previous three, resulting in a much richer set of obtainable dynamics. In <xref ref-type="fig" rid="F3">Figure 3</xref>, we show all observed types of fixed points<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>. While no codimension-2 bifurcation fixed points were observed in the 2D GRU system, a sort of <italic>pseudo-codimension-2</italic> bifurcation fixed point was seen by placing a sink, source, and two saddle points sufficiently close together, such that, when implemented, all four points remain below machine precision, thereby acting as a single fixed point. <xref ref-type="fig" rid="F4">Figure 4</xref> further demonstrates this concept, and <xref ref-type="fig" rid="F3">Figure 3B</xref> depicts and example. We will discuss later that this sort of pseudo-bifurcation point allows the system to exhibit <italic>homoclinic-like</italic> behavior on a two dimensional compact set. In <xref ref-type="fig" rid="F3">Figure 3A</xref>, we see 11 fixed points, the maximum number of fixed points observed in a 2D GRU system. A closer look at this system reveals one interpretation as a continuous analog of 5-discrete states with input-driven transitions, similar to that depicted in <xref ref-type="fig" rid="F2">Figure 2</xref>. This imposes a possible upper bound on the network&#x00027;s capacity to encode a finite set of states in this manner.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Existence of all observed simple fixed points and bifurcation fixed points with 2D GRUs, depicted in phase space. Orange and pink lines represent the x and y nullclines, respectively. Purple lines indicate various trajectories of the hidden state. Direction of the flow is determined by the black arrows, where the colormap underlaying the figure depicts the magnitude of the velocity of the flow in log scale. <bold>(A)</bold> maximum number of stable states achieved using a 2D GRU, <bold>(B)</bold> demonstration of a pseudo co-dimension 2 bifurcation fixed point and stable saddle-node bifurcation fixed points, <bold>(C)</bold> demonstration of an unstable saddle-node bifurcation fixed point.</p></caption>
<graphic xlink:href="fncom-15-678158-g0003.tif"/>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>A cartoon representation of the observed <italic>pseudo-codimension-2</italic> bifurcation fixed point. This structure occurs in implementation when placing a sink (top right), a source (bottom left), and two saddle points (top left and bottom right) close enough together, such that the distance between the two points furthest away from one another <italic>d</italic> is below machine precision &#x003F5;. Under such conditions, the local dynamics behave as a hybridization of all four points. Since at least two parameters need to be adjusted in order to achieve this behavior, we give it the label of <italic>pseudo-codimension-2</italic>; <italic>pseudo</italic> because <italic>d</italic> can never equal 0 in this system.</p></caption>
<graphic xlink:href="fncom-15-678158-g0004.tif"/>
</fig>
<p>The addition of bifurcation fixed points opens the door to dynamically realize more sophisticated models. Take for example the four state system depicted in <xref ref-type="fig" rid="F3">Figure 3B</xref>. If the hidden state is set to initialize in the first quadrant of phase space [i.e., (0, &#x0221E;)<sup>2</sup>], the trajectory will flow toward the pseudo-codimension-2 bifurcation fixed point at the origin. Introducing noise through the input will stochastically cause the trajectory to approach the stable fixed point at (&#x02212;1, &#x02212;1) either directly, or by first flowing into one of the two saddle-node bifurcation fixed points of the first kind. Models of this sort can be used in a variety of applications, such as perceptual decision making (Wong and Wang, <xref ref-type="bibr" rid="B47">2006</xref>; Churchland and Cunningham, <xref ref-type="bibr" rid="B9">2014</xref>).</p>
<p>We will begin our investigation into the non-local dynamics observed with 2D GRUs by showing the existence of an Andronov-Hopf bifurcation, where a stable fixed point bifurcates into an unstable fixed point surrounded by a limit cycle. A limit cycle is an attracting set with a well defined basin of attraction. However, unlike a stable fixed point, where trajectories initialized in the basin of attraction flow toward a single point, a limit cycle pulls trajectories into a stable periodic orbit. If the periodic orbit surrounds an unstable fixed point the attractor is <italic>self-exciting</italic>, otherwise it is a <italic>hidden attractor</italic> (Meiss, <xref ref-type="bibr" rid="B36">2007</xref>). While hidden attractors have been observed in various 2D systems, they have not been found in the 2D GRU system, and we conjecture that they do not exist. If all parameters are set to zero except for the hidden state weights, which are parameterized as a rotation matrix with an associated gain, we can introduce rotation into the vector field as a function of gain and rotation angle. Properly tuning these parameters will give rise to a limit cycle; a result of the saturating nonlinearity impeding the rotating flow velocity sufficiently distant from the origin, thereby pulling trajectories toward a closed orbit.</p>
<p>For &#x003B1;, &#x003B2; &#x02208; &#x0211D;<sup>&#x0002B;</sup> and <italic>s</italic> &#x02208; &#x0211D;,</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mo class="qopname">cos</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo>-</mml:mo><mml:mo class="qopname">sin</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo class="qopname">sin</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo class="qopname">cos</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>s</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Let &#x003B2; &#x0003D; 3 and <italic>s</italic> &#x0003D; 0. If <inline-formula><mml:math id="M17"><mml:mi>&#x003B1;</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula>, the system has a single stable fixed point (stable spiral), as depicted in <xref ref-type="fig" rid="F5">Figure 5A</xref>. If we continuously decrease &#x003B1;, the system undergoes an Andronov-Hopf bifurcation at approximately <inline-formula><mml:math id="M18"><mml:mi>&#x003B1;</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>.</mml:mo><mml:mn>8</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula>. As &#x003B1; continuously decreases, the orbital period increases, and as the nullclines can be made arbitrarily close together, the length of this orbital period can be set arbitrarily. <xref ref-type="fig" rid="F5">Figure 5B</xref> shows an example of a relatively short orbital period, and <xref ref-type="fig" rid="F5">Figure 5C</xref> depicts the behavior seen for slower orbits. If we continue allowing &#x003B1; to decrease, the system will undergo four simultaneous saddle-node bifurcations, and end up in a state topologically equivalent to that depicted in <xref ref-type="fig" rid="F2">Figure 2A</xref>. <xref ref-type="fig" rid="F6">Figure 6A</xref> depicts regions of the parameter space of Equation (7) parameterized by Equation (8), where the Andronov-Hopf bifurcation manifolds can be clearly seen. <xref ref-type="fig" rid="F6">Figure 6B</xref> demonstrates one effect the reset gate can have on the frequency of the oscillations. If we alter the bias vector <italic>b</italic><sub><italic>r</italic></sub>, the expected oscillation period changes for regions of the &#x003B1; &#x02212; &#x003B2; parameter space which exhibit a limit cycle. Computationally speaking, limit cycles are a common dynamical structure for modeling neuron bursting (Izhikevich, <xref ref-type="bibr" rid="B25">2007</xref>), taking place in many foundational works including the Hodgkin-Huxley model (Hodgkin and Huxley, <xref ref-type="bibr" rid="B24">1952</xref>) and the FitzHugh-Nagumo Model (FitzHugh, <xref ref-type="bibr" rid="B14">1961</xref>). Such dynamics also arise in various population level dynamics in artificial tasks, such as sine wave generation (Sussillo and Barak, <xref ref-type="bibr" rid="B42">2012</xref>). Furthermore, initializing the hidden state matrix <italic>U</italic><sub><italic>h</italic></sub> of an even dimensional continuous-time RNN (tanh or GRU) with 2 &#x000D7; 2 blocks along the diagonal and zeros everywhere else is theoretically shown to aid in learning long-term dependencies, when all the blocks act as decoupled oscillators (Sok&#x000F3;&#x00142; et al., <xref ref-type="bibr" rid="B41">2019</xref>).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Two GRUs exhibit an Andronov-Hopf bifurcation, where the parameters are defined by Equation (8). When <inline-formula><mml:math id="M19"><mml:mi>&#x003B1;</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula> the system exhibits a single stable fixed point at the origin <bold>(A)</bold>. If &#x003B1; decreases continuously, a limit cycle emerges around the fixed point, and the fixed point changes stability <bold>(B)</bold>. Allowing &#x003B1; to decrease further increases the size and orbital period of the limit cycle <bold>(C)</bold>. The bottom row represents the hidden state as a function of time, for a single trajectory (denoted by black trajectories in each corresponding phase portrait).</p></caption>
<graphic xlink:href="fncom-15-678158-g0005.tif"/>
</fig>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>(A)</bold> parameter sweep of Equation (8) over &#x003B1; &#x02208; (0, &#x003C0;) (rotation matrix angle) and &#x003B2; &#x02208; (1, 3) (gain term), for <italic>s</italic> &#x0003D; 0. Color map indicates oscillation frequency in Hertz, where white space shows parameter combinations where no limit cycle exists. <bold>(B)</bold> average oscillation frequency across regions of the displayed &#x003B1; &#x02212; &#x003B2; parameter space where a limit cycle exists. The purple shaded region depicts variance of oscillation frequency. Increasing <italic>s</italic> slows down the average frequency of the limit cycles, while simultaneously reducing variance.</p></caption>
<graphic xlink:href="fncom-15-678158-g0006.tif"/>
</fig>
<p>Regarding the second non-local dynamical feature, it can be shown that a 2D GRU can undergo a homoclinic bifurcation, where a periodic orbit (in this case a limit cycle) expands and collides with a saddle at the bifurcation point. At this bifurcation point the system exhibits a homoclinic orbit, where trajectories initialized on the orbit fall into the same fixed point in both forward and backward time. In order to demonstration this behavior, let the parameters of the network be defined as follows:</p>
<p>For &#x003B3; &#x02208; &#x0211D;,</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M20"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>U</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>3</mml:mn><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mo class="qopname">cos</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mn>20</mml:mn></mml:mrow></mml:mfrac><mml:mo class="qopname">sin</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mn>20</mml:mn></mml:mrow></mml:mfrac></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>-</mml:mo><mml:mo class="qopname">sin</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mn>20</mml:mn></mml:mrow></mml:mfrac><mml:mo class="qopname">cos</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mn>20</mml:mn></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>32</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>&#x003B3;</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Under this parameterization the 2D GRU system exhibits a homoclinic orbit when &#x003B3; &#x0003D; 0.054085 (<xref ref-type="fig" rid="F7">Figure 7</xref>). In order to showcase this bifurcation as well as the previous Andronov-Hopf bifurcation sequentially in action we turn to <xref ref-type="fig" rid="F8">Figure 8</xref>, where the parameters are defined by Equation (9) and &#x003B3; is initialized at 0.051 in <xref ref-type="fig" rid="F8">Figure 8A</xref>.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>A 2D GRU parameterized by Equation (9) expresses a homoclinic orbit when &#x003B3; &#x0003D; 0.054085 (denoted by a black trajectory). Trajectories initialized on the homoclinic orbit will approach the same fixed point in both forward and backward time.</p></caption>
<graphic xlink:href="fncom-15-678158-g0007.tif"/>
</fig>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Two GRUs exhibit an Andronov-Hopf bifurcation followed by a homoclinic bifurcation under the same parameterization. The plots directly under each phase portrait depict the time evolution of the black trajectory for the corresponding system. <bold>(A)</bold> (&#x003B3; &#x0003D; 0.051): the system exhibits a stable fixed point. <bold>(B)</bold> (&#x003B3; &#x0003D; 0.0535): the system has undergone an Andronov-Hopf bifurcation and exhibits a stable limit cycle. <bold>(C)</bold> (&#x003B3; &#x0003D; 0.054085): the limit cycle collides with the saddle point, creating a homoclinic orbit. <bold>(D)</bold> (&#x003B3; &#x0003D; 0.0542): the system has undergone a homoclinic bifurcation exhibits neither a homoclinic orbit nor a limit cycle.</p></caption>
<graphic xlink:href="fncom-15-678158-g0008.tif"/>
</fig>
<p>In addition to proper homoclinic orbits, we observe that 2D GRUs can exhibit one or two bounded planar regions of homoclinic-like orbits for a given set of parameters, as shown in <xref ref-type="fig" rid="F9">Figures 9A,B</xref>, respectively. Any trajectory initialized in one of these regions will flow into the pseudo-codimension-2 bifurcation fixed point at the origin, regardless of which direction time flows in. Since the pseudo-codimension-2 bifurcation fixed point is technically a cluster of four fixed points, including one source and one sink, as demonstrated in <xref ref-type="fig" rid="F4">Figure 4</xref>, there is actually no homoclinic loop. However, due to the close proximity of these fixed points, trajectories repelled away from the source, but within the basin of attraction of the sink, will appear homoclinic due to the use of finite precision. This featured behavior enables the accurate depiction of various models, including neuron spiking (Izhikevich, <xref ref-type="bibr" rid="B25">2007</xref>).</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p><bold>(A,B)</bold> Two GRUs exhibit 2D bounded regions of homoclinic-like behavior. <bold>(C,D)</bold> represent the hidden state as a function of time for a single initial condition within the homoclinic-like region(s) of the single and double homoclinic-like region cases, respectively, (denoted by solid black trajectories in each corresponding phase portrait).</p></caption>
<graphic xlink:href="fncom-15-678158-g0009.tif"/>
</fig>
<p>With finite-fixed point topologies and global structures out of the way, the next logical question to ask is <italic>can 2D GRUs exhibit an infinite number of fixed points?</italic> Such behavior is often desirable in models that require stationary attraction to non-point structures, such as line attractors and ring attractors. Computationally, movement along a line attractor may be interpreted as integration (Mante et al., <xref ref-type="bibr" rid="B35">2013</xref>), and has been shown as a crucial population level mechanism in various tasks, including sentiment analysis (Maheswaranathan et al., <xref ref-type="bibr" rid="B34">2019b</xref>) and decision making (Mante et al., <xref ref-type="bibr" rid="B35">2013</xref>). In a similar light, movement around a ring attractor my computationally represent either modular integration or arithmetic. One known application of ring attractor dynamics in neuroscience is a representation of heading direction (Kim et al., <xref ref-type="bibr" rid="B27">2017</xref>). While such behavior in the continuous GRU system has yet to be seen, an approximation of a line attractor can be made, as depicted in <xref ref-type="fig" rid="F10">Figure 10</xref>. We will refer to this phenomenon as a <italic>pseudo-line attractor</italic>, where the nullclines remain sufficiently close on a small finite interval, thereby allowing for arbitrarily slow flow, by means of slow points.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Two GRUs exhibit a pseudo-line attractor. Nullclines intersect at one point, but are close enough on a finite region to mimic an analytic line attractor in practice. <bold>(A,B)</bold> depict the same phase portrait on [&#x02212;1.5, 1.5]<sup>2</sup> and [&#x02212;0.2, 0.2]<sup>2</sup>, respectively.</p></caption>
<graphic xlink:href="fncom-15-678158-g0010.tif"/>
</fig>
</sec>
<sec id="s5">
<title>5. Experiments: Time-Series Prediction</title>
<p>As a means to put our theory to practice, in this section we explore several examples of time series prediction of continuous time planar dynamical systems using 2D GRUs. Results from the previous section indicate what dynamical features can be learned by this RNN, and suggest cases by which training will fail. All of the following computer experiments consist of an RNN, by which the hidden layer is made up of a 2D GRU, followed by a linear output layer. The network is trained to make a 29-step prediction from a given initial observation, and no further input through prediction. As such, to produce accurate predictions, the RNN must rely solely on the hidden layer dynamics.</p>
<p>We train the network to minimize the following multi-step loss function:</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M21"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">traj</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mo stretchy="false">&#x02225;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>w</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>w</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>w</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msubsup><mml:mrow><mml:mo stretchy="false">&#x02225;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003B8; are the parameters of the GRU and linear readout, <italic>T</italic> &#x0003D; 29 is the prediction horizon, <bold>w</bold><sub><italic>i</italic></sub>(<italic>t</italic>) is the <italic>i</italic>-th time series generated by the true system, and <inline-formula><mml:math id="M22"><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>w</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>w</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is the <italic>k</italic>-step prediction given <bold>w</bold><sub>0</sub>.</p>
<p>The hidden states are initialized at zero for each trajectory. The RNN is then trained for 4000 epochs, using ADAM (Kingma and Ba, <xref ref-type="bibr" rid="B28">2014</xref>) in whole batch mode to minimize the loss function, i.e., the mean square error between the predicted trajectory and the data. <italic>N</italic><sub>traj</sub> &#x0003D; 667 time series were used for training. <xref ref-type="fig" rid="F11">Figure 11</xref> depicts the experimental results of the RNN&#x00027;s attempt at learning each dynamical system we describe below.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Training 2D GRUs. (top row) Phase portraits of target dynamical systems. Red solid lines represent 1-dimensional attractors. See main text for each system. (middle row) GRU dynamics learned from corresponding 29-step forecasting tasks. The prediction is an affine transformation of the hidden state. (bottom row) An example time series generated through closed-loop prediction of the trained GRU (denoted by a black trajectory). GRU fails to learn the ring attractor.</p></caption>
<graphic xlink:href="fncom-15-678158-g0011.tif"/>
</fig>
<sec>
<title>5.1. Limit Cycle</title>
<p>To test if 2D GRUs can learn a limit cycle, we use a simple nonlinear oscillator called the FitzHugh-Nagumo Model (FitzHugh, <xref ref-type="bibr" rid="B14">1961</xref>). The FitzHugh-Nagumo model is defined by: <inline-formula><mml:math id="M23"><mml:mi>&#x01E8B;</mml:mi><mml:mo>=</mml:mo><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:mfrac><mml:mo>-</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">ext</mml:mtext></mml:mstyle></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>&#x003C4;</mml:mi><mml:mi>&#x01E8F;</mml:mi><mml:mo>=</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>a</mml:mi><mml:mo>-</mml:mo><mml:mi>b</mml:mi><mml:mi>y</mml:mi></mml:math></inline-formula>, where in this experiment we will chose &#x003C4; &#x0003D; 12.5, <italic>a</italic> &#x0003D; 0.7, <italic>b</italic> &#x0003D; 0.8, and <inline-formula><mml:math id="M24"><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">ext</mml:mtext></mml:mstyle></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>7</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>04</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Under this choice of model parameters, the system will exhibit an unstable fixed point (unstable spiral) surrounded by a limit cycle (<xref ref-type="fig" rid="F11">Figure 11</xref>). As shown in section 4, 2D GRUs are capable of representing this topology. The results of this experiment verify this claim (<xref ref-type="fig" rid="F11">Figure 11</xref>), as 2D GRUs can capture topologically equivalent dynamics.</p>
</sec>
<sec>
<title>5.2. Line Attractor</title>
<p>As discussed in section 4, 2D GRUs can exhibit a pseudo-line attractor, by which the system mimics an analytic line attractor on a small finite domain. We will use the simplest representation of a planar line attractor: &#x01E8B; &#x0003D; &#x02212;<italic>x</italic>, &#x01E8F; &#x0003D; 0. This system will exhibit a line attractor along the <italic>y</italic>-axis, at <italic>x</italic> &#x0003D; 0 (<xref ref-type="fig" rid="F11">Figure 11</xref>). Trajectories will flow directly perpendicular toward the attractor. white Gaussian noise <inline-formula><mml:math id="M25"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>1</mml:mn><mml:mi>I</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> in the training data. While the hidden state dynamics of the trained network do not perfectly match that of an analytic line attractor, there exists a small subinterval near each of the fixed points acting as a pseudo-line attractor (<xref ref-type="fig" rid="F11">Figure 11</xref>). As such, the added affine transformation (linear readout) can scale and reorient this subinterval on a finite domain. Since all attractors in a d-dimensional GRU are bound to [&#x02212;1, 1]<sup><italic>d</italic></sup>, no line attractor can extend infinitely in any given direction, which matches well with the GRUs inability to perform unbounded counting, as the continuous analog of such a task would require a trajectory to move along such an attractor.</p>
</sec>
<sec>
<title>5.3. Ring Attractor</title>
<p>For this experiment, a dynamical system representing a standard ring attractor of radius one is used: &#x01E8B; &#x0003D; &#x02212;(<italic>x</italic><sup>2</sup> &#x0002B; <italic>y</italic><sup>2</sup> &#x02212; 1)<italic>x</italic>; &#x01E8F; &#x0003D; &#x02212;(<italic>x</italic><sup>2</sup> &#x0002B; <italic>y</italic><sup>2</sup> &#x02212; 1)<italic>y</italic>. This system exhibits an attracting ring, centered around an unstable fixed point. We added Gaussian noise <inline-formula><mml:math id="M26"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>1</mml:mn><mml:mi>I</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> in the training data.</p>
<p>In our analysis we did not observe two GRUs exhibit this set of dynamics, and the results of this experiment, demonstrated in <xref ref-type="fig" rid="F3">Figure 3</xref> periments, suggest they cannot. Rather, the hidden state dynamics fall into an observed finite fixed point topology (see case xxix in section 3 of the <xref ref-type="supplementary-material" rid="SM1">Supplementary Material</xref>). In addition, we robustly see this over multiple initializations, and the quality of approximation improves as the dimensionality of GRU increases (<xref ref-type="fig" rid="F12">Figure 12</xref>), suggesting that many GRUs are required to obtain a sufficient approximation of this set of dynamics for a practical task (Funahashi and Nakamura, <xref ref-type="bibr" rid="B17">1993</xref>).</p>
<fig id="F12" position="float">
<label>Figure 12</label>
<caption><p>Average learning curves (training loss) for ring attractor <bold>(A)</bold> and the FitzHugh-Nagumo <bold>(B)</bold> dynamics. Note that the performance of the ring attractor improves as the dimensionality of the GRU increases unlike the FHN dynamics. Four network sizes (2, 4, 8, 16 dimensional GRU) were trained 3 times with different initializations, depicted by the more lightly colored curves.</p></caption>
<graphic xlink:href="fncom-15-678158-g0012.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s6">
<title>6. Discussion</title>
<p>Through example and experiment we indicated classes of dynamics which are crucial in expressing various known neural computations and obtainable with the 2D GRU network. We demonstrated the system&#x00027;s inability to learn continuous attractors, seemingly in any finite dimension, a structure hypothesized to exist in various neural representations. While the GRU network was not originally made as a neuroscientific model, there has been considerable work done showing high qualitative similarity between the underlying dynamics of neural recordings and artificial RNNs on the population level (Mante et al., <xref ref-type="bibr" rid="B35">2013</xref>; Sussillo et al., <xref ref-type="bibr" rid="B43">2015</xref>). Furthermore, recent research has modified such artificial models to simulate various neurobiological phenomenon (Heeger and Mackey, <xref ref-type="bibr" rid="B21">2019</xref>). One recent study demonstrated that trained RNNs of different architectures and nonlinearities express very similar fixed point topologies to one another when successfully trained on the same tasks (Maheswaranathan et al., <xref ref-type="bibr" rid="B33">2019a</xref>), suggesting a possible connection in the dynamics of artificial networks and neural population dynamics. As such, an understanding of the obtainable dynamical features in a GRU network allow one to comment on the efficacy of using such an architecture as an analog of brain dynamics at the population level.</p>
<p>Although this manuscript simplified the problem by considering the 2D GRU, a lot of research has resulted in interpreting cortical dynamics as low dimensional continuous time dynamical systems (Harvey et al., <xref ref-type="bibr" rid="B18">2012</xref>; Mante et al., <xref ref-type="bibr" rid="B35">2013</xref>; Cueva et al., <xref ref-type="bibr" rid="B11">2020</xref>; MacDowell and Buschman, <xref ref-type="bibr" rid="B32">2020</xref>; Zhao and Park, <xref ref-type="bibr" rid="B49">2020</xref>; Flesch et al., <xref ref-type="bibr" rid="B15">2021</xref>). This is not to say that most standard neuroscience inspired tasks can be solved with such a low dimensional network. However, demonstrating that common dynamical features in neuroscience can arise in low dimensions can aid in one&#x00027;s ability to comment on attributes of large networks. These attributes include features such as sparsity of synaptic connections. For example, spiking models exhibiting sparse connectivity have been shown to perform comparatively with fully connected RNNs (Bellec et al., <xref ref-type="bibr" rid="B3">2018</xref>). Additionally, pruning (i.e., removing) substantial percentages of synaptic connections in a trained RNN is known to often result in little to no drop in the network&#x00027;s performance on the task it was trained on (Frankle and Carbin, <xref ref-type="bibr" rid="B16">2019</xref>). This suggests two more examinable properties of large networks. The first is redundancy or multiple realizations of the dynamical mechanisms needed to enact a computation existing within the same network. For example, if only one limit cycle is sufficient to accurately perform a desired task, a trained network may exhibit multiple limit cycles, each qualitatively acting identically toward the overall computation. The second is the robustness of each topological structure to synaptic perturbation/pruning. For example, if we have some dynamical structure, say a limit cycle, how much can we move around in parameter space while still maintaining the existence of that structure?</p>
<p>In a related light, the GRU architecture has been used within more complex machine learning setups to interpret the real-time dynamics of neural recordings (Pandarinath et al., <xref ref-type="bibr" rid="B38">2018</xref>; Willett et al., <xref ref-type="bibr" rid="B46">2021</xref>). These tools allow researchers to better understand and study the differences between neural responses, trial to trial. Knowledge of the inner workings and expressive power of GRU networks can only further our understanding of the limitations and optimization of such setups by the same line of reasoning previously stated, thereby helping to advance this class of technologies, aiding the field of neuroscience as a whole.</p>
<p>The most compared RNN architecture to the GRU is LSTM, as GRU was designed as both a model and computational simplification of this preexisting design in discrete time implementation. LSTM, for a significant period of time, was arguably the most popular discrete time RNN architecture, outperforming other models of the time on many benchmark tasks. However, there is one caveat when comparing the continuous time implementations of LSTM and GRU. A one dimensional LSTM (i.e., a single LSTM unit) is a two dimensional dynamical system, as information is stored in both the system&#x00027;s hidden state and cell state (Hochreiter and Schmidhuber, <xref ref-type="bibr" rid="B23">1997</xref>). With the choice of analysis we use to dissect the GRU in this paper, LSTM is a vastly different class of system. We would expect to see a different and more limited array of dynamics for an LSTM unit when compared with the 2D GRU. However, we wouldn&#x00027;t consider this a fair comparison.</p>
<p>One attribute of the GRU architecture we chose to disregard in this manuscript was the influence of the update gate <bold>z</bold>(<italic>t</italic>). As stated in section 2, every element of this gate is bound to (0, 1)<sup><italic>d</italic></sup>. Since Equation (7) only has one term containing the update gate, [1 &#x02212; <bold>z</bold>(<italic>t</italic>)], which can be factored out, the fixed point topology does not depend on <bold>z</bold>(<italic>t</italic>), as this term is always strictly positive. The role this gate plays is to adjust the point-wise speed of flow, and therefore can bring rise to slow manifolds. Because each element of <bold>z</bold>(<italic>t</italic>) can become arbitrarily close to the value of one, regions of phase space associated with an element of the update-gate sufficiently close to one will experience seemingly no motion in the directions associated with those elements. For example, in the 2D GRU system, if the first element of <bold>z</bold>(<italic>t</italic>) is sufficiently close to one, the trajectory will maintain a near fixed value in <italic>x</italic>. These slow points are not actual fixed points. Therefore, in the autonomous system, trajectories traversing them will eventually overcome this stoppage given sufficient time. However, this may add one complicating factor for analyzing implemented continuous time GRUs in practice. The use of finite precision allows for the flow speed to dip below machine precision, essentially creating <italic>pseudo-attractors</italic> in these regions. The areas of phase space containing these points will qualitatively behave as attracting sets, but not by traditional dynamical systems terms, making them more difficult to analyze. If needed, we recommend looking at <bold>z</bold>(<italic>t</italic>) separately, because this term acts independently from the remaining terms in the continuous time system. Therefore, any slow points found can be superimposed with the traditional fixed points in phase space. In order to avoid the effects of finite precision all together, the system can be realized through a hardware implementation (Jordan and Park, <xref ref-type="bibr" rid="B26">2020</xref>). However, proper care needs to be given in order to mitigate analog imperfections.</p>
<p>Unlike the update gate, we demonstrated that the reset gate <bold>r</bold>(<italic>t</italic>) affects the network&#x00027;s fixed point topology, allowing for more complicated classes of dynamics, including homoclinic-like orbits. These effects are best described through the shape of the nullclines. We will keep things qualitative here as to help build intuition. In 2D, if every element of the reset gate weight matrix <bold>U</bold><sub><italic>r</italic></sub> and bias <bold>b</bold><sub><italic>r</italic></sub> is zero, nullclines can form two shapes. First is a <italic>sigmoid-like</italic> shape (<xref ref-type="fig" rid="F5">Figures 5A</xref>, <xref ref-type="fig" rid="F10">10</xref>, <xref ref-type="fig" rid="F11">11</xref>; inferred limit cycle and line attractor), allowing them to intersect a line (or hyperplane in higher dimensions) orthogonal to their associated dimension a single time. The second is an <italic>s-like</italic> shape (<xref ref-type="fig" rid="F5">Figures 5B,C</xref>, <xref ref-type="fig" rid="F7">7</xref>, <xref ref-type="fig" rid="F11">11</xref>; limit cycle), allowing them to intersect a line orthogonal to their associated dimension up to three times. The peak and trough of the s-like shape can be stretched infinitely as well (<xref ref-type="fig" rid="F2">Figure 2A</xref>). In this case, two fo the three resultant seemingly disconnected nullclines associated with a given dimension can be placed arbitrarily close together (<xref ref-type="fig" rid="F3">Figure 3B</xref>). Varying <bold>r</bold>(<italic>t</italic>) allows the geometry of the nullclines to take on several additional shapes. The first of these additional structures is a <italic>pitchfork-like</italic> shape (<xref ref-type="fig" rid="F3">Figures 3A,C</xref>, <xref ref-type="fig" rid="F9">9</xref>). By disconnecting two of the <italic>prongs</italic> from the pitchfork we get our second structure, simultaneously exhibiting a sigmoid-like shape and a <italic>U-like</italic> shape (<xref ref-type="fig" rid="F3">Figure 3C</xref>). Bending the ends of the &#x0201C;U&#x0201D; at infinity down into &#x0211D;<sup>2</sup> connects them, forming our third structure, an <italic>O-like</italic> shape (<xref ref-type="fig" rid="F3">Figure 3</xref> periments; inferred ring attractor&#x02013;orange nullcline). This O-like shape can then also intersect the additional segment of the nullcline, creating one continuous curve (<xref ref-type="fig" rid="F3">Figure 3</xref> periments; inferred ring attractor&#x02013;pink nullcline). One consequence of the reset-gate is the additional capacity to encode information in the form of stable fixed points. If we neglect <bold>r</bold>(<italic>t</italic>), we can obtain up to four sinks (<xref ref-type="fig" rid="F2">Figure 2A</xref>), as we are limited to the intersections of the nullclines; two sets of three parallel lines. Incorporating <bold>r</bold>(<italic>t</italic>) increases the number of fixed points obtainable (<xref ref-type="fig" rid="F3">Figure 3A</xref>). Refer to section 3 of the <xref ref-type="supplementary-material" rid="SM1">Supplementary Material</xref> to see how these nullcline structures lead to a vast array of different fixed point topologies.</p>
<p>Several interesting extensions to this work immediately come to mind. For one, the extension to a 3D continuous time GRU network opens up the door for the possibility of more complex dynamical features. Three spatial dimensions are the minimum required to experience chaotic dynamics in nonlinear systems (Meiss, <xref ref-type="bibr" rid="B36">2007</xref>), and due to the vast size of the GRU parameter space, even in low dimensions, such behavior is probable. Similarly, additional types of bifurcations may be present, including bifurcations of limit cycles, allowing for more complex oscillatory behavior (Kuznetsov, <xref ref-type="bibr" rid="B29">1998</xref>). Furthermore, higher dimensional GRUs may bring rise to complex center manifolds, requiring center manifold reduction to better analyze and interpret the phase space dynamics (Carr, <xref ref-type="bibr" rid="B5">1981</xref>). While we considered the underlying GRU topology separate from training, considering how the attractor structure influences learning can bring insight into successfully implementing RNN models (Sok&#x000F3;&#x00142; et al., <xref ref-type="bibr" rid="B41">2019</xref>). As of yet, this topic of research is mostly uncharted. We believe such findings, along with the work presented in this manuscript, will unlock new avenues of research on the trainability of recurrent neural networks and help to further understand their mathematical parallels with biological neural networks.</p>
</sec>
<sec sec-type="data-availability-statement" id="s7">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="SM1">Supplementary Material</xref>, further inquiries can be directed to the corresponding author/s.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>IJ performed the analysis. IJ, PS, and IP wrote the manuscript. PS performed the numerical experiments. IP conceived the idea, advised, and edited the manuscript. All authors have read and approved the final manuscript.</p>

</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack><p>We thank Josue Nassar, Brian O&#x00027;Donnell, David Sussillo, Aminur Rahman, Denis Blackmore, Braden Brinkman, Yuan Zhao, and D.S for helpful feedback and conversations regarding the analysis and writing of this manuscript.</p>
</ack>
<sec sec-type="supplementary-material" id="s9">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fncom.2021.678158/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fncom.2021.678158/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beer</surname> <given-names>R. D.</given-names></name></person-group> (<year>1995</year>). <article-title>On the dynamics of small continuous-time recurrent neural networks</article-title>. <source>Adapt. Behav</source>. <volume>3</volume>, <fpage>469</fpage>&#x02013;<lpage>509</lpage>. <pub-id pub-id-type="doi">10.1177/105971239500300405</pub-id><pub-id pub-id-type="pmid">17052157</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beer</surname> <given-names>R. D.</given-names></name></person-group> (<year>2006</year>). <article-title>Parameter space structure of continuous-time recurrent neural networks</article-title>. <source>Neural Comput</source>. <volume>18</volume>, <fpage>3009</fpage>&#x02013;<lpage>3051</lpage>. <pub-id pub-id-type="doi">10.1162/neco.2006.18.12.3009</pub-id><pub-id pub-id-type="pmid">17052157</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bellec</surname> <given-names>G.</given-names></name> <name><surname>Salaj</surname> <given-names>D.</given-names></name> <name><surname>Subramoney</surname> <given-names>A.</given-names></name> <name><surname>Legenstein</surname> <given-names>R.</given-names></name> <name><surname>Maass</surname> <given-names>W.</given-names></name></person-group> (<year>2018</year>). <article-title>Long short-term memory and learning-to-learn in networks of spiking neurons</article-title>. <source>arXiv:1803.09574 [cs, q-bio</source>]. <italic>arXiv: 1803.09574</italic>.</citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Simard</surname> <given-names>P.</given-names></name> <name><surname>Frasconi</surname> <given-names>P.</given-names></name></person-group> (<year>1994</year>). <article-title>Learning long-term dependencies with gradient descent is difficult</article-title>. <source>IEEE Trans. Neural Netw</source>. <volume>5</volume>, <fpage>157</fpage>&#x02013;<lpage>166</lpage>. <pub-id pub-id-type="doi">10.1109/72.279181</pub-id><pub-id pub-id-type="pmid">18267787</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Carr</surname> <given-names>J.</given-names></name></person-group> (<year>1981</year>). <source>Applications of Centre Manifold Theory, 1982nd Edn</source>. <publisher-loc>New York, NY;Heidelberg; Berlin</publisher-loc>: <publisher-name>Springe</publisher-name>.</citation></ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>R. T. Q.</given-names></name> <name><surname>Rubanova</surname> <given-names>Y.</given-names></name> <name><surname>Bettencourt</surname> <given-names>J.</given-names></name> <name><surname>Duvenaud</surname> <given-names>D. K.</given-names></name></person-group> (<year>2018</year>). <article-title>Neural ordinary differential equations</article-title>, in <source>Advances in Neural Information Processing Systems, Vol. 31</source>, eds S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (<publisher-loc>Montreal, QC: Curran Associates, Inc.</publisher-loc>).</citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cho</surname> <given-names>K.</given-names></name> <name><surname>van Merrienboer</surname> <given-names>B.</given-names></name> <name><surname>Gulcehre</surname> <given-names>C.</given-names></name> <name><surname>Bahdanau</surname> <given-names>D.</given-names></name> <name><surname>Bougares</surname> <given-names>F.</given-names></name> <name><surname>Schwenk</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>learning phrase representations using RNN encoder-decoder for statistical machine translation</article-title>. <source>arXiv:1406.1078 [cs, stat</source>]. <italic>arXiv: 1406.1078</italic>. <pub-id pub-id-type="doi">10.3115/v1/D14-1179</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Choi</surname> <given-names>K.</given-names></name> <name><surname>Fazekas</surname> <given-names>G.</given-names></name> <name><surname>Sandler</surname> <given-names>M.</given-names></name> <name><surname>Cho</surname> <given-names>K.</given-names></name></person-group> (<year>2017</year>). <article-title>Convolutional recurrent neural networks for music classification</article-title>, in <source>2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source> (<publisher-loc>New Orleans, LA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2392</fpage>&#x02013;<lpage>2396</lpage>.</citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Churchland</surname> <given-names>M. M.</given-names></name> <name><surname>Cunningham</surname> <given-names>J. P.</given-names></name></person-group> (<year>2014</year>). <article-title>A dynamical basis set for generating reaches</article-title>. <source>Cold Spring Harb. Symp. Quant. Biol</source>. <volume>79</volume>, <fpage>67</fpage>&#x02013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1101/sqb.2014.79.024703</pub-id><pub-id pub-id-type="pmid">25851506</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Costa</surname> <given-names>R.</given-names></name> <name><surname>Assael</surname> <given-names>I. A.</given-names></name> <name><surname>Shillingford</surname> <given-names>B.</given-names></name> <name><surname>de Freitas</surname> <given-names>N.</given-names></name> <name><surname>Vogels</surname> <given-names>T.</given-names></name></person-group> (<year>2017</year>). <article-title>Cortical microcircuits as gated-recurrent neural networks</article-title>, in <source>Advances in Neural Information Processing Systems 30</source>, eds I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. and Garnett (<publisher-loc>Long Beach, CA: Curran Associates, Inc.</publisher-loc>), <fpage>272</fpage>&#x02013;<lpage>283</lpage>.</citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cueva</surname> <given-names>C. J.</given-names></name> <name><surname>Saez</surname> <given-names>A.</given-names></name> <name><surname>Marcos</surname> <given-names>E.</given-names></name> <name><surname>Genovesio</surname> <given-names>A.</given-names></name> <name><surname>Jazayeri</surname> <given-names>M.</given-names></name> <name><surname>Romo</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Low-dimensional dynamics for working memory and time encoding</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>117</volume>, <fpage>23021</fpage>&#x02013;<lpage>23032</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1915984117</pub-id><pub-id pub-id-type="pmid">32859756</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doya</surname> <given-names>K.</given-names></name></person-group> (<year>1993</year>). <article-title>Bifurcations of recurrent neural networks in gradient descent learning</article-title>. <source>IEEE Trans. Neural Netw</source>. <volume>1</volume>, <fpage>75</fpage>&#x02013;<lpage>80</lpage>.</citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dwibedi</surname> <given-names>D.</given-names></name> <name><surname>Sermanet</surname> <given-names>P.</given-names></name> <name><surname>Tompson</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Temporal reasoning in videos using convolutional gated recurrent units</article-title>, in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>).</citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>FitzHugh</surname> <given-names>R.</given-names></name></person-group> (<year>1961</year>). <article-title>Impulses and physiological states in theoretical models of nerve membrane</article-title>. <source>Biophys. J</source>. <volume>1</volume>, <fpage>445</fpage>&#x02013;<lpage>466</lpage>. <pub-id pub-id-type="doi">10.1016/S0006-3495(61)86902-6</pub-id><pub-id pub-id-type="pmid">19431309</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Flesch</surname> <given-names>T.</given-names></name> <name><surname>Juechems</surname> <given-names>K.</given-names></name> <name><surname>Dumbalska</surname> <given-names>T.</given-names></name> <name><surname>Saxe</surname> <given-names>A.</given-names></name> <name><surname>Summerfield</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <source>Rich and Lazy Learning of Task Representations in Brains and Neural Networks. bioRxiv, 2021.04.23.441128</source>. <publisher-loc>Cold Spring Harbor Laboratory Section</publisher-loc>: <publisher-name>New Results</publisher-name>.</citation></ref>
<ref id="B16">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Frankle</surname> <given-names>J.</given-names></name> <name><surname>Carbin</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>The lottery ticket hypothesis: finding sparse, trainable neural networks</article-title>, in <source>International Conference on Learning Representations</source> (New Orleans, LA). Available online at: <ext-link ext-link-type="uri" xlink:href="https://openreview.net/forum?id=rJl-b3RcF7">https://openreview.net/forum?id=rJl-b3RcF7</ext-link></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Funahashi</surname> <given-names>K. -I.</given-names></name> <name><surname>Nakamura</surname> <given-names>Y.</given-names></name></person-group> (<year>1993</year>). <article-title>Approximation of dynamical systems by continuous time recurrent neural networks</article-title>. <source>Neural Netw</source>. <volume>6</volume>, <fpage>801</fpage>&#x02013;<lpage>806</lpage>. <pub-id pub-id-type="doi">10.1016/S0893-6080(05)80125-X</pub-id><pub-id pub-id-type="pmid">12487801</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harvey</surname> <given-names>C. D.</given-names></name> <name><surname>Coen</surname> <given-names>P.</given-names></name> <name><surname>Tank</surname> <given-names>D. W.</given-names></name></person-group> (<year>2012</year>). <article-title>Choice-specific sequences in parietal cortex during a virtual-navigation decision task</article-title>. <source>Nature</source> <volume>484</volume>, <fpage>62</fpage>&#x02013;<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1038/nature10918</pub-id><pub-id pub-id-type="pmid">22419153</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Deep residual learning for image recognition</article-title>, in <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>Las Vegas, NV</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>770</fpage>&#x02013;<lpage>778</lpage>.<pub-id pub-id-type="pmid">32166560</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Heath</surname> <given-names>M. T.</given-names></name></person-group> (<year>2018</year>). <article-title>Scientific computing: an introductory survey, revised second edition</article-title>, in <source>SIAM-Society for Industrial and Applied Mathematics, Philadelphia, 2nd Edn</source> (<publisher-loc>New York, NY</publisher-loc>).</citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heeger</surname> <given-names>D. J.</given-names></name> <name><surname>Mackey</surname> <given-names>W. E.</given-names></name></person-group> (<year>2019</year>). <article-title>Oscillatory recurrent gated neural integrator circuits (ORGaNICs), a unifying theoretical framework for neural dynamics</article-title>. <source>Proc. Natl. Acad. Sci</source>. <volume>116</volume>, <fpage>22783</fpage>&#x02013;<lpage>22794</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1911633116</pub-id><pub-id pub-id-type="pmid">31907306</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="thesis"><person-group person-group-type="author"><name><surname>Hochreiter</surname> <given-names>S.</given-names></name></person-group> (<year>1991</year>). <source>Untersuchungen zu Dynamischen Neuronalen Netzen</source> (<publisher-loc>Ph.D. thesis</publisher-loc>), TU Munich. Advisor J. Schmidhuber.</citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hochreiter</surname> <given-names>S.</given-names></name> <name><surname>Schmidhuber</surname> <given-names>J.</given-names></name></person-group> (<year>1997</year>). <article-title>Long short-term memory</article-title>. <source>Neural Comput</source>. <volume>9</volume>, <fpage>1735</fpage>&#x02013;<lpage>1780</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hodgkin</surname> <given-names>A. L.</given-names></name> <name><surname>Huxley</surname> <given-names>A. F.</given-names></name></person-group> (<year>1952</year>). <article-title>A quantitative description of membrane current and its application to conduction and excitation in nerve</article-title>. <source>J. Physiol</source>. <volume>117</volume>, <fpage>500</fpage>&#x02013;<lpage>544</lpage>. <pub-id pub-id-type="doi">10.1113/jphysiol.1952.sp004764</pub-id><pub-id pub-id-type="pmid">2185861</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Izhikevich</surname> <given-names>E. M.</given-names></name></person-group> (<year>2007</year>). <source>Dynamical Systems in Neuroscience</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jordan</surname> <given-names>I. D.</given-names></name> <name><surname>Park</surname> <given-names>I. M.</given-names></name></person-group> (<year>2020</year>). <article-title>Birhythmic analog circuit maze: A nonlinear neurostimulation testbed</article-title>. <source>Entropy</source> <volume>22</volume>:<fpage>537</fpage>. <pub-id pub-id-type="doi">10.3390/e22050537</pub-id><pub-id pub-id-type="pmid">33286310</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>S. S.</given-names></name> <name><surname>Rouault</surname> <given-names>H.</given-names></name> <name><surname>Druckmann</surname> <given-names>S.</given-names></name> <name><surname>Jayaraman</surname> <given-names>V.</given-names></name></person-group> (<year>2017</year>). <article-title>Ring attractor dynamics in the <italic>Drosophila</italic> central brain</article-title>. <source>Science</source> <volume>356</volume>, <fpage>849</fpage>&#x02013;<lpage>853</lpage>.<pub-id pub-id-type="doi">10.1126/science.aal4835</pub-id><pub-id pub-id-type="pmid">28473639</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs]. arXiv: 1412.6980.</citation></ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kuznetsov</surname> <given-names>Y. A.</given-names></name></person-group> (<year>1998</year>). <source>Elements of Applied Bifurcation Theory 2nd Edn</source>. <publisher-loc>Berlin; Heidelberg</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>.</citation></ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Laurent</surname> <given-names>T.</given-names></name> <name><surname>von Brecht</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>A recurrent neural network without chaos</article-title>, in <source>5th International Conference on Learning Representations, ICLR 2017</source> (<publisher-loc>Toulon</publisher-loc>).</citation></ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>LeVeque</surname> <given-names>R. J.</given-names></name> <name><surname>Leveque</surname> <given-names>R.</given-names></name></person-group> (<year>1992</year>). <source>Numerical Methods for Conservation Laws, 2nd Edn</source>. <publisher-loc>Basel; Boston, MA</publisher-loc>: <publisher-name>Birkh&#x000E4;user</publisher-name>.</citation></ref>
<ref id="B32">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>MacDowell</surname> <given-names>C. J.</given-names></name> <name><surname>Buschman</surname> <given-names>T. J.</given-names></name></person-group> (<year>2020</year>). <article-title>Low-dimensional spatiotemporal dynamics underlie cortex-wide neural activity</article-title>. <source>Curr. Biol</source>. 30, 2665.e8&#x02013;2680.e8. <pub-id pub-id-type="doi">10.1016/j.cub.2020.04.090</pub-id><pub-id pub-id-type="pmid">32470366</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Maheswaranathan</surname> <given-names>N.</given-names></name> <name><surname>Williams</surname> <given-names>A.</given-names></name> <name><surname>Golub</surname> <given-names>M.</given-names></name> <name><surname>Ganguli</surname> <given-names>S.</given-names></name> <name><surname>Sussillo</surname> <given-names>D.</given-names></name></person-group> (<year>2019a</year>). <article-title>Universality and individuality in neural dynamics across large populations of recurrent networks</article-title>, in <source>Advances in Neural Information Processing Systems, Vol. 32</source>, eds H. Wallach, H. Larochelle, A. Beygelzimer, F. d&#x00027; Alch&#x000E9;-Buc, E. Fox and R. Garnett (Vancouver, BC: Curran Associates, Inc.).<pub-id pub-id-type="pmid">32782422</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Maheswaranathan</surname> <given-names>N.</given-names></name> <name><surname>Williams</surname> <given-names>A.</given-names></name> <name><surname>Golub</surname> <given-names>M. D.</given-names></name> <name><surname>Ganguli</surname> <given-names>S.</given-names></name> <name><surname>Sussillo</surname> <given-names>D.</given-names></name></person-group> (<year>2019b</year>). <article-title>Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics</article-title>. <source>arXiv:1906.10720 [cs, stat</source>]. <italic>arXiv: 1906.10720</italic>.<pub-id pub-id-type="pmid">32782423</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mante</surname> <given-names>V.</given-names></name> <name><surname>Sussillo</surname> <given-names>D.</given-names></name> <name><surname>Shenoy</surname> <given-names>K.</given-names></name> <name><surname>Newsome</surname> <given-names>W.</given-names></name></person-group> (<year>2013</year>). <article-title>Context-dependent computation by recurrent dynamics in prefrontal cortex</article-title>. <source>Nature</source> <volume>503</volume>, <fpage>78</fpage>&#x02013;<lpage>84</lpage>. <pub-id pub-id-type="doi">10.1038/nature12742</pub-id><pub-id pub-id-type="pmid">24201281</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Meiss</surname> <given-names>J.</given-names></name></person-group> (<year>2007</year>). <article-title><italic>Differential Dynamical Systems</italic>. Mathematical Modeling and Computation</article-title>. <source>Society for Industrial and Applied Mathematics</source>. <publisher-loc>Boulder, CO</publisher-loc>: <publisher-name>University of Colorado</publisher-name>.</citation></ref>
<ref id="B37">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Morrill</surname> <given-names>J.</given-names></name> <name><surname>Salvi</surname> <given-names>C.</given-names></name> <name><surname>Kidger</surname> <given-names>P.</given-names></name> <name><surname>Foster</surname> <given-names>J.</given-names></name> <name><surname>Lyons</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>Neural rough differential equations for long time series</article-title>. <source>arXiv:2009.08295 [cs, math, stat</source>]. <italic>arXiv: 2009.08295</italic>.</citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pandarinath</surname> <given-names>C.</given-names></name> <name><surname>O&#x00027;Shea</surname> <given-names>D. J.</given-names></name> <name><surname>Collins</surname> <given-names>J.</given-names></name> <name><surname>Jozefowicz</surname> <given-names>R.</given-names></name> <name><surname>Stavisky</surname> <given-names>S. D.</given-names></name> <name><surname>Kao</surname> <given-names>J. C.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Inferring single-trial neural population dynamics using sequential auto-encoders</article-title>. <source>Nat. Methods</source> <volume>15</volume>, <fpage>805</fpage>&#x02013;<lpage>815</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-018-0109-9</pub-id><pub-id pub-id-type="pmid">30224673</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pasemann</surname> <given-names>F.</given-names></name></person-group> (<year>1997</year>). <article-title>A simple chaotic neuron</article-title>. <source>Phys. D Nonlinear Phenomena</source> <volume>104</volume>, <fpage>205</fpage>&#x02013;<lpage>211</lpage>. <pub-id pub-id-type="doi">10.1016/S0167-2789(96)00239-4</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Prabhavalkar</surname> <given-names>R.</given-names></name> <name><surname>Rao</surname> <given-names>K.</given-names></name> <name><surname>Sainath</surname> <given-names>T. N.</given-names></name> <name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Johnson</surname> <given-names>L.</given-names></name> <name><surname>Jaitly</surname> <given-names>N.</given-names></name></person-group> (<year>2017</year>). <article-title>A comparison of sequence-to-sequence models for speech recognition</article-title>, in <source>Interspeech 2017</source> (<publisher-loc>ISCA</publisher-loc>) (Stockholm), <fpage>939</fpage>&#x02013;<lpage>943</lpage>.</citation></ref>
<ref id="B41">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sok&#x000F3;&#x00142;</surname> <given-names>P. A.</given-names></name> <name><surname>Jordan</surname> <given-names>I.</given-names></name> <name><surname>Kadile</surname> <given-names>E.</given-names></name> <name><surname>Park</surname> <given-names>I. M.</given-names></name></person-group> (<year>2019</year>). <article-title>Adjoint dynamics of stable limit cycle neural networks</article-title>, in <source>2019 53rd Asilomar Conference on Signals, Systems, and Computers</source> (<publisher-loc>Pacific Grove, CA</publisher-loc>), <fpage>884</fpage>&#x02013;<lpage>887</lpage>.</citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sussillo</surname> <given-names>D.</given-names></name> <name><surname>Barak</surname> <given-names>O.</given-names></name></person-group> (<year>2012</year>). <article-title>Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks</article-title>. <source>Neural Comput</source>. <volume>25</volume>, <fpage>626</fpage>&#x02013;<lpage>649</lpage>. <pub-id pub-id-type="doi">10.1162/NECO_a_00409</pub-id><pub-id pub-id-type="pmid">23272922</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sussillo</surname> <given-names>D.</given-names></name> <name><surname>Churchland</surname> <given-names>M.</given-names></name> <name><surname>Kaufman</surname> <given-names>M. T.</given-names></name> <name><surname>Shenoy</surname> <given-names>K.</given-names></name></person-group> (<year>2015</year>). <article-title>A neural network that finds a naturalistic solution for the production of muscle activity</article-title>. <source>Nat. Neurosci</source>. <volume>18</volume>:<fpage>1025</fpage>&#x02013;<lpage>1033</lpage>. <pub-id pub-id-type="doi">10.1038/nn.4042</pub-id><pub-id pub-id-type="pmid">26075643</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Thomas</surname> <given-names>J. W.</given-names></name></person-group> (<year>1995</year>). <source>Numerical Partial Differential Equations: Finite Difference Methods, 1st Edn</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation></ref>
<ref id="B45">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Weiss</surname> <given-names>G.</given-names></name> <name><surname>Goldberg</surname> <given-names>Y.</given-names></name> <name><surname>Yahav</surname> <given-names>E.</given-names></name></person-group> (<year>2018</year>). <article-title>On the practical computational power of finite precision RNNs for language recognition</article-title>. <source>arXiv:1805.04908 [cs, stat</source>]. <italic>arXiv: 1805.04908</italic>. <pub-id pub-id-type="doi">10.18653/v1/P18-2117</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Willett</surname> <given-names>F. R.</given-names></name> <name><surname>Avansino</surname> <given-names>D. T.</given-names></name> <name><surname>Hochberg</surname> <given-names>L. R.</given-names></name> <name><surname>Henderson</surname> <given-names>J. M.</given-names></name> <name><surname>Shenoy</surname> <given-names>K. V.</given-names></name></person-group> (<year>2021</year>). <article-title>High-performance brain-to-text communication via handwriting</article-title>. <source>Nature</source> <volume>593</volume>, <fpage>249</fpage>&#x02013;<lpage>254</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-021-03506-2</pub-id><pub-id pub-id-type="pmid">33981047</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wong</surname> <given-names>K.-F.</given-names></name> <name><surname>Wang</surname> <given-names>X.-J.</given-names></name></person-group> (<year>2006</year>). <article-title>A recurrent network mechanism of time integration in perceptual decisions</article-title>. <source>J. Neurosci</source>. <volume>26</volume>, <fpage>1314</fpage>&#x02013;<lpage>1328</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.3733-05.2006</pub-id><pub-id pub-id-type="pmid">16436619</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>Y.</given-names></name> <name><surname>Park</surname> <given-names>I. M.</given-names></name></person-group> (<year>2016</year>). <article-title>Interpretable nonlinear dynamic modeling of neural trajectories</article-title>, in <source>Advances in Neural Information Processing Systems (NIPS)</source> (<publisher-loc>Barcelona</publisher-loc>).</citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>Y.</given-names></name> <name><surname>Park</surname> <given-names>I. M.</given-names></name></person-group> (<year>2020</year>). <article-title>Variational online learning of neural dynamics</article-title>. <source>Front. Comput. Neurosci</source>. <volume>14</volume>:<fpage>71</fpage>. <pub-id pub-id-type="doi">10.3389/fncom.2020.00071</pub-id><pub-id pub-id-type="pmid">33154718</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>The number/dimension of GRUs references to the dimension of the hidden state dynamics.</p></fn>
<fn id="fn0002"><p><sup>2</sup>2D GRUs feature both codimension-1 and pseudo-codimension-2 bifurcation fixed points. In codimension-1, we have the saddle-node bifurcation fixed point, as expected from its existence in the 1D GRU case. These can be thought of as both the fusion of a stable fixed point and a saddle point, and the fusion of an unstable fixed point and a saddle point. We will refer to these fixed points as saddle-node bifurcation fixed points of the first kind and second kind, respectively.</p></fn>
</fn-group>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This work was supported by NIH EB-026946, and NSF IIS-1845836. IJ was supported partially by the Institute of Advanced Computational Science Jr. Researcher Fellowship, at Stony Brook University.</p>
</fn>
</fn-group>
</back>
</article> 