<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Astron. Space Sci.</journal-id>
<journal-title>Frontiers in Astronomy and Space Sciences</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Astron. Space Sci.</abbrev-journal-title>
<issn pub-type="epub">2296-987X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">664483</article-id>
<article-id pub-id-type="doi">10.3389/fspas.2021.664483</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Astronomy and Space Sciences</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Exploring Three Recurrent Neural Network Architectures for Geomagnetic Predictions</article-title>
<alt-title alt-title-type="left-running-head">Wintoft and Wik</alt-title>
<alt-title alt-title-type="right-running-head">RNNs for Geomagnetic Predictions</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Wintoft</surname>
<given-names>Peter</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/840047/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wik</surname>
<given-names>Magnus</given-names>
</name>
</contrib>
</contrib-group>
<aff>Swedish Institute of Space Physics, <addr-line>Lund</addr-line>, <country>Sweden</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/727550/overview">Enrico Camporeale</ext-link>, University of Colorado Boulder, United&#x20;States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1226841/overview">Jannis Teunissen</ext-link>, Centrum Wiskunde and Informatica, Netherlands</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1227747/overview">Shiyong Huang</ext-link>, Wuhan University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Peter Wintoft, <email>peter@lund.irf.se</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Space Physics, a section of the journal Frontiers in Astronomy and Space&#x20;Sciences</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>12</day>
<month>05</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>8</volume>
<elocation-id>664483</elocation-id>
<history>
<date date-type="received">
<day>05</day>
<month>02</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>04</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Wintoft and Wik.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Wintoft and Wik</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Three different recurrent neural network (RNN) architectures are studied for the prediction of geomagnetic activity. The RNNs studied are the Elman, gated recurrent unit (GRU), and long short-term memory (LSTM). The RNNs take solar wind data as inputs to predict the Dst index. The Dst index summarizes complex geomagnetic processes into a single time series. The models are trained and tested using five-fold cross-validation based on the hourly resolution OMNI dataset using data from the years 1995&#x2013;2015. The inputs are solar wind plasma (particle density and speed), vector magnetic fields, time of year, and time of day. The RNNs are regularized using early stopping and dropout. We find that both the gated recurrent unit and long short-term memory models perform better than the Elman model; however, we see no significant difference in performance between GRU and LSTM. RNNs with dropout require more weights to reach the same validation error as networks without dropout. However, the gap between training error and validation error becomes smaller when dropout is applied, reducing over-fitting and improving generalization. Another advantage in using dropout is that it can be applied during prediction to provide confidence limits on the predictions. The confidence limits increase with increasing Dst magnitude: a consequence of the less populated input-target space for events with large Dst values, thereby increasing the uncertainty in the estimates. The best RNNs have test set RMSE of 8.8&#xa0;nT, bias close to zero, and linear correlation of&#x20;0.90.</p>
</abstract>
<kwd-group>
<kwd>space weather</kwd>
<kwd>recurrent neural net</kwd>
<kwd>cross-validation</kwd>
<kwd>solar wind&#x2013;magnetosphere&#x2013;ionosphere coupling</kwd>
<kwd>prediction</kwd>
<kwd>dropout</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>In this work we explore recurrent neural networks (RNNs) for the prediction of geomagnetic activity using solar wind data. An RNN can learn input&#x2013;output mappings that are temporally correlated. Many solar terrestrial relations exhibit such behavior that contains both directly driven processes and dynamic processes that depend on time. The geomagnetic <italic>Dst</italic> index has been addressed in numerous studies and serves as a parameter for general space weather summary and space weather models. The <italic>Dst</italic> index is derived from magnetic field measurements at four near-equatorial stations and primarily indicates the strength of the equatorial ring current and the magnetopause current (<xref ref-type="bibr" rid="B1">Mayaud, 1980</xref>). The <italic>Dst</italic> index has attained a lot of attention over the years, both for understanding solar terrestrial relations and for use in space weather.</p>
<p>An early attempt to predict the <italic>Dst</italic> index from the solar wind made use of a linear filter (<xref ref-type="bibr" rid="B2">Burton et&#x20;al., 1975</xref>) derived from the differential equation containing a source term (the solar wind driver) and a decay term. After removing the variation in <italic>Dst</italic> that is controlled by the solar wind dynamic pressure, one arrives at the pressure-corrected <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> index (<xref ref-type="bibr" rid="B3">O&#x2019;Brien and McPherron, 2000</xref>) which is modeled as<disp-formula id="e1">
<mml:math id="m2">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mi>&#x3c4;</mml:mi>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where <italic>Q</italic> is the source term that depends on the solar wind and <italic>t</italic> is the decay time of the ring current. The decay time <italic>t</italic> may be a constant, but it may also vary with the solar wind. (see, e.g., the AK1 (constant &#x3c4;) and AK2 (variable &#x3c4;) models in <xref ref-type="bibr" rid="B3">O&#x2019;Brien and McPherron (2000</xref>)). As the functional form of <italic>Q</italic> is not known, the equation is numerically solved by<disp-formula id="e2">
<mml:math id="m3">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x394;</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3c4;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>&#x394;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>Based on observed solar wind data, for hourly sampled data, the time step is <inline-formula id="inf2">
<mml:math id="m4">
<mml:mrow>
<mml:mi>&#x394;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hour. The source term <italic>Q</italic> is a nonlinear function of the solar wind parameters, and different forms have been suggested. The AK1 model defines the source term as<disp-formula id="e3">
<mml:math id="m5">
<mml:mrow>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>V</mml:mi>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mi>T</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where <inline-formula id="inf3">
<mml:math id="m6">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2.47</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> is a constant, <italic>V</italic> is the solar wind speed (km/s), and <inline-formula id="inf4">
<mml:math id="m7">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is<disp-formula id="e4">
<mml:math id="m8">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
<mml:mtext>&#x2009;&#x2009;&#x2009;&#x2009;nT</mml:mtext>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>
<inline-formula id="inf5">
<mml:math id="m9">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the vertical solar wind magnetic field component. Thus, as long as <inline-formula id="inf6">
<mml:math id="m10">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, the <italic>Dst</italic> index will be driven to increasing negative values; for example, if &#x3c4; is a constant and the solar wind conditions are constant with negative <inline-formula id="inf7">
<mml:math id="m11">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, then <inline-formula id="inf8">
<mml:math id="m12">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> will asymptotically approach <inline-formula id="inf9">
<mml:math id="m13">
<mml:mrow>
<mml:mi>Q</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. With <inline-formula id="inf10">
<mml:math id="m14">
<mml:mrow>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>17</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hours (AK1), <inline-formula id="inf11">
<mml:math id="m15">
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>600</mml:mn>
<mml:mtext>&#x2009;km/s</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf12">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>20</mml:mn>
<mml:mtext>&#x2009;nT</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> give <inline-formula id="inf13">
<mml:math id="m17">
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>12</mml:mn>
<mml:mtext>&#x2009;mV/m</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf14">
<mml:math id="m18">
<mml:mrow>
<mml:mi>Q</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>500</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>&#xa0;nT. The AK1 model has been further extended by letting &#x3c4; be a function of <italic>Dst</italic> and adding components for the diurnal and seasonal variation that are present in <italic>Dst</italic> (<xref ref-type="bibr" rid="B4">O&#x2019;Brien and McPherron, 2002</xref>).</p>
<p>The machine learning (ML) approach could be viewed as a set of more general algorithms that can model complex functions. The development of an ML model is more involved and time consuming. For the prediction of the <italic>Dst</italic> index, many ML methods have been applied, and we here list some examples using different approaches: neural network with input time delays (<xref ref-type="bibr" rid="B5">Lundstedt and Wintoft, 1994</xref>; <xref ref-type="bibr" rid="B6">Gleisner et&#x20;al., 1996</xref>; <xref ref-type="bibr" rid="B7">Watanabe et&#x20;al., 2002</xref>), recurrent neural network (<xref ref-type="bibr" rid="B8">Wu and Lundstedt, 1997</xref>; <xref ref-type="bibr" rid="B9">Lundstedt et&#x20;al., 2002</xref>; <xref ref-type="bibr" rid="B10">Pallocchia et&#x20;al., 2006</xref>; <xref ref-type="bibr" rid="B11">Gruet et&#x20;al., 2018</xref>), ARMA (<xref ref-type="bibr" rid="B12">Vassiliadis et&#x20;al., 1999</xref>), and NARMAX (<xref ref-type="bibr" rid="B13">Boaghe et&#x20;al., 2001</xref>; <xref ref-type="bibr" rid="B14">Boynton et&#x20;al., 2011</xref>).</p>
<p>An RNN models dynamical behavior through internal states so that the output depends on both the inputs and the internal state (see, e.g., <xref ref-type="bibr" rid="B15">Goodfellow et&#x20;al. (2016</xref>)) for an overview. Thus, structures that are temporally correlated can be modeled without explicitly parameterizing the temporal dependence; instead, the weights in the hidden layer that connects to the internal state units are adjusted during the training phase. An early RNN was the Elman network (<xref ref-type="bibr" rid="B16">Elman, 1990</xref>) which was applied to geomagnetic predictions (<xref ref-type="bibr" rid="B8">Wu and Lundstedt, 1997</xref>) and later implemented for real-time operation (<xref ref-type="bibr" rid="B9">Lundstedt et&#x20;al., 2002</xref>). The Elman RNN can model complex dynamical behavior; however, it was realized that it could be difficult to learn dynamics for systems with long-range memory (<xref ref-type="bibr" rid="B17">Bengio et&#x20;al., 1994</xref>). To overcome that limitation, other RNN architectures were suggested, such as the GRU (<xref ref-type="bibr" rid="B18">Cho et&#x20;al., 2014</xref>) and LSTM (<xref ref-type="bibr" rid="B19">Hochreiter and Schmidhuber, 1997</xref>). The LSTM has been applied to geomagnetic predictions of the <italic>Kp</italic> (<xref ref-type="bibr" rid="B20">Tan et&#x20;al., 2018</xref>) and <italic>Dst</italic> indices (<xref ref-type="bibr" rid="B11">Gruet et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B21">Xu et&#x20;al., 2020</xref>). It should be noted that Elman RNN is less complex and has the shortest training times of the three architectures and may be suited for certain problems, and that it is not clear whether there is a general advantage of using GRU or LSTM (<xref ref-type="bibr" rid="B22">Chung et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B15">Goodfellow et&#x20;al., 2016</xref>).</p>
<p>In this work, the main goal is to compare the three RNNs: Elman, GRU, and LSTM. The geomagnetic <italic>Dst</italic> index is chosen as target as it captures several interesting features of the geomagnetic storm with different temporal dynamics. The initial phase is marked by an increase in <italic>Dst</italic> caused by a directly driven pressure increase in the solar wind; the main phase is marked by a sudden decrease in <italic>Dst</italic> when solar wind energy enters the magnetosphere through mainly reconnection with southward <inline-formula id="inf15">
<mml:math id="m19">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and later, the storm enters the recovery phase when energy is dissipated by internal processes not related to the solar wind condition.</p>
<p>The inputs to the RNNs are solar wind, local time, and time of year. Specifically, past values of <italic>Dst</italic> are not used as inputs, although the autocorrelation is very strong (0.98). Clearly, all statistical measures of performance will improve for short lead time predictions when past values of <italic>Dst</italic> are used. However, as the solar wind controls the initial and main phases of the storm, the strong autocorrelation is mainly a result of quiet time variation and the relatively slow increase in <italic>Dst</italic> during the recovery phase. Another aspect is that for real-time predictions, the variable lead time given by the solar wind must be matched against available real-time <italic>Dst</italic> if it is used as input. Also, any errors in real-time <italic>Dst</italic> will affect the predictions, and as an example, during the period June&#x2013;September 2020, the real-time <italic>Dst</italic> was offset by about <inline-formula id="inf16">
<mml:math id="m20">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>30</mml:mn>
<mml:mtext>&#x2009;nT</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>. It is also interesting to note that in a recent <italic>Dst</italic> prediction competition<xref ref-type="fn" rid="fn1">
<sup>1</sup>
</xref> hosted by NOAA, it was stated that the models &#x201c;may not take Dst as an input.&#x201d;</p>
<p>As the idea here is to compare three RNN architectures that map from solar wind to <italic>Dst</italic>, the prediction lead time is not explored. The solar wind data used have been propagated to a location close to Earth, and no further lead time is added; thus, propagated solar wind at time <italic>t</italic> is mapped to <inline-formula id="inf17">
<mml:math id="m21">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Clearly, possibilities to increase the lead time are of great interest, and many attempts have been made with models driven by measured solar wind (e.g., <xref ref-type="bibr" rid="B11">Gruet et&#x20;al. (2018)</xref>; <xref ref-type="bibr" rid="B21">Xu et&#x20;al. (2020)</xref>). However, without any information other than L1 solar wind measurements, the initial phase cannot be predicted with any additional lead time, except that given by the L1-Earth solar wind propagation time, while the main phase may be predicted with possibly up to an additional hour due to magnetospheric processes. The effect of forcing models driven by measured solar wind to predict <italic>Kp</italic> and <italic>Dst</italic> with different lead times was studied in <xref ref-type="bibr" rid="B23">Wintoft and Wik (2018</xref>).</p>
</sec>
<sec id="s2">
<title>2 Models and Analysis</title>
<sec id="s2-1">
<title>2.1 Models</title>
<p>A neural network performs a sequence of transforms by multiplying its inputs with a set of coefficients (weights) and applying nonlinear functions to provide the output. It has been shown that the neural network can approximate any continuous function (<xref ref-type="bibr" rid="B24">Cybenko, 1989</xref>). For a supervised network, the weights are adjusted to produce a desired output, given the inputs, known as the training phase. The training phase requires known input and target values, a cost function, and an optimization algorithm that minimizes the&#x20;cost.</p>
<p>The Elman RNN was first developed for <italic>Dst</italic> predictions by <xref ref-type="bibr" rid="B8">Wu and Lundstedt (1997</xref>) and later implemented for real-time operation (<xref ref-type="bibr" rid="B9">Lundstedt et&#x20;al., 2002</xref>). In this work, we use the term Elman network, but it is the same as the simple RNN used in the Tensorflow package that we use (<xref ref-type="bibr" rid="B34">Abadi et&#x20;al., 2015</xref>). Using linear units at the output layer, the Elman network at time <italic>t</italic> is described by<disp-formula id="e5">
<mml:math id="m22">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>J</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">V</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mstyle>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
<disp-formula id="e6">
<mml:math id="m23">
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>I</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mstyle>
<mml:mo>&#x2b;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>J</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>f</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">W</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold">U</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>with the output layer bias <italic>b</italic>, <italic>J</italic> hidden weights <inline-formula id="inf18">
<mml:math id="m24">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, nonlinear activation function <italic>f</italic>, <italic>J</italic> hidden layer biases <inline-formula id="inf19">
<mml:math id="m25">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf20">
<mml:math id="m26">
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> weights <inline-formula id="inf21">
<mml:math id="m27">
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf22">
<mml:math id="m28">
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>J</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> recurrent weights <inline-formula id="inf23">
<mml:math id="m29">
<mml:mrow>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Note that we use superscripts <inline-formula id="inf24">
<mml:math id="m30">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext>&#xa0;and&#xa0;</mml:mtext>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> as indices, not powers. The equations can be written more condensed using weight matrices <inline-formula id="inf25">
<mml:math id="m31">
<mml:mi mathvariant="bold">W</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf26">
<mml:math id="m32">
<mml:mi mathvariant="bold">U</mml:mi>
</mml:math>
</inline-formula>, where the bias terms <inline-formula id="inf27">
<mml:math id="m33">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> have been collected into the matrices and increasing the lengths of <inline-formula id="inf28">
<mml:math id="m34">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf29">
<mml:math id="m35">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> by adding a constant set to one. For example, in <xref ref-type="bibr" rid="B9">Lundstedt et&#x20;al. (2002</xref>), there are <inline-formula id="inf30">
<mml:math id="m36">
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> inputs <inline-formula id="inf31">
<mml:math id="m37">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf32">
<mml:math id="m38">
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hidden&#x20;units.</p>
<p>A minimalistic Elman network can be constructed by using only one input unit and one linear unit in the hidden layer, thus <inline-formula id="inf33">
<mml:math id="m39">
<mml:mrow>
<mml:mi>b</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf34">
<mml:math id="m40">
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, leading to<disp-formula id="e7">
<mml:math id="m41">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>which after some rearranging can be written as<disp-formula id="e8">
<mml:math id="m42">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>which is identical to <xref ref-type="disp-formula" rid="e2">Eq. 2</xref> for <inline-formula id="inf35">
<mml:math id="m43">
<mml:mrow>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>const</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf36">
<mml:math id="m44">
<mml:mrow>
<mml:mi>&#x394;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, and by letting <inline-formula id="inf37">
<mml:math id="m45">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf38">
<mml:math id="m46">
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf39">
<mml:math id="m47">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. This simple network is trained using the pressure-corrected <italic>Dst</italic> index as the target. As the weights in the network are initiated with random values before training begins, there will be some variation in the final weight values if the training is repeated. We find typical values of <inline-formula id="inf40">
<mml:math id="m48">
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf41">
<mml:math id="m49">
<mml:mrow>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> corresponding to <inline-formula id="inf42">
<mml:math id="m50">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2.4</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2.7</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="disp-formula" rid="e3">Eq. 3</xref>) and <inline-formula id="inf43">
<mml:math id="m51">
<mml:mrow>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>14,16</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> hours, which are close to the values used by <xref ref-type="bibr" rid="B3">O&#x2019;Brien and McPherron (2000</xref>). However, the algorithm can get stuck in local minima that results in quite different values. We provide code on Github<xref ref-type="fn" rid="fn2">
<sup>2</sup>
</xref> for the minimalistic Elman network (see Model005.py).</p>
<p>The gated recurrent unit (GRU) neural network (<xref ref-type="bibr" rid="B18">Cho et&#x20;al., 2014</xref>) has a more complex architecture than the Elman network. We implement a single GRU layer, and the output from the network is as before, given by <inline-formula id="inf44">
<mml:math id="m52">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">V</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="disp-formula" rid="e5">Eq. 5</xref>). The GRU layer output at unit <italic>j</italic> is<disp-formula id="e9">
<mml:math id="m53">
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mi>z</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>z</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>h</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>where <inline-formula id="inf45">
<mml:math id="m54">
<mml:mrow>
<mml:msubsup>
<mml:mi>z</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the update gate and <inline-formula id="inf46">
<mml:math id="m55">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>h</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the candidate activation. The update gate is defined as<disp-formula id="e10">
<mml:math id="m56">
<mml:mrow>
<mml:msubsup>
<mml:mi>z</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">W</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold">U</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>where &#x3c3; is the sigmoid function with output range 0&#x2013;1. The weight matrix <inline-formula id="inf47">
<mml:math id="m57">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">W</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> operates on the input vector <inline-formula id="inf48">
<mml:math id="m58">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and the matrix <inline-formula id="inf49">
<mml:math id="m59">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">U</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> operates on the past activation <inline-formula id="inf50">
<mml:math id="m60">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. The candidate activation is defined as<disp-formula id="e11">
<mml:math id="m61">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>h</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>f</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">W</mml:mi>
<mml:msub>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold">U</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2299;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(11)</label>
</disp-formula>where <italic>f</italic> is a nonlinear function with two additional weight matrices <inline-formula id="inf51">
<mml:math id="m62">
<mml:mi mathvariant="bold">W</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf52">
<mml:math id="m63">
<mml:mi mathvariant="bold">U</mml:mi>
</mml:math>
</inline-formula>. The <inline-formula id="inf53">
<mml:math id="m64">
<mml:mi mathvariant="bold">U</mml:mi>
</mml:math>
</inline-formula> matrix operates on the past activation weighted by the reset gate<disp-formula id="e12">
<mml:math id="m65">
<mml:mrow>
<mml:msubsup>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">W</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold">U</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(12)</label>
</disp-formula>which has a further set of weights matrices <inline-formula id="inf54">
<mml:math id="m66">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">W</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf55">
<mml:math id="m67">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">U</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Clearly, the GRU network is more complex than the Elman network, and it has approximately 3&#x20;times more weights than the Elman network for the same number of units. As the update and reset gates have outputs between 0 and 1, we see that when both produce ones <inline-formula id="inf56">
<mml:math id="m68">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>z</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1,1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, the GRU network simplifies to the Elman network. On the other hand, when <inline-formula id="inf57">
<mml:math id="m69">
<mml:mrow>
<mml:msubsup>
<mml:mi>z</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, no information of the current input <inline-formula id="inf58">
<mml:math id="m70">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is used, only the past state <inline-formula id="inf59">
<mml:math id="m71">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Finally, when <inline-formula id="inf60">
<mml:math id="m72">
<mml:mrow>
<mml:msubsup>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, no information of past states goes through the candidate activation (<xref ref-type="disp-formula" rid="e11">Eq. 11</xref>); information on past states only goes through <xref ref-type="disp-formula" rid="e9">Eq. 9</xref> and is weighted by <inline-formula id="inf61">
<mml:math id="m73">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>z</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The long short-term memory (LSTM) neural network (<xref ref-type="bibr" rid="B19">Hochreiter and Schmidhuber, 1997</xref>) was introduced before GRU and has further complexity with the number of weights approximately four times that of the Elman network, given the same number of units. We will not repeat the equations here but instead refer to for example, <xref ref-type="bibr" rid="B22">Chung et&#x20;al. (2014</xref>). The LSTM has three gating functions, instead of GRU&#x2019;s two, that control the flow of information: the output gate, the forget gate, and the input gate. When they have values 1, 0, and 1, respectively, the LSTM simplifies to the Elman network.</p>
<p>Given a network with a sufficient number of weights, it can be trained to reach zero MSE; however, such a network will have poor generalizing capabilities; that is, it will have large errors on predictions on samples not included in the training data. Different strategies exist to prevent over-fitting (<xref ref-type="bibr" rid="B15">Goodfellow et&#x20;al., 2016</xref>). We apply early stopping and dropout (<xref ref-type="bibr" rid="B26">Srivastava et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B27">Gal and Ghahramani, 2016</xref>).</p>
<p>In order to make a robust estimation of the performance of the networks, we apply <italic>k</italic>-fold cross-validation (<xref ref-type="bibr" rid="B15">Goodfellow et&#x20;al., 2016</xref>). During a training session, one subset is held out for testing and the remaining <inline-formula id="inf62">
<mml:math id="m74">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> subsets are used for training and validation, and out of the <inline-formula id="inf63">
<mml:math id="m75">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> subsets, one is used for validation and the remaining <inline-formula id="inf64">
<mml:math id="m76">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> subsets for the training. During training, the validation mean squared error (MSE) is monitored, and the network with lowest validation MSE is chosen (early stopping). In practice, to know that the minimum validation MSE has been reached, the training is continued for a number of epochs after the lowest MSE has been reached. The final evaluation of the models is performed on the <italic>k</italic> different test sets (see <xref ref-type="sec" rid="s2-2">Section 2.2</xref> to know how the different sets are selected).</p>
</sec>
<sec id="s2-2">
<title>2.2 Data Sets</title>
<p>The hourly solar wind data and <italic>Dst</italic> index are obtained from the OMNI dataset (<xref ref-type="bibr" rid="B28">King and Papitashvili, 2005</xref>). The inputs are the solar wind magnetic field magnitude <italic>B</italic>, the <italic>y</italic>- and <italic>z</italic>-components <inline-formula id="inf65">
<mml:math id="m77">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>y</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in the geocentric solar magnetospheric (GSM) coordinate system, the particle density <italic>n</italic>, and speed <italic>V</italic>. To provide information on diurnal and seasonal variations (<xref ref-type="bibr" rid="B4">O&#x2019;Brien and McPherron, 2002</xref>), four additional variables are added: the day of year parameterized as sine and cosine of <inline-formula id="inf66">
<mml:math id="m78">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>&#x3c0;</mml:mi>
<mml:mtext>DOY</mml:mtext>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mn>365</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and local time as sine and cosine of <inline-formula id="inf67">
<mml:math id="m79">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>&#x3c0;</mml:mi>
<mml:mtext>UT</mml:mtext>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mn>24</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Thus, in total, nine input variables. Several previous geomagnetic prediction models also use diurnal and seasonal inputs (e.g., <xref ref-type="bibr" rid="B29">Temerin and Li (2006)</xref>; <xref ref-type="bibr" rid="B30">Wintoft et&#x20;al. (2017)</xref>; <xref ref-type="bibr" rid="B23">Wintoft and Wik, (2018)</xref>). Many different coupling functions (<italic>Q</italic>) for the dayside reconnection rate have been suggested and investigated (<xref ref-type="bibr" rid="B31">Borovsky and Birn, 2014</xref>), but as the neural network can approximate any function, the exact function does not have to be specified as long as the relevant inputs are available.</p>
<p>The target variable (<italic>Dst</italic>) depends on both the solar wind and past states of the system, where past states can be described by past values of <italic>Dst</italic> itself or by past values of the solar wind. We choose to only include solar wind, thereby not relying on past observed or predicted values of <italic>Dst</italic>. For the RNN training algorithm, the data are organized so that the past <italic>T</italic> solar wind observations are presented at each time step. The input data are thus collected into a <inline-formula id="inf68">
<mml:math id="m80">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>9</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> tensor, and the target data have <inline-formula id="inf69">
<mml:math id="m81">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> dimension, where <italic>N</italic> is the number of samples in the set. The input history should be long enough to capture typical storm&#x20;dynamics, and we found that validation errors leveled out at <inline-formula id="inf70">
<mml:math id="m82">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>120</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hours (see also the results regarding T in <xref ref-type="sec" rid="s2-3">Sections 2.3</xref> and <xref ref-type="sec" rid="s2-4">2.4</xref>).</p>
<p>To implement the <italic>k</italic>-fold cross-validation (CV), the dataset must be partitioned into subsets; we perform a five-fold CV. We choose the five sets to each have similar target (<italic>Dst</italic>) mean and standard deviation so that training, validation, and testing are based on comparable data. If a more blind approach were done, then there is a high risk that training is performed on data dominated by storms, while testing is performed on more quiet conditions. Further, the samples in a subset cannot be randomly selected because there will be considerable temporal overlap between samples of different subsets due to the <inline-formula id="inf71">
<mml:math id="m83">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>120</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hour window. Instead, we build the subsets from data covering complete years. The data we use cover the years 1995 to 2015, extending over almost two solar cycles and with few solar wind data gaps. We define five subsets based on the data for the years shown in <xref ref-type="table" rid="T1">Table&#x20;1</xref>. The datasets used for training, validation, and test are selected by cycling through the subsets. For the first CV (CV-1) subset, one is selected as test set; subsets two, four, and five for training; and subset three for validation. The process is repeated according to <xref ref-type="table" rid="T2">Table&#x20;2</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Summary of the five subsets showing the years, number of samples, the mean (nT), standard deviation (nT), and minimum <italic>Dst</italic> (nT).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Years</th>
<th align="center">Count</th>
<th align="center">Mean</th>
<th align="center">Std</th>
<th align="center">Min</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">1</td>
<td align="center">1995, 2003, 2006, 2010</td>
<td align="char" char=".">34,890</td>
<td align="char" char=".">&#x2212;15.0</td>
<td align="char" char=".">19.8</td>
<td align="char" char=".">&#x2212;422</td>
</tr>
<tr>
<td align="left">2</td>
<td align="center">2001, 2002, 2009, 2011</td>
<td align="char" char=".">35,021</td>
<td align="char" char=".">&#x2212;13.2</td>
<td align="char" char=".">23.4</td>
<td align="char" char=".">&#x2212;387</td>
</tr>
<tr>
<td align="left">3</td>
<td align="center">1998, 2004, 2008, 2012</td>
<td align="char" char=".">35,089</td>
<td align="char" char=".">&#x2212;11.3</td>
<td align="char" char=".">20.9</td>
<td align="char" char=".">&#x2212;374</td>
</tr>
<tr>
<td align="left">4</td>
<td align="center">1996, 2000, 2013, 2015</td>
<td align="char" char=".">35,059</td>
<td align="char" char=".">&#x2212;13.0</td>
<td align="char" char=".">21.0</td>
<td align="char" char=".">&#x2212;301</td>
</tr>
<tr>
<td align="left">5</td>
<td align="center">1997, 1999, 2005, 2014</td>
<td align="char" char=".">34,754</td>
<td align="char" char=".">&#x2212;12.7</td>
<td align="char" char=".">19.2</td>
<td align="char" char=".">&#x2212;247</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Selection of subsets for the different cross-validation (CV)&#x20;sets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">CV</th>
<th align="center">Training</th>
<th align="center">Validation</th>
<th align="center">Test</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">1</td>
<td align="center">2, 4, 5</td>
<td align="char" char=".">3</td>
<td align="char" char=".">1</td>
</tr>
<tr>
<td align="left">2</td>
<td align="center">1, 4, 5</td>
<td align="char" char=".">3</td>
<td align="char" char=".">2</td>
</tr>
<tr>
<td align="left">3</td>
<td align="center">1, 4, 5</td>
<td align="char" char=".">2</td>
<td align="char" char=".">3</td>
</tr>
<tr>
<td align="left">4</td>
<td align="center">1, 3, 5</td>
<td align="char" char=".">2</td>
<td align="char" char=".">4</td>
</tr>
<tr>
<td align="left">5</td>
<td align="center">1, 3, 4</td>
<td align="char" char=".">2</td>
<td align="char" char=".">5</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The input and target values span very different numerical ranges, whereas the training algorithm should receive input-target data that have similar numerical ranges. Therefore, the input and target data are normalized, where the normalization coefficients are found from the training set. By subtracting the mean and dividing with the standard deviation for each variable separately, the training set will have zero mean and one standard deviation on all its inputs and target variables. However, as the distributions for each variable are highly skewed, they result in several normalized values with magnitudes much larger than one. Another way to normalize is to instead rescale the minimum and maximum values to the range [-1, 1]. This guarantees that there will be no values outside this range for the training set. We found that the min&#x2013;max normalization gave slightly better results, especially at the large values.</p>
</sec>
<sec id="s2-3">
<title>2.3 Hyperparameters</title>
<p>There are a number of hyperparameters (HPs) that control the model complexity and training algorithm that need to be tuned, but it is not feasible to make an exhaustive search. Initially, a number of different combinations of HP values were manually tested to provide a basic insight into reasonable choices and how the training and validation MSEs vary with epochs. In this initial exploration, we found the Tensorboard (<xref ref-type="bibr" rid="B34">Abadi et&#x20;al., 2015</xref>) software valuable in monitoring the&#x20;MSE.</p>
<p>The Adam learning algorithm (<xref ref-type="bibr" rid="B32">Kingma and Lei, 2015</xref>), which is a stochastic gradient descent method, has three parameters: learning rate &#x3f5; and two decay rates for the moment estimates <inline-formula id="inf72">
<mml:math id="m84">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. We fix the latter to the suggested values <inline-formula id="inf73">
<mml:math id="m85">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.9</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.999</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and vary <inline-formula id="inf74">
<mml:math id="m86">
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>&#x22c5;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x22c5;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>&#x22c5;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The learning algorithm updates the weights in batches of samples from the training set, where the number of samples in each batch <inline-formula id="inf75">
<mml:math id="m87">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is much smaller than the total number of training samples <inline-formula id="inf76">
<mml:math id="m88">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
<mml:mo>&#x226a;</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. We test batch sizes of <inline-formula id="inf77">
<mml:math id="m89">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>32,64,128</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. One training epoch includes approximately <inline-formula id="inf78">
<mml:math id="m90">
<mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> training iterations in which the weights are updated at each iteration.</p>
<p>The model capacity is determined by the number of weights and the network architecture. In this work, we have one input layer, a recurrent layer (hidden layer), and a single output. Thus, the capacity is determined by the network type <inline-formula id="inf79">
<mml:math id="m91">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtext>Elman</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext>GRU</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext>LSTM</mml:mtext>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and the number of hidden units&#x20;<inline-formula id="inf80">
<mml:math id="m92">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The current state <inline-formula id="inf81">
<mml:math id="m93">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in the RNN depends on both its inputs <inline-formula id="inf82">
<mml:math id="m94">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and the past state <inline-formula id="inf83">
<mml:math id="m95">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. For computational performance reasons, past states are not kept indefinitely; instead, there is a limit <italic>T</italic> on the length of the memory. We explored <inline-formula id="inf84">
<mml:math id="m96">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>48,72,96,120</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> hours and found that the validation MSE decreased with increasing <italic>T</italic>, but that it leveled out for large <italic>T</italic>. We therefore set <inline-formula id="inf85">
<mml:math id="m97">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>120</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hours. This also means that any dynamical processes extending past 120&#xa0;h cannot be modeled internally by the RNN. The choice of <italic>T</italic> for the Elman and GRU networks is studied on simulated <italic>Dst</italic> data in the next section.</p>
<p>The dropout is controlled by parameters that specify the fraction of network units in a layer that are randomly selected per epoch and temporarily disregarded. The dropout can be applied to all layers: the input layer <inline-formula id="inf86">
<mml:math id="m98">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, the recurrent layer <inline-formula id="inf87">
<mml:math id="m99">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and the hidden layer <inline-formula id="inf88">
<mml:math id="m100">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The dropout is a number between 0 and 1, where 0 means all units are included and 1 that all units are unused.</p>
<p>For each combination of HP that we explore, we train three networks initiated with different random weight values as there is no guarantee that the training algorithm will find a good local minimum. The network with the lowest validation error is selected. Note that here validation refers to the split into training and validation sets used during training, which is different from the cross-validation sets that make up the independent test&#x20;set.</p>
</sec>
<sec id="s2-4">
<title>2.4 Training Network on Simulated Data</title>
<p>It is interesting to study the RNNs on data generated from a known function relating solar wind to <italic>Dst</italic>, and for this purpose, we use the AK1 model (<xref ref-type="bibr" rid="B3">O&#x2019;Brien and McPherron, 2000</xref>). Using the datasets defined in the previous section, we apply the AK1 model to the solar wind inputs and create the target data. Thus, there exists an exact relation between input and output, and the learning process of the RNN will only be limited by the amount of data, network structure (type of RNNs), and network capacity (size of RNNs). We showed that the minimalistic Elman network (<xref ref-type="disp-formula" rid="e7">Eq. 7</xref>) can model the pressure-corrected <italic>Dst</italic>. The AK1 model also includes the pressure term, and its inputs are <inline-formula id="inf89">
<mml:math id="m101">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <italic>n</italic>, and <italic>V</italic>. The five-fold CV is applied to Elman and GRU networks, and we vary the time window <italic>T</italic> and the network size&#x20;<inline-formula id="inf90">
<mml:math id="m102">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>In <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>, the validation errors as function of <italic>T</italic> are shown for the Elman and GRU networks. At each <italic>T</italic>, the optimal networks with respect to <inline-formula id="inf91">
<mml:math id="m103">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are used. We see that for small <italic>T</italic>, the RMSE is large, but it is similar for the two network types. At small <italic>T</italic>, only part of the storm recovery phase can be modeled. But as <italic>T</italic> is increased, the RMSE becomes much smaller for the GRU network than for the Elman network. It is likely that the Elman network suffers from the vanishing gradient problem (<xref ref-type="bibr" rid="B17">Bengio et&#x20;al., 1994</xref>): the reason for introducing GRU and LSTM networks. We also see that the GRU network reaches an RMSE of lower than 0.6&#xa0;nT, which could be further decreased by increasing <italic>T</italic>. Thus, the GRU network can learn the AK1 model using the observed solar wind&#x20;data.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>The average validation RMSE as function of memory size (<italic>T</italic> in hours) for the Elman and GRU networks.</p>
</caption>
<graphic xlink:href="fspas-08-664483-g001.tif"/>
</fig>
</sec>
<sec id="s2-5">
<title>2.5 Result for the <italic>Dst</italic> Index</title>
<p>As described in <xref ref-type="sec" rid="s2-2">Section 2.2</xref>, the inputs to the <italic>Dst</italic> model are solar wind magnetic field <inline-formula id="inf92">
<mml:math id="m104">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>B</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>y</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, density (<italic>n</italic>), and speed (<italic>V</italic>); the day of year parameterized as sine and cosine of <inline-formula id="inf93">
<mml:math id="m105">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>&#x3c0;</mml:mi>
<mml:mtext>DOY</mml:mtext>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mn>365</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>; and local time as sine and cosine of <inline-formula id="inf94">
<mml:math id="m106">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>&#x3c0;</mml:mi>
<mml:mtext>UT</mml:mtext>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mn>24</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The DOY and UT are added to model the seasonal and diurnal variations in <italic>Dst</italic> (<xref ref-type="bibr" rid="B4">O&#x2019;Brien and McPherron, 2002</xref>).</p>
<p>We perform a search in the hyperparameter space as described above and conclude that training is not very sensitive on the learning rate (&#x3f5;) or batch size <inline-formula id="inf95">
<mml:math id="m107">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and therefore fix them at <inline-formula id="inf96">
<mml:math id="m108">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mn>128</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>For each of the five splits, we select the corresponding training set (<xref ref-type="table" rid="T1">Tables 1</xref>,<xref ref-type="table" rid="T2">2</xref>), and RNNs are trained with different number of hidden units <inline-formula id="inf97">
<mml:math id="m109">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and different dropout rates <inline-formula id="inf98">
<mml:math id="m110">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. For each combination of <inline-formula id="inf99">
<mml:math id="m111">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, three networks are trained starting from different random initial weights. During training, the validation error is monitored, and the network with lowest validation error is selected. The training is stopped 20 epochs after the minimum validation error has been reached, and the network at minimum validation error is saved. Typically, the minimum validation RMSE is found after 40 to 80 epochs. This results in five different networks for each HP combination that can be tested using the CV approach. <xref ref-type="sec" rid="app1">See Appendix for software used and typical training times</xref>.</p>
<p>The coupling function from solar wind to observed <italic>Dst</italic> is subject to a number of uncertainties, and to provide a few examples: the solar wind data have been measured at different locations upstream of Earth, mostly from orbit around the L1 location, and then shifted to a common location closer to Earth (<xref ref-type="bibr" rid="B28">King and Papitashvili, 2005</xref>); We rely on a point measurement; there may be both systematic and random errors in the derived <italic>Dst</italic> index. The uncertainties introduce errors in the input&#x2013;output mapping, and to reduce their effect and improve generalization, we apply dropout. From a search of different combinations of <inline-formula id="inf100">
<mml:math id="m112">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, it was found that dropout on the inputs <inline-formula id="inf101">
<mml:math id="m113">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> always led to poor performance, which can be understood as several inputs individually are critical, for example <inline-formula id="inf102">
<mml:math id="m114">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Therefore, we set <inline-formula id="inf103">
<mml:math id="m115">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. The performance improved when dropout was applied on the recurrent and hidden layers. <xref ref-type="fig" rid="F2">Figure&#x20;2</xref> shows the training and validation RMSE as a function of <inline-formula id="inf104">
<mml:math id="m116">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> for different dropouts. In the case with no dropout (left panel), it is seen that the GRU and LSTM validation errors are similar and significantly below the Elman validation errors. There is also a large gap between the training and validation errors, indicating over-fitting on the training set. When dropout is introduced (middle and right panels), the network size must be increased to reach similar validation RMSE as when no dropout is applied, which is expected as only a fraction of units are active at any one time. But we also see that the gap between training and validation errors decreases. We also applied dropout on the Elman network, but the validation errors became large when <inline-formula id="inf105">
<mml:math id="m117">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>; therefore, the results are not included in the middle and right panels. When <inline-formula id="inf106">
<mml:math id="m118">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf107">
<mml:math id="m119">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> (right panel), the optimal GRU and LSTM networks have <inline-formula id="inf108">
<mml:math id="m120">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>50</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf109">
<mml:math id="m121">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>40</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, respectively. In terms of the number of weights, they are of similar sizes, 9,051 and 8,041, respectively. When dropout is applied, the number of active weights drops to 2,651 and&#x20;2,421</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>The average training (dashed lines) and validation (solid lines) RMSE as function of the number of hidden units (<inline-formula id="inf110">
<mml:math id="m122">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>) for the Elman, GRU, and LSTM networks. The panels show errors, from left to right, when no dropout is applied (<inline-formula id="inf111">
<mml:math id="m123">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>), dropout <inline-formula id="inf112">
<mml:math id="m124">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.5</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, and dropout <inline-formula id="inf113">
<mml:math id="m125">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.5</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. Note that the dropout on the inputs is <inline-formula id="inf114">
<mml:math id="m126">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. Gray horizontal line marks the minimum validation RMSE.</p>
</caption>
<graphic xlink:href="fspas-08-664483-g002.tif"/>
</fig>
<p>For each CV set, we select the GRU and LSTM networks with minimum validation RMSE with and without dropout, and run them and collect the 5 CV sets into one set. We thereby get an estimate of the generalization performance for the whole 1995 to 2015 period. <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> shows scatterplots of predicted <italic>Dst</italic> vs. observed <italic>Dst</italic> on the test sets for different networks. <xref ref-type="table" rid="T3">Table&#x20;3</xref> summarizes the performance on the training, validation, and test sets. The 95% confidence intervals have been estimated by both assuming independent data points and taking into account the autocorrelation (<xref ref-type="bibr" rid="B33">Zwiers and von Storch, 1995</xref>). It is clear that using dropout significantly improves the generalization capability. We also see that there is no significant difference between the GRU and LSTM networks. The bias (mean of errors) and linear correlation coefficient are computed on the test set and shown in <xref ref-type="table" rid="T4">Table&#x20;4</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Scatterplots of predicted vs. observed <italic>Dst</italic> based on the five CV test sets. The left panels show predictions without dropout using GRU and LSTM networks (gru-10 and lstm-10), while the right panels are predictions based on networks trained using dropout of <inline-formula id="inf115">
<mml:math id="m127">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0.5</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> (gru-50 and lstm-40).</p>
</caption>
<graphic xlink:href="fspas-08-664483-g003.tif"/>
</fig>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Training, validation, and test RMSE (nT) for networks with and without dropout. The training and validation RMSE are averages over the five CV splits, while the test RMSE is computed from the combined five CV test sets. Networks with <inline-formula id="inf116">
<mml:math id="m128">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> have no dropout and the larger networks have dropout <inline-formula id="inf117">
<mml:math id="m129">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0.5</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The 95% RMSE confidence interval is approximately <inline-formula id="inf118">
<mml:math id="m130">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.03</mml:mn>
<mml:mtext>&#x2009;nT</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> assuming independent errors but increases to <inline-formula id="inf119">
<mml:math id="m131">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.17</mml:mn>
<mml:mtext>&#x2009;nT</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> if the autocorrelation is taken into account.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Net</th>
<th align="center">
<inline-formula id="inf120">
<mml:math id="m132">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">Train</th>
<th align="center">Val</th>
<th align="center">Test</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">GRU</td>
<td align="char" char=".">10</td>
<td align="char" char=".">7.21</td>
<td align="char" char=".">8.79</td>
<td align="char" char=".">9.24</td>
</tr>
<tr>
<td align="left">GRU</td>
<td align="char" char=".">50</td>
<td align="char" char=".">8.43</td>
<td align="char" char=".">8.67</td>
<td align="char" char=".">8.85</td>
</tr>
<tr>
<td align="left">LSTM</td>
<td align="char" char=".">10</td>
<td align="char" char=".">7.06</td>
<td align="char" char=".">8.84</td>
<td align="char" char=".">9.37</td>
</tr>
<tr>
<td align="left">LSTM</td>
<td align="char" char=".">40</td>
<td align="char" char=".">8.34</td>
<td align="char" char=".">8.77</td>
<td align="char" char=".">8.81</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>BIAS, RMSE, and CORR for the GRU and LSTM models on the test set. BIAS and RMSE are in units of nT. The 95% confidence intervals are <inline-formula id="inf121">
<mml:math id="m133">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.04</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula id="inf122">
<mml:math id="m134">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.2</mml:mn>
<mml:mtext>&#x2009;nT</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> for BIAS, <inline-formula id="inf123">
<mml:math id="m135">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.03</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula id="inf124">
<mml:math id="m136">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.17</mml:mn>
<mml:mtext>&#x2009;nT</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> for RMSE (same as <xref ref-type="table" rid="T3">Table&#xa0;3</xref>), and <inline-formula id="inf125">
<mml:math id="m137">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.001</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula id="inf126">
<mml:math id="m138">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.01</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> for CORR. The lower limits assume independence and the higher limits take into account the autocorrelations.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Gru-10</th>
<th align="center">Gru-50</th>
<th align="center">Lstm-10</th>
<th align="center">Lstm-40</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">BIAS</td>
<td align="char" char=".">&#x2212;0.41</td>
<td align="char" char=".">&#x2212;0.10</td>
<td align="char" char=".">&#x2212;0.59</td>
<td align="char" char=".">0.16</td>
</tr>
<tr>
<td align="left">RMSE</td>
<td align="char" char=".">9.24</td>
<td align="char" char=".">8.85</td>
<td align="char" char=".">9.37</td>
<td align="char" char=".">8.81</td>
</tr>
<tr>
<td align="left">CORR</td>
<td align="char" char=".">0.89</td>
<td align="char" char=".">0.90</td>
<td align="char" char=".">0.89</td>
<td align="char" char=".">0.90</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The performance of the networks varies with the level of <italic>Dst</italic>; the errors have a tendency to increase with the magnitude of <italic>Dst</italic>. <xref ref-type="fig" rid="F4">Figure&#x20;4</xref> shows the RMSE binned by observed <italic>Dst</italic>. Down to <inline-formula id="inf127">
<mml:math id="m139">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>300</mml:mn>
<mml:mtext>&#x2009;nT</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>, the networks with dropout have the lowest RMSE. The bin at <inline-formula id="inf128">
<mml:math id="m140">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>400</mml:mn>
<mml:mtext>&#x2009;nT</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> has too few samples to be interpretable. The main reason the errors increase with the magnitude of <italic>Dst</italic> is that there are very few samples around the extremes; thus, the uncertainty of the function estimation will be large. In <xref ref-type="bibr" rid="B30">Wintoft et&#x20;al. (2017</xref>), this problem was addressed by using an ensemble of networks; the predictions from several networks with different weights were averaged. In this work, we study the use of dropout not only in the training phase but also in the prediction phase. The algorithm that temporarily cancels units at random during training can also be applied during prediction. This means that there is practically an indefinitely number of weight combinations that can be used to produce an arbitrary number of predictions at each time step. For the GRU network with <inline-formula id="inf129">
<mml:math id="m141">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>50,0.5</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, there are more than <inline-formula id="inf130">
<mml:math id="m142">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>28</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> possible combinations. There is a Bayesian interpretation of dropout as the estimation of model uncertainty (<xref ref-type="bibr" rid="B34">Gal and Ghahramani, 2016b</xref>). The idea is that the weights are random variables leading to a distribution of predictions for fixed inputs. For each sample, a large number of predictions can be generated, which randomly use different combinations of network units. For each sample, we generate 100 predictions and compute the mean and standard deviation. <xref ref-type="fig" rid="F5">Figure&#x20;5</xref> shows two examples, the first a severe geomagnetic storm and the second a major storm. The mean predictions with dropout come close to the predictions without dropout. The prediction uncertainty is small during quiet times (<italic>Dst</italic> close to zero) and increases with storm magnitude. Again, this is a result of the greater uncertainty in parameter estimates in regions which are poorly sampled.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>The test RMSE binned by observed <italic>Dst</italic>. The RMSE is computed on the five CV test sets. Bins are 100&#xa0;nT wide, and the numbers show the number of samples in each bin. Legend: GRU (gru-10) and LSTM (lstm-10) networks without dropout, and GRU (gru-50) and LSTM (lstm-40) networks with dropout <inline-formula id="inf131">
<mml:math id="m143">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.5</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.5</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</caption>
<graphic xlink:href="fspas-08-664483-g004.tif"/>
</fig>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Two geomagnetic storms predicted with the GRU network using the test set. Panels show observed <italic>Dst</italic> (blue solid), predicted <italic>Dst</italic> without dropout during prediction phase (dashed green), and mean prediction with dropout (orange solid). The dark orange regions show the predicted <inline-formula id="inf132">
<mml:math id="m144">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and the light orange regions the <inline-formula id="inf133">
<mml:math id="m145">
<mml:mrow>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. The dash-dotted curve is the quiet time variation. Note that the intensity of the storms is different and that the y-scales span different ranges.</p>
</caption>
<graphic xlink:href="fspas-08-664483-g005.tif"/>
</fig>
<p>As time of day and season are included in the inputs, the network can model diurnal and seasonal variations in <italic>Dst</italic>. These variations are not strong, and the left panel in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref> shows <italic>Dst</italic> for all years averaged over month and UT hour. Running the GRU networks on the test data from the five CV sets reveals a very similar pattern (second panel from left). Thus, the network shows similar long-term statistics considering that it is driven only by solar wind and time information. The two left panels contain contributions from all levels of <italic>Dst</italic> from quiet conditions to storm conditions. But we may now simulate solar wind conditions that we can define as quiet conditions. The two right panels show predicted <italic>Dst</italic>, assuming solar wind flowing out from the Sun (GSEQ system) along the Parker spiral with a <inline-formula id="inf134">
<mml:math id="m146">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>45</mml:mn>
</mml:mrow>
<mml:mo>&#x2218;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> angle at L1 at two different speeds, 350&#xa0;km/s and 400&#xa0;km/s, respectively. In this configuration, <inline-formula id="inf135">
<mml:math id="m147">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> in the solar coordinate system, but <italic>via</italic> geometric effects (Sun&#x2019;s and Earth&#x2019;s tilts with respect to the ecliptic and Earth&#x2019;s dipole tilt), <inline-formula id="inf136">
<mml:math id="m148">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> will be nonzero in the GSM system showing diurnal and seasonal variations (<xref ref-type="bibr" rid="B35">Lockwood et&#x20;al., 2020</xref>).</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>The panels show average <italic>Dst</italic> binned by month and UT hour. From left to right, the averages are based on all observed <italic>Dst</italic> (Dst), predicted <italic>Dst</italic> from the CV test sets (Predicted), predicted <italic>Dst</italic> from quiet solar wind at 350&#xa0;km/s (Quiet 350), and predicted <italic>Dst</italic> from quiet solar wind at 400&#xa0;km/s (Quiet 400) (see text for definition of quiet solar wind).</p>
</caption>
<graphic xlink:href="fspas-08-664483-g006.tif"/>
</fig>
</sec>
</sec>
<sec id="s3">
<title>3 Discussion and Conclusion</title>
<p>There is a close correspondence between Elman networks and models expressed in terms of the differential equation for the <italic>Dst</italic> index. A minimalistic Elman network trained on simulated data from the pressure-corrected <italic>Dst</italic> index (<xref ref-type="disp-formula" rid="e1">Eq. 1</xref>) results in weights that translate to values around <inline-formula id="inf137">
<mml:math id="m149">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2.45</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf138">
<mml:math id="m150">
<mml:mrow>
<mml:mi>&#x3c4;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>15</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, close to those used in <xref ref-type="disp-formula" rid="e2">Eqs 2</xref>,<xref ref-type="disp-formula" rid="e3">3</xref>. However, using solar wind data from the years 1995&#x2013;2015 and targeting simulated <italic>Dst</italic> from the AK1 model, we find that the RMSE for Elman network basically levels out for temporal history of <inline-formula id="inf139">
<mml:math id="m151">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi mathvariant="normal">&#x2273;</mml:mi>
<mml:mn>40</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hours. This is not the case for the GRU network, which has similar RMSE up to <inline-formula id="inf140">
<mml:math id="m152">
<mml:mrow>
<mml:mi mathvariant="normal">&#x2272;</mml:mi>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hours but continues to improve for <inline-formula id="inf141">
<mml:math id="m153">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> hours. We interpret this as an effect of the vanishing gradient problem (<xref ref-type="bibr" rid="B17">Bengio et&#x20;al., 1994</xref>) that is solved in the GRU and LSTM networks. It should be noted that the Elman network takes less time to train, and if the dynamics of the system can be captured in less than about 20&#x20;time steps, then the Elman network could be sufficient. In the future, it would be interesting to perform similar experiments for other solar terrestrial variables, for example, other geomagnetic indices with different temporal dynamics. Another line of experimenting could be to separate processes with different dynamics in the construction of the&#x20;RNN.</p>
<p>The GRU (<xref ref-type="bibr" rid="B18">Cho et&#x20;al., 2014</xref>) and LSTM (<xref ref-type="bibr" rid="B19">Hochreiter and Schmidhuber, 1997</xref>) networks include gating units that control information flow through time. However, it is not clear if one architecture is better than the other (<xref ref-type="bibr" rid="B22">Chung et&#x20;al., 2014</xref>). In order to reliably study the differences between the two RNNs, we applied five-fold cross-validation. Further, it was also essential to apply dropout (<xref ref-type="bibr" rid="B27">Gal and Ghahramani, 2016</xref>) to reduce over-fitting and achieve consistent results. Using solar wind data and observed <italic>Dst</italic> from 1995 to 2015, we see no significant difference between the two architectures. However, the GRU network is slightly less complex than the LSTM and will therefore have shorter training&#x20;times.</p>
<p>An interesting effect of using dropout is that it can also be applied during the prediction phase as a way of capturing model uncertainty (<xref ref-type="bibr" rid="B34">Gal and Ghahramani, 2016b</xref>). Using dropout during prediction is similar to ensemble prediction based on a collection of networks with identical architectures but different specific weights (<xref ref-type="bibr" rid="B30">Wintoft et&#x20;al., 2017</xref>), but with the great advantage that the predictions can be based on, in principle, unlimited number of models. However, it is different from using an ensemble of different types of models like in <xref ref-type="bibr" rid="B21">Xu et&#x20;al. (2020</xref>). We illustrated the prediction uncertainty using dropout for a couple of storms from the test set. Estimating the prediction uncertainty is important and was addressed by <xref ref-type="bibr" rid="B11">Gruet et&#x20;al. (2018</xref>) using a combination of LSTM network and a Gaussian process (GP) model. In that case, the LSTM network provides the mean function to the GP model from which a distribution of prediction can be made. For future work, it will be interesting to further study the use of dropout for estimating model uncertainty.</p>
<p>Predictions based on the test sets using the GRU networks show very good agreement with observed <italic>Dst</italic> when averaged over month and UT (<xref ref-type="fig" rid="F6">Figure&#x20;6</xref>). The semiannual variation (<xref ref-type="bibr" rid="B35">Lockwood et&#x20;al., 2020</xref>) is clear, with a deeper minimum in autumn than in spring and a weak UT variation. It is a combination of geometrical effects that cause the asymmetric semiannual variation leading to a modulation of the <inline-formula id="inf142">
<mml:math id="m154">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> component in the GSM system, and, together with the nonlinear solar wind&#x2013;magnetosphere coupling, gives rise to the variation in <italic>Dst</italic>. The two rightmost panels in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref> show predictions based on simulated data with <inline-formula id="inf143">
<mml:math id="m155">
<mml:mrow>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>z</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> in the GSEQ system using two different speeds. In these cases, the semiannual variation is only caused by geometrical effects, while the two panels to the left also contain storms caused by different solar wind disturbances like coronal mass ejections. We also see that the difference between the spring and autumn minima is about 6&#xa0;nT for both observed and predicted <italic>Dst</italic>, while the difference is about 14&#x2013;18&#xa0;nT for quiet time-simulated <italic>Dst</italic>. In this work, we only showed that the semiannual variation is reproduced by the simulations, but for the future, other types of simulations that contain CME structures could be performed to provide further insights into the semiannual variations.</p>
</sec>
</body>
<back>
<sec id="s4">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. These data can be found here: <ext-link ext-link-type="uri" xlink:href="https://omniweb.gsfc.nasa.gov/ow.html">https://omniweb.gsfc.nasa.gov/ow.html</ext-link>.</p>
</sec>
<sec id="s5">
<title>Author Contributions</title>
<p>PW and MW have carried out this work with main contribution from&#x20;PW.</p>
</sec>
<sec sec-type="COI-statement" id="s6">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ack>
<p>We acknowledge the use of NASA/GSFC&#x2019;s Space Physics Data Facility&#x2019;s OMNIWeb (or CDAWeb or ftp) service and OMNI&#x20;data.</p>
</ack>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://www.drivendata.org/competitions/73/noaa-magnetic-forecasting/page/278/">https://www.drivendata.org/competitions/73/noaa-magnetic-forecasting/page/278/</ext-link>
</p>
</fn>
<fn id="fn2">
<label>2</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://github.com/spacedr/dst_rnn">https://github.com/spacedr/dst_rnn</ext-link>
</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B25">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Abadi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Agarwal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Barham</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Brevdo</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Citro</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <source>TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems</source>. <comment>Software available from tensorflow.org</comment>.</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Simard</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Frasconi</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>Learning Long-Term Dependencies with Gradient Descent Is Difficult</article-title>. <source>IEEE Trans. Neural Netw.</source> <volume>5</volume> <fpage>157</fpage>&#x2013;<lpage>166</lpage>. <pub-id pub-id-type="doi">10.1109/72.279181</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boaghe</surname>
<given-names>O. M.</given-names>
</name>
<name>
<surname>Balikhin</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Billings</surname>
<given-names>S. A.</given-names>
</name>
<name>
<surname>Alleyne</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Identification of Nonlinear Processes in the Magnetospheric Dynamics and Forecasting of Dst Index</article-title>. <source>J.&#x20;Geophys. Res.</source> <volume>106</volume>, <fpage>30047</fpage>&#x2013;<lpage>30066</lpage>. <pub-id pub-id-type="doi">10.1029/2000ja900162</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Borovsky</surname>
<given-names>J.&#x20;E.</given-names>
</name>
<name>
<surname>Birn</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>The Solar Wind Electric Field Does Not Control the Dayside Reconnection Rate</article-title>. <source>J.&#x20;Geophys. Res. Space Phys.</source> <volume>119</volume> <fpage>751</fpage>&#x2013;<lpage>760</lpage>. <pub-id pub-id-type="doi">10.1002/2013JA019193</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boynton</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Balikhin</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Billings</surname>
<given-names>S. A.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>A. S.</given-names>
</name>
<name>
<surname>Amariutei</surname>
<given-names>O. A.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Data Derived Narmax Dst Model</article-title>. <source>Ann. Geophys.</source> <volume>29</volume> <fpage>965</fpage>&#x2013;<lpage>971</lpage>. <pub-id pub-id-type="doi">10.5194/angeo-29-965-2011</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burton</surname>
<given-names>R. K.</given-names>
</name>
<name>
<surname>McPherron</surname>
<given-names>R. L.</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>C. T.</given-names>
</name>
</person-group> (<year>1975</year>). <article-title>An Empirical Relationship between Interplanetary Conditions andDst</article-title>. <source>J.&#x20;Geophys. Res.</source> <volume>80</volume> <fpage>4204</fpage>&#x2013;<lpage>4214</lpage>. <pub-id pub-id-type="doi">10.1029/ja080i031p04204</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Cho</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>van Merri&#xeb;nboer</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Gulcehre</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bahdanau</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bougares</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Schwenk</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). &#x201c;<article-title>Learning Phrase Representations Using RNN Encoder&#x2013;Decoder for Statistical Machine Translation</article-title>,&#x201d;in <conf-name>Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)</conf-name> (<publisher-loc>Doha, Qatar</publisher-loc>: <publisher-name>Association for Computational Linguistics)</publisher-name> <fpage>1724</fpage>&#x2013;<lpage>1734</lpage>. <pub-id pub-id-type="doi">10.3115/v1/D14-1179</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chung</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gulcehre</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Cho</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling</article-title>. <source>NIPS 2014 Workshop on Deep Learning</source>, <volume>December</volume> <fpage>2014</fpage>. </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cybenko</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>1989</year>). <article-title>Approximation by Superposition of a Sigmoidal Function</article-title>. <source>Math. Control Signals, Syst.</source> <volume>2</volume> <fpage>303</fpage>&#x2013;<lpage>314</lpage>. <pub-id pub-id-type="doi">10.1007/bf02551274</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elman</surname>
<given-names>J.&#x20;L.</given-names>
</name>
</person-group> (<year>1990</year>). <article-title>Finding Structure in Time</article-title>. <source>Cogn. Sci.</source> <volume>14</volume> <fpage>179</fpage>&#x2013;<lpage>211</lpage>. <pub-id pub-id-type="doi">10.1207/s15516709cog1402_1</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gal</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ghahramani</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2016a</year>). &#x201c;<article-title>A Theoretically Grounded Application of Dropout in Recurrent Neural Networks</article-title>,&#x201d; in <conf-name>30th Conference on Neural Information Processing Systems</conf-name> (<publisher-loc>Barcelona, Spain</publisher-loc>: <publisher-name>NIPS</publisher-name>). </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gal</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ghahramani</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2016b</year>). <article-title>Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning</article-title>. <conf-name>Proceedings of the 33rd International Conference on Machine Learning</conf-name>, <conf-loc>New York, NY, United States</conf-loc>, <conf-date>2016</conf-date>, (<publisher-name>JMLR: W&#x26;CP</publisher-name>), <volume>48</volume>. </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gleisner</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Lundstedt</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wintoft</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1996</year>). <article-title>Predicting Geomagnetic Storms from Solar-Wind Data Using Time-Delay Neural Networks</article-title>. <source>Ann. Geophys.</source> <volume>14</volume> <fpage>679</fpage>&#x2013;<lpage>686</lpage>. <pub-id pub-id-type="doi">10.1007/s00585-996-0679-1</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Goodfellow</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Courville</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Deep Learning</source> (<publisher-name>MIT Press</publisher-name>)</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gruet</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Chandorkar</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sicard</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Camporeale</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Multiple-hour-ahead Forecast of the Dst Index Using a Combination of Long Short-Term Memory Neural Network and Gaussian Process</article-title>. <source>Space Weather</source> <volume>16</volume>, <fpage>1882</fpage>, <lpage>1896</lpage>. <pub-id pub-id-type="doi">10.1029/2018SW001898</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hochreiter</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schmidhuber</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Long Short-Term Memory</article-title>. <source>Neural Comput.</source> <volume>9</volume> <fpage>1735</fpage>&#x2013;<lpage>1780</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hunter</surname>
<given-names>J.&#x20;D.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Matplotlib: A 2d Graphics Environment</article-title>. <source>Comput. Sci. Eng.</source> <volume>9</volume> <fpage>90</fpage>&#x2013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1109/MCSE.2007.55</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>King</surname>
<given-names>J.&#x20;H.</given-names>
</name>
<name>
<surname>Papitashvili</surname>
<given-names>N. E.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Solar Wind Spatial Scales in and Comparisons of Hourly Wind and Ace Plasma and Magnetic Field Data</article-title>. <source>J.&#x20;Geophys. Res.</source> <volume>110</volume>, <fpage>A02104</fpage>. <pub-id pub-id-type="doi">10.1029/2004JA010649</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kingma</surname>
<given-names>D. P.</given-names>
</name>
<name>
<surname>Lei</surname>
<given-names>Ba. J.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Adam: A Method for Stochastic Optimization</article-title>,&#x201d; in <conf-name>The 3rd International Conference on Learning Representations (ICLR)</conf-name>, <comment>arXiv:1412.6980</comment>. </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lockwood</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Owens</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Barnard</surname>
<given-names>L. A.</given-names>
</name>
<name>
<surname>Haines</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Scott</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>McWilliams</surname>
<given-names>K. A.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Semi-annual, Annual and Universal Time Variations in the Magnetosphere and in Geomagnetic Activity: 1. Geomagnetic Data</article-title>. <source>J.&#x20;Space Weather Space Clim.</source> <volume>10</volume> . <pub-id pub-id-type="doi">10.1051/swsc/2020023</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lundstedt</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Gleisner</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wintoft</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Operational Forecasts of the geomagneticDstindex</article-title>. <source>Geophys. Res. Lett.</source> <volume>29</volume>, <fpage>34</fpage>. <pub-id pub-id-type="doi">10.1029/2002GL016151</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lundstedt</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wintoft</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>Prediction of Geomagnetic Storms from Solar Wind Data with the Use of a Neural Network</article-title>. <source>Ann. Geophys.</source> <volume>12</volume>, <fpage>19</fpage>&#x2013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1007/s00585-994-0019-2</pub-id> </citation>
</ref>
<ref id="B1">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mayaud</surname>
<given-names>P. N.</given-names>
</name>
</person-group> (<year>1980</year>). <article-title>Derivation, Meaning, and Use of Geomagnetic Indices</article-title>, <source>Geophysical Monograph</source>. <volume>22</volume> (<publisher-name>American Geophysical Union</publisher-name>. </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>O&#x2019;Brien</surname>
<given-names>T. P.</given-names>
</name>
<name>
<surname>McPherron</surname>
<given-names>R. L.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Forecasting the Ring Current Dst in Real Time</article-title>. <source>J.&#x20;Atmos. Solar-Terrestrial Phys.</source> <volume>62</volume> <fpage>1295</fpage>&#x2013;<lpage>1299</lpage>. </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>O&#x2019;Brien</surname>
<given-names>T. P.</given-names>
</name>
<name>
<surname>McPherron</surname>
<given-names>R. L.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Seasonal and Diurnal Variation of Dst Dynamics</article-title>. <source>J.&#x20;Geophys. Res.</source> <volume>107</volume>, <fpage>1341</fpage>. <pub-id pub-id-type="doi">10.1029/2002JA009435</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pallocchia</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Amata</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Consolini</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Marcucci</surname>
<given-names>M. F.</given-names>
</name>
<name>
<surname>Bertello</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Geomagnetic Dst Index Forecast Based on IMF Data Only</article-title>. <source>Ann. Geophys.</source> <volume>24</volume>, <fpage>989</fpage>&#x2013;<lpage>999</lpage>. <pub-id pub-id-type="doi">10.5194/angeo-24-989-2006</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Srivastava</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Krizhevsky</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Salakhutdinov</surname>
<given-names>R.</given-names>
</name>
</person-group> <year>2014</year>). <article-title>Dropout: a Simple Way to Prevent Neural Networks from Overfitting</article-title>. <source>J.&#x20;Machine Learn. Res.</source> <volume>15</volume>, <fpage>1929</fpage>&#x2013;<lpage>1958</lpage>. </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tan</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Geomagnetic Index Kp Forecasting with Lstm</article-title>. <source>Space Weather</source> <volume>16</volume>, <fpage>406</fpage>&#x2013;<lpage>416</lpage>. <pub-id pub-id-type="doi">10.1002/2017SW001764</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Temerin</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Dst Model for 1995&#x2013;2002</article-title>. <source>J.&#x20;Geophys. Res.</source> <volume>111</volume>, <fpage>A04221</fpage>. <pub-id pub-id-type="doi">10.1029/2005JA011257</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<collab>T. pandas-dev/pandas: Pandas</collab> (<year>2020</year>). <article-title>pandas development team</article-title>. <pub-id pub-id-type="doi">10.5281/zenodo.3509134</pub-id>
<comment>[Dataset]</comment> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vassiliadis</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Klimas</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Valdivia</surname>
<given-names>J.&#x20;A.</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>D. N.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>The Dst Geomagnetic Response as Function of Storm Phase and Amplitude and the Solar Wind Electric Field</article-title>. <source>J.&#x20;Geophys. Res.</source> <volume>104</volume>, <fpage>957</fpage>&#x2013;<lpage>976</lpage>. <pub-id pub-id-type="doi">10.1029/1999ja900185</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Watanabe</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sagawa</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Ohtaka</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Shimazu</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Prediction of the Dst Index from Solar Wind Parameters by a Neural Network Method</article-title>. <source>Earth Planets Space</source> <volume>54</volume>, <fpage>1263</fpage>&#x2013;<lpage>1275</lpage>. <pub-id pub-id-type="doi">10.1186/bf03352454</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wintoft</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Wik</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Evaluation of Kp and Dst Predictions Using Ace and Dscovr Solar Wind Data</article-title>. <source>Space Weather</source> <volume>16</volume>, <fpage>1972</fpage>&#x2013;<lpage>1983</lpage>. <pub-id pub-id-type="doi">10.1029/2018SW001994</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wintoft</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Wik</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Matzka</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shprits</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Forecasting Kp from Solar Wind Data: Input Parameter Study Using 3-hour Averages and 3-hour Range Values</article-title>. <source>J.&#x20;Space Weather Space Clim.</source> <volume>7</volume>, <fpage>A29</fpage>. <pub-id pub-id-type="doi">10.1051/swsc/2017027</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>J.&#x20;G.</given-names>
</name>
<name>
<surname>Lundstedt</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Neural Network Modeling of Solar Wind-Magnetosphere Interaction</article-title>. <source>J.&#x20;Geophys. Res.</source> <volume>102</volume>, <fpage>14457</fpage>&#x2013;<lpage>14466</lpage>. <pub-id pub-id-type="doi">10.1029/97ja01081</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>S. B.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>S. Y.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>Z. G.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>X. H.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Prediction of the Dst Index with Bagging Ensemble-Learning Algorithm</article-title>. <source>Astrophysical J.&#x20;Suppl. Ser.</source> <volume>248</volume>. <pub-id pub-id-type="doi">10.3847/1538-4365/ab880e</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zwiers</surname>
<given-names>F. W.</given-names>
</name>
<name>
<surname>von Storch</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Taking Serial Correlation into Account in Tests of the Mean</article-title>. <source>J.&#x20;Clim.</source> <volume>8</volume>, <fpage>336</fpage>&#x2013;<lpage>351</lpage>. <pub-id pub-id-type="doi">10.1175/1520-0442(1995)008&#x3c;0336:tsciai&#x3e;2.0.co;2</pub-id> </citation>
</ref>
</ref-list>
<app-group>
<app id="app1">
<title>Appendix: Software and hardware</title>
<p>The code has been written in Python where we rely on several software packages: Pandas for data analysis (<xref ref-type="bibr" rid="B36">T. pandas-dev/pandas: Pandas, 2020</xref>); Matplotlib for plotting (<xref ref-type="bibr" rid="B37">Hunter, 2007</xref>); and TensorFlow and TensorBoard for RNN training (Dataset] <xref ref-type="bibr" rid="B25">Abadi et&#x20;al., 2015</xref>).</p>
<p>The simulations have been run on an Intel Core i9-7960X CPU at 4.2&#xa0;GHz with 64&#xa0;GB memory. In total, 32 threads can be run in parallel. Typical training time for one Elman network with 30 hidden units for 50 epochs ranges between 5 and 15&#xa0;min, where the shorter time is due to that the process could be distributed on multiple threads. We noted that one training process could be distributed over four threads, when the overall load was low. A GRU network with 10 hidden units could take between 30&#xa0;min and slightly more than 1&#xa0;h for 50&#xa0;epochs. A 10-hidden unit LSTM network ranged between 50&#xa0;min and 1.5&#xa0;h.</p>
</app>
</app-group>
</back>
</article>