<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Robot. AI</journal-id>
<journal-title>Frontiers in Robotics and AI</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Robot. AI</abbrev-journal-title>
<issn pub-type="epub">2296-9144</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">878246</article-id>
<article-id pub-id-type="doi">10.3389/frobt.2022.878246</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Robotics and AI</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Estimating spatio-temporal fields through reinforcement learning</article-title>
<alt-title alt-title-type="left-running-head">Padrao et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/frobt.2022.878246">10.3389/frobt.2022.878246</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Padrao</surname>
<given-names>Paulo</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="fn" rid="fn1">
<sup>&#x2020;</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fuentes</surname>
<given-names>Jose</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="fn" rid="fn1">
<sup>&#x2020;</sup>
</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Bobadilla</surname>
<given-names>Leonardo</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="fn" rid="fn1">
<sup>&#x2020;</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1274238/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Smith</surname>
<given-names>Ryan N.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Knight Foundation School of Computing and Information Sciences</institution>, <institution>Florida International University</institution>, <addr-line>Miami</addr-line>, <addr-line>FL</addr-line>, <country>United States</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Institute for Environment</institution>, <institution>Florida International University</institution>, <addr-line>Miami</addr-line>, <addr-line>FL</addr-line>, <country>United States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/236114/overview">Kostas Alexis</ext-link>, Norwegian University of Science and Technology, Norway</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/256318/overview">Elias B. Kosmatopoulos</ext-link>, Democritus University of Thrace, Greece</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1689075/overview">Caoyang Yu</ext-link>, Shanghai Jiao Tong University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Leonardo Bobadilla, <email>bobadilla@cs.fiu.edu</email>
</corresp>
<fn fn-type="equal" id="fn1">
<label>
<sup>&#x2020;</sup>
</label>
<p>These authors have contributed equally to this work and share first authorship</p>
</fn>
<fn fn-type="other">
<p>This article was submitted to Field Robotics, a section of the journal Frontiers in Robotics and AI</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>05</day>
<month>09</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>9</volume>
<elocation-id>878246</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>02</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>01</day>
<month>08</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Padrao, Fuentes, Bobadilla and Smith.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Padrao, Fuentes, Bobadilla and Smith</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Prediction and estimation of phenomena of interest in aquatic environments are challenging since they present complex spatio-temporal dynamics. Over the past few decades, advances in machine learning and data processing contributed to ocean exploration and sampling using autonomous robots. In this work, we formulate a reinforcement learning framework to estimate spatio-temporal fields modeled by partial differential equations. The proposed framework addresses problems of the classic methods regarding the sampling process to determine the path to be used by the agent to collect samples. Simulation results demonstrate the applicability of our approach and show that the error at the end of the learning process is close to the expected error given by the fitting process due to added noise.</p>
</abstract>
<kwd-group>
<kwd>spatio-temporal fields</kwd>
<kwd>reinforcement learning</kwd>
<kwd>partial differential equations</kwd>
<kwd>autonomous navigation</kwd>
<kwd>environmental monitoring</kwd>
</kwd-group>
<contract-num rid="cn001">IIS-2034123 IIS-2024733</contract-num>
<contract-num rid="cn002">2017-ST-062000002</contract-num>
<contract-sponsor id="cn001">National Science Foundation<named-content content-type="fundref-id">10.13039/100000001</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">U.S. Department of Homeland Security<named-content content-type="fundref-id">10.13039/100000180</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1 Introduction</title>
<p>The use of autonomous underwater and surface vehicles (AUVs and ASVs) for persistent surveillance in coastal and estuarine environments has been a topic of increasing interest. Examples of studies enabled by these vehicles include the dynamics of physical phenomena, such as ocean fronts, temperature, the onset of harmful algae blooms, salinity profiles, monitoring of seagrass and coral reefs, and fish ecology.</p>
<p>Due to the stochastic nature of these vital environments and the large spatial and temporal scales of significant processes and phenomena, sampling with traditional modalities (e.g., manned boats, buoys) is sparse and predictive models are necessary to augment decision-making to ensure that robotics assets are at the right time and the right place for sampling. However, no single model provides an informed view or representation of these or any other ocean feature that enables intelligent sampling in a principled manner. Therefore, it is critical to forecasting where a robot should sample in the immediate future so that sufficient information is provided on getting to the desired location within a dynamic environment.</p>
<p>Our ideas are inspired by commonly used underwater vehicles in environmental and infrastructure monitoring problems such as the AUV Ecomapper shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. This vehicle can measure water quality parameters, currents, and bathymetric information. However, its mission endurance is limited to a few hours due to its battery constraints, therefore, efficient sampling strategies are needed.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>YSI-Ecomapper autonomous underwater vehicle.</p>
</caption>
<graphic xlink:href="frobt-09-878246-g001.tif"/>
</fig>
<p>The contributions of this paper are the following:<list list-type="simple">
<list-item>
<p>1) A novel framework combining classic methods with reinforcement learning to estimate ocean features, which are modeled as spatio-temporal fields.</p>
</list-item>
<list-item>
<p>2) A technique to get a set of informative samples to estimate spatio-temporal fields, which an agent can collect and process.</p>
</list-item>
<list-item>
<p>3) An extension to the classical partial differential equations fitting methods to estimate models incorporating reinforcement learning.</p>
</list-item>
</list>
</p>
<p>This paper is an expansion of our preliminary work in <xref ref-type="bibr" rid="B32">Padrao et al. (2022)</xref> and extends it to include the estimation of ocean features using partial differential equations. The rest of the paper is organized as follows. In <xref ref-type="sec" rid="s2">Section 2</xref>, we review related work to our approach. <xref ref-type="sec" rid="s3">Section 3</xref> gives the preliminaries needed to build our method and formulate our problem. <xref ref-type="sec" rid="s4">Section 4</xref> presents the Reinforcement Learning Methods used to solve our approach, and the results are presented in <xref ref-type="sec" rid="s5">Section 5</xref>. Finally, <xref ref-type="sec" rid="s6">Section 6</xref> concludes our paper and gives direction for future work.</p>
</sec>
<sec id="s2">
<title>2 Related work</title>
<sec id="s2-1">
<title>2.1 Oceanic monitoring sampling</title>
<p>Over the last decade, it has become clear that autonomous marine vehicles will revolutionize ocean sampling. Several researchers have investigated approaches for ASVs and AUVs for adaptive ocean sampling <xref ref-type="bibr" rid="B59">Yuh (2000)</xref>-<xref ref-type="bibr" rid="B48">Smith et al. (2010c)</xref> and fundamental marine sampling techniques for ASVs and AUVs are discussed in <xref ref-type="bibr" rid="B44">Singh et al. (1997)</xref>. Besides control algorithms for Oceanic Sampling, an alternative approach is to use static sensor placements to maximize information gathering <xref ref-type="bibr" rid="B60">Zhang and Sukhatme (2008)</xref>.</p>
</sec>
<sec id="s2-2">
<title>2.2 Adaptive sampling with marine vehicles</title>
<p>Our work connects also with research on control design for AUVs for adaptive ocean sampling, <xref ref-type="bibr" rid="B57">Yoerger and Slotine (1985)</xref>; <xref ref-type="bibr" rid="B16">Frazzoli et al. (2002)</xref>; <xref ref-type="bibr" rid="B25">Low et al. (2009)</xref>; <xref ref-type="bibr" rid="B38">Rudnick and Perry (2003)</xref>; <xref ref-type="bibr" rid="B59">Yuh (2000)</xref>; <xref ref-type="bibr" rid="B15">Frank and J&#xf3;nsson (2003)</xref>; <xref ref-type="bibr" rid="B17">Graver (2005)</xref>; <xref ref-type="bibr" rid="B3">Barnett et al. (1996)</xref>; <xref ref-type="bibr" rid="B7">Carreras et al. (2000)</xref>; <xref ref-type="bibr" rid="B36">Ridao et al. (2000)</xref>; <xref ref-type="bibr" rid="B37">Rosenblatt et al. (2002)</xref>; <xref ref-type="bibr" rid="B52">Turner and Stevenson (1991)</xref>; <xref ref-type="bibr" rid="B53">Whitcomb et al. (1999</xref>, <xref ref-type="bibr" rid="B54">1998)</xref>; <xref ref-type="bibr" rid="B28">McGann et al. (2008b)</xref>, <xref ref-type="bibr" rid="B27">McGann et al. (2008a)</xref>, <xref ref-type="bibr" rid="B29">McGann et al. (2008c)</xref>, <xref ref-type="bibr" rid="B57">Yoerger and Slotine (1985)</xref>-<xref ref-type="bibr" rid="B29">McGann et al. (2008c)</xref>. Applications of ocean sampling techniques for autonomous vehicles are discussed in <xref ref-type="bibr" rid="B44">Singh et al. (1997)</xref>-<xref ref-type="bibr" rid="B12">Eriksen et al. (2001)</xref>. This body of research differs from the proposed research in that we plan to utilize predictive models in the form of Partial Differential Equations (PDE) to enable effective sampling, navigation, and localization within dynamic features.</p>
</sec>
<sec id="s2-3">
<title>2.3 Reinforcement learning in marine robotics</title>
<p>Reinforcement learning in marine robotics, especially model-free methods, is an attractive alternative to finding plans for several reasons. First, executing marine robotics experiments and deployments is expensive, time-consuming, and often risky; controllers learned through RL can represent significant time and cost savings and shorten the time to deployment. Second, system identification can sometimes be challenging in marine environments due to several factors such as unmodeled dynamics and environment&#x2019;s unknowns; for that reason, model-free RL approaches can be an alternative in these scenarios. Examples of approaches that have used RL for ASVs or AUVs include path planning <xref ref-type="bibr" rid="B58">Yoo and Kim (2016)</xref>, control <xref ref-type="bibr" rid="B10">Cui et al. (2017)</xref> and tracking <xref ref-type="bibr" rid="B26">Martinsen et al. (2020)</xref>.</p>
</sec>
<sec id="s2-4">
<title>2.4 Machine learning for partial differential equations</title>
<p>Our ideas are also connected to the use of Machine Learning models in the context of Partial Differential Equations. Due to their usefulness and impact in several domains, there have been efforts to use modern machine learning techniques to solve high dimensional PDEs <xref ref-type="bibr" rid="B19">Han et al. (2018)</xref>, find appropriate discretizations <xref ref-type="bibr" rid="B19">Han et al. (2018)</xref>, and control them <xref ref-type="bibr" rid="B14">Farahmand et al. (2017)</xref>.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Preliminaries and problem formulation</title>
<sec id="s3-1">
<title>3.1 Partial differential equations</title>
<p>Partial differential equations (PDEs) have been used to model water features of interest such as pH, temperature, turbidity, salinity, and chlorophyll-A. Depending on the nature of their motion, they can be modeled through diffusion, advection or a combination of both. It is important to evaluate how they behave given certain initial conditions to understand their evolution in time. We model the ocean features of interest as a scalar field <inline-formula id="inf1">
<mml:math id="m1">
<mml:mi>f</mml:mi>
<mml:mo>:</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2192;</mml:mo>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:math>
</inline-formula>.</p>
<sec id="s3-1-1">
<title>3.1.1 Advection equation</title>
<p>The advection equation models how a given ocean feature (e.g., algae bloom, oil spill, chemical contaminants, etc.) is transported by a given flow which goes in the direction of <inline-formula id="inf2">
<mml:math id="m2">
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>; it is also called the <italic>transport equation</italic>. The model the space is given by 1.<disp-formula id="e1">
<mml:math id="m3">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>&#x2207;</mml:mo>
<mml:mi>f</mml:mi>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.3333em"/>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#xd7;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>h</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.3333em"/>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(1)</label>
</disp-formula>has the solution shown in <xref ref-type="bibr" rid="B13">Evans (1998)</xref>.<disp-formula id="e2">
<mml:math id="m4">
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>h</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="bold">b</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo>&#x222b;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>g</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi>d</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo>&#xfe38;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mtext>By&#x2009;the&#x2009;Duhamel</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mtext>s&#x2009;principle</mml:mtext>
</mml:mrow>
</mml:munder>
<mml:mo>.</mml:mo>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>Provided that <inline-formula id="inf3">
<mml:math id="m5">
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and has a compact support for each <inline-formula id="inf41">
<mml:math id="m70">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>&#x221E;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. This function <italic>g</italic> models if there are sinks or fonts of the ocean feature in the domain. If the sign of <italic>g</italic> (<bold>x</bold>, <italic>t</italic>) is positive, we consider that point as an ocean feature source; if it is negative, we consider it as an ocean feature sink. On the other hand, <italic>h</italic>(<bold>x</bold>) is the initial distribution of the ocean feature at the beginning.</p>
</sec>
</sec>
<sec id="s3-2">
<title>3.2 Estimation of the parameters of a PDE</title>
<p>Once we chose a PDE as a model, it is crucial to estimate the parameters of the PDE to get a reliable model. This problem belongs to the family of inverse problems since those parameters are sensitive to the observations and given initial conditions <xref ref-type="bibr" rid="B35">Richard et al. (2019)</xref> <xref ref-type="bibr" rid="B1">Antman et al. (2006)</xref>. Because of this sensitivity, it is computationally expensive to find the PDE parameters. There are optimization-based techniques to solve this problem. These techniques problems balance the fitting parameter to the observations and the model sensitivity to those parameters. One of most used methods is the Tikhonov regularization technique <xref ref-type="bibr" rid="B30">Nair and Roy (2020)</xref> <xref ref-type="bibr" rid="B6">Bourgeois and Recoquillay (2018)</xref>. It comprises solving a regularized optimization problem to get a regularized solution. It can be highly efficient depending on the regularization norm (especially if the <italic>L</italic>
<sup>2</sup> norm is used). However, it depends on the regularization constant to achieve good results.</p>
<p>Other approaches to solving the PDE estimation problem take advantage of Bayesian theory <xref ref-type="bibr" rid="B56">Xun et al. (2013)</xref>. In this case, bayesian learning is connected to regularization since the regularization problem coincides with the maximization of the likelihood of the parameters given the observations <xref ref-type="bibr" rid="B5">Bishop (1995)</xref>. Therefore, Machine Learning techniques have been proposed to take advantage of the capability of the models to discover hidden relationships between the input data and the final estimation <xref ref-type="bibr" rid="B21">Jamili and Dua (2021)</xref>. Most of those models use the fact that the samples are given in advance. This work proposes a learning mechanism to select samples that can reasonably estimate the model without exploring the complete domain. This principle has been used in numerical integration problems resulting in several quadrature rules, such that Gauss&#x2013;Kronrod, Gauss-Legendre, or Newton cotes <xref ref-type="bibr" rid="B24">Kincaid et al. (2009)</xref>. Those methods have proven to be more efficient since they can give reliable estimations using few points. In this work, we employ an intelligent agent capable of sampling the environment, searching for reliable samples, and using them to compute the parameters of a PDE. Also, this allows estimating the ocean feature behavior in the domain according to <xref ref-type="disp-formula" rid="e2">Eq. 2</xref>.</p>
</sec>
<sec id="s3-3">
<title>3.3 Model definition</title>
<p>We modeled the marine environment as a 2-D water layer (representing, for example, the surface) denoted as <inline-formula id="inf4">
<mml:math id="m6">
<mml:mi mathvariant="script">W</mml:mi>
<mml:mo>&#x2282;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> where <inline-formula id="inf5">
<mml:math id="m7">
<mml:mi mathvariant="script">W</mml:mi>
</mml:math>
</inline-formula> is an open and bounded set. The obstacle-free state space for our robot is represented by <inline-formula id="inf6">
<mml:math id="m8">
<mml:mi mathvariant="script">S</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">W</mml:mi>
<mml:mo>&#x5c;</mml:mo>
<mml:mi mathvariant="script">O</mml:mi>
</mml:math>
</inline-formula>, where <inline-formula id="inf7">
<mml:math id="m9">
<mml:mi mathvariant="script">O</mml:mi>
</mml:math>
</inline-formula> represents the set of locations that are not accessible to the robot.</p>
<p>To estimate the flow field, we define a scheme of fitting problems based on the known initial conditions of <xref ref-type="disp-formula" rid="e1">Eq. 1</xref> <italic>h</italic>(<bold>x</bold>) and the current samples acquired by the agent. First, we expect to collect samples <italic>y</italic>
<sub>
<italic>i</italic>
</sub> at the location <bold>x</bold>
<sub>
<italic>i</italic>
</sub> and time <italic>t</italic>
<sub>
<italic>i</italic>
</sub> for <italic>i</italic> &#x3d; 1, <italic>&#x2026;</italic> , <italic>n</italic> such that the field minimizes the mean square error of the collected samples. Taking advantage of the closed solution described in the homogeneous version of <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>, the fitting error function <italic>e</italic>
<sub>
<italic>f</italic>
</sub>(<italic>
<bold>b</bold>
</italic>) is defined as<disp-formula id="e3">
<mml:math id="m10">
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi mathvariant="bold">b</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
<label>(3)</label>
</disp-formula>where <italic>n</italic> is the number of collected samples. Next, we define the fitting error <italic>e</italic>
<sub>
<italic>f</italic>
</sub> (<bold>x</bold>
<sub>1</sub>, <italic>&#x2026;</italic> , <bold>x</bold>
<sub>
<italic>n</italic>
</sub>) associated to the locations<disp-formula id="e4">
<mml:math id="m11">
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>The fitting error expressed in <xref ref-type="disp-formula" rid="e4">Eq. 4</xref> measures how well can the best fitted model prediction of the given samples (i.e., predict <italic>y</italic>
<sub>
<italic>i</italic>
</sub> given <bold>x</bold>
<sub>
<italic>i</italic>
</sub> and a parameter vector <italic>
<bold>b</bold>
</italic>. It is the &#x201c;best&#x201d; in the sense that is the minimum achievable error produced by the model given the samples <bold>x</bold>
<sub>1</sub>, &#x2026; , <bold>x</bold>
<sub>
<italic>n</italic>
</sub>). Nevertheless, we can notice that if the locations <bold>x</bold>
<sub>1</sub>, <italic>&#x2026;</italic> , <bold>x</bold>
<sub>
<italic>n</italic>
</sub> are wrongly chosen, the fitting error can be low, but its capability of estimating the entire field may lead to over-fitting problems. To handle this issue, we add a new error term based on how well one sample can be predicted using the remaining ones. This is known as cross-validation. In this case, we propose the following cross-validation scheme. For each 1 &#x2264; <italic>i</italic> &#x2264; <italic>n</italic> let <inline-formula id="inf8">
<mml:math id="m12">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> defined as<disp-formula id="e5">
<mml:math id="m13">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi mathvariant="normal">a</mml:mi>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mi mathvariant="normal">g</mml:mi>
<mml:mspace width="0.17em"/>
<mml:mi mathvariant="normal">min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>We define the cross validation error <italic>e</italic>
<sub>
<italic>cv</italic>
</sub> (<bold>x</bold>
<sub>1</sub>, <italic>&#x2026;</italic> , <bold>x</bold>
<sub>
<italic>n</italic>
</sub>) as<disp-formula id="e6">
<mml:math id="m14">
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
</mml:math>
<label>(6)</label>
</disp-formula>and it measures on average how well the samples can fit a model, which is estimating the remaining sample. This avoids the over-fitting problems and allows to measure how reliable are the taken samples. Lastly, we define the total error <italic>e</italic>
<sub>
<italic>total</italic>
</sub> (<bold>x</bold>
<sub>1</sub>, <italic>&#x2026;</italic> , <bold>x</bold>
<sub>
<italic>n</italic>
</sub>) or just <italic>e</italic>
<sub>
<italic>total</italic>
</sub> as<disp-formula id="e7">
<mml:math id="m15">
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">total</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>This error compound aims to have an equal trade off between the sample estimation measured by <italic>e</italic>
<sub>
<italic>f</italic>
</sub> (<bold>x</bold>
<sub>1</sub>, <italic>&#x2026;</italic> , <bold>x</bold>
<sub>
<italic>n</italic>
</sub>) and the reliability of the samples measured by <italic>e</italic>
<sub>
<italic>cv</italic>
</sub> (<bold>x</bold>
<sub>1</sub>, <italic>&#x2026;</italic> , <bold>x</bold>
<sub>
<italic>n</italic>
</sub>).</p>
<p>The agent is modeled as a rigid body that moves in <inline-formula id="inf9">
<mml:math id="m16">
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> and can be described by a non-linear system as<disp-formula id="e8">
<mml:math id="m17">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mo>&#x307;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">u</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi mathvariant="bold">z</mml:mi>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">r</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
<p>Such that <italic>f</italic> (<bold>x</bold>, <bold>u</bold>) is the motion model of the vehicle, <italic>o</italic> (<bold>x</bold>, <bold>r</bold>) is the observation model of the vehicle, and <bold>r</bold> are additive, zero-mean noise to account for modeling errors and sensor imperfections.</p>
<p>Let <inline-formula id="inf10">
<mml:math id="m18">
<mml:mi mathvariant="script">S</mml:mi>
</mml:math>
</inline-formula> be the state space, i e., the set of all possible states <inline-formula id="inf11">
<mml:math id="m19">
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">S</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf12">
<mml:math id="m20">
<mml:mi mathvariant="script">U</mml:mi>
</mml:math>
</inline-formula> be the action space, which represents the set of all possible actions. Therefore, a configuration of the vehicle can be described by<disp-formula id="e9">
<mml:math id="m21">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi mathvariant="bold">u</mml:mi>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(9)</label>
</disp-formula>
</p>
<p>In which (<italic>x</italic>, <italic>y</italic>) is the position of the vehicle and <italic>&#x3d5;</italic> &#x2208; (&#x2212;<italic>&#x3c0;</italic>/4, <italic>&#x3c0;</italic>/4) is the vehicle&#x2019;s heading; the forward speed <italic>v</italic> and the angular velocity of the agent orientation <italic>&#x3c9;</italic> can be set directly by the action variables <italic>u</italic>
<sub>
<italic>v</italic>
</sub> and <italic>u</italic>
<sub>
<italic>&#x3c9;</italic>
</sub>, respectively. The kinematic model of the agent <inline-formula id="inf13">
<mml:math id="m22">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mo>&#x307;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">u</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is described by <xref ref-type="disp-formula" rid="e10">Eq. 10</xref>.<disp-formula id="e10">
<mml:math id="m23">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>&#x307;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>cos</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>&#x307;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>sin</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
<mml:mo>&#x307;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(10)</label>
</disp-formula>where <italic>v</italic>
<sub>
<italic>x</italic>
</sub> and <italic>v</italic>
<sub>
<italic>y</italic>
</sub> account for the velocity components of the environment (flow field) in <italic>x</italic> and <italic>y</italic> directions.</p>
<p>Let <inline-formula id="inf14">
<mml:math id="m24">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">S</mml:mi>
</mml:math>
</inline-formula> be the initial location of the agent. It is assumed that the agent takes advantage of ocean current dynamics as it drifts and moves forward with or against the currents and rotates clockwise or counterclockwise. Therefore, the action space is defined as<disp-formula id="e11">
<mml:math id="m25">
<mml:mi mathvariant="script">U</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#xd7;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(11)</label>
</disp-formula>
</p>
<p>We discretize the action space to obtain a finite subset of <inline-formula id="inf15">
<mml:math id="m26">
<mml:mi mathvariant="script">U</mml:mi>
</mml:math>
</inline-formula> defined as<disp-formula id="e12">
<mml:math id="m27">
<mml:mi mathvariant="script">A</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:mn>2,0</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>0,0</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(12)</label>
</disp-formula>
</p>
<p>The description of the actions of the agent are summarized in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Description of the actions of the agent.</p>
</caption>
<table>
<tbody valign="top">
<tr>
<td align="left">(<italic>v</italic>
<sub>max</sub>, 0)</td>
<td align="left">moving forward with maximum speed <italic>v</italic>
<sub>max</sub>
</td>
</tr>
<tr>
<td align="left">(<italic>v</italic>
<sub>max</sub>/2, 0)</td>
<td align="left">moving forward at half the speed <italic>v</italic>
<sub>max</sub>/2</td>
</tr>
<tr>
<td align="left">(<italic>v</italic>
<sub>max</sub>, &#x2212; <italic>&#x3d5;</italic>)</td>
<td align="left">turning clockwise by <italic>&#x3d5;</italic> and moving with maximum speed</td>
</tr>
<tr>
<td align="left">(<italic>v</italic>
<sub>max</sub>, &#x2b; <italic>&#x3d5;</italic>)</td>
<td align="left">turning counterclockwise by <italic>&#x3d5;</italic> and moving with maximum speed</td>
</tr>
<tr>
<td align="left">(<italic>v</italic>
<sub>max</sub>/2, &#x2b; <italic>&#x3d5;</italic>)</td>
<td align="left">turning clockwise by <italic>&#x3d5;</italic> and moving at half the maximum speed</td>
</tr>
<tr>
<td align="left">(<italic>v</italic>
<sub>max</sub>/2, &#x2b; <italic>&#x3d5;</italic>)</td>
<td align="left">turning counterclockwise by <italic>&#x3d5;</italic> and moving at half the maximum speed</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For the observation model, we assume that the vehicle uses an IMU to measure its heading angle <italic>&#x3d5;</italic> and has access to GPS at surface level. Also, the vehicle can observe its state with uncertainties due to sensor imperfections and the dynamic nature of the underwater environment. The observation space <inline-formula id="inf16">
<mml:math id="m28">
<mml:mi mathvariant="script">Z</mml:mi>
</mml:math>
</inline-formula>, the set of all possible sensor observations <inline-formula id="inf17">
<mml:math id="m29">
<mml:mi mathvariant="bold">z</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">Z</mml:mi>
</mml:math>
</inline-formula>, is given by<disp-formula id="e13">
<mml:math id="m30">
<mml:mi mathvariant="script">Z</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#xd7;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#xd7;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(13)</label>
</disp-formula>
</p>
<p>The observation model <italic>o</italic>(<bold>x</bold>) is represented by<disp-formula id="e14">
<mml:math id="m31">
<mml:mi mathvariant="bold">z</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">r</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="bold">r</mml:mi>
</mml:math>
<label>(14)</label>
</disp-formula>
</p>
<p>Where <inline-formula id="inf18">
<mml:math id="m32">
<mml:mi mathvariant="bold">r</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> is noise distributed as <inline-formula id="inf19">
<mml:math id="m33">
<mml:mi>r</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn mathvariant="bold">0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> with &#x3a3; a diagonal covariance matrix to account for modeling errors and sensor imperfections and <italic>I</italic> is the identity matrix. It was also considered that the measurement noises of each sensor are uncorrelated and have constant covariance.</p>
<p>These elements allow us to formulate the following problem.</p>
<p>
<bold>Problem</bold>: <italic>Given an aquatic environment</italic> <inline-formula id="inf20">
<mml:math id="m34">
<mml:mi mathvariant="script">W</mml:mi>
</mml:math>
</inline-formula>
<italic>, the action set of the agent</italic> <italic>A</italic>
<italic>, the state space</italic> <inline-formula id="inf21">
<mml:math id="m35">
<mml:mi mathvariant="script">S</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">W</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>, the vehicle&#x2019;s motion model, observations of a given ocean feature in several locations, estimate the flow field (and therefore the ocean feature distribution) by minimizing cross-validation error and error fitting within a given fixed number of steps.</italic>
</p>
</sec>
</sec>
<sec id="s4">
<title>4 Methods</title>
<p>Because of the computational effort required to tackle problems with large state spaces, tabular learning methods may be unfeasible <xref ref-type="bibr" rid="B51">Sutton and Barto (2018)</xref>. As a result, combining approximation solutions of reinforcement learning methods with generalization techniques yields a computationally viable solution for real-world problems.</p>
<p>To update the agent policy based on actions taken, we suggest using SARSA(<italic>&#x3bb;</italic>) algorithm in conjunction with a linear function approximation technique based on stochastic semi-gradient descent. The agent is in state <inline-formula id="inf22">
<mml:math id="m36">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">S</mml:mi>
</mml:math>
</inline-formula>, takes action <italic>a</italic>
<sub>
<italic>t</italic>
</sub> &#x2208; <italic>A</italic>, and receives reward <italic>r</italic>
<sub>
<italic>t</italic>
</sub> at each time step <italic>t</italic>. In this method, we can estimate the action-value function <inline-formula id="inf23">
<mml:math id="m37">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> for the behavior policy <italic>&#x3c0;</italic> in a systematic way. The SARSA(<italic>&#x3bb;</italic>) algorithm also chooses an action based on the <italic>&#x25b;</italic>-greedy approach. Therefore, actions with the highest estimated values are chosen with a high probability, but random actions are picked with a low probability <italic>&#x25b;</italic> independent of their estimated values.</p>
<p>The action-value function approximation is defined as<disp-formula id="e15">
<mml:math id="m38">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2248;</mml:mo>
<mml:mi>q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(15)</label>
</disp-formula>
</p>
<p>Where <inline-formula id="inf24">
<mml:math id="m39">
<mml:mi mathvariant="bold">w</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> is the weight vector of the semi-gradient descent method. The weight vector update is defined by <xref ref-type="disp-formula" rid="e16">Eq. 16</xref>
<disp-formula id="e16">
<mml:math id="m40">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x25bd;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(16)</label>
</disp-formula>where <italic>&#x3b1;</italic> is the step size, and <italic>G</italic>
<sub>
<italic>t</italic>
</sub> is the return function. Applying linear function approximation, <xref ref-type="disp-formula" rid="e15">Eq. 15</xref> can be modified to<disp-formula id="e17">
<mml:math id="m41">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x22a4;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(17)</label>
</disp-formula>
</p>
<p>Where <inline-formula id="inf25">
<mml:math id="m42">
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> is the feature vector. Each component <italic>x</italic>
<sub>
<italic>i</italic>
</sub> (<bold>s</bold>, <italic>a</italic>) of the feature vector corresponds to a feature of the state-action pair (<bold>s</bold>, <italic>a</italic>) and maps it to a real value. As a result, the gradient of the approximate action-value function can be modified as <inline-formula id="inf26">
<mml:math id="m43">
<mml:mo>&#x25bd;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <xref ref-type="disp-formula" rid="e16">Eq. 16</xref> reduces to<disp-formula id="e18">
<mml:math id="m44">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(18)</label>
</disp-formula>
</p>
<sec id="s4-1">
<title>4.1 Reward function design</title>
<p>In reinforcement learning problems, designing a reward function is not a trivial task, <xref ref-type="bibr" rid="B31">Ng et al. (1999)</xref>. To avoid spurious exploration, we defined a terminal condition with a fixed number of observations taken to determine when to reset the environment for a new episode. To encourage the agent to minimize the fitting and cross-validation errors within a given number of steps, we provide a reward that is inversely proportional to the sum of the errors at the terminal state. For each episode, the agent collects 20 observations, and the reward function is defined as<disp-formula id="e19">
<mml:math id="m45">
<mml:mi>r</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mfrac>
<mml:mrow>
<mml:mn>100</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">total</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if&#x2009;the&#x2009;number&#x2009;of&#x2009;observations</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>20</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if&#x2009;the&#x2009;number&#x2009;of&#x2009;observations</mml:mtext>
<mml:mo>&#x3c;</mml:mo>
<mml:mn>5</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>1.5</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>otherwise</mml:mtext>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(19)</label>
</disp-formula>where <italic>c</italic>
<sub>1</sub> is the ratio between total error at previous and current step and <inline-formula id="inf27">
<mml:math id="m46">
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">total</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mn>20</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>0.4</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>.</p>
</sec>
<sec id="s4-2">
<title>4.2 Linear methods and feature construction: tile coding</title>
<p>In reinforcement learning systems, feature construction is critical since it values each state of the agent. The main techniques for feature construction of linear methods are polynomial-based, Fourier basis and tile coding <xref ref-type="bibr" rid="B43">Sherstov and Stone (2005)</xref>. As such, tile coding is a computationally effective feature design technique that divides the state space into divisions called tiles. Each element in the tiling is referred to as a tile. Different tilings are separated by a fixed-size fraction of the tile width <xref ref-type="bibr" rid="B51">Sutton and Barto (2018)</xref>. If there are <italic>n</italic> tilings and each tiling has <italic>m</italic> &#xd7; <italic>m</italic> tiles, the feature vector is <inline-formula id="inf28">
<mml:math id="m47">
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>. One of the main advantages of using tile coding with binary feature vectors is that the weighted sum in the approximate value function (<xref ref-type="disp-formula" rid="e17">Eq. 17</xref>) is easy to compute. <xref ref-type="fig" rid="F2">Figure 2</xref> shows an example of the representation of tile coding for two-dimensional continuous state space. In this case, <bold>x</bold>(<bold>s</bold>) is a feature vector with twelve components, one for each tile in each tiling. Each component of <bold>x</bold>(<bold>s</bold>) is inactive (zero-valued) except active components <italic>x</italic>
<sub>0</sub>(<bold>s</bold>), <italic>x</italic>
<sub>4</sub>(<bold>s</bold>) and <italic>x</italic>
<sub>8</sub>(<bold>s</bold>) that corresponds to the current location states of the agent. As a consequence, there are <italic>n</italic> active features in <bold>x</bold>(<bold>s</bold>) because every position in state space falls into precisely one tile in each of the <italic>n</italic> tilings. Let the weight vector <inline-formula id="inf29">
<mml:math id="m48">
<mml:mi mathvariant="bold">w</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x22a4;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> and the action space be <italic>A</italic> &#x3d; {<italic>a</italic>
<sub>0</sub>, <italic>a</italic>
<sub>1</sub>, <italic>a</italic>
<sub>2</sub>}. The feature vector regarding actions <italic>a</italic>
<sub>0</sub>, <italic>a</italic>
<sub>1</sub> and <italic>a</italic>
<sub>2</sub> is <bold>x</bold> (<bold>s</bold>, <italic>a</italic>
<sub>0</sub>) &#x3d; <bold>x</bold> (<bold>s</bold>, <italic>a</italic>
<sub>1</sub>) &#x3d; <bold>x</bold> (<bold>s</bold>, <italic>a</italic>
<sub>2</sub>) &#x3d; [1,0,0,0,1,0,0,0,1,0,0,0]<sup>
<italic>&#x22a4;</italic>
</sup>. Thus, the action-value function approximation <inline-formula id="inf30">
<mml:math id="m49">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> described in <xref ref-type="disp-formula" rid="e17">Eq. 17</xref> is computed as<disp-formula id="e20">
<mml:math id="m50">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
<label>(20)</label>
</disp-formula>for each action in action space.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>An example of tile coding representation of a continuous 2D state space. The agent is a point in the state space to be represented by the active tiles of the three tilings. Active tiles are described by solid lines and have a value of 1. Inactive tiles are described by dashed lines and have a value of 0. Therefore, the feature vector is <bold>x</bold>(<italic>s</italic>) &#x3d; [1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0].</p>
</caption>
<graphic xlink:href="frobt-09-878246-g002.tif"/>
</fig>
<p>With tile coding, design issues for discrimination and generalization should be taken into account. The number and size of tiles, for example, affect the granularity of state discrimination, or how far the agent must move in state space to change at least one component of the feature vector. Aside from that, the shape of the tilings and the offset distance between them have an impact on generalization. As an example, if tiles are stretched along one dimension in state space, generalization will extend to states along that dimension as well <xref ref-type="bibr" rid="B51">Sutton and Barto (2018)</xref>.</p>
</sec>
<sec id="s4-3">
<title>4.3 Eligibility traces in reinforcement learning</title>
<p>In problems with large state spaces, the eligibility trace is a technique to promote computational efficiency of reinforcement learning methods. The eligibility trace is a vector <inline-formula id="inf31">
<mml:math id="m51">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> whose components maintain track of which components of the weight vector <bold>w</bold>
<sub>
<italic>t</italic>
</sub> have contributed to recent state values and temporarily records the occurrence of estimated events. Therefore, components of <bold>w</bold>
<sub>
<italic>t</italic>
</sub> that most frequently contribute to valuations of previous states are considered <italic>eligible</italic> for an update <xref ref-type="bibr" rid="B45">Singh et al. (1995)</xref>. Eligibility trace components are updated based on the trace-decay parameter <italic>&#x3bb;</italic> &#x2208; [0, 1], which specifies the pace at which the trace fades away exponentially. In contrast with <italic>n</italic>-step methods that perform action-value updates after a given number of steps, eligibility traces provide updates continually over the learning process. For this reason, agent behavior can be modified right after a new state is found rather than being delayed n steps.</p>
<p>The action-value return function <italic>G</italic>
<sub>
<italic>t</italic>
</sub> is a function approximation of the <italic>n</italic>-step return defined as<disp-formula id="e21">
<mml:math id="m52">
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>&#x22ef;</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mspace width="5.69046pt"/>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x3c;</mml:mo>
<mml:mi>T</mml:mi>
</mml:math>
<label>(21)</label>
</disp-formula>where <italic>&#x3b3;</italic> is the discount rate that regulates the relative importance of near-sighted and far-sighted rewards. Thus, the <italic>&#x3bb;</italic>-return <inline-formula id="inf32">
<mml:math id="m53">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is written as<disp-formula id="e22">
<mml:math id="m54">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
<label>(22)</label>
</disp-formula>
</p>
<p>In this way, the update rule for the weight vector in <xref ref-type="disp-formula" rid="e16">Eq. 16</xref> is modified as follows<disp-formula id="e23">
<mml:math id="m55">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2207;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(23)</label>
</disp-formula>where the action-value estimation error <italic>&#x3b4;</italic>
<sub>
<italic>t</italic>
</sub> is defined as<disp-formula id="e24">
<mml:math id="m56">
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(24)</label>
</disp-formula>
</p>
<p>The action-value representation of the eligibility trace is defined as<disp-formula id="e25">
<mml:math id="m57">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mn mathvariant="bold">0</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mi>&#x3bb;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>&#x25bd;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
<mml:mn>0</mml:mn>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(25)</label>
</disp-formula>
</p>
<p>The complete algorithm for SARSA(<italic>&#x3bb;</italic>) is presented in <xref ref-type="table" rid="T1">table 1</xref>, <xref ref-type="bibr" rid="B51">Sutton and Barto (2018)</xref>.</p>
<p>
<statement content-type="algorithm" id="Enun_algorithm_1">
<label>Algorithm 1</label>
<p>SARSA(<italic>&#x3bb;</italic>) with linear function approximation.</p>
<p>
<inline-graphic xlink:href="frobt-09-878246-fx1.tif"/>
</p>
</statement>
</p>
</sec>
</sec>
<sec sec-type="results|discussion" id="s5">
<title>5 Results and discussion</title>
<p>Simulation results are presented in <xref ref-type="fig" rid="F3">Figure 3</xref>. For each simulation, we ran a set of simulations consisting of 400 episodes with 20 steps each to investigate how the agent behaves under the effect of the flow field and the variation of the step size <italic>&#x3b1;</italic> and the trace decay rate <italic>&#x3bb;</italic>. For tile coding, we used eight tilings, each tiling containing 8 &#xd7; 8 tiles. Thus, the feature vector is <inline-formula id="inf33">
<mml:math id="m58">
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>8</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>8</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>8</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>. Throughout the simulation, the <italic>&#x25b;</italic>-greedy parameter was fixed at 0.15, indicating that actions with the highest estimated returns are selected 75% of the time. In this way, higher values of the <italic>&#x25b;</italic>-greedy parameter can lead to an increase in the exploratory behavior of the agent. Besides, to perform the contaminant estimation we selected the functions, keeping the notation at <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>, as <italic>g</italic> (<bold>x</bold>, <italic>t</italic>) &#x3d; 0 to mean that there are no more sources of the Ocean feature around the domain and<disp-formula id="e26">
<mml:math id="m59">
<mml:mi>h</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="bold">c</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(26)</label>
</disp-formula>
</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Simulation results of the proposed learning framework with the variation of step size <italic>&#x3b1;</italic> <bold>(A</bold>,<bold>B)</bold>, trace decay rate <italic>&#x3bb;</italic> <bold>(C</bold>,<bold>D)</bold>, and <italic>&#x3f5;</italic>-greedy parameter <bold>(E</bold>,<bold>F)</bold> with respect to the total number of steps per episode and returns per episode.</p>
</caption>
<graphic xlink:href="frobt-09-878246-g003.tif"/>
</fig>
<p>
<italic>h</italic>(<bold>x</bold>) models the initial distribution of the ocean feature. Where <italic>a</italic> &#x3d; 100 controls the scale, <italic>q</italic> &#x3d; 2 manages the decay rate, &#x2016; &#x22c5;&#x2016;<sub>
<italic>p</italic>
</sub> is the <italic>L</italic>
<sup>
<italic>p</italic>
</sup> norm defined in <inline-formula id="inf34">
<mml:math id="m60">
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> for 1 &#x2264; <italic>p</italic> &#x2264; <italic>&#x221e;</italic>, <italic>&#x3c3;</italic> &#x3d; 40 combined with <italic>q</italic> can be interpreted as the standard deviation of <italic>h</italic>(<bold>x</bold>) and <italic>c</italic> is the point where the ocean feature reaches its maximum. Lastly, each observation was corrupted using Gaussian noise <inline-formula id="inf35">
<mml:math id="m61">
<mml:mi>&#x3f5;</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>
<xref ref-type="fig" rid="F3">Figures 3A,B</xref> shows the total estimation error and agent reward with respect to variation of the step size <italic>&#x3b1;</italic>. The step size is interpreted as the fraction of the way the agent moves towards the target. Smaller values of the step size <italic>&#x3b1;</italic> provided an increase in rewards through the episodes and a slight decrease in the estimation error. Additionally, the trace decay rate <italic>&#x3bb;</italic> was fixed at 0.9. <xref ref-type="fig" rid="F3">Figures 3C,D</xref> shows the total estimation error and agent reward with respect to variation of trace decay rate <italic>&#x3bb;</italic> of the eligibility trace <bold>z</bold>
<sub>
<italic>t</italic>
</sub> in <xref ref-type="disp-formula" rid="e25">Eq. 25</xref>. Larger values of <italic>&#x3bb;</italic> resulted in a significant decrease in the estimation error and an increase in rewards. <xref ref-type="fig" rid="F3">Figures 3E,F</xref> shows the total estimation error and agent reward with respect to variation of the <italic>&#x25b;</italic>-greedy parameter. Although higher values of the <italic>&#x25b;</italic>-greedy parameter can lead to higher exploratory agent behavior, simulation shows similar results with different values of <italic>&#x25b;</italic>.</p>
<p>
<xref ref-type="fig" rid="F4">Figure 4</xref> shows different paths taken by the agent in different simulation scenarios. Circles represent level sets of the ocean feature distribution at the end of the simulation. We notice the highest feature concentration location at the center, and the outer circles represent lower ocean feature levels assuming a radial diffusion. Optimal paths have the characteristic of following the ocean feature and crossing its level sets to obtain information at different levels to estimate the entire field.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Agent path with the variation of step size <italic>&#x3b1;</italic> <bold>(A)</bold>, trace decay rate <italic>&#x3bb;</italic> <bold>(B)</bold>, and <italic>&#x25b;</italic>-greedy parameter <bold>(C)</bold>. <bold>(D)</bold> Difference between the true and the estimated ocean features.</p>
</caption>
<graphic xlink:href="frobt-09-878246-g004.tif"/>
</fig>
<p>Finally, <xref ref-type="fig" rid="F4">Figure 4D</xref> shows the difference between the estimated and the true ocean feature distributions at the final time. Both of them are similar once the parameters for the true flow field is <bold>b</bold> &#x3d; (5,5)<sup>
<italic>&#x22a4;</italic>
</sup> and the estimated is <inline-formula id="inf36">
<mml:math id="m62">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>4.928</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>5.037</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x22a4;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>. We notice that <inline-formula id="inf37">
<mml:math id="m63">
<mml:mo stretchy="false">&#x2016;</mml:mo>
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">&#x2016;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2248;</mml:mo>
<mml:mn>0.0809</mml:mn>
</mml:math>
</inline-formula>, but <italic>e</italic>
<sub>
<italic>total</italic>
</sub> is close to 2&#xa0;at the end of the reinforcement learning process. This can be explained because the fitting error <italic>e</italic>
<sub>
<italic>f</italic>
</sub> is on average the difference between the real observation and the corrupted one. If we assume that the true observations <italic>f</italic> (<bold>x</bold>
<sub>
<italic>i</italic>
</sub>, <italic>t</italic>
<sub>
<italic>i</italic>
</sub>) and the corrupted ones <italic>y</italic>
<sub>
<italic>i</italic>
</sub> are related by <italic>y</italic>
<sub>
<italic>i</italic>
</sub> &#x3d; <italic>f</italic> (<bold>x</bold>
<sub>
<italic>i</italic>
</sub>, <italic>t</italic>
<sub>
<italic>i</italic>
</sub>) &#x2b; <italic>&#x3f5;</italic>
<sub>
<italic>i</italic>
</sub> for each <italic>i</italic>, where <italic>&#x3f5;</italic>
<sub>
<italic>i</italic>
</sub> are i.i.d. Random variables such that <inline-formula id="inf38">
<mml:math id="m64">
<mml:msub>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x223c;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> for each <italic>i</italic>. Then, we can notice <xref ref-type="bibr" rid="B5">Bishop (1995)</xref> that both, the fitting error <italic>e</italic>
<sub>
<italic>f</italic>
</sub> and the cross-validation error <italic>e</italic>
<sub>
<italic>cv</italic>
</sub> approximate the variance <italic>&#x3c3;</italic>
<sup>2</sup>. Since <inline-formula id="inf39">
<mml:math id="m65">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>Var</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> and<disp-formula id="e27">
<mml:math id="m66">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:munder>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi mathvariant="bold">b</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo>&#xfe38;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mtext>as&#x2009;</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo>&#xfe38;</mml:mo>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:mtext>as&#x2009;</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(27)</label>
</disp-formula>
</p>
<p>By the large numbers law. Therefore, <italic>e</italic>
<sub>
<italic>total</italic>
</sub> &#x3d; <italic>e</italic>
<sub>
<italic>cv</italic>
</sub> &#x2b; <italic>e</italic>
<sub>
<italic>f</italic>
</sub> &#x2248; 2.</p>
<p>To increase the complexity of our simulations, we chose to a double-gyre system; a commonly occurring oceanic feature that is relatively easy to model and analyse <xref ref-type="bibr" rid="B34">Provost and Verron (1987)</xref>, <xref ref-type="bibr" rid="B55">Wolligandt et al. (2020)</xref>, <xref ref-type="bibr" rid="B49">Smith et al. (2015)</xref>, <xref ref-type="bibr" rid="B39">Shadden et al. (2005)</xref>, <xref ref-type="bibr" rid="B41">Shen et al. (1999)</xref>. The flow is described by the stream-function<disp-formula id="e28">
<mml:math id="m67">
<mml:mi>&#x3c8;</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>sin</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mi>sin</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(28)</label>
</disp-formula>
</p>
<p>Where <italic>f</italic>
<sub>
<italic>dg</italic>
</sub> (<italic>x</italic>, <italic>t</italic>) &#x3d; <italic>a</italic>(<italic>t</italic>)<italic>x</italic>
<sup>2</sup> &#x2b; <italic>b</italic>(<italic>t</italic>)<italic>x</italic>, <italic>a</italic>(<italic>t</italic>) &#x3d; <italic>&#x3bc;</italic>&#x2009;sin (<italic>&#x3c9;</italic>
<sub>
<italic>dg</italic>
</sub>
<italic>t</italic>), <italic>b</italic>(<italic>t</italic>) &#x3d; 100, &#x2212;,200<italic>&#x3bc;</italic>&#x2009;sin (<italic>&#x3c9;</italic>
<sub>
<italic>dg</italic>
</sub>
<italic>t</italic>) over the domain (0, 200) &#xd7; (0, 100). In <xref ref-type="disp-formula" rid="e28">Eq. 28</xref>, <italic>A</italic> describes the magnitude of the velocity vectors, <italic>&#x3c9;</italic>
<sub>
<italic>dg</italic>
</sub> is the frequency of gyre oscillation, and <italic>&#x3bc;</italic> is the amplitude of motion of the line separating the gyres, <xref ref-type="bibr" rid="B39">Shadden et al. (2005)</xref>. Then the flow field produced the double gyre is the vectorial field <bold>v</bold> (<italic>x</italic>, <italic>y</italic>, <italic>t</italic>) &#x3d; &#x2207;<italic>&#x3c8;</italic>(<italic>x</italic>, <italic>y</italic>, <italic>t</italic>).</p>
<p>The PDE (1) considers constant flow fields given by the vector <bold>b</bold>. For this reason, we need to consider an extension of this equation defined in the bounded domain <inline-formula id="inf40">
<mml:math id="m68">
<mml:mi mathvariant="script">W</mml:mi>
</mml:math>
</inline-formula> called the <italic>advection-diffusion</italic> equation<disp-formula id="e29">
<mml:math id="m69">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>&#x2207;</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi mathvariant="bold">v</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.3333em"/>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">W</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi>f</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>h</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="bold">x</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.3333em"/>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x2202;</mml:mi>
<mml:mi mathvariant="bold">n</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.3333em"/>
<mml:mi mathvariant="bold">x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>&#x2202;</mml:mi>
<mml:mi mathvariant="script">W</mml:mi>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(29)</label>
</disp-formula>
</p>
<p>Which considers non-constant flow fields, the addition of the diffusion term <italic>&#x3c1;</italic>&#x394;<italic>f</italic> with a small diffusivity coefficient <italic>&#x3c1;</italic> and the homogeneous Neumann boundary conditions with outer normal vector <bold>n</bold> is due to the numerical difficulties found and reported when the pure advection equation is solved by numerical methods <xref ref-type="bibr" rid="B13">Evans (1998)</xref>.</p>
<p>
<xref ref-type="fig" rid="F5">Figure 5</xref> illustrates the spread of a given ocean feature through time under the influence of a double-gyre flow field.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Spread of a given ocean feature through time under the influence of a double-gyre flow field with <italic>A</italic> &#x3d; 10, <italic>&#x3bc;</italic> &#x3d; 0.25, and <italic>&#x3c9;</italic>
<sub>
<italic>dg</italic>
</sub> &#x3d; <italic>&#x3c0;</italic>/5&#xa0;at <bold>(A)</bold> <italic>t</italic> &#x3d; 0&#xa0;s <bold>(B)</bold> <italic>t</italic> &#x3d; 5&#xa0;s <bold>(C)</bold> <italic>t</italic> &#x3d; 10&#xa0;s.</p>
</caption>
<graphic xlink:href="frobt-09-878246-g005.tif"/>
</fig>
<p>For the simulation of the reinforcement learning framework and the double-gyre system, we ran a total of 10 episodes with 10 steps each. Although we used tile coding as a computationally effective feature in our reinforcement learning framework, it is still necessary to solve the partial differential equation given in <xref ref-type="disp-formula" rid="e29">Eq. 29</xref> at each step of each episode. Moreover, to find the fitting and cross-validation errors it is necessary to solve an optimization problem involving the solution of the PDE as a subroutine several times. In order to simulate this computationally intensive optimization algorithm, we took advantage of Florida International University&#x2019;s Phosphorus, a 20-core Intel(R) Xeon(R) Silver 4114 CPU at 2.20&#xa0;GHz server, and a Bayesian optimization algorithm intended to handle black box functions which are costly to evaluate. True values for the frequency of gyre oscillation <italic>&#x3c9;</italic>
<sub>
<italic>dg</italic>
</sub> and the amplitude of gyre motion <italic>&#x3bc;</italic> are set to 0.25 and <italic>&#x3c0;</italic>/5 &#x2248; 0.6283, respectively. Considering only 10 episodes, the learned values for <italic>&#x3c9;</italic>
<sub>
<italic>dg</italic>
</sub> and <italic>&#x3bc;</italic> were 0.2481 and 0.6344, respectively, with the smallest estimation error in episode 7. Learned parameters are summarized in <xref ref-type="table" rid="T2">Table 2</xref> and <xref ref-type="fig" rid="F6">Figure 6</xref> shows the paths taken by the agent at different episodes while estimating the flow field. The agent follows the contaminant, but careful examination should be made at the gyre separation line once the agent could take an undesired action, resulting in feature mistracking. This behavior is illustrated when we compared paths in <xref ref-type="fig" rid="F6">Figures 6A,B</xref>.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Learning simulation parameters and results. True and learned double-gyre model parameters over 10 learning episodes.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th colspan="2" align="left">Learning simulation parameters</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">&#xa0;Number of episodes</td>
<td align="left">10 with 10 steps each</td>
</tr>
<tr>
<td align="left">&#xa0;Tile coding</td>
<td align="left">8 tilings with 8 &#xd7; 8 tiles each</td>
</tr>
<tr>
<td align="left">&#xa0;Step size <italic>&#x3b1;</italic>
</td>
<td align="left">0.9</td>
</tr>
<tr>
<td align="left">&#xa0;Trace decay rate <italic>&#x3bb;</italic>
</td>
<td align="left">0.9</td>
</tr>
<tr>
<td align="left">&#xa0;<italic>&#x25b;</italic>-greedy parameter</td>
<td align="left">0.15</td>
</tr>
<tr>
<td colspan="2" align="center">Double-Gyre Model Parameters and Results</td>
</tr>
<tr>
<td align="left">&#xa0;True <italic>&#x3bc;</italic>
</td>
<td align="left">0.25</td>
</tr>
<tr>
<td align="left">&#xa0;Learned <italic>&#x3bc;</italic>
</td>
<td align="left">0.2481</td>
</tr>
<tr>
<td align="left">&#xa0;True <italic>&#x3c9;</italic>
<sub>
<italic>dg</italic>
</sub>
</td>
<td align="left">
<italic>&#x3c0;</italic>/5 &#x2248; 0.6283</td>
</tr>
<tr>
<td align="left">&#xa0;Learned <italic>&#x3c9;</italic>
<sub>
<italic>dg</italic>
</sub>
</td>
<td align="left">0.6344</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Path followed by the agent, double gyre field to estimate the parameters described in <xref ref-type="table" rid="T2">Table 2</xref> at different episodes <bold>(A)</bold> episode 3, <bold>(B)</bold> episode 7.</p>
</caption>
<graphic xlink:href="frobt-09-878246-g006.tif"/>
</fig>
</sec>
<sec id="s6">
<title>6 Conclusion</title>
<p>In this work, we presented a novel method for estimating a spatio-temporal field using informative samples taken by a trained agent. This allowed estimating the distribution of the ocean feature, keeping track of its localization and distribution at each time. It was possible to address the problem of selecting meaning samples such that they help to perform the estimation of the field. Therefore, this develops a different perspective in estimation procedures, which has been addressed using other techniques having pre-defined models to show <italic>a priori</italic> which samples should be taken.</p>
<p>Moreover, we proposed combining the classical regularization methods used to estimate parameters in partial differential equations with the optimization processes used to carry out those estimates. We merged machine learning techniques, which are more flexible and capable of learning complex patterns from different sources, to choose the sample locations to keep track of and estimate the ocean feature field.</p>
<sec id="s6-1">
<title>Future work</title>
<p>For future work, we consider the expansion of the proposed method for 3D environments. This can be accomplished by augmenting the vehicle model (state space, action space, observation space) and validating the proposed framework with deployments in aquatic environments such as in the Biscayne Bay area, Florida, United States. Besides that, it is possible to refine our estimation strategies with cooperative agents. A primary direction for future work is to incorporate a combination of heterogeneous agents in order to provide better estimates of the locations of the ocean feature. In this work, we assume known initial conditions for a given linear, constant flow field. A second direction for future work is to investigate how effective the proposed estimation framework is for time-varying flow fields and actual oceanic data from the Regional Ocean Modeling System (ROMS) <xref ref-type="bibr" rid="B40">Shchepetkin and McWilliams (2005)</xref>. ROMS data set that provides current velocity prediction data consisting of three spatial dimensions (longitude, latitude, and depth) associated with time. Finally, tracking oceanic features, such as the Lagrangian coherent structures (LCS) contributes to a wide range of applications in ocean exploration <xref ref-type="bibr" rid="B20">Hsieh et al. (2012)</xref>. Therefore, an additional direction of our work is to expand the current work towards an efficient method for LCS tracking using machine learning techniques.</p>
</sec>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s8">
<title>Author contributions</title>
<p>LB and RS contributed to the conception and design of the study. PP and JF performed simulation experiments and wrote the first draft of the manuscript. LB also wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>This work is supported in part by the NSF grants IIS-2034123, IIS-2024733, and the U.S. Dept. Of Homeland Security grant 2017-ST-062000002.</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Antman</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>Marsden</surname>
<given-names>J. E.</given-names>
</name>
<name>
<surname>Sirovich</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hale</surname>
<given-names>J. K.</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Keener</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2006</year>). <source>Inverse problems for partial differential equations</source>. <edition>Second edn</edition>. <publisher-loc>Germany</publisher-loc>: <publisher-name>Springer</publisher-name>. </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bachmayer</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Humphris</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Fornari</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Oceanographic research using remotely operated underwater robotic vehicles: Exploration</article-title>. <source>Mar. Technol. Soc. J.</source> <volume>32</volume>, <fpage>37</fpage>. </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barnett</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>McClaran</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>McDermott</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>1996</year>). &#x201c;<article-title>Architecture of the Texas A&#x26;M autonomous underwater vehicle controller</article-title>,&#x201d; in <source>Proceedings of Symposium on Autonomous Underwater Vehicle Technology</source> <fpage>231</fpage>&#x2013;<lpage>237</lpage>. <pub-id pub-id-type="doi">10.1109/AUV.1996.532420</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Bird</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sherman</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ryan</surname>
<given-names>J. P.</given-names>
</name>
</person-group> (<year>2007</year>). &#x201c;<article-title>Development of an active, large volume, discrete seawater sampler for autonomous underwater vehicles</article-title>,&#x201d; in <conf-name>Proc Oceans MTS/IEEE Conference (Vancouver, Canada)</conf-name>, <conf-loc>Vancouver, Canada</conf-loc>, <conf-date>04 October 2007</conf-date>. <pub-id pub-id-type="doi">10.1109/OCEANS.2007.4449303</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bishop</surname>
<given-names>C. M.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Training with noise is equivalent to tikhonov regularization</article-title>. <source>Neural Comput.</source> <volume>7</volume>, <fpage>108</fpage>&#x2013;<lpage>116</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1995.7.1.108</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bourgeois</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Recoquillay</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A mixed formulation of the tikhonov regularization and its application to inverse pde problems</article-title>. <source>ESAIM Math. Model. Numer. Analysis</source> <volume>52</volume>, <fpage>123</fpage>&#x2013;<lpage>145</lpage>. <pub-id pub-id-type="doi">10.1051/m2an/2018008</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Carreras</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Batlle</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ridao</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Roberts</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2000</year>). &#x201c;<article-title>An overview on behaviour-based methods for AUV control</article-title>,&#x201d; in <source>MCMC2000, 5th IFAC Conference</source>. </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Creed</surname>
<given-names>E. L.</given-names>
</name>
<name>
<surname>Kerfoot</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mudgal</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Barrier</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Transition of slocum electric gliders to a sustained operational system</article-title>. <source>OCEANS &#x2019;04: MTTS/IEEE TECHNO-OCEAN &#x2019;04</source> <volume>2</volume>, <fpage>828</fpage>&#x2013;<lpage>833</lpage>. <pub-id pub-id-type="doi">10.1109/OCEANS.2004.1405565</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Creed</surname>
<given-names>E. L.</given-names>
</name>
<name>
<surname>Mudgal</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Glenn</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schofield</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Webb</surname>
<given-names>D. C.</given-names>
</name>
</person-group> (<year>2002</year>). <source>Using a fleet of slocum battery gliders in a regional scale coastal ocean observatory</source>. <publisher-loc>Biloxi, MI, USA</publisher-loc>: <publisher-name>Oceans &#x2019;02 MTS/IEEE</publisher-name>. </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cui</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Adaptive neural network control of auvs with control input nonlinearities using reinforcement learning</article-title>. <source>IEEE Trans. Syst. Man. Cybern. Syst.</source> <volume>47</volume>, <fpage>1019</fpage>&#x2013;<lpage>1029</lpage>. <pub-id pub-id-type="doi">10.1109/tsmc.2016.2645699</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davis</surname>
<given-names>R. E.</given-names>
</name>
<name>
<surname>Ohman</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Rudnick</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Sherman</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hodges</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Glider surveillance of physics and biology in the southern California current system</article-title>. <source>Limnol. Oceanogr.</source> <volume>53</volume>, <fpage>2151</fpage>&#x2013;<lpage>2168</lpage>. <pub-id pub-id-type="doi">10.4319/lo.2008.53.5_part_2.2151</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eriksen</surname>
<given-names>C. C.</given-names>
</name>
<name>
<surname>Osse</surname>
<given-names>T. J.</given-names>
</name>
<name>
<surname>Light</surname>
<given-names>R. D.</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Lehman</surname>
<given-names>T. W.</given-names>
</name>
<name>
<surname>Sabin</surname>
<given-names>P. L.</given-names>
</name>
<etal/>
</person-group> (<year>2001</year>). <article-title>Seaglider: A long-range autonomous underwater vehicle for oceanographic research</article-title>. <source>IEEE J. Ocean. Eng.</source> <volume>26</volume>, <fpage>424</fpage>&#x2013;<lpage>436</lpage>. <pub-id pub-id-type="doi">10.1109/48.972073</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Evans</surname>
<given-names>L. C.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Partial differential equations</article-title>. <source>Graduate Stud. Math.</source> <volume>19</volume>, <fpage>7</fpage>. </citation>
</ref>
<ref id="B14">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Farahmand</surname>
<given-names>A.-m.</given-names>
</name>
<name>
<surname>Nabi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Nikovski</surname>
<given-names>D. N.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Deep reinforcement learning for partial differential equation control</article-title>,&#x201d; in <conf-name>American Control Conference (ACC)</conf-name>, <conf-loc>Seattle, WA, USA</conf-loc>, <conf-date>24-26 May 2017</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>3120</fpage>&#x2013;<lpage>3127</lpage>. </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frank</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>J&#xf3;nsson</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Constraint-based attribute and interval planning</article-title>. <source>Constraints</source> <volume>8</volume>, <fpage>339</fpage>&#x2013;<lpage>364</lpage>. <pub-id pub-id-type="doi">10.1023/a:1025842019552</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frazzoli</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Daleh</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Feron</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Real-time motion planning for agile autonomous vehicles</article-title>. <source>J. Guid. Control Dyn.</source> <volume>25</volume> (<issue>1</issue>), <fpage>116</fpage>&#x2013;<lpage>129</lpage>. <pub-id pub-id-type="doi">10.2514/2.4856</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="thesis">
<person-group person-group-type="author">
<name>
<surname>Graver</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2005</year>). <source>Underwater gliders: Dynamics, control and design</source> (<publisher-loc>Princeton, NJ</publisher-loc>: <publisher-name>Princeton University</publisher-name>). <comment>Ph.D. thesis</comment>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Griffiths</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ferguson</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bose</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Undersea gliders</article-title>. <source>Feed. Heal. Humans</source> <volume>2</volume>, <fpage>64</fpage>&#x2013;<lpage>75</lpage>. </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Han</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jentzen</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Weinan</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Solving high-dimensional partial differential equations using deep learning</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>115</volume>, <fpage>8505</fpage>&#x2013;<lpage>8510</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1718942115</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Hsieh</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Forgoston</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Mather</surname>
<given-names>T. W.</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>I. B.</given-names>
</name>
</person-group> (<year>2012</year>). &#x201c;<article-title>Robotic manifold tracking of coherent structures in flows</article-title>,&#x201d; in <conf-name>IEEE International Conference on Robotics and Automation</conf-name>, <conf-loc>Saint Paul, MN, USA</conf-loc>, <conf-date>14-18 May 2012</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>4242</fpage>&#x2013;<lpage>4247</lpage>. <pub-id pub-id-type="doi">10.1109/ICRA.2012.6224769</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jamili</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Dua</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Parameter estimation of partial differential equations using artificial neural network</article-title>. <source>Comput. Chem. Eng.</source> <volume>147</volume>, <fpage>107221</fpage>. <pub-id pub-id-type="doi">10.1016/j.compchemeng.2020.107221</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnson</surname>
<given-names>K. S.</given-names>
</name>
<name>
<surname>Needoba</surname>
<given-names>J. A.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Mapping the spatial variability of plankton metabolism using nitrate and oxygen sensors on an autonomous underwater vehicle</article-title>. <source>Limnol. Oceanogr.</source> <volume>53</volume>, <fpage>2237</fpage>&#x2013;<lpage>2250</lpage>. <pub-id pub-id-type="doi">10.4319/lo.2008.53.5_part_2.2237</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Jones</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Creed</surname>
<given-names>E. L.</given-names>
</name>
<name>
<surname>Glenn</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kerfoot</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kohut</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mudgal</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2005</year>). &#x201c;<article-title>Slocum gliders - a component of operational oceanography</article-title>,&#x201d; in <source>Autonomous undersea systems institute symposium proceedings</source>. </citation>
</ref>
<ref id="B24">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kincaid</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kincaid</surname>
<given-names>D. R.</given-names>
</name>
<name>
<surname>Cheney</surname>
<given-names>E. W.</given-names>
</name>
</person-group> (<year>2009</year>). <source>Numerical analysis: Mathematics of scientific computing, vol. 2</source>. <publisher-loc>Providence, Rhode Island</publisher-loc>: <publisher-name>American Mathematical Soc</publisher-name>. </citation>
</ref>
<ref id="B25">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Low</surname>
<given-names>K. H.</given-names>
</name>
<name>
<surname>Dolan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Khosla</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2009</year>). &#x201c;<article-title>Information-theoretic approach to efficient adaptive path planning for mobile robotic environmental sensing</article-title>,&#x201d; in <conf-name>Proceedings of the 19th international conference on automated planning and scheduling (ICAPS-09)</conf-name>. <comment>arXiv:1305.6129v1</comment>. </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martinsen</surname>
<given-names>A. B.</given-names>
</name>
<name>
<surname>Lekkas</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Gros</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Glomsrud</surname>
<given-names>J. A.</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>T. A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Reinforcement learning-based tracking control of usvs in varying operational conditions</article-title>. <source>Front. Robot. AI</source> <volume>7</volume>, <fpage>32</fpage>. <pub-id pub-id-type="doi">10.3389/frobt.2020.00032</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>McGann</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Py</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Rajan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Ryan</surname>
<given-names>J. P.</given-names>
</name>
<name>
<surname>Henthorn</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2008a</year>). &#x201c;<article-title>Adaptive control for autonomous underwater vehicles</article-title>,&#x201d; in <conf-name>AAAI</conf-name>, <conf-loc>San Diego, California, USA</conf-loc>, <conf-date>09-12 October 1995</conf-date> (<publisher-loc>Chicago, IL</publisher-loc>: <publisher-name>AAAI</publisher-name>). <pub-id pub-id-type="doi">10.1109/OCEANS.1995.528563</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>McGann</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Py</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Rajan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Henthorn</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>McEwen</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2008b</year>). &#x201c;<article-title>A deliberative architecture for AUV control</article-title>,&#x201d; in <conf-name>ICRA</conf-name> (<publisher-loc>Pasadena, CA</publisher-loc>: <publisher-name>ICRA</publisher-name>). <pub-id pub-id-type="doi">10.1109/ROBOT.2008.4543343</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>McGann</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Py</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Rajan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Henthorn</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>McEwen</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2008c</year>). &#x201c;<article-title>Preliminary results for model-based adaptive control of an autonomous underwater vehicle</article-title>,&#x201d; in <conf-name>Int. Symp. on Experimental Robotics</conf-name>, <conf-loc>Athens</conf-loc>, <conf-date>July 13-16, 2008</conf-date> (<publisher-loc>Athens, Greece</publisher-loc>: <publisher-name>DBLP</publisher-name>). </citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Nair</surname>
<given-names>M. T.</given-names>
</name>
<name>
<surname>Roy</surname>
<given-names>S. D.</given-names>
</name>
</person-group> (<year>2020</year>). <source>A new regularization method for a parameter identification problem in a non-linear partial differential equation</source>, <pub-id pub-id-type="doi">10.22541/au.159138733.37659934</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Ng</surname>
<given-names>A. Y.</given-names>
</name>
<name>
<surname>Harada</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>1999</year>). &#x201c;<article-title>Policy invariance under reward transformations: Theory and application to reward shaping</article-title>,&#x201d; in <conf-name>Proceedings of the Sixteenth International Conference on Machine Learning</conf-name>, <conf-date>June 1999</conf-date> (<publisher-loc>Burlington, MA, USA</publisher-loc>: <publisher-name>Morgan Kaufmann</publisher-name>), <fpage>278</fpage>&#x2013;<lpage>287</lpage>. </citation>
</ref>
<ref id="B32">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Padrao</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Dominguez</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bobadilla</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>R. N.</given-names>
</name>
</person-group> (<year>2022</year>). <source>Towards learning ocean models for long-term navigation in dynamic environments</source>. <publisher-loc>Chennai</publisher-loc>: <publisher-name>OCEANS 2022</publisher-name>, <fpage>1</fpage>&#x2013;<lpage>8</lpage>. </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paley</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Leonard</surname>
<given-names>N. E.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Cooperative control for Ocean sampling: The glider coordinated control system</article-title>. <source>IEEE Trans. Control Syst. Technol.</source> <volume>16</volume>, <fpage>735</fpage>&#x2013;<lpage>744</lpage>. <pub-id pub-id-type="doi">10.1109/TCST.2007.912238</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Provost</surname>
<given-names>C. L.</given-names>
</name>
<name>
<surname>Verron</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1987</year>). <article-title>Wind-driven ocean circulation transition to barotropic instability</article-title>. <source>Dyn. Atmos. Oceans</source> <volume>11</volume>, <fpage>175</fpage>&#x2013;<lpage>201</lpage>. <pub-id pub-id-type="doi">10.1016/0377-0265(87)90005-4</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Richard</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Brian</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Clifford</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Parameter Estimation and inverse problems (candice janco)</source>. <edition>third edn</edition>. </citation>
</ref>
<ref id="B36">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ridao</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Yuh</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Batlle</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sugihara</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2000</year>). <source>On AUV control architecture</source>. <publisher-loc>Kyoto, Japan</publisher-loc>: <publisher-name>IEEE IROS</publisher-name>. </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosenblatt</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Durrant-Whyte</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>A behavior-based architecture for autonomous underwater exploration</article-title>. <source>Inf. Sci. (N. Y).</source> <volume>145</volume>, <fpage>69</fpage>&#x2013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1016/s0020-0255(02)00224-4</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="editor">
<name>
<surname>Rudnick</surname>
<given-names>D. L.</given-names>
</name>
<name>
<surname>Perry</surname>
<given-names>M.</given-names>
</name>
</person-group> (Editors) (<year>2003</year>). <source>Alps: Autonomous and Lagrangian platforms and sensors</source>. <comment>Workshop Report</comment>, <fpage>64</fpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.geo-prose.com/ALPS">www.geo-prose.com/ALPS</ext-link>
</comment>. </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shadden</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Leigen</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Marsden</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Definition and properties of Lagrangian coherent structures from finite-time lyapunov exponents in two-dimensional aperiodic flows</article-title>. <source>Phys. D. Nonlinear Phenom.</source> <volume>212</volume>, <fpage>271</fpage>&#x2013;<lpage>304</lpage>. <pub-id pub-id-type="doi">10.1016/j.physd.2005.10.007</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shchepetkin</surname>
<given-names>A. F.</given-names>
</name>
<name>
<surname>McWilliams</surname>
<given-names>J. C.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>The regional oceanic modeling system (ROMS): A split-explicit, free-surface, topography-following-coordinate oceanic model</article-title>. <source>Ocean. Model.oxf.</source> <volume>9</volume>, <fpage>347</fpage>&#x2013;<lpage>404</lpage>. <pub-id pub-id-type="doi">10.1016/j.ocemod.2004.08.002</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Medjo</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>On a wind-driven, double-gyre, quasi-geostrophic ocean model: Numerical simulations and structural analysis</article-title>. <source>J. Comput. Phys.</source> <volume>155</volume>, <fpage>387</fpage>&#x2013;<lpage>409</lpage>. <pub-id pub-id-type="doi">10.1006/jcph.1999.6344</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sherman</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>R. E.</given-names>
</name>
<name>
<surname>Owens</surname>
<given-names>W. B.</given-names>
</name>
<name>
<surname>Valdes</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>The autonomous underwater glider &#x201d;Spray</article-title>. <source>IEEE J. Ocean. Eng.</source> <volume>26</volume>, <fpage>437</fpage>&#x2013;<lpage>446</lpage>. <pub-id pub-id-type="doi">10.1109/48.972076</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sherstov</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Stone</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2005</year>). &#x201c;<article-title>Function approximation via tile coding: Automating parameter choice</article-title>,&#x201d; in <source>Abstraction, reformulation and approximation</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Zucker</surname>
<given-names>J.-D.</given-names>
</name>
<name>
<surname>Saitta</surname>
<given-names>L.</given-names>
</name>
</person-group> (<publisher-loc>New York</publisher-loc>: <publisher-name>Springer Berlin Heidelberg</publisher-name>), <fpage>194</fpage>&#x2013;<lpage>205</lpage>. </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Singh</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yoerger</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bradley</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Issues in auv design and deployment for oceanographic research</article-title>. <source>Proc. 1997 IEEE Int. Conf. Robotics Automation</source> <volume>3</volume>, <fpage>1857</fpage>&#x2013;<lpage>1862</lpage>. <pub-id pub-id-type="doi">10.1109/ROBOT.1997.619058</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Singh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Kaelbling</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Reinforcement learning with replacing eligibility traces</article-title>. <source>Mach. Learn.</source> <volume>22</volume>, <fpage>123</fpage>&#x2013;<lpage>158</lpage>. <pub-id pub-id-type="doi">10.1023/A:1018012322525</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>R. N.</given-names>
</name>
<name>
<surname>Chao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P. P.</given-names>
</name>
<name>
<surname>Caron</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>B. H.</given-names>
</name>
<name>
<surname>Sukhatme</surname>
<given-names>G. S.</given-names>
</name>
</person-group> (<year>2010a</year>). <article-title>Planning and implementing trajectories for autonomous underwater vehicles to track evolving ocean processes based on predictions from a Regional Ocean model</article-title>. <source>Int. J. Rob. Res.</source> <volume>29</volume>, <fpage>1475</fpage>&#x2013;<lpage>1497</lpage>. <pub-id pub-id-type="doi">10.1177/0278364910377243</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>R. N.</given-names>
</name>
<name>
<surname>Das</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Heidarsson</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Pereira</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Cetini&#x107;</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Darjany</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2010b</year>). <article-title>\{USC\} \{CINAPS\} builds bridges: Observing and monitoring the \{S\}outhern \{C\}alifornia \{B\}ight</article-title>. <source>IEEE Robot. Autom. Mag.</source> <volume>17</volume>, <fpage>20</fpage>&#x2013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1109/mra.2010.935795</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>R. N.</given-names>
</name>
<name>
<surname>Das</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yi</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Caron</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>B. H.</given-names>
</name>
<name>
<surname>Sukhatme</surname>
<given-names>G. S.</given-names>
</name>
</person-group> (<year>2010c</year>). &#x201c;<article-title>Cooperative multi-AUV tracking of phytoplankton blooms based on ocean model predictions</article-title>,&#x201d; in <source>MTS/IEEE oceans 2010</source> (<publisher-loc>Sydney, Australia</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>10</lpage>. </citation>
</ref>
<ref id="B49">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>R. N.</given-names>
</name>
<name>
<surname>Heckman</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sibley</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Hsieh</surname>
<given-names>M. A.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>A representative modeling approach to sampling dynamic ocean structures</article-title>,&#x201d; in <source>Symposium on marine robotics - broadening horizons with inter-disciplinary science &#x26; engineering</source>. <source>A. Pascoal (horta, faial island, azores, Portugal</source>). Editors <person-group person-group-type="editor">
<name>
<surname>Rajan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Sousa</surname>
<given-names>J.</given-names>
</name>
</person-group>. </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>R. N.</given-names>
</name>
<name>
<surname>Schwager</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>S. L.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>B. H.</given-names>
</name>
<name>
<surname>Rus</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Sukhatme</surname>
<given-names>G. S.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Persistent ocean monitoring with underwater gliders: Adapting sampling resolution</article-title>. <source>J. Field Robot.</source> <volume>28</volume>, <fpage>714</fpage>&#x2013;<lpage>741</lpage>. <pub-id pub-id-type="doi">10.1002/rob.20405</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sutton</surname>
<given-names>R. S.</given-names>
</name>
<name>
<surname>Barto</surname>
<given-names>A. G.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Reinforcement learning: An introduction</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>MIT press</publisher-name>. </citation>
</ref>
<ref id="B52">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Turner</surname>
<given-names>R. M.</given-names>
</name>
<name>
<surname>Stevenson</surname>
<given-names>R. A. G.</given-names>
</name>
</person-group> (<year>1991</year>). <source>Orca: An adaptive, context-sensitive reasoner for controlling AUVs. Proc 7th intnl symp. On unmanned untethered submersible tech</source>. <publisher-loc>Umhlanga, South Africa</publisher-loc>: <publisher-name>UUST</publisher-name>. </citation>
</ref>
<ref id="B53">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Whitcomb</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yoerger</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Howland</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1999</year>). &#x201c;<article-title>Advances in underwater robot vehicles for deep ocean exploration: Navigation, control, and survey operations</article-title>,&#x201d; in <source>Proceedings of the ninth international symposium of robotics research</source> (<publisher-loc>London</publisher-loc>: <publisher-name>Springer-Verlag Publications</publisher-name>). </citation>
</ref>
<ref id="B54">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Whitcomb</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yoerger</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Mindell</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>1998</year>). &#x201c;<article-title>Towards precision robotic maneuvering, survey, and manipulation in unstructured undersea environments</article-title>,&#x201d; in <source>Robotics research - the eighth international symposium</source> (<publisher-loc>London</publisher-loc>: <publisher-name>Springer-Verlag Publications</publisher-name>), <fpage>45</fpage>&#x2013;<lpage>54</lpage>. </citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wolligandt</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wilde</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Roessl</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Theisel</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A modified double gyre with ground truth hyperbolic trajectories for flow visualization</article-title>. <source>Comput. Graph. Forum</source> <volume>40</volume>, <fpage>209</fpage>&#x2013;<lpage>221</lpage>. <pub-id pub-id-type="doi">10.1111/cgf.14183</pub-id> </citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xun</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mallick</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Carroll</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Maity</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Parameter estimation of partial differential equation models</article-title>. <source>J. Am. Stat. Assoc.</source> <volume>108</volume>, <fpage>1009</fpage>&#x2013;<lpage>1020</lpage>. <pub-id pub-id-type="doi">10.1080/01621459.2013.794730</pub-id> </citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yoerger</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Slotine</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1985</year>). <article-title>Robust trajectory control of underwater vehicles</article-title>. <source>IEEE J. Ocean. Eng.</source> <volume>10</volume> (<issue>4</issue>), <fpage>462</fpage>&#x2013;<lpage>470</lpage>. <pub-id pub-id-type="doi">10.1109/joe.1985.1145131</pub-id> </citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yoo</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Path optimization for marine vehicles in ocean currents using reinforcement learning</article-title>. <source>J. Mar. Sci. Technol.</source> <volume>21</volume>, <fpage>334</fpage>&#x2013;<lpage>343</lpage>. <pub-id pub-id-type="doi">10.1007/s00773-015-0355-9</pub-id> </citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yuh</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Design and control of autonomous underwater robots: A survey</article-title>. <source>Aut. Robots</source> <volume>8</volume>, <fpage>7</fpage>&#x2013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1023/a:1008984701078</pub-id> </citation>
</ref>
<ref id="B60">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Sukhatme</surname>
<given-names>G. S.</given-names>
</name>
</person-group> (<year>2008</year>). &#x201c;<article-title>Adaptive sampling with multiple mobile robots</article-title>,&#x201d; in <source>IEEE international conference on robotics and automation</source>. </citation>
</ref>
</ref-list>
</back>
</article>