<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurorobot.</journal-id>
<journal-title>Frontiers in Neurorobotics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurorobot.</abbrev-journal-title>
<issn pub-type="epub">1662-5218</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnbot.2023.1244417</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Simulating human walking: a model-based reinforcement learning approach with musculoskeletal modeling</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Su</surname> <given-names>Binbin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1486671/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Gutierrez-Farewik</surname> <given-names>Elena M.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/494779/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>KTH MoveAbility Lab, Department of Engineering Mechanics, KTH Royal Institute of Technology</institution>, <addr-line>Stockholm</addr-line>, <country>Sweden</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Women&#x00027;s and Children&#x00027;s Health, Karolinska Institutet</institution>, <addr-line>Stockholm</addr-line>, <country>Sweden</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Massimo Sartori, University of Twente, Netherlands</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Tetsuro Funato, The University of Electro-Communications, Japan; Josep M. Font-Llagunes, Universitat Politecnica de Catalunya, Spain; Chongben Tao, Suzhou University of Science and Technology, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Elena M. Gutierrez-Farewik <email>lanie&#x00040;kth.se</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>12</day>
<month>10</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>17</volume>
<elocation-id>1244417</elocation-id>
<history>
<date date-type="received">
<day>22</day>
<month>06</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>09</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Su and Gutierrez-Farewik.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Su and Gutierrez-Farewik</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>Recent advancements in reinforcement learning algorithms have accelerated the development of control models with high-dimensional inputs and outputs that can reproduce human movement. However, the produced motion tends to be less human-like if algorithms do not involve a biomechanical human model that accounts for skeletal and muscle-tendon properties and geometry. In this study, we have integrated a reinforcement learning algorithm and a musculoskeletal model including trunk, pelvis, and leg segments to develop control modes that drive the model to walk.</p></sec>
<sec>
<title>Methods</title>
<p>We simulated human walking first without imposing target walking speed, in which the model was allowed to settle on a stable walking speed itself, which was 1.45 <italic>m</italic>/<italic>s</italic>. A range of other speeds were imposed for the simulation based on the previous self-developed walking speed. All simulations were generated by solving the Markov decision process problem with covariance matrix adaptation evolution strategy, without any reference motion data.</p></sec>
<sec>
<title>Results</title>
<p>Simulated hip and knee kinematics agreed well with those in experimental observations, but ankle kinematics were less well-predicted.</p></sec>
<sec>
<title>Discussion</title>
<p>We finally demonstrated that our reinforcement learning framework also has the potential to model and predict pathological gait that can result from muscle weakness.</p></sec></abstract>
<kwd-group>
<kwd>human and humanoid motion analysis</kwd>
<kwd>motion synthesis</kwd>
<kwd>optimization</kwd>
<kwd>optimal control</kwd>
<kwd>kinematics</kwd>
<kwd>CMA-ES</kwd>
<kwd>reflex-based control</kwd>
</kwd-group>
<contract-sponsor id="cn001">Stiftelsen Promobilia<named-content content-type="fundref-id">10.13039/100009389</named-content></contract-sponsor>
<contract-sponsor id="cn002">Vetenskapsr&#x000E5;det<named-content content-type="fundref-id">10.13039/501100004359</named-content></contract-sponsor>
<counts>
<fig-count count="10"/>
<table-count count="0"/>
<equation-count count="2"/>
<ref-count count="45"/>
<page-count count="12"/>
<word-count count="7875"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>In recent years, reinforcement learning (RL) (Sutton and Barto, <xref ref-type="bibr" rid="B37">2018</xref>) has emerged as a promising approach for motion synthesis such as human walking where an agent learns to adapt its behavior through interacting with the environment. Many optimization techniques used to develop controllers for simulated locomotion are based on reinforcement learning. RL algorithms have been applied in several studies to develop torque-driven control for physically simulated articulated models. Peng et al. (<xref ref-type="bibr" rid="B26">2018</xref>) used RL to generate a set of human movements including walking and running. Schulman et al. (<xref ref-type="bibr" rid="B29">2015</xref>) simulated dynamic gaits using high-dimensional, general-purpose neural network function approximators for both the policy and the value function in a variety of robot models. Duan et al. (<xref ref-type="bibr" rid="B9">2016</xref>) presented a benchmark suite of continuous control tasks with a simple humanoid based on a systematic evaluation of their effectiveness in training deep neural network policies. Even though RL algorithms can successfully develop controllers capable of performing a versatile set of locomotion tasks, the resulting behaviors generally appear less natural than normal human movements (Heess et al., <xref ref-type="bibr" rid="B17">2017</xref>; Rajeswaran et al., <xref ref-type="bibr" rid="B28">2017</xref>; Peng et al., <xref ref-type="bibr" rid="B26">2018</xref>). More specifically, controllers trained with RL have exhibited large upper body motion, abnormal gaits, and unrealistic body posture (Heess et al., <xref ref-type="bibr" rid="B17">2017</xref>). One of the reasons stems from the absence of biomechanical models that take into account the excitation and contraction of the muscles, the geometry and inertia properties of the body segments, and the external forces from the environment. Musculoskeletal models (Zajac, <xref ref-type="bibr" rid="B45">1989</xref>; Thelen et al., <xref ref-type="bibr" rid="B41">2003</xref>) represent a sophisticated dynamical system comprised of bones as articulating rigid bodies and muscles as actuators. These models often account for the neural excitation of muscles and also muscle contraction dynamics, which are determined by muscles&#x00027; optimal lengths, shortening/lengthening velocities, and activations. Muscle contraction generates muscle force which is then transmitted to the bone through a compliant tendon. Muscle force causes joint torques to the body segments, thus generating human motions.</p>
<p>Musculoskeletal models can mainly be driven by three locomotion control frameworks: trajectory tracking (Neilson and Neilson, <xref ref-type="bibr" rid="B23">1999</xref>; Fey et al., <xref ref-type="bibr" rid="B13">2012</xref>), optimal control (Pandy et al., <xref ref-type="bibr" rid="B25">1995</xref>; Suzuki, <xref ref-type="bibr" rid="B38">2010</xref>) and reflex-based control (Geyer and Herr, <xref ref-type="bibr" rid="B14">2010</xref>). In trajectory tracking, the controller solves the optimization problem by reducing the squared error between simulated and predefined trajectories and the squared muscle activation over a specific time interval, outputting the required muscle activation to mimic the predefined movement (Silverman and Neptune, <xref ref-type="bibr" rid="B33">2012</xref>). Computed muscle control is a popular approach to estimate muscle activation that generates motion that in turn tracks the desired trajectory (e.g., joint angles from experimental motion capture) (Thelen et al., <xref ref-type="bibr" rid="B41">2003</xref>). However, the method merely reproduces the predefined trajectory and cannot predict responses to new inputs. In optimal control, the controller solves the optimization problem by minimizing a specific cost function (e.g., metabolic energy expenditure or summed muscle activations) while achieving a task-objective function such as a steady-state gait. This control method is free from experimental data but requires sufficient domain knowledge to craft a cost function and represent natural human movement, which can also make it more computationally expensive (Anderson and Pandy, <xref ref-type="bibr" rid="B3">2001</xref>). Performance criteria in predictive simulations with complex musculoskeletal models are frequently based on energy (Minetti et al., <xref ref-type="bibr" rid="B22">1994</xref>) or muscle activity (Miller et al., <xref ref-type="bibr" rid="B21">2012</xref>) minimization, but which criterion best represents reality remains unclear, and its formulation may vary depending on the musculoskeletal model. Recent review articles in predictive simulations of human movement describe both the potential and the challenges involved in realistic application in pathological motion (De Groote and Falisse, <xref ref-type="bibr" rid="B6">2021</xref>). Novel applications of optimal control are emerging to predict optimal orthosis properties for persons with gait pathology (Febrer Nafr&#x000ED;a et al., <xref ref-type="bibr" rid="B12">2022</xref>). In reflex-based control, the controller determines muscle activations and hypothesized reflex pathways to generate joint torques that drive the musculoskeletal model, mimicking human gait while optimizing a cost function (e.g., minimal metabolic cost or maximal walking distance). The muscle excitation is associated with computed muscle length or muscle force feedback while the reflex pathways accommodate leg mechanics to prevent joint hyperextension and maintain gait stability (Seyfarth et al., <xref ref-type="bibr" rid="B31">2001</xref>; G&#x000FC;nther et al., <xref ref-type="bibr" rid="B15">2004</xref>). The reflex-based model does not require input from a predefined movement and can perform a natural walking motion by interacting the muscles and reflex pathways with the physics-based environment. Geyer and Herr (<xref ref-type="bibr" rid="B14">2010</xref>) presented a 2D human model controlled by reflex that can perform stable walking through interaction with the ground, while it tolerates ground disturbances and adapts to slopes without parameter interventions. Their approach can also predict some individual muscle activation patterns from experimental data. They further extended the model to a 3D locomotion study and compared neural controls for 3D-related motions by adding degrees of freedom at the hips in the frontal plane (Song and Geyer, <xref ref-type="bibr" rid="B34">2013</xref>). Song and Geyer (<xref ref-type="bibr" rid="B35">2015</xref>) further developed this model by incorporating a higher layer, longer latency control that can alter some of the reflex gains. The added layer can adjust the desired foot placements and identify which leg to switch into swing control during double support. In a similar manner, Eilenberg et al. (<xref ref-type="bibr" rid="B10">2010</xref>) used an adaptive muscle-reflex controller for powered ankle-foot prostheses to adapt to environmental disturbances such as speed transients and terrain variation. Clinical trials have been successfully conducted with a transtibial amputee walking on level ground, ramp ascent, and ramp descent conditions. Thatte et al. (<xref ref-type="bibr" rid="B40">2018</xref>) implemented a reflex-based control policy on five subjects walking with a powered knee and ankle prosthesis and found that the level-ground walking torque and angle profiles from the prosthesis are similar to those of a weight and height-matched subject with intact limbs. Sharbafi et al. (<xref ref-type="bibr" rid="B32">2018</xref>) developed a control algorithm of an exoskeleton with one biarticular actuator based on a reflex-based human walking model that employs leg force to adjust hip compliance.</p>
<p>This study aims to use model-based RL methods to develop control modes that can produce realistic human walking in a musculoskeletal model driven by 18 muscle-tendon units. The main novel contribution of this study is the RL-based approach to solve the control parameters in a complex musculoskeletal model, as well as the design of the reward functions that can generate stable walking gaits reasonably similar to those of able-bodied persons in terms of joint kinematics and muscle activation patterns at different walking speeds or even with muscle weakness. While inspired by the work of Song and Geyer (<xref ref-type="bibr" rid="B35">2015</xref>), we apply a different reward function in that we introduced a pelvis component to encourage the model to walk naturally with a reasonable pelvis tilt angle. We formulated the human walking problem as a standard Markov decision process (MDP). We modeled the policy with a reflex-based controller to output muscle excitation that eventually activates the muscles. The MDP problem was solved and the controlled parameters were optimized with derivative-free covariance matrix adaptation evolution strategy (CMA-ES). We then generated gait without imposing target walking speed, i.e., allowing the model to settle on a stable walking speed. We also generated gait with a range of imposed target walking speeds. All simulations were performed without reference data from motion capture. We finally demonstrated the model&#x00027;s potential to predict pathological gaits, in this case, gait that may result from muscle weakness.</p></sec>
<sec sec-type="methods" id="s2">
<title>2. Methods</title>
<p>We used an integrated OpenSim-RL (Kidzi&#x00144;ski et al., <xref ref-type="bibr" rid="B20">2018</xref>) platform which embedded OpenSim (Delp et al., <xref ref-type="bibr" rid="B7">2007</xref>) and OpenAI Gym (Brockman et al., <xref ref-type="bibr" rid="B5">2016</xref>) to simulate muscle-driven forward movement in a physics-based simulation environment. Experimental data were collected and used solely for comparison with simulation outcomes.</p>
<sec>
<title>2.1. Reinforcement learning</title>
<p>The goal of reinforcement learning is to train an agent to complete a task. The agent receives observations and a reward from the environment and sends actions back to the environment. In the current study, the environment is a musculoskeletal model that has 9 joint degrees of freedom and 18 muscles (<xref ref-type="fig" rid="F1">Figure 1</xref>). The observations contain movement information such as joint position, velocity, ground contact, etc. The actions are the muscle excitations of each muscle. The agent contains two components: a policy and a learning algorithm. The policy produces actions based on the observations from the environment. The learning algorithm continuously updates the policy parameters based on the actions, observations, and reward.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>An illustration of the RL flow.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0001.tif"/>
</fig></sec>
<sec>
<title>2.2. Musculoskeletal model</title>
<p>The musculoskeletal model in this paper is a simplified 2D model adapted from Delp et al. (<xref ref-type="bibr" rid="B8">1990</xref>), with 6 internal degrees of freedom: flexion/extension at the hips, the knees, and the ankles, and 18 Hill-type muscle-tendon units: iliopsoas (ILPSO), gluteus maximus (GMAX), hamstrings (HAM), rectus femoris (RF), vasti (VAS), biceps femoris short head (BFSH), gastrocnemius (GAS), soleus (SOL), and tibialis anterior (TA) for each leg. Muscle parameters and moment arms are according to Delp et al. (<xref ref-type="bibr" rid="B8">1990</xref>), and tendons were assumed as non-compliant. OpenSim&#x00027;s forward-dynamics approach was used, in which the musculoskeletal system has muscle excitations as inputs and outputs the body motions (<italic>q</italic>, <inline-formula><mml:math id="M1"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>.</mml:mo></mml:mover></mml:math></inline-formula>, and <inline-formula><mml:math id="M2"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>&#x000A8;</mml:mo></mml:mover></mml:math></inline-formula>) (<xref ref-type="fig" rid="F2">Figure 2</xref>). Since muscle cannot activate or relax instantaneously, there is a delay between muscle excitation, muscle activation, and the development of muscle force. This delay is modeled by the model activation dynamics (Zajac, <xref ref-type="bibr" rid="B45">1989</xref>). The musculotendon dynamics (Anderson and Pandy, <xref ref-type="bibr" rid="B2">1993</xref>) describe the translation of muscle activation to muscle force. The musculoskeletal geometry determines the muscles&#x00027; moment arms. Joint moments are determined from muscle forces and moment arms. Finally, through multibody dynamics (Kane and Levinson, <xref ref-type="bibr" rid="B18">1983</xref>), accelerations, velocities, and angles for each joint are computed.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Forward dynamics that depict the production of human movement. <italic>q</italic>, <inline-formula><mml:math id="M3"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>.</mml:mo></mml:mover></mml:math></inline-formula>, and <inline-formula><mml:math id="M4"><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>&#x000A8;</mml:mo></mml:mover></mml:math></inline-formula> are vectors of the generalized coordinates, velocities, and accelerations, respectively.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0002.tif"/>
</fig></sec>
<sec>
<title>2.3. Markov decision processes (MDP) formulation</title>
<p>We formulated the forward-dynamics simulation of the musculoskeletal model as an MDP <inline-formula><mml:math id="M5"><mml:mrow><mml:mo>&#x0003C;</mml:mo><mml:mi mathvariant="script">S</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="script">A</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="script">R</mml:mi><mml:mo>&#x0003E;</mml:mo></mml:mrow></mml:math></inline-formula> which consists of the set of body states <inline-formula><mml:math id="M6"><mml:mrow><mml:mi mathvariant="script">S</mml:mi></mml:mrow></mml:math></inline-formula>, possible muscle excitation <inline-formula><mml:math id="M7"><mml:mrow><mml:mi mathvariant="script">A</mml:mi></mml:mrow></mml:math></inline-formula>, and the expected rewards <inline-formula><mml:math id="M8"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="script">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> received when going from state <italic>s</italic> to <italic>s</italic>&#x02032; (<inline-formula><mml:math id="M9"><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="script">S</mml:mi></mml:mrow></mml:math></inline-formula>) after performing action <italic>a</italic>. The body states <inline-formula><mml:math id="M10"><mml:mrow><mml:mi mathvariant="script">S</mml:mi></mml:mrow></mml:math></inline-formula> are joint position, velocity, ground contact, etc. We considered that the agent accumulates rewards through interacting with the environment. The agent follows its actions according to a deterministic policy <inline-formula><mml:math id="M11"><mml:mi>&#x003C0;</mml:mi><mml:mo>:</mml:mo><mml:mrow><mml:mi mathvariant="script">S</mml:mi></mml:mrow><mml:mo>&#x02192;</mml:mo><mml:mrow><mml:mi mathvariant="script">A</mml:mi></mml:mrow></mml:math></inline-formula> which indicates how action <italic>a</italic> is chosen in state <italic>s</italic>. Our goal was to find an optimal policy <inline-formula><mml:math id="M12"><mml:mrow><mml:mi mathvariant="script">&#x003C0;</mml:mi></mml:mrow></mml:math></inline-formula> such that the expected future reward is maximized. In our case, the policy was modeled by a gait controller based on a reflex-based framework (Song and Geyer, <xref ref-type="bibr" rid="B35">2015</xref>) for human locomotion that maps the body states to muscle excitation.</p></sec>
<sec>
<title>2.4. Reward design</title>
<p>A forward-dynamics simulation was run by integrating the musculoskeletal model&#x00027;s dynamic equations starting from a user-specified initial state. Muscle states were set by equilibrating the force between the muscle and tendon at an activation based on the excitations calculated by the gait controller. Then, new states at a small time interval (0.01<italic>s</italic>) were determined by numerical integration until the desired simulation time was reached or the pelvis of the human model fell below 0.6 m. During simulation, the agent gathered survival rewards (<italic>R</italic><sub><italic>alive</italic></sub>) and footstep rewards (<italic>R</italic><sub><italic>steps</italic></sub>). The total reward is high when the human model locomotes at desired velocities with minimum muscle effort and pelvis tilt.
<disp-formula id="E1"><label>(1)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>i</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>i</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <italic>t</italic> is the total simulation timestep, <italic>r</italic><sub><italic>alive</italic></sub> is the survival reward for each timestep <italic>i</italic> [i.e., <italic>r</italic><sub><italic>alive</italic></sub> is the sum of the timesteps (0.01s) if the simulation does not fail], <italic>w</italic><sub><italic>steps</italic></sub>, <italic>w</italic><sub><italic>vel</italic></sub>, <italic>w</italic><sub><italic>pel</italic></sub>, <italic>w</italic><sub><italic>mul</italic></sub> are the weights for the step reward and velocity, pelvis tilt, and muscle effort costs. The values of these weights were determined by trial and error tests and finally set to 10, 60, 20, and 1, respectively. The survival reward <italic>R</italic><sub><italic>alive</italic></sub> encourages the model to search for solutions to stay alive throughout the simulation. The footstep reward <italic>R</italic><sub><italic>steps</italic></sub> evaluates gait behaviors within footsteps rather than at discrete instances of time, for example, to allow the model&#x00027;s walking speed to vary within a footstep, similar to how humans walk. Specifically, <italic>r</italic><sub><italic>steps</italic></sub> was designed to encourage the model to take footsteps but not unnecessarily small steps. <italic>J</italic><sub><italic>vel</italic></sub> penalizes movements that deviate from target speed. <italic>J</italic><sub><italic>pel</italic></sub> penalizes large pelvis tilt during locomotion. <italic>J</italic><sub><italic>mul</italic></sub> minimizes muscle excitations and distributes the load to muscles more efficiently. Thus, the rewards and costs within footsteps are defined as:
<disp-formula id="E2"><label>(2)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mi>&#x00394;</mml:mi><mml:mi>t</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>g</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>&#x00394;</mml:mi><mml:mi>t</mml:mi><mml:mo>|</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mi>&#x00394;</mml:mi><mml:mi>t</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>u</mml:mi><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mi>&#x00394;</mml:mi><mml:mi>t</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <italic>t</italic><sub><italic>step</italic></sub> is the number of simulation timesteps in one footstep, <italic>ii</italic> is the <italic>ii</italic>th timestep in one footstep, &#x00394;<italic>t</italic> is the simulation interval 0.01<italic>s</italic>, <italic>v</italic><sub><italic>pel</italic></sub> and <italic>v</italic><sub><italic>tgt</italic></sub> are the velocity of the pelvis and the target velocity respectively, &#x003B8;<sub><italic>pel</italic></sub> is the pelvis tilt, <italic>e</italic><sub><italic>k</italic></sub> is the muscle excitation of the <italic>k</italic>th muscle.</p></sec>
<sec>
<title>2.5. Covariance matrix adaptation evolution strategy (CMA-ES)</title>
<p>We used CMA-ES (Hansen et al., <xref ref-type="bibr" rid="B16">2003</xref>), which represents the population by a full-covariance multivariate Gaussian, to solve the MDP problem including 37 control parameters for the gait controller and 12 parameters for the model&#x00027;s initial states. The control parameters are the target angles of the trunk, knee, and ankle, force feedback, length feedback, velocity feedback, proportional-derivative feedback and co-stimulation of the muscles. These 12 initial states are the forward speed, rightward speed, pelvis height, trunk lean angle, hip abduction/adduction, hip flexion/extension, knee flexion/extension, and ankle dorsiflexion/plantarflexion for both sides (Song and Geyer, <xref ref-type="bibr" rid="B35">2015</xref>). We set the population size to be 16 for each generation and ran for 1,000 generations for every trial. The parameters were updated every 16<italic>th</italic> simulations whenever a higher reward was encountered. In every generation, the CMA-ES will generate 16 simulations with different values of control parameters in parallel. Based on the highest reward achieved in these simulations, the model will then seed the control parameters for the next 16 simulations. The generation number was set large enough that the MDP problem could usually be solved at the end of each trial. To accelerate the optimization process, we established a common E2 virtual machine with 8 vCPUs and 32 GB memory on the Google cloud platform to run parallel optimizations with the same initial parameters. The best solution from the previous generation was used to seed the next generation of optimizations (<xref ref-type="table" rid="T1">Algorithm 1</xref>).</p>
<table-wrap position="float" id="T1">
<label>Algorithm 1</label>
<caption><p>Training algorithm structure and description.</p></caption>
<graphic xlink:href="fnbot-17-1244417-i0001.tif"/>
</table-wrap></sec>
<sec>
<title>2.6. Evaluation</title>
<p>We evaluated the performance of our simulation in 3 aspects: generating gaits without a prescribed gait speed, simulating gaits in a range of speeds, and simulating gait impairment with reduced maximum isometric force (MIF) of SOL and GAS at a speed of 1.45 <italic>m</italic>/<italic>s</italic>. Experimental data from 8 able-bodied adults walking at several different speeds were used for visual comparison with the simulation results (<xref ref-type="app" rid="A1">Appendix</xref>). To evaluate the agreement of the simulated walking pattern at its natural speed 1.45 <italic>m</italic>/<italic>s</italic> to the observed gait kinematics at a mean speed of 1.41 <italic>m</italic>/<italic>s</italic>, correlation coefficients (R) between simulated and observed gait were computed for hip, knee, and ankle kinematics.</p>
<p>To generate gaits without a prescribed gait speed, we removed the velocity cost <italic>J</italic><sub><italic>vel</italic></sub> from the reward function and set the control parameters and initial states to random values. The simulation time for each simulation was set to be 20 <italic>s</italic>. We then found the solution with CMA-ES with parallel computing. The steady walking speed developed during the simulation was 1.45 <italic>m</italic>/<italic>s</italic>. Films illustrating the incremental learning process can be viewed in the <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref>.</p>
<p>To generate simulations of gait at a range of target speeds from 1.1 to 1.8 <italic>m</italic>/<italic>s</italic>, we first solved the MDP problem in the prediction horizon of 10<italic>s</italic> using CMA-ES with parallel computing in GCP at a target speed of 1.45 <italic>m</italic>/<italic>s</italic>. In the initial optimization, the control parameters and initial states were randomly assigned, which means we assumed no prior knowledge of the problem, and the initial position of the model was not in consideration. After the first MDP was solved, the optimized parameters were used to seed the next optimization of the neighbor speeds until the lowest or highest speeds, e.g., from 1.45 to 1.27 <italic>m</italic>/<italic>s</italic>, then from 1.27 to 1.10 <italic>m</italic>/<italic>s</italic>.</p>
<p>To generate gait impairment with simulated muscle weakness of SOL and GAS, we used the solution previously solved at 1.45 <italic>m</italic>/<italic>s</italic> as the initial parameters for the controller. The MIF of SOL and GAS were set to 80 or 60% of the original value wherein only one muscle strength was reduced per simulation. The problem was solved via 4 simulations of impaired gait.</p></sec></sec>
<sec sec-type="results" id="s3">
<title>3. Results</title>
<p>We present the salient results of gait kinematic and kinetics, specifically the sagittal plane hip, knee, and ankle angles, the vertical ground reaction force (vertical GRF) and the muscle excitations for all evaluations. The gait cycle, including descriptions of foot rockers, is described here according to Perry and Davids (<xref ref-type="bibr" rid="B27">1992</xref>). Only the simulations with the highest reward were presented. All simulated data are presented only for the model&#x00027;s right side and normalized to one gait cycle except for the vertical GRF normalized to the stance phase. The number of simulated gait cycles varied between 10 and 15 depending on the walking speed, and results are shown as the ensemble average of these 10&#x02013;15 gait cycles &#x000B1; one standard deviation. Each optimization problem took between 12 and 18 h to complete.</p>
<sec>
<title>3.1. Simulated gait without prescribed speed</title>
<p>Without a prescribed speed, the model settled on a walking speed of 1.45 <italic>m</italic>/<italic>s</italic>; the simulated hip, knee, and ankle joint angles are illustrated (<xref ref-type="fig" rid="F3">Figure 3</xref>). Experimental kinematics from able-bodied subjects walking at a comfortable speed, which was on average 1.41 <italic>m</italic>/<italic>s</italic>, is also illustrated. Hip and knee kinematics matched reported experimental observations reasonably well with correlation coefficients <italic>R</italic> = 0.97 and <italic>R</italic> = 0.96, respectively, with a somewhat better agreement in swing than in stance, but ankle kinematics were different from observed kinematics (<italic>R</italic> = 0.09). In the simulated ankle kinematics, the gait cycle began with a heel rocker (plantarflexion with heel contact during approximately 0&#x02013;8% of the gait cycle), but the ankle rocker (dorsiflexion via tibial advancement over the ankle with whole-foot contact during approximately 8&#x02013;30% gait cycle), forefoot rocker (dorsiflexion with forefoot contact during approximately 30&#x02013;50% gait cycle) and toe rocker (rapid plantarflexion with forefoot/toe contact during approximately 50&#x02013;60% of the gait cycle) were absent in the simulation. Instead, the ankle was dorsiflexed at foot contact, and continued to plantarflex more or less constantly until toe-off.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Simulated (blue) and experimental (yellow) sagittal plane kinematics of the hip, knee, and ankle at the simulated walking speed of 1.45 <italic>m</italic>/<italic>s</italic> and an average experimental walking speed of 1.41 <italic>m</italic>/<italic>s</italic>, respectively. Positive joint angles indicate flexion/dorsiflexion. Correlation coefficient <italic>R</italic> is indicated.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0003.tif"/>
</fig></sec>
<sec>
<title>3.2. Simulated gait over a range of prescribed speeds</title>
<p>All simulations produced stable gait patterns at the five prescribed walking velocities between 1.1 and 1.8 <italic>m</italic>/<italic>s</italic> in the prediction horizon of 10 <italic>s</italic>. Simulated knee kinematics were more realistic at higher walking speeds than at lower speeds. The stance phase became relatively shorter as walking speed increased (toe-off shifted from 70 to 56% of the gait cycle). This temporal shift was more noticeable in the simulation. The joint kinematics showed expected trends as speed increased; peak flexion and extension in the hip and ankle increased with higher walking speeds as well as peak knee flexion in loading response (<xref ref-type="fig" rid="F4">Figure 4</xref>). The timing of peak plantarflexion in pre-swing and peak knee flexion in swing was shifted temporally, reflecting the earlier toe-off. These kinematic trends agree reasonably well with observed trends in experimental data for subjects walking at 0.78 to 2.04 <italic>m</italic>/<italic>s</italic>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>(A)</bold> Simulated and <bold>(B)</bold> experimental sagittal plane kinematics of the hip, knee, and ankle at different walking speeds. For the simulated data, walking speeds range from 1.1 to 1.8 <italic>m</italic>/<italic>s</italic>. For the experimental data, average walking speeds range from 0.78 to 2.04 <italic>m</italic>/<italic>s</italic>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0004.tif"/>
</fig>
<p>Several features of the simulated vertical GRF are common to the experimental GRFs; in both, the first vertical peak increases and the minimum vertical GRF at mid-stance decreases with increasing walking speed (<xref ref-type="fig" rid="F5">Figure 5</xref>). However, the expected second vertical GRF peak during pre-swing was not as prominent in simulated walking as in experimental data, and while it should increase with increasing speed, the simulated second vertical GRF peak actually decreased with increasing speed. This behavior has been observed by Keller et al. (<xref ref-type="bibr" rid="B19">1996</xref>) who indicated that walking at a higher speed can result in a lower second vertical GRF peak than at lower speeds.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Vertical GRF of simulated walking <bold>(A)</bold> at different speeds ranging from 1.1 to 1.8 <italic>m</italic>/<italic>s</italic> and experimental walking <bold>(B)</bold> at different speeds ranging from 0.99 to 1.83 <italic>m</italic>/<italic>s</italic>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0005.tif"/>
</fig>
<p>Computed muscle excitations indicate GMAX, HAM, and VAS activation during early stance and SOL and GAS activation during mid- and terminal stance (<xref ref-type="fig" rid="F6">Figure 6</xref>). Excitation of major muscles such as ILPSO, GMAX, HAM, SOL, and TA increased with increasing walking speed.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Muscle excitation during simulated walking at different speeds ranging from 1.1 <italic>m</italic>/<italic>s</italic> to 1.8 <italic>m</italic>/<italic>s</italic> in the iliopsoas (ILPSO), gluteus maximus (GMAX), hamstrings (HAM), rectus femoris (RF), vasti (VAS), biceps femoris short head (BFSH), gastrocnemius (GAS), soleus (SOL), and tibialis anterior (TA).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0006.tif"/>
</fig></sec>
<sec>
<title>3.3. Simulated gait with muscle weakness at 1.45 <italic>m</italic>/<italic>s</italic></title>
<p>When GAS muscle weakness was simulated, the hip tended to extend more and the ankle tended to plantarflex more in pre-swing (<xref ref-type="fig" rid="F7">Figure 7</xref>). With decreasing GAS strength, the SOL excitation increased and the GAS excitation decreased (<xref ref-type="fig" rid="F8">Figure 8</xref>).</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Sagittal plane angles of the hip, knee, and ankle during simulated walking at a speed of 1.45 <italic>m</italic>/<italic>s</italic> in 3 conditions: normal GAS strength, GAS strength decreased to 80% MIF, and GAS strength decreased to 60% MIF.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0007.tif"/>
</fig>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Muscle excitation of GAS and SOL during simulated walking at a speed of 1.45 <italic>m</italic>/<italic>s</italic> in 3 conditions: normal GAS strength, GAS strength decreased to 80% MIF, and GAS strength decreased to 60% MIF.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0008.tif"/>
</fig>
<p>When SOL muscle weakness was simulated, the hip tended to flex more during swing, and the ankle plantarflexed less; the ankle did not reach a plantarflexed position when SOL strength was reduced to 60% MIF (<xref ref-type="fig" rid="F9">Figure 9</xref>). The kinematic pattern shifted temporally to the left. With decreasing SOL strength, the SOL excitation decreased and the GAS excitation increased (<xref ref-type="fig" rid="F10">Figure 10</xref>).</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Sagittal plane angles of the hip, knee, and ankle during simulated walking at a speed of 1.45 <italic>m</italic>/<italic>s</italic> in 3 conditions: normal SOL strength, SOL strength decreased to 80% MIF, and SOL strength decreased to 60% MIF.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0009.tif"/>
</fig>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Muscle excitation of GAS and SOL during simulated walking at a speed of 1.45 <italic>m</italic>/<italic>s</italic> in 3 conditions; normal SOL strength, SOL strength decreased to 80% MIF, and SOL strength decreased to 60% MIF.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnbot-17-1244417-g0010.tif"/>
</fig></sec></sec>
<sec sec-type="discussion" id="s4">
<title>4. Discussion</title>
<p>In this study, we show that human walking can be reproduced with a musculoskeletal model realistically by solving the MDP problem using CMA-ES. The successful reproduction of human walking can be attributed to the reflex-based controller that mimics simple feedback laws based on sensory data accessible at the spinal cord, such as muscle length, speed, force, and foot contact information and the specialized reward function that can guide the model to achieve physiologically realistic waking. We experimented with different reward components, and found that adding pelvic tilt makes the model converge to the target speed faster and produce better joint kinematics. Pelvic tilt plays a critical role in determining the alignment of the spine, hips, and lower limbs, which in turn affects muscle activations, joint forces, and stability. Incorporating it into simulations enhances the realism of the model by accurately representing how the pelvis&#x00027;s orientation impacts the rest of the body&#x00027;s kinematics and kinetics. Particularly, the pelvis tilt component in the reward function can guide the model to maintain an upright upper body position, and increase muscle activation to achieve higher walking speed as required. Without this component, the model would conveniently tilt the upper body and use gravity to accelerate the speed, ultimately leading to either a fall or a less natural walking position. The controller adopted in our model outputs muscle excitation signals that correspond reasonably well with reported muscle activation during normal gait (Perry and Davids, <xref ref-type="bibr" rid="B27">1992</xref>). These excitation signals could eventually generate biologically plausible torque patterns, whereas other controllers often employ inefficient or even impossible torque patterns for humans (Wang et al., <xref ref-type="bibr" rid="B44">2012</xref>). However, not all muscle activities from our simulation (<xref ref-type="fig" rid="F6">Figure 6</xref>) agree well with observed muscle activity during human gait, and do not corroborate previous literature using similar models (e.g., Geyer and Herr, <xref ref-type="bibr" rid="B14">2010</xref>; Song and Geyer, <xref ref-type="bibr" rid="B35">2015</xref>). This disagreement can at least in part be attributed to the complexity of rigid differential equations, combined with the sensitivity of movement simulations to changes in muscle excitations and kinematics, which makes predicting and optimizing movement patterns challenging. This is a common issue in fields that merge biomechanics, robotics, and physics simulations where accurate representation of dynamic systems is crucial. To overcome this issue, Falisse et al. (<xref ref-type="bibr" rid="B11">2019</xref>) used direct collocation to solve optimal control to reduce the sensitivity of the cost function. However, their simulated kinematics did not entirely agree with those observed experimentally. They suggest co-contraction can play a stabilizing role during walking, which does agree with our simulation. The computed gait kinematics, vertical GRF, and muscle excitations were compared and evaluated under a variety of target walking speeds. We even computed solutions with simulated muscle weakness of the plantarflexors SOL and GAS, to demonstrate the potential use of the simulations in clinical applications.</p>
<p>In the simulation without a prescribed gait speed, the model converged to a steady walking speed of 1.45 <italic>m</italic>/<italic>s</italic>, even with various random control parameters and initial states. This indicates that the model is robust enough to not be influenced by prior guesses of the system. Our result corroborates findings by Umberger et al. (<xref ref-type="bibr" rid="B42">2003</xref>) whose model eventually developed a constant walking speed, provided that the objective functions, e.g., a minimal error between simulated and reference trajectory and minimal muscle effort, were optimized. Ackermann and Van den Bogert (<xref ref-type="bibr" rid="B1">2010</xref>) and Miller et al. (<xref ref-type="bibr" rid="B21">2012</xref>) used a &#x0201C;predictive&#x0201D; approach without tracking experimental data, in which the data-tracking solution had to serve as an initial guess for the control variables. A major benefit of our approach is that the forward simulation is free from any reference data such as captured motion and GRF data, even for setting initial control variables. As for other RL approaches used to produce human-like walking with a musculoskeletal model, Song et al. (<xref ref-type="bibr" rid="B36">2021</xref>) report a simulated gait pattern that did not resemble a natural gait pattern. Similar to Song et al., the simulated ankle kinematics from our study did not agree well with observed ankle kinematics during experiments. For example, our simulation did not display ankle, forefoot or toe rockers. Our simulated gait did exhibit some expected phases, such as heel rocker and dorsiflexion during swing to achieve foot clearance (<xref ref-type="fig" rid="F3">Figure 3</xref>). Our simulation also exhibited knee flexion during loading response, though somewhat less than observed data. We speculate that the model&#x00027;s knee joint reaction force is much higher than in reality. Since the reward function in our method does not penalize the joint reaction force, this can cause unnatural kinematics if a large joint reaction force is present.</p>
<p>In simulations over a range of walking speeds, our model was capable of developing stable gaits at different speeds with the prescribed walking speed in the reward function. We were able to predict expected temporal shifts toward the shorter stance phase with increasing walking speeds, as well as expected increases in hip and ankle sagittal plane motion and reduced gait variability at higher speeds (<xref ref-type="fig" rid="F4">Figure 4</xref>). Our findings of gait variability agree with experimental findings by Terrier and Schutz (<xref ref-type="bibr" rid="B39">2003</xref>) who reported low intra-subject gait variability at preferred and high speeds, but higher variability at low walking speed. Schwartz et al. (<xref ref-type="bibr" rid="B30">2008</xref>) reported in an experimental study increases in maximum knee flexion during swing with higher walking speeds, which we did not see in our simulations. We attribute this to the small excitation of the knee flexor BFSH in the model. Ong et al. (<xref ref-type="bibr" rid="B24">2019</xref>) found a similar trend in their musculoskeletal simulations. The increased ankle plantarflexion at fast walking speeds suggests the control framework responded accordingly to find a solution to adjust the gait kinematics to a fast walking pace. The vertical GRF shows a local vertical GRF peak in loading response at all speeds and a greater standard deviation at 1.1 <italic>m</italic>/<italic>s</italic> than at higher speeds (<xref ref-type="fig" rid="F5">Figure 5</xref>), which suggests that the control framework could more easily converge to steady gait solutions in medium and fast walking speeds than at slow speeds. In the muscle excitation predicted in our simulations (<xref ref-type="fig" rid="F6">Figure 6</xref>), the hip extensors GMAX and HAM were activated in loading response, i.e., when the hip extended to advance the trunk over the support limb. The hip flexor ILPSO was activated during pre-swing and initial swing, resisting hip extension during stance and reversing the hip into flexion during swing. The knee extensors VAS and RF activated eccentrically to restrain knee flexion in loading response. The ankle plantarflexors GAS and SOL were activated during mid-stance, late stance, and preswing, i.e., when their activity is expected to first control tibial advancement then to propel the leg into swing. The dorsiflexor TA was active throughout swing, to contribute to foot clearance. The expected TA activity in loading response to prevent foot drop was, however, not predicted in our simulation. The expected hamstrings activation in late stance to decelerate the knee extension was also not predicted in our simulation.</p>
<p>In simulations in which muscle weakness in GAS and SOL was modeled, the produced gait kinematics were slightly different that with full muscle strength, wherein simulated SOL weakness influenced all kinematics, particularly ankle kinematics, more than GAS weakness (<xref ref-type="fig" rid="F7">Figures 7</xref>, <xref ref-type="fig" rid="F9">9</xref>). This is likely attributable to the muscle parameters in the musculoskeletal model; the uniarticular SOL is stronger than the biarticular GAS in the model, i.e., the modeled SOL MIF is higher than the modeled GAS MIF. The model could more easily compensate for GAS weakness with minimal kinematic changes than for SOL weakness. According to van der Krogt et al. (<xref ref-type="bibr" rid="B43">2012</xref>) who simulated how muscle weakness can be compensated by synergies in normal walking, GAS weakness led to increased SOL activation, and SOL weakness likewise led to increased GAS activation. Compensations for the weakness of individual muscles included increases in activation in unimpaired muscles, but not necessary increases in the impaired muscle&#x00027;s activation, corroborating our findings in muscle activation in <xref ref-type="fig" rid="F8">Figures 8</xref>, <xref ref-type="fig" rid="F10">10</xref>. While Ong et al. (<xref ref-type="bibr" rid="B24">2019</xref>) found decreased walking speed when weakness in plantarflexor muscles was simulated, attributed to reduced push-off force in pre-swing, our simulation indicates that the prescribed walking speed can still be maintained, as long as the synergistic ankle plantarflexor can compensate for the weak muscle. Unlike the study by Falisse et al. (<xref ref-type="bibr" rid="B11">2019</xref>) in which gait was simulated with muscle weakness by imposing gait symmetry over a complete gait cycle and reserve actuators to prevent the simulation from falling, our model did not assume periodicity of the gait cycle and included no reserve actuator or residuals to guarantee that the model could still achieve steady walking. Nevertheless, the model still managed to perform locomotive behaviors without non-physical compensatory forces commonly seen in other physics-based environments. It is worth pointing out that our simulations with weakness were created to demonstrate how this model can be applied to study optimal phenomena in walking with or without muscle weakness; while accurate and individualized representation of pathological gait is on the horizon, it will require individualized muscle parameters, accurate reproduction of internal and external forces, and possibly subjective factors that affect how a person interacts with the external environment.</p>
<p>There are some limitations in the current study. Our simulation was not able to accurately represent realistic ankle kinematics; we speculate that more realistic kinematics may be achieved through computing and incorporating joint reaction forces in the cost function and with a more sophisticated contact model. The musculoskeletal model used in the simulation was also limited in that it is a 2D planar model and can thereby not represent the full characteristics of gait. In the current study, the gait controller was based on a reflex-based framework (Song and Geyer, <xref ref-type="bibr" rid="B35">2015</xref>), though modified to not activate hip abductors and adductors; we restricted motion to the sagittal plane only, as the reinforcement learning algorithm could not converge to identify optimal control parameters in 3D. This warrants future implementation using a 3D musculoskeletal model that at least accounts for more degrees of freedom such as hip ab/adduction and hip rotation, which can stabilize the hip in the frontal plane and allow foot clearance with a less sagittal plane hip range of motion. We only present limited simulations of walking in the present study, whereas simulation of different conditions such as inclined or uneven surfaces, which require further adjustments of the OpenSim-RL environment, can further challenge the robustness of the RL approach. Computational efficiency is another limitation and was not prioritized in this study; the aim of our approach was instead to build a bridge between musculoskeletal modeling and reinforcement learning. Direct collocation could tremendously reduce the computational cost, but it normally optimizes for one footstep and its implementation in this simulation to encode &#x0201C;robustness&#x0201D; in the solution may be challenging, whereas the single-shooting with CMA-ES in our study optimized for multiple steps as it naturally does.</p></sec>
<sec sec-type="conclusions" id="s5">
<title>5. Conclusion</title>
<p>We present a model-based RL approach to simulate realistic human walking in a musculoskeletal model, first allowing the model to settle on a stable speed, then given faster and slower target speeds. The computed kinematics, ground reaction forces, and muscle excitation patterns and trends correspond reasonably well with those from reported normal gait, as indicated by good correlation of hip and knee kinematics and by similar trends over a range of walking speeds, with exception to ankle kinematics, which were not realistic in simulations. We further generated pathological gaits that result from ankle plantarflexor muscle weakness using the same approach. Our simulation results illustrate that the proposed approach can reliably find solutions to perform steady locomotion that are not sensitive to the initial guess of the control parameters and states. The simulations were achieved in the absence of reference motion data from motion capture. With the proposed RL framework, neuromechanical simulations can be developed to model versatile human movements and predict human motor behavior.</p></sec>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p></sec>
<sec sec-type="ethics-statement" id="s7">
<title>Ethics statement</title>
<p>The studies involving human participants were reviewed and approved by the Swedish Ethical Review Authority. The participants provided their written informed consent to participate in this study.</p></sec>
<sec sec-type="author-contributions" id="s8">
<title>Author contributions</title>
<p>BS formulated the problem, performed the simulation, analyzed the results, and drafted the original manuscript. EG-F supervised the research process and gave concrete advice during implementation. All authors critically reviewed the manuscript. All authors contributed to the article and approved the submitted version.</p></sec>
</body>
<back>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>This work was generously funded by the Promobilia Foundation (ref nr. 20300) and by the Swedish Research Council (grant no. 2018-00750).</p>
</sec>
<ack><p>The experimental data were kindly shared by Israel Luis Pe&#x000F1;a, Ph.D. student at the KTH MoveAbility Lab.</p>
</ack>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="s11">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fnbot.2023.1244417/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fnbot.2023.1244417/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Video_1.MP4" id="SM1" mimetype="video/mp4" xmlns:xlink="http://www.w3.org/1999/xlink"/></sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ackermann</surname> <given-names>M.</given-names></name> <name><surname>Van den Bogert</surname> <given-names>A. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Optimality principles for model-based prediction of human gait</article-title>. <source>J. Biomech.</source> <volume>43</volume>, <fpage>1055</fpage>&#x02013;<lpage>1060</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbiomech.2009.12.012</pub-id><pub-id pub-id-type="pmid">20074736</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anderson</surname> <given-names>F. C.</given-names></name> <name><surname>Pandy</surname> <given-names>M. G.</given-names></name></person-group> (<year>1993</year>). <article-title>Storage and utilization of elastic strain energy during jumping</article-title>. <source>J. Biomech.</source> <volume>26</volume>, <fpage>1413</fpage>&#x02013;<lpage>1427</lpage>.<pub-id pub-id-type="pmid">8308046</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anderson</surname> <given-names>F. C.</given-names></name> <name><surname>Pandy</surname> <given-names>M. G.</given-names></name></person-group> (<year>2001</year>). <article-title>Dynamic optimization of human walking</article-title>. <source>J. Biomech. Eng.</source> <volume>123</volume>, <fpage>381</fpage>&#x02013;<lpage>390</lpage>. <pub-id pub-id-type="doi">10.1115/1.1392310</pub-id><pub-id pub-id-type="pmid">11601721</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bohannon</surname> <given-names>R. W.</given-names></name></person-group> (<year>1997</year>). <article-title>Comfortable and maximum walking speed of adults aged 20&#x02013;79 years: reference values and determinants</article-title>. <source>Age Ageing</source> <volume>26</volume>, <fpage>15</fpage>&#x02013;<lpage>19</lpage>.<pub-id pub-id-type="pmid">9143432</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brockman</surname> <given-names>G.</given-names></name> <name><surname>Cheung</surname> <given-names>V.</given-names></name> <name><surname>Pettersson</surname> <given-names>L.</given-names></name> <name><surname>Schneider</surname> <given-names>J.</given-names></name> <name><surname>Schulman</surname> <given-names>J.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>OpenAI gym</article-title>. <source>arXiv preprint arXiv:1606.01540</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1606.01540</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Groote</surname> <given-names>F.</given-names></name> <name><surname>Falisse</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Perspective on musculoskeletal modelling and predictive simulations of human movement to assess the neuromechanics of gait</article-title>. <source>Proc. R. Soc. B Biol. Sci.</source> <volume>288</volume>, <fpage>1946</fpage>. <pub-id pub-id-type="doi">10.1098/rspb.2020.2432</pub-id><pub-id pub-id-type="pmid">33653141</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Delp</surname> <given-names>S. L.</given-names></name> <name><surname>Anderson</surname> <given-names>F. C.</given-names></name> <name><surname>Arnold</surname> <given-names>A. S.</given-names></name> <name><surname>Loan</surname> <given-names>P.</given-names></name> <name><surname>Habib</surname> <given-names>A.</given-names></name> <name><surname>John</surname> <given-names>C. T.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>Opensim: open-source software to create and analyze dynamic simulations of movement</article-title>. <source>IEEE Trans. Biomed. Eng.</source> <volume>54</volume>, <fpage>1940</fpage>&#x02013;<lpage>1950</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2007.901024</pub-id><pub-id pub-id-type="pmid">18018689</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Delp</surname> <given-names>S. L.</given-names></name> <name><surname>Loan</surname> <given-names>J. P.</given-names></name> <name><surname>Hoy</surname> <given-names>M. G.</given-names></name> <name><surname>Zajac</surname> <given-names>F. E.</given-names></name> <name><surname>Topp</surname> <given-names>E. L.</given-names></name> <name><surname>Rosen</surname> <given-names>J. M.</given-names></name></person-group> (<year>1990</year>). <article-title>An interactive graphics-based model of the lower extremity to study orthopaedic surgical procedures</article-title>. <source>IEEE Trans. Biomed. Eng.</source> <volume>37</volume>, <fpage>757</fpage>&#x02013;<lpage>767</lpage>. <pub-id pub-id-type="doi">10.1109/10.102791</pub-id><pub-id pub-id-type="pmid">2210784</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Duan</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Houthooft</surname> <given-names>R.</given-names></name> <name><surname>Schulman</surname> <given-names>J.</given-names></name> <name><surname>Abbeel</surname> <given-names>P.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Benchmarking deep reinforcement learning for continuous control,&#x0201D;</article-title> in <source>International Conference on Machine Learning, Vol. 48</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>PMLR</publisher-name>), <fpage>1329</fpage>&#x02013;<lpage>1338</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.1604.06778</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eilenberg</surname> <given-names>M. F.</given-names></name> <name><surname>Geyer</surname> <given-names>H.</given-names></name> <name><surname>Herr</surname> <given-names>H.</given-names></name></person-group> (<year>2010</year>). <article-title>Control of a powered ankle&#x02013;foot prosthesis based on a neuromuscular model</article-title>. <source>IEEE Trans. Neural Syst. Rehabil. Eng.</source> <volume>18</volume>, <fpage>164</fpage>&#x02013;<lpage>173</lpage>. <pub-id pub-id-type="doi">10.1109/TNSRE.2009.2039620</pub-id><pub-id pub-id-type="pmid">20071268</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Falisse</surname> <given-names>A.</given-names></name> <name><surname>Serrancol&#x000ED;</surname> <given-names>G.</given-names></name> <name><surname>Dembia</surname> <given-names>C. L.</given-names></name> <name><surname>Gillis</surname> <given-names>J.</given-names></name> <name><surname>Jonkers</surname> <given-names>I.</given-names></name> <name><surname>De Groote</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Rapid predictive simulations with complex musculoskeletal models suggest that diverse healthy and pathological human gaits can emerge from similar control strategies</article-title>. <source>J. R. Soc. Interface</source> <volume>16</volume>, <fpage>20190402</fpage>. <pub-id pub-id-type="doi">10.1098/rsif.2019.0402</pub-id><pub-id pub-id-type="pmid">31431186</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Febrer Nafr&#x000ED;a</surname> <given-names>M.</given-names></name> <name><surname>Fregly</surname> <given-names>B. J.</given-names></name> <name><surname>Font-Llagunes</surname> <given-names>J. M.</given-names></name></person-group> (<year>2022</year>). <article-title>Evaluation of optimal control approaches for predicting active knee-ankle-foot-orthosis motion for individuals with spinal cord injury</article-title>. <source>Front. Neurorobot</source>. <volume>15</volume>, <fpage>748148</fpage>. <pub-id pub-id-type="doi">10.3389/fnbot.2021.748148</pub-id><pub-id pub-id-type="pmid">35140596</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fey</surname> <given-names>N. P.</given-names></name> <name><surname>Klute</surname> <given-names>G. K.</given-names></name> <name><surname>Neptune</surname> <given-names>R. R.</given-names></name></person-group> (<year>2012</year>). <article-title>Optimization of prosthetic foot stiffness to reduce metabolic cost and intact knee loading during below-knee amputee walking: a theoretical study</article-title>. <source>J. Biomech. Eng.</source> <volume>134</volume>, <fpage>111005</fpage>. <pub-id pub-id-type="doi">10.1115/1.4007824</pub-id><pub-id pub-id-type="pmid">23387787</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Geyer</surname> <given-names>H.</given-names></name> <name><surname>Herr</surname> <given-names>H.</given-names></name></person-group> (<year>2010</year>). <article-title>A muscle-reflex model that encodes principles of legged mechanics produces human walking dynamics and muscle activities</article-title>. <source>IEEE Trans. Neural Syst. Rehabil. Eng.</source> <volume>18</volume>, <fpage>263</fpage>&#x02013;<lpage>273</lpage>. <pub-id pub-id-type="doi">10.1109/TNSRE.2010.2047592</pub-id><pub-id pub-id-type="pmid">20378480</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>G&#x000FC;nther</surname> <given-names>M.</given-names></name> <name><surname>Keppler</surname> <given-names>V.</given-names></name> <name><surname>Seyfarth</surname> <given-names>A.</given-names></name> <name><surname>Blickhan</surname> <given-names>R.</given-names></name></person-group> (<year>2004</year>). <article-title>Human leg design: optimal axial alignment under constraints</article-title>. <source>J. Math. Biol.</source> <volume>48</volume>, <fpage>623</fpage>&#x02013;<lpage>646</lpage>. <pub-id pub-id-type="doi">10.1007/s00285-004-0269-3</pub-id><pub-id pub-id-type="pmid">15164226</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hansen</surname> <given-names>N.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>S. D.</given-names></name> <name><surname>Koumoutsakos</surname> <given-names>P.</given-names></name></person-group> (<year>2003</year>). <article-title>Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)</article-title>. <source>Evol. Comput.</source> <volume>11</volume>, <fpage>1</fpage>&#x02013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1162/106365603321828970</pub-id><pub-id pub-id-type="pmid">12804094</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heess</surname> <given-names>N.</given-names></name> <name><surname>TB</surname> <given-names>D.</given-names></name> <name><surname>Sriram</surname> <given-names>S.</given-names></name> <name><surname>Lemmon</surname> <given-names>J.</given-names></name> <name><surname>Merel</surname> <given-names>J.</given-names></name> <name><surname>Wayne</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Emergence of locomotion behaviours in rich environments</article-title>. <source>arXiv Preprint arXiv:1707.02286</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1707.02286</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kane</surname> <given-names>T. R.</given-names></name> <name><surname>Levinson</surname> <given-names>D. A.</given-names></name></person-group> (<year>1983</year>). <article-title>The use of Kane&#x00027;s dynamical equations in robotics</article-title>. <source>Int. J. Robot. Res.</source> <volume>2</volume>, <fpage>3</fpage>&#x02013;<lpage>21</lpage>.</citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keller</surname> <given-names>T. S.</given-names></name> <name><surname>Weisberger</surname> <given-names>A.</given-names></name> <name><surname>Ray</surname> <given-names>J.</given-names></name> <name><surname>Hasan</surname> <given-names>S.</given-names></name> <name><surname>Shiavi</surname> <given-names>R.</given-names></name> <name><surname>Spengler</surname> <given-names>D.</given-names></name></person-group> (<year>1996</year>). <article-title>Relationship between vertical ground reaction force and speed during walking, slow jogging, and running</article-title>. <source>Clin. Biomech.</source> <volume>11</volume>, <fpage>253</fpage>&#x02013;<lpage>259</lpage>.<pub-id pub-id-type="pmid">11415629</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kidzi&#x00144;ski</surname> <given-names>&#x00141;.</given-names></name> <name><surname>Mohanty</surname> <given-names>S. P.</given-names></name> <name><surname>Ong</surname> <given-names>C. F.</given-names></name> <name><surname>Hicks</surname> <given-names>J. L.</given-names></name> <name><surname>Carroll</surname> <given-names>S. F.</given-names></name> <name><surname>Levine</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>&#x0201C;Learning to run challenge: synthesizing physiologically accurate motion using deep reinforcement learning,&#x0201D;</article-title> in <source>The Springer Series on Challenges in Machine Learning book series (SSCML)</source> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>).</citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname> <given-names>R. H.</given-names></name> <name><surname>Umberger</surname> <given-names>B. R.</given-names></name> <name><surname>Hamill</surname> <given-names>J.</given-names></name> <name><surname>Caldwell</surname> <given-names>G. E.</given-names></name></person-group> (<year>2012</year>). <article-title>Evaluation of the minimum energy hypothesis and other potential optimality criteria for human running</article-title>. <source>Proc. R. Soc. B Biol. Sci.</source> <volume>279</volume>, <fpage>1498</fpage>&#x02013;<lpage>1505</lpage>. <pub-id pub-id-type="doi">10.1098/rspb.2011.2015</pub-id><pub-id pub-id-type="pmid">22072601</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Minetti</surname> <given-names>A.</given-names></name> <name><surname>Ardigo</surname> <given-names>L.</given-names></name> <name><surname>Saibene</surname> <given-names>F.</given-names></name></person-group> (<year>1994</year>). <article-title>The transition between walking and running in humans: metabolic and mechanical aspects at different gradients</article-title>. <source>Acta Physiol. Scand.</source> <volume>150</volume>, <fpage>315</fpage>&#x02013;<lpage>323</lpage>.<pub-id pub-id-type="pmid">8010138</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Neilson</surname> <given-names>P. D.</given-names></name> <name><surname>Neilson</surname> <given-names>M. D.</given-names></name></person-group> (<year>1999</year>). <article-title>A neuroengineering solution to the optimal tracking problem</article-title>. <source>Hum. Movement Sci.</source> <volume>18</volume>, <fpage>155</fpage>&#x02013;<lpage>183</lpage>.</citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ong</surname> <given-names>C. F.</given-names></name> <name><surname>Geijtenbeek</surname> <given-names>T.</given-names></name> <name><surname>Hicks</surname> <given-names>J. L.</given-names></name> <name><surname>Delp</surname> <given-names>S. L.</given-names></name></person-group> (<year>2019</year>). <article-title>Predicting gait adaptations due to ankle plantarflexor muscle weakness and contracture using physics-based musculoskeletal simulations</article-title>. <source>PLoS Comput. Biol.</source> <volume>15</volume>, <fpage>e1006993</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1006993</pub-id><pub-id pub-id-type="pmid">31589597</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pandy</surname> <given-names>M.</given-names></name> <name><surname>Garner</surname> <given-names>B.</given-names></name> <name><surname>Anderson</surname> <given-names>F.</given-names></name></person-group> (<year>1995</year>). <article-title>Optimal control of non-ballistic muscular movements: a constraint-based performance criterion for rising from a chair</article-title>. <source>J. Biomech. Eng</source>. <volume>117</volume>, <fpage>15</fpage>&#x02013;<lpage>26</lpage>.<pub-id pub-id-type="pmid">7609479</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>X. B.</given-names></name> <name><surname>Kanazawa</surname> <given-names>A.</given-names></name> <name><surname>Toyer</surname> <given-names>S.</given-names></name> <name><surname>Abbeel</surname> <given-names>P.</given-names></name> <name><surname>Levine</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Variational discriminator bottleneck: improving imitation learning, inverse RL, and GANs by constraining information flow</article-title>. <source>arXiv preprint arXiv:1810.00821</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1810.00821</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perry</surname> <given-names>J.</given-names></name> <name><surname>Davids</surname> <given-names>J. R.</given-names></name></person-group> (<year>1992</year>). <article-title>Gait analysis: normal and pathological function</article-title>. <source>J. Pediatr. Orthopaed.</source> <volume>12</volume>, <fpage>815</fpage>.</citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rajeswaran</surname> <given-names>A.</given-names></name> <name><surname>Kumar</surname> <given-names>V.</given-names></name> <name><surname>Gupta</surname> <given-names>A.</given-names></name> <name><surname>Vezzani</surname> <given-names>G.</given-names></name> <name><surname>Schulman</surname> <given-names>J.</given-names></name> <name><surname>Todorov</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Learning complex dexterous manipulation with deep reinforcement learning and demonstrations</article-title>. <source>arXiv preprint arXiv:1709.10087</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1709.10087</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schulman</surname> <given-names>J.</given-names></name> <name><surname>Moritz</surname> <given-names>P.</given-names></name> <name><surname>Levine</surname> <given-names>S.</given-names></name> <name><surname>Jordan</surname> <given-names>M.</given-names></name> <name><surname>Abbeel</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <article-title>High-dimensional continuous control using generalized advantage estimation</article-title>. <source>arXiv preprint ArXiv:1506.02438</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1506.02438</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwartz</surname> <given-names>M. H.</given-names></name> <name><surname>Rozumalski</surname> <given-names>A.</given-names></name> <name><surname>Trost</surname> <given-names>J. P.</given-names></name></person-group> (<year>2008</year>). <article-title>The effect of walking speed on the gait of typically developing children</article-title>. <source>J. Biomech.</source> <volume>41</volume>, <fpage>1639</fpage>&#x02013;<lpage>1650</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbiomech.2008.03.015</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seyfarth</surname> <given-names>A.</given-names></name> <name><surname>G&#x000FC;nther</surname> <given-names>M.</given-names></name> <name><surname>Blickhan</surname> <given-names>R.</given-names></name></person-group> (<year>2001</year>). <article-title>Stable operation of an elastic three-segment leg</article-title>. <source>Biol. Cybernet.</source> <volume>84</volume>, <fpage>365</fpage>&#x02013;<lpage>382</lpage>. <pub-id pub-id-type="doi">10.1007/PL00007982</pub-id><pub-id pub-id-type="pmid">11357549</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sharbafi</surname> <given-names>M. A.</given-names></name> <name><surname>Barazesh</surname> <given-names>H.</given-names></name> <name><surname>Iranikhah</surname> <given-names>M.</given-names></name> <name><surname>Seyfarth</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Leg force control through biarticular muscles for human walking assistance</article-title>. <source>Front. Neurorobot.</source> <volume>12</volume>, <fpage>39</fpage>. <pub-id pub-id-type="doi">10.3389/fnbot.2018.00039</pub-id><pub-id pub-id-type="pmid">30050426</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Silverman</surname> <given-names>A. K.</given-names></name> <name><surname>Neptune</surname> <given-names>R. R.</given-names></name></person-group> (<year>2012</year>). <article-title>Muscle and prosthesis contributions to amputee walking mechanics: a modeling study</article-title>. <source>J. Biomech.</source> <volume>45</volume>, <fpage>2271</fpage>&#x02013;<lpage>2278</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbiomech.2012.06.008</pub-id><pub-id pub-id-type="pmid">22840757</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>S.</given-names></name> <name><surname>Geyer</surname> <given-names>H.</given-names></name></person-group> (<year>2013</year>). <article-title>&#x0201C;Generalization of a muscle-reflex control model to 3d walking,&#x0201D;</article-title> in <source>2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)</source> (<publisher-loc>Osaka</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>7463</fpage>&#x02013;<lpage>7466</lpage>. <pub-id pub-id-type="doi">10.1109/EMBC.2013.6611284</pub-id><pub-id pub-id-type="pmid">24111471</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>S.</given-names></name> <name><surname>Geyer</surname> <given-names>H.</given-names></name></person-group> (<year>2015</year>). <article-title>A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion</article-title>. <source>J. Physiol.</source> <volume>593</volume>, <fpage>3493</fpage>&#x02013;<lpage>3511</lpage>. <pub-id pub-id-type="doi">10.1113/JP270228</pub-id><pub-id pub-id-type="pmid">25920414</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>S.</given-names></name> <name><surname>Kidzi&#x00144;ski</surname> <given-names>&#x00141;.</given-names></name> <name><surname>Peng</surname> <given-names>X. B.</given-names></name> <name><surname>Ong</surname> <given-names>C.</given-names></name> <name><surname>Hicks</surname> <given-names>J. L.</given-names></name> <name><surname>Levine</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Deep reinforcement learning for modeling human locomotion control in neuromechanical simulation</article-title>. <source>J. NeuroEngineering Rehabil.</source> <volume>18</volume>, <fpage>126</fpage>. <pub-id pub-id-type="doi">10.1186/s12984-021-00919-y</pub-id><pub-id pub-id-type="pmid">34399772</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sutton</surname> <given-names>R. S.</given-names></name> <name><surname>Barto</surname> <given-names>A. G.</given-names></name></person-group> (<year>2018</year>). <source>Reinforcement Learning: An Introduction</source>. <edition>2nd Ed</edition>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Suzuki</surname> <given-names>Y.</given-names></name></person-group> (<year>2010</year>). <article-title>Dynamic optimization of transfemoral prosthesis during swing phase with residual limb model</article-title>. <source>Prosthet. Orthot. Int.</source> <volume>34</volume>, <fpage>428</fpage>&#x02013;<lpage>438</lpage>. <pub-id pub-id-type="doi">10.3109/03093646.2010.484829</pub-id><pub-id pub-id-type="pmid">20521999</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terrier</surname> <given-names>P.</given-names></name> <name><surname>Schutz</surname> <given-names>Y.</given-names></name></person-group> (<year>2003</year>). <article-title>Variability of gait patterns during unconstrained walking assessed by satellite positioning (GPS)</article-title>. <source>Eur. J. Appl. Physiol.</source> <volume>90</volume>, <fpage>554</fpage>&#x02013;<lpage>561</lpage>. <pub-id pub-id-type="doi">10.1007/s00421-003-0906-3</pub-id><pub-id pub-id-type="pmid">12905048</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Thatte</surname> <given-names>N.</given-names></name> <name><surname>Duan</surname> <given-names>H.</given-names></name> <name><surname>Geyer</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;A method for online optimization of lower limb assistive devices with high dimensional parameter spaces,&#x0201D;</article-title> in <source>2018 IEEE International Conference on Robotics and Automation (ICRA)</source> (<publisher-loc>Brisbane, QLD</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/ICRA.2018.8460953</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thelen</surname> <given-names>D. G.</given-names></name> <name><surname>Anderson</surname> <given-names>F. C.</given-names></name> <name><surname>Delp</surname> <given-names>S. L.</given-names></name></person-group> (<year>2003</year>). <article-title>Generating dynamic simulations of movement using computed muscle control</article-title>. <source>J. Biomech.</source> <volume>36</volume>, <fpage>321</fpage>&#x02013;<lpage>328</lpage>. <pub-id pub-id-type="doi">10.1016/S0021-9290(02)00432-3</pub-id><pub-id pub-id-type="pmid">12594980</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Umberger</surname> <given-names>B. R.</given-names></name> <name><surname>Gerritsen</surname> <given-names>K. G.</given-names></name> <name><surname>Martin</surname> <given-names>P. E.</given-names></name></person-group> (<year>2003</year>). <article-title>A model of human muscle energy expenditure</article-title>. <source>Comput. Methods Biomech. Biomed. Eng.</source> <volume>6</volume>, <fpage>99</fpage>&#x02013;<lpage>111</lpage>. <pub-id pub-id-type="doi">10.1080/1025584031000091678</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van der Krogt</surname> <given-names>M. M.</given-names></name> <name><surname>Delp</surname> <given-names>S. L.</given-names></name> <name><surname>Schwartz</surname> <given-names>M. H.</given-names></name></person-group> (<year>2012</year>). <article-title>How robust is human gait to muscle weakness?</article-title> <source>Gait Post.</source> <volume>36</volume>, <fpage>113</fpage>&#x02013;<lpage>119</lpage>. <pub-id pub-id-type="doi">10.1016/j.gaitpost.2012.01.017</pub-id><pub-id pub-id-type="pmid">22386624</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>J. M.</given-names></name> <name><surname>Hamner</surname> <given-names>S. R.</given-names></name> <name><surname>Delp</surname> <given-names>S. L.</given-names></name> <name><surname>Koltun</surname> <given-names>V.</given-names></name></person-group> (<year>2012</year>). <article-title>Optimizing locomotion controllers using biologically-based actuators and objectives</article-title>. <source>ACM Trans. Graph.</source> <volume>31</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1145/2185520.2185521</pub-id><pub-id pub-id-type="pmid">26251560</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zajac</surname> <given-names>F. E.</given-names></name></person-group> (<year>1989</year>). <article-title>Muscle and tendon: properties, models, scaling, and application to biomechanics and motor control</article-title>. <source>Crit. Rev. Biomed. Eng.</source> <volume>17</volume>, <fpage>359</fpage>&#x02013;<lpage>411</lpage>.<pub-id pub-id-type="pmid">2676342</pub-id></citation></ref>
</ref-list>
<app-group>
<app id="A1">
<title>Appendix</title>
<p>Eight able-bodied adults (5 men/3 women, mean &#x000B1; standard deviation age: 37.8 &#x000B1; 9.6 years old, height: 1.76 &#x000B1; 0.10 m, body mass: 76.6 &#x000B1; 14.4 kg) participated in this experiment. The experiments were approved by the Swedish Ethical Review Authority (Dnr. 2020-02311), and all participants provided written consent. Participation was voluntary and could be terminated at any time during the experiment. For each subject, cadence was recorded while subjects walked on a treadmill at 70, 85, 100, 115, and 130% of their preferred walking speed (PWS) in randomized order. The PWS was determined by the participant&#x00027;s gender, age, and height (Bohannon, <xref ref-type="bibr" rid="B4">1997</xref>). Then, subjects walked along a 10-m pathway in an instrumented motion lab at five speeds by matching their cadences from the treadmill at each speed. Marker positions (100 Hz) and ground reaction force (1,000 Hz) were measured using optical motion capture (Vicon V16) and strain gauge force platforms (AMTI, Watertown, MA, USA), respectively. Full-body marker placement was implemented based on the Conventional Gait Model with the extended-foot model (CGM 2.4). Joint kinematics were calculated based on marker coordinates using the Inverse Kinematics (IK) Tool in Opensim (Delp et al., <xref ref-type="bibr" rid="B7">2007</xref>). Between 5 and 10 gait cycles per person and speed were analyzed, and the side, i.e., left or right, was chosen at random for each person.</p>
</app>
</app-group>
</back>
</article>