<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Robot. AI</journal-id>
<journal-title>Frontiers in Robotics and AI</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Robot. AI</abbrev-journal-title>
<issn pub-type="epub">2296-9144</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">736644</article-id>
<article-id pub-id-type="doi">10.3389/frobt.2021.736644</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Robotics and AI</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>General Framework for the Optimization of the Human-Robot Collaboration Decision-Making Process Through the Ability to Change Performance Metrics</article-title>
<alt-title alt-title-type="left-running-head">Hani Daniel Zakaria et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Optimizing Human-Robot Collaboration</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Hani Daniel Zakaria</surname>
<given-names>M&#xe9;lodie</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1359520/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lengagne</surname>
<given-names>S&#xe9;bastien</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1121975/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Corrales Ram&#xf3;n</surname>
<given-names>Juan Antonio</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/872811/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mezouar</surname>
<given-names>Youcef</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/314256/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>CNRS, Clermont Auvergne INP, Institut Pascal, Universit&#xe9; Clermont Auvergne, <addr-line>Clermont-Ferrand</addr-line>, <country>France</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Centro Singular de Investigaci&#xf3;n en Tecnolox&#xed;as Intelixentes (CiTIUS), Universidade de Santiago de Compostela, <addr-line>Santiago de Compostela</addr-line>, <country>Spain</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/753046/overview">Yanan Li</ext-link>, University of Sussex, United&#x20;Kingdom</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1350484/overview">Xiaoxiao Cheng</ext-link>, Imperial College London, United&#x20;Kingdom</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1421719/overview">Selma Music</ext-link>, Technical University of Munich, Germany</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: M&#xe9;lodie Hani Daniel Zakaria, <email>Melodie.HANI_DANIEL_ZAKARIA@uca.fr</email>, <email>melodie.daniel@yahoo.fr</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Human-Robot Interaction, a section of the journal Frontiers in Robotics and&#x20;AI</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>25</day>
<month>10</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>8</volume>
<elocation-id>736644</elocation-id>
<history>
<date date-type="received">
<day>05</day>
<month>07</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>09</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Hani Daniel Zakaria, Lengagne, Corrales Ram&#xf3;n and Mezouar.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Hani Daniel Zakaria, Lengagne, Corrales Ram&#xf3;n and Mezouar</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>This paper proposes a new decision-making framework in the context of Human-Robot Collaboration (HRC). State-of-the-art techniques consider the HRC as an optimization problem in which the utility function, also called reward function, is defined to accomplish the task regardless of how well the interaction is performed. When the performance metrics are considered, they cannot be easily changed within the same framework. In contrast, our decision-making framework can easily handle the change of the performance metrics from one case scenario to another. Our method treats HRC as a constrained optimization problem where the utility function is split into two main parts. Firstly, a constraint defines how to accomplish the task. Secondly, a reward evaluates the performance of the collaboration, which is the only part that is modified when changing the performance metrics. It gives control over the way the interaction unfolds, and it also guarantees the adaptation of the robot actions to the human ones in real-time. In this paper, the decision-making process is based on Nash Equilibrium and perfect-information extensive form from game theory. It can deal with collaborative interactions considering different performance metrics such as optimizing the time to complete the task, considering the probability of human errors, etc. Simulations and a real experimental study on &#x201c;an assembly task&#x201d; -i.e.,&#x20;a game based on a construction kit-illustrate the effectiveness of the proposed framework.</p>
</abstract>
<kwd-group>
<kwd>human-robot collaboration</kwd>
<kwd>decision-making</kwd>
<kwd>game theory</kwd>
<kwd>Nash equilibrium</kwd>
<kwd>interaction optimality</kwd>
</kwd-group>
<contract-num rid="cn001">869855</contract-num>
<contract-sponsor id="cn001">Horizon 2020<named-content content-type="fundref-id">10.13039/501100007601</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">R&#xe9;gion Auvergne-Rh&#xf4;ne-Alpes<named-content content-type="fundref-id">10.13039/501100010115</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Nowadays, Human-Robot Collaboration (HRC) is a fast-growing sector in the robotics domain. HRC aims to make everyday human tasks easier. It is a challenging research field that interacts with many others: psychology, cognitive science, sociology, artificial intelligence, and computer science (<xref ref-type="bibr" rid="B37">Seel, 2012</xref>). HRC is based on the exchange of information between humans and robots sharing a common environment to achieve a task as teammates with a common goal (<xref ref-type="bibr" rid="B1">Ajoudani et&#x20;al., 2018</xref>).</p>
<p>HRC applications can have social and/or physical benefits for humans (<xref ref-type="bibr" rid="B3">B&#xfc;tepage and Kragic, 2017</xref>). Social collaboration tasks include social, emotional and cognitive aspects (<xref ref-type="bibr" rid="B9">Durantin et&#x20;al., 2017</xref>) such as care for the elderly (<xref ref-type="bibr" rid="B42">Wagner-Hartl et&#x20;al., 2020</xref>), therapy (<xref ref-type="bibr" rid="B5">Clabaugh et&#x20;al., 2019</xref>), companionship (<xref ref-type="bibr" rid="B17">Hosseini et&#x20;al., 2017</xref>), and education (<xref ref-type="bibr" rid="B35">Rosenberg-Kima et&#x20;al., 2019</xref>). Social robots, such as Nao, Pepper, iCub, etc., are dedicated to this type of task; however, their physical abilities are very limited (<xref ref-type="bibr" rid="B32">Nocentini et&#x20;al., 2019</xref>). For the physical HRC (pHRC), physical contacts are necessary to perform the task. They can occur directly between humans and robots or indirectly through the environment (<xref ref-type="bibr" rid="B1">Ajoudani et&#x20;al., 2018</xref>). pHRC applications are mainly used in industrial environments [e.g., assembly, handling, surface polishing, welding, etc., (<xref ref-type="bibr" rid="B25">Maurtua et&#x20;al., 2017</xref>)]. pHRC is also used in the Advanced Driver-Assistance Systems (ADAS) for autonomous cars (<xref ref-type="bibr" rid="B11">Flad et&#x20;al., 2014</xref>).</p>
<p>Robots can adapt to humans in different situations by implementing five steps in a decision-making process (<xref ref-type="bibr" rid="B28">Negulescu, 2014</xref>): 1) gathering relevant information on possible actions, environment, and agents, 2) identifying alternatives, 3) weighing evidence, 4) choosing alternatives and selecting actions, and 5) examining the consequences of decisions. These steps are usually modeled in computer science using a decision-making method with a strategy and a utility function (<xref ref-type="bibr" rid="B12">F&#xfc;l&#xf6;p, 2005</xref>). The decision-making method models the whole situation (environment, actions, agents, task restrictions, etc.). The strategy defines the policy of choosing actions based on the value of their reward. The utility function (i.e.,&#x20;reward function) evaluates each action for each alternative by attributing a reward to&#x20;it.</p>
<p>On the one hand, previous works, known as leader-follower systems (<xref ref-type="bibr" rid="B3">B&#xfc;tepage and Kragic, 2017</xref>), focused the decision process on choosing the actions that increase the robot&#x2019;s abilities to accomplish the task without considering how the collaboration is done. Such as in <xref ref-type="bibr" rid="B8">DelPreto and Rus (2019)</xref>, where a human-robot collaborative team is lifting an object together, or in <xref ref-type="bibr" rid="B19">Kwon et&#x20;al. (2019)</xref>, where robots influence humans to change the pre-defined leader-follower agents to rescue more people when there is a plane or ship crash in the&#x20;sea.</p>
<p>On the other hand, other works deal with maximizing the collaboration performance by promoting mutual adaptation (<xref ref-type="bibr" rid="B31">Nikolaidis et&#x20;al., 2017b</xref>; <xref ref-type="bibr" rid="B4">Chen et&#x20;al., 2020</xref>) or reconsidering the task allocation (<xref ref-type="bibr" rid="B24">Malik and Bilberg, 2019</xref>). However, they only consider one or two unchangeable performance metrics for this evaluation in their utility function: postural or ergonomic optimization (<xref ref-type="bibr" rid="B38">Sharkawy et&#x20;al., 2020</xref>), time consumption (<xref ref-type="bibr" rid="B43">Weitschat and Aschemann, 2018</xref>), trajectory optimization (<xref ref-type="bibr" rid="B10">Fishman et&#x20;al., 2019</xref>), cognitive aspects (<xref ref-type="bibr" rid="B41">Tanevska et&#x20;al., 2020</xref>), and reduction of the number of human errors (<xref ref-type="bibr" rid="B40">Tabrez and Hayes, 2019</xref>).</p>
<p>In this paper, we optimize and quantitatively assess the collaboration between robots and humans based on the resulting impact of some changeable performance metrics on human agents. Hence, an optimized collaboration aims to bring a benefit to humans, such as getting the task done faster or reducing the effort of human agents. However, an unoptimized collaboration will bring nothing to humans or, on the contrary, will represent a nuisance, such as slowing them down or overloading them, even if the task is finally accomplished. The main contribution of this paper is that the proposed framework allows optimizing the performance, based on some changeable metrics, of the collaboration between one or more humans and one or more robots. Contrary to previous works, our framework allows us to easily change the performance metrics without changing the whole way the task is formalized since we isolate the impact of the metrics in the utility function.</p>
<p>The benefit of this contribution is to increase the collaboration performance without having to ameliorate the robot&#x2019;s abilities. This is important in relevant practical cases: for instance, when using social robots that have great limitations (e.g., slowness in their movements and/or reduced dexterity), and it is not easy or even possible to ameliorate their abilities drastically. Therefore, our work provides an interesting solution to enhance collaboration performance with such limited robots.</p>
<p>Our framework uses the state-of-the-art decision-making process composed of: a decision-making method, a strategy, and a utility function. We divide the utility function into two main parts: the collaboration performance evaluated by a reward according to one or several performance metrics, and the task accomplishment, which is considered as a constraint since we only deal with achievable&#x20;tasks.</p>
<p>The paper is organized as follows. First, we review related work in <xref ref-type="sec" rid="s2">Section 2</xref>. Then, we present our framework formalization in <xref ref-type="sec" rid="s3">Section 3</xref>. <xref ref-type="sec" rid="s4">Section 4</xref> includes all the details regarding the decision-making process. The effectiveness of our new formalization is shown in <xref ref-type="sec" rid="s5">Section 5</xref> based on simulated and experimental tests of an assembly task (i.e.,&#x20;a game<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref> that involves placing cubes to build a path between two figurines) shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>. Finally, we sum up the effectiveness of our contribution and discuss the possible improvements in <xref ref-type="sec" rid="s6">Section&#x20;6</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Agents solving the Camelot Jr. game. <bold>(A)</bold> Agents play sequentially: the human starts to play, and then it is the robot&#x2019;s turn. <bold>(B)</bold> This puzzle starts with four cubes to assemble. <bold>(C)</bold> The cubes are correctly assembled, and the puzzle is solved (i.e.,&#x20;a path composed by cubes is created between both figurines).</p>
</caption>
<graphic xlink:href="frobt-08-736644-g001.tif"/>
</fig>
</sec>
<sec id="s2">
<title>2 Related Work</title>
<p>In this section, we present the most popular methods, strategies, utility functions, and performance metrics used in the decision-making process of human-robot collaboration to place our contributions with regard to them. A decision-making method models the relationship between the agents, the actions, the environment, the task, etc. A strategy defines how to select the optimal actions each agent can choose based on the reward (utility) calculated by the utility function for each action. An optimal action profile is made up of the best actions each agent can choose. All methods and strategies can be used to perform different tasks, and there is no pre-written rule that implies that one will necessarily perform better than the others.</p>
<sec id="s2-1">
<title>2.1&#x20;Decision-Making Methods</title>
<p>Decision-making methods are used, as mentioned before, to model the relationship between the task, the agents accomplishing it, their actions, and their impact on the environment. Probabilistic methods, deep learning, and game theory are considered among the most widespread decision-making methods.</p>
<p>Probabilistic methods are the first and most widely used in decision-making processes. Markov&#x2019;s decision-making processes (e.g., Markov chains) are the most used ones. There are also studies based on other methods such as Gaussian processes (e.g., Gaussian mixtures), Bayesian processes (e.g., likelihood functions), and graph theory. In <xref ref-type="bibr" rid="B36">Roveda et&#x20;al. (2021)</xref>, a Hidden Markov Model (HMM) is used to teach the robot how to achieve the task based on human demonstrations, and an algorithm based on Bayesian Optimization (BO) is used to maximize task performance (i.e.,&#x20;avoid task failures while reducing the interaction force), and to enable the robot to compensate for task uncertainties.</p>
<p>The interest in using deep learning in decision-making methods began very early due to unsatisfactory results of probabilistic methods in managing uncertainties in complex tasks. In <xref ref-type="bibr" rid="B33">Oliff et&#x20;al. (2020)</xref>, Deep Q-Networks (DQN) are used to adapt robot behavior to human behavior changes during industrial tasks. The drawbacks of deep learning methods are the computation cost and the slowness of learning.</p>
<p>Game theory methods in decision-making processes have only recently been exploited. They can model most of the tasks performed by a group of agents (players) in collaboration or competition, whether the choice of actions is simultaneous [normal form also called matrix form modeling (<xref ref-type="bibr" rid="B6">Conitzer and Sandholm, 2006</xref>)] or sequential [extensive form also known as tree form modeling (<xref ref-type="bibr" rid="B20">Leyton-Brown and Shoham, 2008</xref>)]. The game theory methods have been used in different HRC applications, for instance, in analyzing and detecting the human behavior to adapt the robot&#x2019;s one to it for reaching a better collaboration performance (<xref ref-type="bibr" rid="B18">Jarrass&#xe9; et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B22">Li et&#x20;al., 2019</xref>). Game theory has been also utilized in HRC in mutual adaptation to achieve industrial assembly scenarios (<xref ref-type="bibr" rid="B13">Gabler et&#x20;al., 2017</xref>). We choose the game theory as a decision-making method due to its simplicity and effectiveness in modeling most interactions between a group of participants and their reactions to each other&#x2019;s decisions. We specifically use the extensive form due to its sequential nature, which is suitable for HRC applications.</p>
</sec>
<sec id="s2-2">
<title>2.2&#x20;Decision-Making Strategies</title>
<p>The decision-making strategy is the policy of choosing actions based on the value of their reward calculated by the utility function (i.e.,&#x20;the reward function). We present the most used strategies for multi-criteria decision-making in HRC as well as some of their application areas. The following strategies are used intensively in deep learning and/or in Game theory (<xref ref-type="bibr" rid="B6">Conitzer and Sandholm, 2006</xref>; <xref ref-type="bibr" rid="B20">Leyton-Brown and Shoham, 2008</xref>):<list list-type="simple">
<list-item>
<p>&#x2022; Dominance: All the actions whose rewards are dominated by others are eliminated. Researchers used it to assess the human&#x2019;s confidence in a robot in <xref ref-type="bibr" rid="B34">Reinhardt et&#x20;al. (2017)</xref>.</p>
</list-item>
<list-item>
<p>&#x2022; Pareto optimality: An action profile is Pareto optimal if we cannot change it without penalizing at least one agent. It is used, for example, in disassembly and remanufacturing tasks (<xref ref-type="bibr" rid="B44">Xu et&#x20;al., 2020</xref>).</p>
</list-item>
<list-item>
<p>&#x2022; Nash Equilibrium (NE): Each agent responds to the others in the best possible way. The best response is the best actions an agent can choose whatever others have done. This is the main strategy used in Game theory. In <xref ref-type="bibr" rid="B2">Bansal et&#x20;al. (2020)</xref>, a NE strategy is used to ensure human safety in a nearby environment during a pick-and-place&#x20;task.</p>
</list-item>
<list-item>
<p>&#x2022; Stackelberg duopoly model: The agents make their decision sequentially: one agent (the leader) makes their decision first, and all other agents (followers) decide after. The optimal action of the leader will be the one that maximizes its own reward and minimizes the follower&#x2019;s rewards. This means that the leader has always the biggest reward. This strategy is used, for example, in a collaborative scenario between a human and a car to predict the driver&#x2019;s behavior in this specific scenario (<xref ref-type="bibr" rid="B21">Li et&#x20;al., 2017</xref>) such as the driver&#x2019;s steering behavior in response to a collision avoidance control (<xref ref-type="bibr" rid="B26">Na and Cole, 2014</xref>).</p>
</list-item>
</list>
</p>
</sec>
<sec id="s2-3">
<title>2.3 Performance Metrics</title>
<p>After the decision-making process is settled and used to perform a task by a human-robot collaborative team, other works tend to evaluate the performance of the collaboration using some performance metrics. On the one hand, some works focused on evaluating one specific metric, as done in <xref ref-type="bibr" rid="B16">Hoffman (2019)</xref>, where the author is evaluating several human-robot collaborative teams, performing different tasks, using the fluency metric. On the other hand, other works create a global framework to evaluate, in general, the HRC based on several metrics. In <xref ref-type="bibr" rid="B14">Gervasi et&#x20;al. (2020)</xref>, the authors developed a global framework to evaluate the HRC based on more than twenty performance metrics, among which the cognitive load and the physical ergonomics. <xref ref-type="table" rid="T1">Table&#x20;1</xref> presents the main metrics considered, in the state-of-the-art, to evaluate the optimality of the collaboration. We present in the <xref ref-type="sec" rid="s13">Supplementary Material</xref> a more detailed table that introduces more performance metrics and defines each metric according to its usage in different task&#x20;types.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Some metrics considered for the evaluation of HRC classified based on the task types (<xref ref-type="bibr" rid="B39">Steinfeld et&#x20;al., 2006</xref>; <xref ref-type="bibr" rid="B3">B&#xfc;tepage and Kragic, 2017</xref>; <xref ref-type="bibr" rid="B29">Nelles et&#x20;al., 2018</xref>).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Task</th>
<th align="center">Navigation</th>
<th align="center">Perception</th>
<th align="center">Management</th>
<th align="center">Manipulation</th>
<th align="center">Social</th>
<th align="center">Common metrics that can be used for all task types</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Performance metrics</td>
<td align="left">Failure rate, accuracy, ergonomy or posture, time to completion, and rapidity</td>
<td align="left">Velocity, accuracy, time to completion, effectiveness, and number of errors</td>
<td align="left">Time delivery, time request, number of human and robot errors, trust, cognitive load</td>
<td align="left">Positional accuracy and repeatability, velocity, dexterity, time to completion, and effort or force</td>
<td align="left">Persuasiveness, engagement in social characteristics, trust, and compliance</td>
<td align="left">Time to completion, number of human and robot errors, autonomy, cognitive load, and effectiveness</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2-4">
<title>2.4 Utility Functions</title>
<p>The utility is a reward calculated by the utility function to express the value of an action. Thanks to these utilities, the decision-making strategy can choose the right actions. Some previous works in the literature only considered task accomplishment (and no performance metrics) in their utility functions because their focus was on complex task accomplishment. For example, in <xref ref-type="bibr" rid="B30">Nikolaidis et&#x20;al. (2017a)</xref>, a human-robot collaborative team was carrying a table to move it from one room to another. The goal was to ensure mutual adaptation between the agents by having the human also adapt to the robot. In this type of work, none of the performance metrics in <xref ref-type="table" rid="T1">Table&#x20;1</xref> is considered.</p>
<p>More recent works include performance metrics (see <xref ref-type="table" rid="T1">Table&#x20;1</xref>). However, they considered that they are not changeable without significant changes in their framework. A relevant example is <xref ref-type="bibr" rid="B23">Liu et&#x20;al. (2018)</xref> where, by changing the task allocation, the authors make the robot respect the real-time duration of the assembly process while following the necessary order to assemble the parts. In this case, they considered one metric (the time to completion) since respecting the part&#x2019;s assembly order is a constraint to accomplish the task. However, this time metric cannot be replaced by another (e.g., effort or velocity) using this framework.</p>
</sec>
<sec id="s2-5">
<title>2.5 Contributions</title>
<p>Unlike the utility functions used in the state-of-the-art works, we take into account a changeable unrestricted number of performance metrics (from <xref ref-type="table" rid="T1">Table&#x20;1</xref>) that are usually optimized no matter how the human is behaving. To summarize our contributions, we propose a framework that allows us to:<list list-type="simple">
<list-item>
<p>&#x2022; easily change the performance metrics from one scenario to another without changing anything in our formalization except the part in the utility function related to the metrics,&#x20;and</p>
</list-item>
<list-item>
<p>&#x2022; improve the collaboration performance without having to change the robot&#x2019;s abilities.</p>
</list-item>
</list>
</p>
<p>In the following section, we define the problem formalization and present the utility function which optimizes the performance metrics and aims to accomplish the task as a constraint.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Formalization</title>
<p>A HRC<xref ref-type="fn" rid="FN2">
<sup>2</sup>
</xref> consists of a global environment {<bold>E</bold>} and a task <italic>T</italic>. The environment state <italic>E</italic>
<sup>
<italic>k</italic>
</sup> at each iteration <italic>k</italic> (with <italic>k</italic>&#x20;&#x2208; [1, <italic>k</italic>
<sub>
<italic>f</italic>
</sub>], where <italic>k</italic>
<sub>
<italic>f</italic>
</sub> is the final iteration of the task) comprises a group of <italic>n</italic> agents (humans and robots), each of them can carry out a finite set of actions (continuous or discrete). <italic>E</italic>
<sup>
<italic>k</italic>
</sup> changes according to the actions chosen by the agents. The global environment {<bold>E</bold>} is the set of changes in the environment state at each iteration.<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">E</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>Since the possible actions may change at each iteration, we define {<bold>A</bold>} as the global set of feasible actions for each iteration <italic>k</italic>: {<bold>A</bold>}<sup>
<italic>k</italic>
</sup>.<disp-formula id="e2">
<mml:math id="m2">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>The set {<bold>A</bold>}<sup>
<italic>k</italic>
</sup> contains a set of feasible actions for each agent <italic>i</italic> (with <italic>i</italic>&#x20;&#x2208; [1, <italic>n</italic>]) at iteration <italic>k</italic> denoted by <inline-formula id="inf1">
<mml:math id="m3">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>.<disp-formula id="e3">
<mml:math id="m4">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
<inline-formula id="inf2">
<mml:math id="m5">
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is the <italic>a</italic>
<sup>
<italic>th</italic>
</sup> feasible action of agent <italic>i</italic> where <italic>a</italic>&#x20;&#x2208; [1, <italic>l</italic>] and <italic>l</italic> is the number of feasible actions of the agent <italic>i</italic> at time <italic>k</italic>.<disp-formula id="e4">
<mml:math id="m6">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>At each iteration, an action profile <inline-formula id="inf3">
<mml:math id="m7">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> groups the actions chosen by each agent <italic>i</italic> denoted by <inline-formula id="inf4">
<mml:math id="m8">
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2282;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>.<disp-formula id="e5">
<mml:math id="m9">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>The optimal action profile <inline-formula id="inf5">
<mml:math id="m10">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> at iteration <italic>k</italic> is computed through the decision-making function <bold>d</bold>
<sub>
<italic>M</italic>,<italic>S</italic>
</sub> as presented in <xref ref-type="disp-formula" rid="e6">Eq. 6</xref>. <bold>d</bold>
<sub>
<italic>M</italic>,<italic>S</italic>
</sub> relies on the decision-making method <italic>M</italic>, the decision-making strategy <italic>S</italic>, and the utility profile <inline-formula id="inf6">
<mml:math id="m11">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> that contains all the utilities for all possible actions {<bold>A</bold>}<sup>
<italic>k</italic>
</sup> at iteration <italic>k</italic>. The decision-making method <italic>M</italic> takes into account the constraints related to the task such as the order in which the agents act, i.e.,&#x20;sequentially or simultaneously. The decision-making strategy <italic>S</italic> defines the way the agents must choose the actions according to their utilities contained in <inline-formula id="inf7">
<mml:math id="m12">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>, as presented in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>.<disp-formula id="e6">
<mml:math id="m13">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(6)</label>
</disp-formula>The utility profile <inline-formula id="inf8">
<mml:math id="m14">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is computed by the utility function <bold>f</bold>
<sub>
<italic>u</italic>
</sub> based on different sets including: 1) the set of performance metrics <inline-formula id="inf9">
<mml:math id="m15">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="double-struck">M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> (cf. <xref ref-type="table" rid="T1">Table&#x20;1</xref>), 2) the set of constraints {<bold>G</bold>} to be respected in order to make the task <italic>T</italic> progress for accomplishing it, 3) {<bold>R</bold>} the reward of each action in the profile action which is calculated according to the task and the metrics, and 4) {<bold>
<italic>&#x3f5;</italic>
</bold>} a set of weighting coefficients (between 0 and 1) used to determine the importance of each metric (e.g., favoring one metric over the others, especially when it is in opposition to others). We get:<disp-formula id="e7">
<mml:math id="m16">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="double-struck">M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">R</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">&#x3f5;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>Let us discuss how one can make changes to the different elements involved in <xref ref-type="disp-formula" rid="e7">Eq. 7</xref>. To only change the scenario of the collaboration by changing the performance metrics <inline-formula id="inf10">
<mml:math id="m17">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="double-struck">M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> in <bold>f</bold>
<sub>
<italic>u</italic>
</sub>, we first need to change the value of the metrics <inline-formula id="inf11">
<mml:math id="m18">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="double-struck">M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, and the value of the reward {<bold>R</bold>} of each action, and afterwards recalculate the utilities <inline-formula id="inf12">
<mml:math id="m19">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>. To only modify the agent&#x2019;s actions, the utilities <inline-formula id="inf13">
<mml:math id="m20">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> should be recalculated for the new actions. To change the task, we will need to modify the constraints {<bold>G</bold>} which define the task by setting the conditions that allow the agent to only choose among the actions which permit to make the task progress. It will then be necessary to recalculate the utilities <inline-formula id="inf14">
<mml:math id="m21">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>. We can, of course, combine several modifications (e.g., changing the performance metrics and the task) by making the appropriate adaptations (e.g., first modifying the metrics <inline-formula id="inf15">
<mml:math id="m22">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="double-struck">M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, the rewards {<bold>R</bold>}, and the task constraints {<bold>G</bold>}, and afterwards recalculating the utilities <inline-formula id="inf16">
<mml:math id="m23">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Block diagram of our formalization of the decision-making process used to calculate the optimal action profile <inline-formula id="inf17">
<mml:math id="m24">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> at iteration <italic>k</italic>.</p>
</caption>
<graphic xlink:href="frobt-08-736644-g002.tif"/>
</fig>
<p>To illustrate how <xref ref-type="disp-formula" rid="e6">Eqs 6</xref>, <xref ref-type="disp-formula" rid="e7">7</xref> can be settled, let us consider the example of a collaborative team composed of a human and a robot, each holding an edge of a gutter on which there is a ball (<xref ref-type="bibr" rid="B15">Ghadirzadeh et&#x20;al., 2016</xref>). Their goal is to position the ball, for instance, in the center of the gutter. Our solution using our formalization for such task can be as follows:<list list-type="simple">
<list-item>
<p>&#x2022;Agents: Agent 1 is the human, and agent 2 is the robot. Both agents are making decisions simultaneously.</p>
</list-item>
<list-item>
<p>&#x2022;Human actions <inline-formula id="inf18">
<mml:math id="m25">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>: They are the angles of inclination of the gutter. The actions are continuous. The set of human actions remains the same for all the iterations (<inline-formula id="inf19">
<mml:math id="m26">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>).</p>
</list-item>
<list-item>
<p>&#x2022;Robot actions <inline-formula id="inf20">
<mml:math id="m27">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>: They are angles of inclination of the gutter by the robot end-effector. The decision-making method will provide the correspondent joint values needed to reach the desired position by the end-effector. The actions are continuous (since it is a continuous control task). The set of robot actions remains the same for all iterations (<inline-formula id="inf21">
<mml:math id="m28">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>).</p>
</list-item>
<list-item>
<p>&#x2022;Constraints {<bold>G</bold>}: The angles of inclination should be between [&#x2212; 30&#xb0;, 30&#xb0;], other values will be penalized.</p>
</list-item>
<list-item>
<p>&#x2022;Performance metrics <inline-formula id="inf22">
<mml:math id="m29">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="double-struck">M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>: Time to completion and human posture. Human posture is measured by ISO standards that define some uncomfortable work postures (<xref ref-type="bibr" rid="B7">Delleman and Dul, 2007</xref>). These uncomfortable postures (or positions) will lead, for example, to find that when the human inclines the gutter with an angle out of the interval [&#x2212;20&#xb0;, 20&#xb0;], it is getting painful for&#x20;them.</p>
</list-item>
<list-item>
<p>&#x2022;Rewards {<bold>R</bold>}: They will be calculated by the following equation: &#x2212;&#x2016;&#x2009;<italic>C</italic>
<sub>
<italic>b</italic>
</sub> &#x2212; <italic>C</italic>
<sub>
<italic>g</italic>
</sub>&#x2009;&#x2016;&#x2009;&#x2217;&#x2009;<italic>&#x3bb;</italic>. Where <italic>C</italic>
<sub>
<italic>b</italic>
</sub> is the position of the center of the ball, <italic>C</italic>
<sub>
<italic>g</italic>
</sub> is the position of the center of the gutter (the desired position), and <italic>&#x3bb;</italic> is a fixed gain for a case scenario. <italic>&#x3bb;</italic> allows to privileged an action according to the performance metrics (<inline-formula id="inf23">
<mml:math id="m30">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="double-struck">M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>) and the constraints ({<bold>G</bold>}).</p>
</list-item>
<list-item>
<p>&#x2022;Weighting coefficients {<bold>
<italic>&#x3f5;</italic>
</bold>}: It is equal to 1 for both performance metrics.</p>
</list-item>
<list-item>
<p>&#x2022;Decision-making method <italic>M</italic>: It is the reinforcement learning process that is based on trial and error learning. The agent 2 (the robot which learns) in state <italic>s</italic> makes an action <italic>A</italic>
<sub>2,<italic>a</italic>
</sub> which changes the state to <italic>s</italic>&#x2032;. The observation the agent got from the environment describes the changes that happened by moving from state <italic>s</italic> to <italic>s</italic>&#x2032;. The reward (<italic>R</italic>(<italic>s</italic>, <italic>A</italic>
<sub>2,<italic>a</italic>
</sub>)) evaluates the taken action <italic>A</italic>
<sub>2,<italic>a</italic>
</sub> (which leads to the new state <italic>s</italic>&#x2032;) with respect to the desired learning goal. The state <italic>s</italic> is made up of <italic>C</italic>
<sub>
<italic>b</italic>
</sub>, <italic>C</italic>
<sub>
<italic>g</italic>
</sub>, and the position of the robot&#x2019;s end-effector. The learning procedure of all reinforcement learning algorithms consists of learning the value that is attributed to the state <italic>V</italic>(<italic>s</italic>) defined&#x20;below.</p>
</list-item>
<list-item>
<p>&#x2022;Decision-making strategy <italic>S</italic>: It is the dominance strategy. Once the <italic>V</italic>(<italic>s</italic>) are learned for all possible states, the optimal actions can be chosen. Most of the reinforcement learning algorithms are based on the Bellman equation for choosing the optimal actions (<xref ref-type="bibr" rid="B27">Nachum et&#x20;al., 2017</xref>):</p>
</list-item>
</list>
<disp-formula id="e8">
<mml:math id="m31">
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>
<italic>&#x3b3;</italic> is the discount factor that determines how much agent 2 cares about rewards in the distant future relative to those in the immediate future. <inline-formula id="inf24">
<mml:math id="m32">
<mml:munder>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
</mml:math>
</inline-formula> is the strategy <italic>S</italic> for choosing the action (i.e.,&#x20;dominance strategy).</p>
<p>The decision-making method manages the way agents act (simultaneously or sequentially) as well as the different types of actions (continuous or discrete). It is also necessary to ensure that the decision-making strategy can handle the nature of the actions (discrete or continuous) and how they are chosen (sequentially or simultaneously). As our framework allows us to easily change the decision-making method and strategy, we just have to select them according to the nature of the actions and how they are chosen. <xref ref-type="fig" rid="F2">Figure&#x20;2</xref> summarizes our formalization of the decision-making process using a block diagram. In <xref ref-type="sec" rid="s4">Section 4</xref>, we explain the selected decision-making method and strategy in our experiments as well as the performance metrics that can be taken into consideration.</p>
</sec>
<sec id="s4">
<title>4 Approach</title>
<p>To illustrate our contributions, we define a constant decision-making method <italic>M</italic> and strategy <italic>S</italic>. We assume as decision-making method the Perfect-Information Extensive Form (PIEF) of the game theory (environment and actions are known) in which the full flow of the game is displayed in the form of a tree. Using Nash Equilibrium as the strategy of the decision-making process ensures optimality regarding the choice of the actions, which is what we seek to guarantee.</p>
<sec id="s4-1">
<title>4.1&#x20;Perfect-Information Extensive Form</title>
<p>As decision-making method <italic>M</italic> in <xref ref-type="disp-formula" rid="e6">Eq. 6</xref> we used the Perfect-Information Extensive Form (PIEF). Using this method, the agent has all the information about the actions and decisions of other agents and the environment. A game (or task or application) in PIEF in game theory is represented mathematically by the tuple <inline-formula id="inf25">
<mml:math id="m33">
<mml:mi>T</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">N</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">H</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">Z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3c7;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3c1;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold-italic">&#x3c3;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B20">Leyton-Brown and Shoham, 2008</xref>), with:<list list-type="simple">
<list-item>
<p>&#x2022; <italic>T</italic> represents the game (i.e.,&#x20;the task) as a tree (graph) structure.</p>
</list-item>
<list-item>
<p>&#x2022; {<bold>N</bold>} is a set of <italic>n</italic> agents.</p>
</list-item>
<list-item>
<p>&#x2022; {<bold>A</bold>} is a set of actions of all agents for all iterations.</p>
</list-item>
<list-item>
<p>&#x2022; {<bold>H</bold>} is a set of non-terminal choice nodes. A non-terminal choice node represents an agent that chooses the actions to perform.</p>
</list-item>
<list-item>
<p>&#x2022; {<bold>Z</bold>} is a set of terminal choice nodes; disjoint from {<bold>H</bold>}. A terminal choice node represents the utility values attributed to the actions <inline-formula id="inf26">
<mml:math id="m34">
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> each agent <italic>i</italic> chose in an alternative (i.e.,&#x20;a branch of the tree).</p>
</list-item>
<list-item>
<p>&#x2022; <bold>
<italic>&#x3c7;</italic>
</bold>: {<bold>H</bold>}&#x21a6;{<bold>A</bold>}<sub>
<italic>@H</italic>
</sub> is the action function, which assigns to each choice node <italic>H</italic> a set of possible actions {<bold>A</bold>}<sub>
<italic>@H</italic>
</sub>.</p>
</list-item>
<list-item>
<p>&#x2022; <bold>
<italic>&#x3c1;</italic>
</bold>: {<bold>H</bold>}&#x21a6;{<bold>N</bold>} is the agent function, which assigns to each non-terminal choice node an agent <italic>i</italic>&#x20;&#x2208; {<bold>N</bold>} who chooses an action in that&#x20;node.</p>
</list-item>
<list-item>
<p>&#x2022; <bold>
<italic>&#x3c3;</italic>
</bold>: {<bold>H</bold>}&#x2009;&#xd7;&#x2009;{<bold>A</bold>}&#x21a6;{<bold>H</bold>} &#x222a; {<bold>Z</bold>} is the successor function, which maps a choice node and an action to a new choice node or terminal&#x20;node.</p>
</list-item>
<list-item>
<p>&#x2022; <inline-formula id="inf27">
<mml:math id="m35">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is the global utility profile for all iterations.</p>
</list-item>
</list>
</p>
<p>We apply this structure to represent the task in the following sections. In our case, since the number of nodes is small, <bold>
<italic>&#x3c7;</italic>
</bold>, <bold>
<italic>&#x3c1;</italic>
</bold>, and <bold>
<italic>&#x3c3;</italic>
</bold> are straightforward functions (cf. <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>).</p>
<p>From a high-level perspective, a perfect-information game in extensive form is simply a tree (e.g., <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>) which consists of:<list list-type="simple">
<list-item>
<p>&#x2022; Non-terminal nodes (squares): each square represents an agent that will choose actions.</p>
</list-item>
<list-item>
<p>&#x2022; Arrows: each one represents a possible action (there are as many arrows as available actions <inline-formula id="inf28">
<mml:math id="m36">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> for agent <italic>i</italic> at iteration&#x20;<italic>k</italic>).</p>
</list-item>
<list-item>
<p>&#x2022; Terminal nodes (ellipses): each ellipse represents the utilities calculated for each action chosen by each agent in an alternative (i.e.,&#x20;a branch of the tree).</p>
</list-item>
</list>
</p>
<p>Note that this kind of tree is made for all the possible alternatives (considering all the actions an agent might choose) even if some of them will never happen (the agent will never choose some of the available actions). In this way, the tree represents all possible reactions of each agent to any alternative chosen by the others, even if, in the end, only one of these alternatives will really happen.</p>
</sec>
<sec id="s4-2">
<title>4.2 Subgame Perfect Nash Equilibrium</title>
<p>As decision-making method <italic>S</italic> in <xref ref-type="disp-formula" rid="e6">Eq. 6</xref> we used Nash Equilibrium (NE). The game <italic>T</italic> can be divided into subgames <italic>T</italic>
<sup>
<italic>k</italic>
</sup> at each iteration. In game theory (<xref ref-type="bibr" rid="B20">Leyton-Brown and Shoham, 2008</xref>), we consider a subgame of <italic>T</italic> (in PIEF game) rooted at node <italic>H</italic> as the restriction of <italic>T</italic> to the descendants of <italic>H</italic>. A Subgame Perfect Nash Equilibrium (SPNE) of <italic>T</italic> is all action profiles <inline-formula id="inf29">
<mml:math id="m37">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> such that for any subgame <italic>T</italic>
<sup>
<italic>k</italic>
</sup> of <italic>T</italic>, the restriction of <inline-formula id="inf30">
<mml:math id="m38">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> to <italic>T</italic>
<sup>
<italic>k</italic>
</sup> is a Nash Equilibrium of&#x20;<italic>T</italic>
<sup>
<italic>k</italic>
</sup>.</p>
<p>Nash Equilibrium in pure strategy (game theory) at iteration <italic>k</italic> is reached when each agent <italic>i</italic> best responds to the others (denoted by &#x2212; <italic>i</italic>). The Best Response (BR) at <italic>k</italic> is defined mathematically as:<disp-formula id="e9">
<mml:math id="m39">
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>B</mml:mi>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;iff&#x2009;</mml:mtext>
<mml:mo>&#x2200;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2265;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
<label>(9)</label>
</disp-formula>Hence, NE will ultimately be expressed as follows: <inline-formula id="inf31">
<mml:math id="m40">
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mo>&#x20d7;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is an optimal profile of actions following Nash&#x2019;s equilibrium in pure strategy iff <inline-formula id="inf32">
<mml:math id="m41">
<mml:mo>&#x2200;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>B</mml:mi>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>From a high-level perspective, to ensure that the actions chosen by one agent are following the NE strategy, it is enough to verify that each agent chooses the actions that have the maximum possible utilities.</p>
</sec>
<sec id="s4-3">
<title>4.3 Performance Metrics</title>
<p>As long as a metric can be formulated mathematically or at least can be measured during the execution of the task and expressed as a condition to calculate the task reward, it can be considered in choosing the actions through the performance metrics <inline-formula id="inf33">
<mml:math id="m42">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="double-struck">M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> (some examples are given in <xref ref-type="table" rid="T1">Table&#x20;1</xref>). In the next section, we present the tests conducted, and we mention the chosen performance metrics for each scenario.</p>
</sec>
</sec>
<sec id="s5">
<title>5 Experiments</title>
<p>We conduct real and simulated tests to prove the effectiveness of our formalization. We test three different utility function case scenarios in which the reward values change according to the chosen performance metrics. In the state-of-the-art case scenario, no metric is optimized. In the real experimental tests, the time to completion metric is optimized. In the simulated tests, we optimize the time to completion by considering the probability of human errors and the time each agent takes to make an action.</p>
<sec id="s5-1">
<title>5.1 The Task</title>
<p>We chose to solve Camelot Jr. as a task. To successfully complete this task, all the cubes must be positioned correctly to build a path between the two figures. We have divided the task completion process into iterations during which each agent chooses an action sequentially.</p>
<sec id="s5-1-1">
<title>5.1.1 Experiments Context</title>
<p>We make the collaborative team ({<bold>N</bold>}), composed of a human (<italic>h</italic>) and the humanoid robot Nao (<italic>r</italic>), do a task (<italic>T</italic>) that consists of building puzzles (cf. <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>). Nao is much slower than the human (<inline-formula id="inf34">
<mml:math id="m43">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>) in doing physical tasks (e.g., pick-and-place tasks), and we want to minimize the total task time (<inline-formula id="inf35">
<mml:math id="m44">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="double-struck">M</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>). This slowness depends on the nature of the robot itself (its motor capacity combined with the use of its camera) and the complexity of the puzzle. For the robot, the puzzle is more complex as the number of cubes to assemble increases. It is quite different for the human the complexity depends on their &#x201c;intelligence&#x201d; which means that the puzzle is easier as the human is &#x201c;intelligent&#x201d;. By &#x201c;intelligent&#x201d;, we mean that the human can discover rapidly and without making mistakes where the correct position of each cube&#x20;is.</p>
<p>The advantage of collaborating with the robot is that it knows the solution to the construction task. Therefore, the robot is always performing well, even if it is slower than the human. The human agent, however, can make mistakes. The human begins to play, and then, it is the robot&#x2019;s turn. The robot will correct the human&#x2019;s move if this move is wrong. The changes in the robot&#x2019;s decision-making between the three case scenarios, including all the details we will present in the following sections, are shown in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>. The implementation procedure and computation times for the conducted experiments are presented in the <xref ref-type="sec" rid="s13">Supplementary Material</xref>.</p>
</sec>
<sec id="s5-1-2">
<title>5.1.2 Assumptions</title>
<p>To illustrate the contributions of this paper, we consider the following assumptions:<list list-type="simple">
<list-item>
<p>&#x2022; The task is always achievable. We solve the task while optimizing the performance metrics through the utility function. The optimization of the metrics does not have an impact on the solvability of the&#x20;task.</p>
</list-item>
<list-item>
<p>&#x2022; We limit the number of agents to two: a human (<italic>h</italic>) and a robot (<italic>r</italic>). Hence, {<bold>N</bold>} &#x3d; {<italic>h</italic>, <italic>r</italic>} &#x21d2; <italic>n</italic>&#x20;&#x3d;&#x20;2.</p>
</list-item>
<list-item>
<p>&#x2022; We limit agents to choose only one discrete action per iteration (i.e.,&#x20;<inline-formula id="inf36">
<mml:math id="m45">
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>) and to maximize only one metric (time to completion) in the real experiment and two metrics (time to completion by considering the probability of human errors) in the simulated experiments.</p>
</list-item>
<list-item>
<p>&#x2022; The task is performed sequentially through iterations. An iteration <italic>k</italic> includes the human making an action, then the robot reacting.</p>
</list-item>
<list-item>
<p>&#x2022; The agent set of actions and the time the agent takes to make an action are invariable by iteration.</p>
</list-item>
</list>
</p>
</sec>
<sec id="s5-1-3">
<title>5.1.3 The Actions</title>
<p>The set of human actions <xref ref-type="disp-formula" rid="e10">Eq. 10</xref> and the set of robot actions <xref ref-type="disp-formula" rid="e11">Eq. 11</xref> are the same at every iteration, and each one of them consists of three actions:<list list-type="simple">
<list-item>
<p>&#x2022; <italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub> &#x2261; <italic>A</italic>
<sub>
<italic>r</italic>,<italic>g</italic>
</sub>: perform the good action (i.e.,&#x20;grasp a free cube and release it at the right place).</p>
</list-item>
<list-item>
<p>&#x2022; <italic>A</italic>
<sub>
<italic>h</italic>,<italic>w</italic>
</sub> &#x2261; <italic>A</italic>
<sub>
<italic>r</italic>,<italic>w</italic>
</sub>: wait (i.e.,&#x20;the agent does nothing and passes its turn).</p>
</list-item>
<list-item>
<p>&#x2022; <italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub>: perform the bad action (i.e.,&#x20;the human makes an error: grasping a free cube and releasing it at the wrong place).</p>
</list-item>
<list-item>
<p>&#x2022; <italic>A</italic>
<sub>
<italic>r</italic>,<italic>c</italic>
</sub>: correct a bad action (i.e.,&#x20;the robot removes the cube from the wrong place).</p>
</list-item>
</list>
<disp-formula id="e10">
<mml:math id="m46">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>
<disp-formula id="e11">
<mml:math id="m47">
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
<label>(11)</label>
</disp-formula>
</p>
</sec>
<sec id="s5-1-4">
<title>5.1.4 Utility Calculation</title>
<p>The following equation is the adaptation of <xref ref-type="disp-formula" rid="e7">Eq. 7</xref> to the current task. So, the utility of every available action <italic>a</italic> for each agent <italic>i</italic> is calculated as follows:<disp-formula id="e12">
<mml:math id="m48">
<mml:msubsup>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#xd7;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#xd7;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mi>t</mml:mi>
</mml:math>
<label>(12)</label>
</disp-formula>with:<list list-type="simple">
<list-item>
<p>&#x2022; <inline-formula id="inf37">
<mml:math id="m49">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>: the duration of action <italic>a</italic> of agent&#x20;<italic>i</italic>,</p>
</list-item>
<list-item>
<p>&#x2022; <italic>t</italic>: the total time for an iteration (<inline-formula id="inf38">
<mml:math id="m50">
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>, here having <italic>n</italic>&#x20;&#x3d; 2, therefore <inline-formula id="inf39">
<mml:math id="m51">
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>),</p>
</list-item>
<list-item>
<p>&#x2022; <inline-formula id="inf40">
<mml:math id="m52">
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>: the constraint that ensures the task progression by penalizing the actions which make the task regress (cf. <xref ref-type="table" rid="T2">Table&#x20;2</xref>),</p>
</list-item>
<list-item>
<p>&#x2022; and <inline-formula id="inf41">
<mml:math id="m53">
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>: the reward of action <italic>a</italic> of agent&#x20;<italic>i</italic>.</p>
</list-item>
</list>
</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>The value of the constraint of the task accomplishment for each action: making the task progress (<inline-formula id="inf42">
<mml:math id="m54">
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>), making no progression (<inline-formula id="inf43">
<mml:math id="m55">
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:math>
</inline-formula>), and making the task regress (<inline-formula id="inf44">
<mml:math id="m56">
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Action</th>
<th align="center">
<inline-formula id="inf45">
<mml:math id="m57">
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
</th>
<th align="center">Task progress</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3" align="left">Human</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub>
</td>
<td align="left">1</td>
<td align="left">Progression</td>
</tr>
<tr>
<td align="center">
<italic>A</italic>
<sub>
<italic>h</italic>,<italic>w</italic>
</sub>
</td>
<td align="left">0</td>
<td align="left">No progression</td>
</tr>
<tr>
<td align="center">
<italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub>
</td>
<td align="left">&#x2212; 1</td>
<td align="left">Regression</td>
</tr>
<tr>
<td rowspan="3" align="left">Robot</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>r</italic>,<italic>g</italic>
</sub>
</td>
<td align="left">1 if <italic>A</italic>
<sub>
<italic>h</italic>
</sub> &#x2260; <italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub> or &#x2212; 1 otherwise</td>
<td align="left">Progression</td>
</tr>
<tr>
<td align="center">
<italic>A</italic>
<sub>
<italic>r</italic>,<italic>w</italic>
</sub>
</td>
<td align="left">0</td>
<td align="left">No progression</td>
</tr>
<tr>
<td align="center">
<italic>A</italic>
<sub>
<italic>r</italic>,<italic>c</italic>
</sub>
</td>
<td align="left">1 if <italic>A</italic>
<sub>
<italic>h</italic>
</sub> &#x3d; <italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub> or &#x2212; 1 otherwise</td>
<td align="left">Progression</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s5-1-5">
<title>5.1.5 Strategy of Action&#x2019;s Choice</title>
<p>In our formalization, we mentioned that the agents are choosing Nash Equilibrium (NE) as the decision-making strategy. But since the behavior (the decision-making strategy) of each human is different from one to another, we cannot claim that they will follow the NE for choosing their actions. For the robot, however, we restrict it to choose the actions by using the NE strategy. That is why the robot is choosing the action with the highest utility knowing the one chosen by the human. Note that, in our case scenarios, the robot reacts to the human&#x2019;s action since they are doing the task sequentially, and the human starts.</p>
</sec>
</sec>
<sec id="s5-2">
<title>5.2 State-of-the-Art Utility Function</title>
<p>In state-of-the-art techniques, there is no optimization of the task. This is equivalent to always consider: <inline-formula id="inf46">
<mml:math id="m58">
<mml:mo>&#x2200;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mspace width="0.3333em" class="nbsp"/>
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula> in our approach (in <xref ref-type="disp-formula" rid="e12">Eq. 12</xref>). For each iteration (each agent chooses an action with a utility), we can represent the task with the tree structure of <xref ref-type="fig" rid="F4">Figure&#x20;4A</xref>. We will refer in the rest of the article to this case scenario by using&#x20;<italic>C</italic>
<sub>1</sub>.</p>
<p>In this case, using NE, the robot&#x2019;s reaction to the human action will be as follows: <italic>A</italic>
<sub>
<italic>r</italic>,<italic>g</italic>
</sub> if the human chose <italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub> or <italic>A</italic>
<sub>
<italic>h</italic>,<italic>w</italic>
</sub>, and <italic>A</italic>
<sub>
<italic>r</italic>,<italic>c</italic>
</sub> if the human chose&#x20;<italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub>.</p>
</sec>
<sec id="s5-3">
<title>5.3 Real Experiments with Time Metric</title>
<p>We conducted tests<xref ref-type="fn" rid="FN3">
<sup>3</sup>
</xref> with a group of 20 volunteers. The objectives were to prove that the framework is applicable for a real task and to check human adaptation to the&#x20;robot.</p>
<sec id="s5-3-1">
<title>5.3.1 Experiment Procedure</title>
<p>After explaining the game rules to the participants, we asked them to complete two puzzles to make sure they understood the gameplay. Afterward, we asked each participant to complete three puzzles, chosen randomly among five, by collaborating with the Nao Robot.</p>
<p>The participant began the game. Then, it was Nao&#x2019;s turn. It continued until the puzzle was done. At each time, the participant had 20&#xa0;s (<inline-formula id="inf47">
<mml:math id="m59">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>) to make an action or to decide to skip their turn. Nao takes on average 60&#xa0;s <inline-formula id="inf48">
<mml:math id="m60">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> to do an action. It was skipping its turn when humans were well-doing and correcting them when they made an error. Nao did not move the cubes on its own (for human safety), but it was showing and telling the human which cube should be moved and where by pointing it. <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> illustrates, as an example, the steps of solving puzzle two by a participant and&#x20;Nao.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Example of the solving steps of the puzzle two by a participant and Nao. <bold>(A)</bold> The human puts a cube in a wrong position. <bold>(B)</bold> Nao asks him to remove that cube. <bold>(C)</bold> The human puts a cube in a correct position, then the robot does nothing. <bold>(D)</bold> The human puts another cube in a correct position and the puzzle is solved.</p>
</caption>
<graphic xlink:href="frobt-08-736644-g003.tif"/>
</fig>
</sec>
<sec id="s5-3-2">
<title>5.3.2 Utility Function for Optimizing the Time</title>
<p>The reward values <xref ref-type="disp-formula" rid="e13">Eq. 13</xref> in the utility function <xref ref-type="disp-formula" rid="e12">Eq. 12</xref> ensure to maximize the time metric by penalising the action taken by the robot (the slower agent, i.e.,&#x20;<inline-formula id="inf49">
<mml:math id="m61">
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>) if the human (the faster agent denoted by <italic>i</italic>&#x2032;) chooses the correct action (denoted by <italic>a</italic>&#x2032;). This penalization will prevent the robot from interfering with the human actions if the human makes the right decision:<disp-formula id="e13">
<mml:math id="m62">
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if</mml:mtext>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mtext>&#x2009;and&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mtext>&#x2009;and&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mn>1</mml:mn>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>otherwise</mml:mtext>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(13)</label>
</disp-formula>Thus, for each iteration, we can represent the task with the tree structure of <xref ref-type="fig" rid="F4">Figure&#x20;4B</xref>. We will refer in the rest of the article to this case scenario using <italic>C</italic>
<sub>2</sub>. In this case, using NE, the robot&#x2019;s reaction to the human action will be as follows: <italic>A</italic>
<sub>
<italic>r</italic>,<italic>w</italic>
</sub> if the human chose <italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub>, <italic>A</italic>
<sub>
<italic>r</italic>,<italic>g</italic>
</sub> if the human chose <italic>A</italic>
<sub>
<italic>h</italic>,<italic>w</italic>
</sub>, and <italic>A</italic>
<sub>
<italic>r</italic>,<italic>c</italic>
</sub> if the human chose&#x20;<italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub>.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Tree representation of the task based on the utility function in <italic>C</italic>
<sub>1</sub> and <italic>C</italic>
<sub>2</sub>. Notice that the difference between both figures is the utility value of the action <italic>A</italic>
<sub>
<italic>r</italic>,<italic>g</italic>
</sub>, of the robot (1.33 and &#x2212;1.33). It is because <italic>C</italic>
<sub>1</sub> (on the contrary of <italic>C</italic>
<sub>2</sub>) does not minimize the time, so the robot continues to make an action even if the robot is slower than the well-performing human. <bold>(A)</bold> This tree is obtained by simulating an iteration of the task without optimization (<italic>C</italic>
<sub>1</sub>). The utilities (first for human agent and second for robot in green ellipses) are calculated for <inline-formula id="inf50">
<mml:math id="m63">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>20</mml:mn>
<mml:mspace width="0.3333em" class="nbsp"/>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>60</mml:mn>
<mml:mspace width="0.3333em" class="nbsp"/>
<mml:mi>s</mml:mi>
</mml:math>
</inline-formula> and <italic>t</italic>&#x20;&#x3d; 80&#x20;s. <bold>(B)</bold> This tree is obtained by simulating an iteration of the task optimized by the time metric (<italic>C</italic>
<sub>2</sub>). The utilities (first for human agent and second for robot in green ellipses) are calculated for <inline-formula id="inf51">
<mml:math id="m64">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>20</mml:mn>
<mml:mspace width="0.3333em" class="nbsp"/>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>60</mml:mn>
<mml:mspace width="0.3333em" class="nbsp"/>
<mml:mi>s</mml:mi>
</mml:math>
</inline-formula> and <italic>t</italic>&#x20;&#x3d; 80&#x20;s.</p>
</caption>
<graphic xlink:href="frobt-08-736644-g004.tif"/>
</fig>
</sec>
<sec id="s5-3-3">
<title>5.3.3 Results</title>
<p>Experiments with humans (presented in <xref ref-type="sec" rid="s5-3-1">Section 5.3.1</xref>) were those where the robot used the utility function optimizing the time metric (case 2&#x20;(<italic>C</italic>
<sub>2</sub>)). It was very difficult to have enough participants to also test the case where the robot does not optimize any metric (the state-of-the-art case (<italic>C</italic>
<sub>1</sub>)). The only change in the procedure of the experiments using <italic>C</italic>
<sub>1</sub> will be that even if the human is well-doing, the robot will not pass its turn (<italic>A</italic>
<sub>
<italic>r</italic>,<italic>w</italic>
</sub>) but will perform the good action (<italic>A</italic>
<sub>
<italic>r</italic>,<italic>g</italic>
</sub>). Hence, to compare the achieved results of our technique and the state-of-the-art techniques, we assumed that human actions remain the same in the case <italic>C</italic>
<sub>1</sub> as in the case <italic>C</italic>
<sub>2</sub>, and we merely changed the robot reactions.</p>
<p>We chose to keep human actions unchanged between the two cases to ensure that only the switching of the utility function (<italic>C</italic>
<sub>2</sub> to <italic>C</italic>
<sub>1</sub>) affects the robot reaction and not the influence of human behavior. <xref ref-type="table" rid="T3">Table&#x20;3</xref> provides an example of a scenario for solving puzzle two with <italic>C</italic>
<sub>2</sub> and <italic>C</italic>
<sub>1</sub> (<xref ref-type="fig" rid="F3">Figure&#x20;3</xref>). We also calculated in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref> the average time and the standard deviation of the measured times among the experiments (<italic>C</italic>
<sub>2</sub>) and the deducted times&#x20;(<italic>C</italic>
<sub>1</sub>).</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>The adaptation of time calculation from <italic>C</italic>
<sub>2</sub> to <italic>C</italic>
<sub>1</sub> for the resolution of one scenario of puzzle&#x20;two.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="left"/>
<th align="center">Iteration 1</th>
<th align="center">Iteration 2</th>
<th align="center">Iteration 3</th>
<th align="center">Total time (s)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3" align="left">
<italic>C</italic>
<sub>1</sub>
</td>
<td align="left">Human actions</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub> (20s)</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub> (20s)</td>
<td align="left"/>
<td rowspan="3" align="center">160</td>
</tr>
<tr>
<td align="left">Robot reactions</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>r</italic>,<italic>c</italic>
</sub> (60s)</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>r</italic>,<italic>g</italic>
</sub> (60s)</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Iteration time</td>
<td align="center">80&#xa0;s</td>
<td align="center">80&#xa0;s</td>
<td align="left"/>
</tr>
<tr>
<td rowspan="3" align="left">
<italic>C</italic>
<sub>2</sub>
</td>
<td align="left">Human actions</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub> (20s)</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub> (20s)</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub> (20s)</td>
<td rowspan="3" align="center">140</td>
</tr>
<tr>
<td align="left">Robot reactions</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>r</italic>,<italic>c</italic>
</sub> (60s)</td>
<td align="center">
<italic>A</italic>
<sub>
<italic>r</italic>,<italic>w</italic>
</sub> (20s)</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Iteration time</td>
<td align="center">80s</td>
<td align="center">40s</td>
<td align="center">20s</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>The average time and the standard deviation in seconds of the time taken to do the task with the state-of-the-art utility function (<italic>C</italic>
<sub>1</sub>) and the utility function used to optimize the time (<italic>C</italic>
<sub>2</sub>), which is our contribution.</p>
</caption>
<graphic xlink:href="frobt-08-736644-g005.tif"/>
</fig>
<p>In <italic>C</italic>
<sub>2</sub>, we assumed that if the human does the good action once, they will continue to do it each time. We notice from <xref ref-type="fig" rid="F5">Figure&#x20;5</xref> that <italic>C</italic>
<sub>1</sub> works better when the human is not &#x201c;intelligent&#x201d;, i.e.,&#x20;they make lots of errors. That is why, the standard deviation values using <italic>C</italic>
<sub>2</sub> are bigger than using <italic>C</italic>
<sub>1</sub>. This is the case for the last three puzzles where the average time using <italic>C</italic>
<sub>2</sub> is bigger than using <italic>C</italic>
<sub>1</sub>. For the first puzzle, however, the average time using <italic>C</italic>
<sub>2</sub> is smaller than using <italic>C</italic>
<sub>1</sub>, but the standard deviation values using <italic>C</italic>
<sub>2</sub> are bigger than using <italic>C</italic>
<sub>1</sub>. The standard deviation values of this puzzle (using <italic>C</italic>
<sub>1</sub> and <italic>C</italic>
<sub>2</sub>) are the biggest ones among all puzzles presented in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>. Having big standard deviation values means that this puzzle was harder to solve for some participants and easier for others. That is why the average time and the standard deviation values using <italic>C</italic>
<sub>2</sub> and <italic>C</italic>
<sub>1</sub> do not have the same&#x20;trend.</p>
<p>On the contrary, <italic>C</italic>
<sub>2</sub> performs better when the human is &#x201c;intelligent&#x201d;. Therefore, the time taken to accomplish the task depends on human &#x201c;intelligence&#x201d; that is related to the probability of human errors and the ratio between the time each agent takes to do an action. Without taking into account these two additional metrics, we cannot optimally ensure to minimize the time to completion when the human makes many mistakes.</p>
<p>In the next case (<italic>C</italic>
<sub>3</sub>), we present a third utility function that takes into account the time taken by the agents to make an action and optimizes the time to completion by encouraging the human agent to reduce the number of errors. Each metric has the same weight <italic>&#x3f5;</italic> &#x3d; 1 (<xref ref-type="disp-formula" rid="e7">Eq. 7</xref>) since all these metrics are compatible. It means that optimizing one metric depends on optimizing the others.</p>
</sec>
</sec>
<sec id="s5-4">
<title>5.4 Simulated Experiments with Time and Number of Human Errors Metrics</title>
<p>We use case (<italic>C</italic>
<sub>3</sub>) to prove that our framework can handle the changes in the performance metrics from one case scenario to another. In this case (<italic>C</italic>
<sub>3</sub>), we select between <italic>C</italic>
<sub>1</sub> and <italic>C</italic>
<sub>2</sub> the case that minimizes the total time by considering the probability of human errors and the ratio between the time each agent takes to make an action. The difference between <italic>C</italic>
<sub>1</sub> and <italic>C</italic>
<sub>2</sub> lies in the robot reaction when the human agent makes the good action (<italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub>). With <italic>C</italic>
<sub>1</sub>, the robot makes the good action (<italic>A</italic>
<sub>
<italic>r</italic>,<italic>g</italic>
</sub>), while with <italic>C</italic>
<sub>2</sub>, the robot decides to wait (<italic>A</italic>
<sub>
<italic>r</italic>,<italic>w</italic>
</sub>), to not slow down the human. <xref ref-type="fig" rid="F6">Figure&#x20;6</xref> presents an algorithmic block diagram showing which case the robot will choose to make an action.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>
<italic>C</italic>
<sub>3</sub> algorithm block diagram.</p>
</caption>
<graphic xlink:href="frobt-08-736644-g006.tif"/>
</fig>
<sec id="s5-4-1">
<title>5.4.1 Assumptions on Humans</title>
<p>We did not have enough participants to do real tests so we chose to do simulated tests. For this, we simulated the human decision process as a probability distribution among the set of feasible actions such that: <italic>P</italic>(<italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub>) &#x3d; <italic>I</italic>
<sub>1</sub>, <italic>P</italic>(<italic>A</italic>
<sub>
<italic>h</italic>,<italic>w</italic>
</sub>) &#x3d; <italic>I</italic>
<sub>2</sub>, and <italic>P</italic>(<italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub>) &#x3d; <italic>I</italic>
<sub>3</sub> &#x3d; 1&#x20;&#x2212; (<italic>I</italic>
<sub>1</sub> &#x2b; <italic>I</italic>
<sub>2</sub>). <italic>I</italic>
<sub>1</sub>, <italic>I</italic>
<sub>2</sub>, and <italic>I</italic>
<sub>3</sub> are variable from one participant to another and 0 &#x3c; <italic>I</italic>
<sub>1</sub> &#x2b; <italic>I</italic>
<sub>2</sub> &#x2264;&#x20;1.</p>
</sec>
<sec id="s5-4-2">
<title>5.4.2 Utility Function for Optimizing the Time to Completion While Considering the Probability of Human Errors</title>
<p>Compares to <xref ref-type="disp-formula" rid="e12">Eq. 12</xref>, only the reward values (<inline-formula id="inf52">
<mml:math id="m65">
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>) change. The reward values of the utility function for <italic>C</italic>
<sub>3</sub> are calculated by the following function:<disp-formula id="e14">
<mml:math id="m66">
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="matrix">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mtd>
<mml:mtd columnalign="center">
<mml:mtext>if</mml:mtext>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mtext>&#x2009;and&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mtext>&#x2009;and&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mn>1</mml:mn>
</mml:mtd>
<mml:mtd columnalign="center">
<mml:mtext>otherwise</mml:mtext>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(14)</label>
</disp-formula>Where <inline-formula id="inf53">
<mml:math id="m67">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> decides which case (1 or 2) is the best to optimize the total time (cf. <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>) and thus reduce the number of human errors. So, if <inline-formula id="inf54">
<mml:math id="m68">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is true, <italic>C</italic>
<sub>2</sub> will be faster than <italic>C</italic>
<sub>1</sub>, and vice versa. <italic>t</italic>
<sub>
<italic>C</italic>
</sub> is the generic equation for calculating time payoffs <inline-formula id="inf55">
<mml:math id="m69">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> and <inline-formula id="inf56">
<mml:math id="m70">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> <xref ref-type="disp-formula" rid="e15">Eq. 15</xref>. It considers the probability that the human agent will perform each feasible action (<italic>P</italic>(<italic>A</italic>
<sub>
<italic>i</italic>&#x2032;</sub>) &#x3d; probability distribution of human actions) which we assume as known and the time that the agents will take to make an action. <inline-formula id="inf57">
<mml:math id="m71">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is the time required for the other agent <italic>i</italic>&#x2032; (i.e.,&#x20;human) to make the chosen action <italic>a</italic>&#x2032; and <inline-formula id="inf58">
<mml:math id="m72">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is the time taken by the agent <italic>i</italic> (i.e.,&#x20;robot) to react by making the action <italic>a</italic>. <italic>N</italic>
<sub>
<italic>c</italic>
</sub> is the number of cubes correctly placed by taking actions <italic>a</italic> and <italic>a</italic>&#x2032;.</p>
<p>
<italic>C</italic>
<sub>2</sub> did not work well because it was assuming that if the human does the good action once, they will continue to do it each time. That is why in <xref ref-type="disp-formula" rid="e13">Eq. 13</xref> the comparison of the times (<inline-formula id="inf59">
<mml:math id="m73">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>) was not including the probability of the human actions (including probability of making errors). In case the human often performs the bad action (e.g., <italic>I</italic>
<sub>3</sub> &#x2265; 0.6), the robot is encouraged not to wait but to perform the good action (<italic>C</italic>
<sub>1</sub>), despite its slowness. This is done to reduce the number of iterations and thus reduce the number of times the human will make a mistake, as they will have fewer turns to play (i.e.,&#x20;reducing the number of human errors). That is why in <italic>C</italic>
<sub>3</sub>, we consider the probability distribution of human actions, including that of doing the bad action (committing errors) while calculating <italic>t</italic>
<sub>
<italic>c</italic>
</sub> (cf. <xref ref-type="disp-formula" rid="e14">Eq. 14</xref>). The robot chooses <italic>C</italic>
<sub>1</sub> if the human will make many errors and <italic>C</italic>
<sub>2</sub> in the opposite case.<disp-formula id="e15">
<mml:math id="m74">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(15)</label>
</disp-formula>
</p>
</sec>
<sec id="s5-4-3">
<title>5.4.3 Simulation Conditions</title>
<p>A simulated test depends on:<list list-type="simple">
<list-item>
<p>&#x2022; The values of <italic>I</italic>
<sub>1</sub> and <italic>I</italic>
<sub>2</sub> (we tested for <italic>I</italic>
<sub>1</sub> &#x3d; (0 : 0.1 : 1) and <italic>I</italic>
<sub>2</sub> &#x3d; (0 : 0.1 : 1) except for <italic>I</italic>
<sub>1</sub> &#x3d; <italic>I</italic>
<sub>2</sub> &#x3d;&#x20;0).</p>
</list-item>
<list-item>
<p>&#x2022; The ratio between <inline-formula id="inf60">
<mml:math id="m75">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> and <inline-formula id="inf61">
<mml:math id="m76">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> (we tested for 1/1, 1/1.5, 1/2, 1/3, 1/4,&#x20;1/5).</p>
</list-item>
<list-item>
<p>&#x2022; The number of cubes required to solve the puzzle (we tested for 2, 3, 4, and&#x20;5).</p>
</list-item>
<list-item>
<p>&#x2022; The number of simulations (10000) we conducted to calculate the average time and the standard deviation.</p>
</list-item>
</list>
</p>
</sec>
<sec id="s5-4-4">
<title>5.4.4 Simulation Results</title>
<p>We illustrate the efficiency of our utility function <italic>C</italic>
<sub>3</sub> by showing the improvement in time to completion and the reduction of the number of human errors obtained while solving the puzzles.</p>
<sec id="s5-4-4-1">
<title>5.4.4.1 Time Improvement</title>
<p>We validate the efficiency of our utility function <italic>C</italic>
<sub>3</sub> by comparing the resulted average total times with similar cases using <italic>C</italic>
<sub>1</sub> over 10000 simulations<xref ref-type="fn" rid="FN4">
<sup>4</sup>
</xref>. Like real experiments, we assumed that human actions are constant, and we change merely the robot actions.</p>
<p>We calculate the time improvement <xref ref-type="disp-formula" rid="e16">Eq. 16</xref> by comparing the average total times <inline-formula id="inf62">
<mml:math id="m77">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> calculated using the utility function of <italic>C</italic>
<sub>3</sub> to the average total times <inline-formula id="inf63">
<mml:math id="m78">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> calculated using the state-of-the-art utility function (i.e.,&#x20;<italic>C</italic>
<sub>1</sub>). This is illustrated in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref> for a 4-cube puzzle with a ratio <inline-formula id="inf64">
<mml:math id="m79">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>5</mml:mn>
</mml:math>
</inline-formula>. As it can be observed, the experiment times are improved up to 66.7<italic>%</italic>. Another example is given in <xref ref-type="sec" rid="s13">Supplementary Material</xref> for a 3-cube puzzle with a ratio <inline-formula id="inf65">
<mml:math id="m80">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>3</mml:mn>
</mml:math>
</inline-formula>. The percentage of the time improvement depends on how much the human participant is &#x201c;intelligent&#x201d;.<disp-formula id="e16">
<mml:math id="m81">
<mml:mtext>Percentage&#x2009;of&#x2009;time&#x2009;improvement</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>&#x2217;</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mn>100</mml:mn>
</mml:math>
<label>(16)</label>
</disp-formula>Theoretically, however, this percentage can reach a value close to 100<italic>%</italic> for a very small time taken by the human (which lead to a very small <inline-formula id="inf66">
<mml:math id="m82">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>) and a very big time taken by the robot (which lead to a very big <inline-formula id="inf67">
<mml:math id="m83">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>). We can note that having the time improvement percentage equal to 0 signifies that we are using <italic>C</italic>
<sub>1</sub>; while utilizing <italic>C</italic>
<sub>2</sub> increases the value of the time improvement percentage. It means that in the worst-case scenario, the efficiency of our method is as the state-of-the-art&#x20;peers.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Percentage of time improvement between <italic>C</italic>
<sub>3</sub> and <italic>C</italic>
<sub>1</sub> for a 4-cube puzzle. <inline-formula id="inf68">
<mml:math id="m84">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>15,0,15</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf69">
<mml:math id="m85">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>75,0,75</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, so the ratio <inline-formula id="inf70">
<mml:math id="m86">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>5</mml:mn>
</mml:math>
</inline-formula>. <italic>P</italic>(<italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub>) &#x3d; <italic>I</italic>
<sub>1</sub>, <italic>P</italic>(<italic>A</italic>
<sub>
<italic>h</italic>,<italic>w</italic>
</sub>) &#x3d; <italic>I</italic>
<sub>2</sub>, and <italic>P</italic>(<italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub>) &#x3d; <italic>I</italic>
<sub>3</sub> &#x3d; 1 &#x2212; (<italic>I</italic>
<sub>1</sub> &#x2b; <italic>I</italic>
<sub>2</sub>). In this figure, each dotted line is equivalent to a specific <italic>I</italic>
<sub>1</sub> value. Each dot corresponds to a <italic>I</italic>
<sub>2</sub> value (read on the x-axis). For each dot knowing <italic>I</italic>
<sub>1</sub> and <italic>I</italic>
<sub>2</sub>, we can deduce its <italic>I</italic>
<sub>3</sub> value using <italic>I</italic>
<sub>3</sub> &#x3d; 1 &#x2212; (<italic>I</italic>
<sub>1</sub> &#x2b; <italic>I</italic>
<sub>2</sub>). For illustrating, we give <italic>I</italic>
<sub>1</sub>, <italic>I</italic>
<sub>2</sub>, and <italic>I</italic>
<sub>3</sub> values of the dot marked in the figure.</p>
</caption>
<graphic xlink:href="frobt-08-736644-g007.tif"/>
</fig>
</sec>
<sec id="s5-4-4-2">
<title>5.4.4.2 Reduction of the Number of Human Errors</title>
<p>For reducing the time to completion, we consider the probability of human errors in <xref ref-type="disp-formula" rid="e15">Eq. 15</xref>. So, we choose between <italic>C</italic>
<sub>1</sub> and <italic>C</italic>
<sub>2</sub> the case which minimizes the time by reducing the number of iterations needed for solving the puzzle. This means choosing the case which reduces the number of human errors as explained in <xref ref-type="sec" rid="s5-4-2">Section 5.4.2</xref>. We calculate, in <xref ref-type="disp-formula" rid="e17">Eq. 17</xref>, the percentage of human errors reduction (PHER) using the difference between the predicted probability of human errors <italic>I</italic>
<sub>3</sub> and the average (over the 10000 simulations) measured probability of human errors <inline-formula id="inf71">
<mml:math id="m87">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>.<disp-formula id="e17">
<mml:math id="m88">
<mml:mtext>Percentage&#x2009;of&#x2009;human&#x2009;errors&#x2009;reduction</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="matrix">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mspace width="0.3333em" class="nbsp"/>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mspace width="0.3333em" class="nbsp"/>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>&#x2217;</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mn>100</mml:mn>
</mml:mtd>
<mml:mtd columnalign="center">
<mml:mtext>if</mml:mtext>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd columnalign="center">
<mml:mtext>if&#x2009;</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(17)</label>
</disp-formula>Where <italic>I</italic>
<sub>3</sub> is the predicted probability that the human makes a wrong move (makes an error), <inline-formula id="inf72">
<mml:math id="m89">
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> the measured number of human errors, and <inline-formula id="inf73">
<mml:math id="m90">
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> the measured total number of human actions. So, <inline-formula id="inf74">
<mml:math id="m91">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula> will be the measured probability that the human makes an error after one simulation. The reduction of the number of human errors is as big as <inline-formula id="inf75">
<mml:math id="m92">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>&#x304;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is&#x20;small.</p>
<p>The reduction percentage of the number of human errors increases with the reduction of <inline-formula id="inf76">
<mml:math id="m93">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> (the time the robot takes to make an action) and the reduction of the number of cubes that should be assembled to solve a puzzle. In other words, the human will have fewer turns to play and so fewer chances to make mistakes. The best result we got is presented in <xref ref-type="fig" rid="F8">Figure&#x20;8</xref> (for a 2-cube puzzle with <inline-formula id="inf77">
<mml:math id="m94">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>): the reduction percentage of the number of human errors is up to 50.6<italic>%</italic>. The result can be better in case that the robot is faster than the human in performing an action (<inline-formula id="inf78">
<mml:math id="m95">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>). Note that, when the <italic>I</italic>
<sub>3</sub> is equal to 0<italic>%</italic>, the percentage of human errors reduction is also equal to 0<italic>%</italic>. It means that the human never makes errors so, there is nothing that needs to be improved. Another example is presented in <xref ref-type="sec" rid="s13">Supplementary Material</xref> for a 3-cube puzzle with a ratio <inline-formula id="inf79">
<mml:math id="m96">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>3</mml:mn>
</mml:math>
</inline-formula>.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Percentage of human errors reduction between the predicted probability of human errors and the measured one for a 2-cube puzzle. <inline-formula id="inf80">
<mml:math id="m97">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>15,0,15</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf81">
<mml:math id="m98">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mn>15,0,15</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, so the ratio <inline-formula id="inf82">
<mml:math id="m99">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>. <italic>P</italic>(<italic>A</italic>
<sub>
<italic>h</italic>,<italic>g</italic>
</sub>) &#x3d; <italic>I</italic>
<sub>1</sub>, <italic>P</italic>(<italic>A</italic>
<sub>
<italic>h</italic>,<italic>w</italic>
</sub>) &#x3d; <italic>I</italic>
<sub>2</sub>, and <italic>P</italic>(<italic>A</italic>
<sub>
<italic>h</italic>,<italic>b</italic>
</sub>) &#x3d; <italic>I</italic>
<sub>3</sub> &#x3d; 1 &#x2212; (<italic>I</italic>
<sub>1</sub> &#x2b; <italic>I</italic>
<sub>2</sub>). In this figure, each dotted line is equivalent to a specific <italic>I</italic>
<sub>1</sub> value. Each dot corresponds to a <italic>I</italic>
<sub>2</sub> value (read on the x-axis). For each dot knowing <italic>I</italic>
<sub>1</sub> and <italic>I</italic>
<sub>2</sub>, we can deduce its <italic>I</italic>
<sub>3</sub> value using <italic>I</italic>
<sub>3</sub> &#x3d; 1 &#x2212; (<italic>I</italic>
<sub>1</sub> &#x2b; <italic>I</italic>
<sub>2</sub>).</p>
</caption>
<graphic xlink:href="frobt-08-736644-g008.tif"/>
</fig>
</sec>
</sec>
</sec>
</sec>
<sec id="s6">
<title>6 Conclusion and Future Work</title>
<p>We propose a new formalization of the decision-making process to perform the task and accomplish it more efficiently. We assess through the experiments that our formalization can be applied to feasible tasks and optimize the human-robot collaboration in terms of all defined metrics. We also prove through the experiments that we can change the three studied case scenarios by changing the performance metrics in the utility function (i.e.,&#x20;reward function) without changing the entire framework.</p>
<p>Validating this, experiments are carried out by simulating the task of solving the construction puzzle. It shows that using our proposed utility function instead of the state-of-the-art utility function improves the experiment time up to 66.7<italic>%</italic>, hence improves the human-robot collaboration without extending the robot&#x2019;s abilities. Theoretically, this improvement can reach a value close to 100<italic>%</italic>. We also got a percentage of human errors reduction up to 50.6<italic>%</italic> by considering the predicted probability that the human makes errors for optimizing the time to completion.</p>
<p>We note that there are still some points to improve in future work. First, we want to add to the formalization a predictive function to estimate human behavior through a realistic database that can be used in a reinforcement learning procedure. Secondly, we set in this paper the decision-making method and the strategy. We want to develop another formalization in which they will be variable and dynamically adaptable to the&#x20;task.</p>
</sec>
</body>
<back>
<sec id="s7">
<title>Data Availability Statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: <ext-link ext-link-type="uri" xlink:href="https://github.com/MelodieDANIEL/Optimizing_Human_Robot_Collaboration_Frontiers">https://github.com/MelodieDANIEL/Optimizing_Human_Robot_Collaboration_Frontiers</ext-link>.</p>
</sec>
<sec id="s8">
<title>Ethics Statement</title>
<p>The studies involving human participants were reviewed and approved by The ethics committee of the Clermont-Auvergne University under the number: IRB00011540-2020-48. The patients/participants provided their written informed consent to participate in this study.</p>
</sec>
<sec id="s9">
<title>Author Contributions</title>
<p>MH designed and implemented the core algorithm presented in the paper and carried out the experiments on the Nao humanoid robot. SL, JC, and YM contributed to the presented ideas and to the review of the final manuscript.</p>
</sec>
<sec id="s10">
<title>Funding</title>
<p>This work has received funding from the Auvergne-Rh&#xf4;ne-Alpes Region through the ATTRIHUM project and from the European Union&#x2019;s Horizon 2020 research and innovation programme under grant agreement No 869855 (Project &#x201c;SoftManBot&#x201d;).</p>
</sec>
<sec sec-type="COI-statement" id="s11">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s12">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>We would like to thank the European Union&#x2019;s Horizon 2020 research and innovation programme under grant agreement No 869855 (Project &#x201c;SoftManBot&#x201d;) for funding this work. We would like to thank Sayed Mohammadreza Shetab Bushehri and Miguel Aranda for providing feedback and English editing on the previous versions of this manuscript and for giving us valuable advice.</p>
</ack>
<sec id="s13">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frobt.2021.736644/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frobt.2021.736644/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="Image3.JPEG" id="SM1" mimetype="application/JPEG" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image2.JPEG" id="SM2" mimetype="application/JPEG" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet1.pdf" id="SM3" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image1.jpg" id="SM4" mimetype="application/jpg" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<fn-group>
<fn id="FN1">
<label>1</label>
<p>Camelot Jr. is a game created by Smart Games: <ext-link ext-link-type="uri" xlink:href="https://www.smartgames.eu/uk/one-player-games/camelot-jr">https://www.smartgames.eu/uk/one-player-games/camelot-jr</ext-link>.</p>
</fn>
<fn id="FN2">
<label>2</label>
<p>We denote functions by lower case letters in bold, sets and subsets between braces with upper case letters in bold, indexes by lower case letters, parameters by upper case letters, and vectors (i.e.,&#x20;profiles) by letters in bold topped by an arrow between parenthesis.</p>
</fn>
<fn id="FN3">
<label>3</label>
<p>The experiment protocol was approved by the ethics committee of the Clermont-Auvergne University under the number: IRB00011540-2020-48.</p>
</fn>
<fn id="FN4">
<label>4</label>
<p>All the results are presented on <ext-link ext-link-type="uri" xlink:href="https://github.com/MelodieDANIEL/Optimizing_Human_Robot_Collaboration_Frontiers">https://github.com/MelodieDANIEL/Optimizing_Human_Robot_Collaboration_Frontiers</ext-link>.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ajoudani</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Zanchettin</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Ivaldi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Albu-Sch&#xe4;ffer</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kosuge</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Khatib</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Progress and Prospects of the Human-Robot Collaboration</article-title>. <source>Auton. Robot</source> <volume>42</volume>, <fpage>957</fpage>&#x2013;<lpage>975</lpage>. <pub-id pub-id-type="doi">10.1007/s10514-017-9677-2</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bansal</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Howard</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Isbell</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2020</year>). <source>A Bayesian Framework for Nash Equilibrium Inference in Human-Robot Parallel Play</source>. <publisher-loc>Corvalis, OR</publisher-loc>: <publisher-name>arXiv</publisher-name>. <comment>preprint arXiv:2006.05729</comment>. </citation>
</ref>
<ref id="B3">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>B&#xfc;tepage</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kragic</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Human-robot Collaboration: From&#x20;Psychology to Social Robotics</source>. <publisher-name>arXiv</publisher-name>. <comment>preprint arXiv:1705.10146</comment>. </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nikolaidis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Soh</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Srinivasa</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Trust-Aware Decision Making for Human-Robot Collaboration</article-title>. <source>J.&#x20;Hum. Robot Interact.</source> <volume>9</volume>, <fpage>1</fpage>&#x2013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1145/3359616</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Clabaugh</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Mahajan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Jain</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pakkar</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Becerra</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Z.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Long-term Personalization of an in-home Socially Assistive Robot for Children with Autism Spectrum Disorders</article-title>. <source>Front. Robot. AI.</source> <volume>6</volume>, <fpage>110</fpage>. <pub-id pub-id-type="doi">10.3389/frobt.2019.00110</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Conitzer</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Sandholm</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2006</year>). &#x201c;<article-title>Computing the Optimal Strategy to Commit to</article-title>,&#x201d; in <conf-name>Proceedings of the 7th ACM conference on Electronic commerce</conf-name>, <conf-loc>Ann Arbor, MI</conf-loc>, <conf-date>June 11&#x2013;15</conf-date>, <fpage>82</fpage>&#x2013;<lpage>90</lpage>. <pub-id pub-id-type="doi">10.1145/1134707.1134717</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Delleman</surname>
<given-names>N. J.</given-names>
</name>
<name>
<surname>Dul</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>International Standards on Working Postures and Movements ISO 11226 and EN 1005-4</article-title>. <source>Ergonomics</source> <volume>50</volume>, <fpage>1809</fpage>&#x2013;<lpage>1819</lpage>. <pub-id pub-id-type="doi">10.1080/00140130701674430</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>DelPreto</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Rus</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Sharing the Load: Human-Robot Team Lifting Using Muscle Activity</article-title>,&#x201d; in <conf-name>2019 International Conference on Robotics and Automation</conf-name>, <conf-loc>Montreal, QC</conf-loc>, <conf-date>May 20&#x2013;24</conf-date> (<publisher-name>ICRA</publisher-name>), <fpage>7906</fpage>&#x2013;<lpage>7912</lpage>. <pub-id pub-id-type="doi">10.1109/ICRA.2019.8794414</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Durantin</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Heath</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wiles</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Social Moments: A Perspective on Interaction for Social Robotics</article-title>. <source>Front. Robot. AI.</source> <volume>4</volume>, <fpage>24</fpage>. <pub-id pub-id-type="doi">10.3389/frobt.2017.00024</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Fishman</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Paxton</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Ratliff</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Boots</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Collaborative Interaction Models for Optimized Human-Robot Teamwork</article-title>, &#x201d;in <conf-name>2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</conf-name>, <conf-loc>Las Vegas, NV</conf-loc>, <conf-date>October 24&#x2013;January 24</conf-date>, <fpage>11221</fpage>&#x2013;<lpage>11228</lpage>. <comment>
<italic>preprint arXiv:1910.04339</italic>
</comment>. </citation>
</ref>
<ref id="B11">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Flad</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Otten</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Schwab</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hohmann</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Steering Driver Assistance System: A Systematic Cooperative Shared Control Design Approach</article-title>,&#x201d; in <conf-name>2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (IEEE)</conf-name>, <conf-loc>San Diego, CA</conf-loc>, <conf-date>October 5&#x2013;8</conf-date>, <fpage>3585</fpage>&#x2013;<lpage>3592</lpage>. <pub-id pub-id-type="doi">10.1109/smc.2014.6974486</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>F&#xfc;l&#xf6;p</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2005</year>). &#x201c;<article-title>Introduction to Decision Making Methods</article-title>,&#x201d; in <source>BDEI-3 Workshop</source>. <publisher-loc>Washington</publisher-loc>, <fpage>1</fpage>&#x2013;<lpage>15</lpage>. </citation>
</ref>
<ref id="B13">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Gabler</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Stahl</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Huber</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Oguz</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Wollherr</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>A Game-Theoretic Approach for Adaptive Action Selection in Close Proximity Human-Robot-Collaboration</article-title>,&#x201d; in <conf-name>In 2017 IEEE International Conference on Robotics and Automation (ICRA)</conf-name>, <conf-loc>Singapore</conf-loc>, <conf-date>May 29&#x2013;June 3</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>2897</fpage>&#x2013;<lpage>2903</lpage>. <pub-id pub-id-type="doi">10.1109/icra.2017.7989336</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gervasi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Mastrogiacomo</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Franceschini</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Conceptual Framework to Evaluate Human-Robot Collaboration</article-title>. <source>Int. J.&#x20;Adv. Manuf Technol.</source> <volume>108</volume>, <fpage>841</fpage>&#x2013;<lpage>865</lpage>. <pub-id pub-id-type="doi">10.1007/s00170-020-05363-1</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Ghadirzadeh</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>B&#xfc;tepage</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Maki</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kragic</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bj&#xf6;rkman</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>A Sensorimotor Reinforcement Learning Framework for Physical Human-Robot Interaction</article-title>,&#x201d; in <conf-name>2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</conf-name>, <conf-loc>Daejeon, Korea</conf-loc>, <conf-date>October 9&#x2013;14</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>2682</fpage>&#x2013;<lpage>2688</lpage>. <pub-id pub-id-type="doi">10.1109/iros.2016.7759417</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoffman</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Evaluating Fluency in Human-Robot Collaboration</article-title>. <source>IEEE Trans. Human-mach. Syst.</source> <volume>49</volume>, <fpage>209</fpage>&#x2013;<lpage>218</lpage>. <pub-id pub-id-type="doi">10.1109/thms.2019.2904558</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Hosseini</surname>
<given-names>S. M. F.</given-names>
</name>
<name>
<surname>Lettinga</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Vasey</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Jeon</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>C. H.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). &#x201c;<article-title>Both &#x201c;look and Feel&#x201d; Matter: Essential Factors for Robotic Companionship</article-title>,&#x201d; in <conf-name>2017 26th IEEE International Symposium on Robot and Human Interactive Communication</conf-name>, <conf-loc>Lisbon, Portugal</conf-loc>, <conf-date>August 28&#x2013;September 1</conf-date> (<publisher-name>RO-MAN IEEE</publisher-name>), <fpage>150</fpage>&#x2013;<lpage>155</lpage>. <pub-id pub-id-type="doi">10.1109/roman.2017.8172294</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jarrass&#xe9;</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Charalambous</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Burdet</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>A Framework to Describe, Analyze and Generate Interactive Motor Behaviors</article-title>. <source>PloS one</source> <volume>7</volume>, <fpage>e49945</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0049945</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kwon</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bucquet</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sadigh</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Influencing Leading and Following in Human-Robot Teams</article-title>,&#x201d; in <conf-name>Proceedings of Robotics: Science and Systems</conf-name> <conf-loc>FreiburgimBreisgau, Germany</conf-loc>, <conf-date>June 22&#x2013;26</conf-date>. <pub-id pub-id-type="doi">10.15607/rss.2019.xv.075</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leyton-Brown</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Shoham</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Essentials of Game Theory: A Concise Multidisciplinary Introduction</article-title>. <source>Synth. Lectures Artif. Intelligence Machine Learn.</source> <volume>2</volume>, <fpage>1</fpage>&#x2013;<lpage>88</lpage>. <pub-id pub-id-type="doi">10.2200/s00108ed1v01y200802aim003</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Oyler</surname>
<given-names>D. W.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Yildiz</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kolmanovsky</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Girard</surname>
<given-names>A. R.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Game Theoretic Modeling of Driver and Vehicle Interactions for Verification and Validation of Autonomous Vehicle Control Systems</article-title>. <source>IEEE Trans. Control Syst. Technol.</source> <volume>26</volume>, <fpage>1782</fpage>&#x2013;<lpage>1797</lpage>. </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Carboni</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Gonzalez</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Campolo</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Burdet</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Differential Game Theory for Versatile Physical Human-Robot Interaction</article-title>. <source>Nat. Mach Intell.</source> <volume>1</volume>, <fpage>36</fpage>&#x2013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.1038/s42256-018-0010-3</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Pham</surname>
<given-names>D. T.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Human-robot Collaborative Manufacturing Using Cooperative Game: Framework and Implementation</article-title>. <source>Proced. CIRP.</source> <volume>72</volume>, <fpage>87</fpage>&#x2013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1016/j.procir.2018.03.172</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Malik</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Bilberg</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Complexity-based Task Allocation in Human-Robot Collaborative Assembly</article-title>. <source>Ind. Robot: Int. J.&#x20;robotics Res. Appl.</source> <volume>46</volume>, <fpage>471</fpage>&#x2013;<lpage>180</lpage>. <pub-id pub-id-type="doi">10.1108/ir-11-2018-0231</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maurtua</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Ibarguren</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kildal</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Susperregi</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sierra</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Human&#x2013;robot Collaboration in Industrial Applications: Safety, Interaction and Trust</article-title>. <source>Int. J.&#x20;Adv. Robotic Syst.</source> <volume>14</volume>, <fpage>1729881417716010</fpage>. <pub-id pub-id-type="doi">10.1177/1729881417716010</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Na</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Cole</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Game Theoretic Modelling of a Human Driver&#x2019;s Steering Interaction with Vehicle Active Steering Collision Avoidance System</article-title>. <source>IEEE Trans. on Human-Machine Sys.</source> <volume>45</volume>, <fpage>25</fpage>&#x2013;<lpage>38</lpage>. </citation>
</ref>
<ref id="B27">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Nachum</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Norouzi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Schuurmans</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Bridging the gap between Value and Policy Based Reinforcement Learning</article-title>,&#x201d; in <conf-name>31st Conference on Neural Information Processing Systems (NIPS 2017)</conf-name>, <conf-loc>Long Beach, CA</conf-loc>. </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Negulescu</surname>
<given-names>O.-H.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Using a Decision Making Process Model in Strategic Management</article-title>. <source>Rev. Gen. Manage.</source> <volume>19</volume>. </citation>
</ref>
<ref id="B29">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Nelles</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kwee-Meier</surname>
<given-names>S. T.</given-names>
</name>
<name>
<surname>Mertens</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Evaluation Metrics Regarding Human Well-Being and System Performance in Human-Robot Interaction - A Literature Review</article-title>,&#x201d; in <conf-name>Congress of the International Ergonomics Association</conf-name>, <conf-loc>Florence, Italy</conf-loc>, <conf-date>August 26&#x2013;30</conf-date> (<publisher-name>Springer</publisher-name>), <fpage>124</fpage>&#x2013;<lpage>135</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-96068-5_14</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nikolaidis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Srinivasa</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2017a</year>). <article-title>Human-robot Mutual Adaptation in Collaborative Tasks: Models and Experiments</article-title>. <source>Int. J.&#x20;Robotics Res.</source> <volume>36</volume>, <fpage>618</fpage>&#x2013;<lpage>634</lpage>. <pub-id pub-id-type="doi">10.1177/0278364917690593</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Nikolaidis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Nath</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Procaccia</surname>
<given-names>A. D.</given-names>
</name>
<name>
<surname>Srinivasa</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2017b</year>). &#x201c;<article-title>Game-theoretic Modeling of Human Adaptation in Human-Robot Collaboration</article-title>,&#x201d; in <conf-name>Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction - HRI &#x2019;17</conf-name>, <conf-loc>Vienna, Austria</conf-loc>, <conf-date>March 6&#x2013;9</conf-date> (<publisher-name>ACM Press</publisher-name>). <pub-id pub-id-type="doi">10.1145/2909824.3020253</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nocentini</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Fiorini</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Acerbi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Sorrentino</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mancioppi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Cavallo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A Survey of Behavioral Models for Social Robots</article-title>. <source>Robotics</source> <volume>8</volume>, <fpage>54</fpage>. <pub-id pub-id-type="doi">10.3390/robotics8030054</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oliff</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ryan</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Reinforcement Learning for Facilitating Human-Robot-Interaction in Manufacturing</article-title>. <source>J.&#x20;Manufacturing Syst.</source> <volume>56</volume>, <fpage>326</fpage>&#x2013;<lpage>340</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmsy.2020.06.018</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Reinhardt</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pereira</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Beckert</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bengler</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Dominance and Movement Cues of Robot Motion: A User Study on Trust and Predictability</article-title>,&#x201d; in <conf-name>2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC)</conf-name>, <conf-loc>Banff, AB</conf-loc>, <conf-date>October 5&#x2013;8</conf-date> (<publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/smc.2017.8122825</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Rosenberg-Kima</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Koren</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yachini</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Human-robot-collaboration (Hrc): Social Robots as Teaching Assistants for Training Activities in Small Groups</article-title>,&#x201d; in <conf-name>2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI)</conf-name>, <conf-loc>Daegu, South Korea</conf-loc>, <conf-date>March 11&#x2013;14</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>522</fpage>&#x2013;<lpage>523</lpage>. <pub-id pub-id-type="doi">10.1109/hri.2019.8673103</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roveda</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Magni</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cantoni</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Piga</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bucca</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Human-robot Collaboration in Sensorless Assembly Task Learning Enhanced by Uncertainties Adaptation via Bayesian Optimization</article-title>. <source>Robotics Autonomous Syst.</source> <volume>136</volume>, <fpage>103711</fpage>. <pub-id pub-id-type="doi">10.1016/j.robot.2020.103711</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Seel</surname>
<given-names>N. M.</given-names>
</name>
</person-group> (<year>2012</year>). <source>Encyclopedia of the Sciences of Learning</source> (<publisher-loc>Boston, MA</publisher-loc>:<publisher-name>Springer US</publisher-name>). <pub-id pub-id-type="doi">10.1007/978-1-4419-1428-6</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharkawy</surname>
<given-names>A.-N.</given-names>
</name>
<name>
<surname>Papakonstantinou</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Papakostopoulos</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Moulianitis</surname>
<given-names>V. C.</given-names>
</name>
<name>
<surname>Aspragathos</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Task Location for High Performance Human-Robot Collaboration</article-title>. <source>J.&#x20;Intell. Robotic Syst.</source> <volume>100</volume>, <fpage>1</fpage>&#x2013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1007/s10846-020-01181-5</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Steinfeld</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fong</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kaber</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Scholtz</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Schultz</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2006</year>). &#x201c;<article-title>Common Metrics for Human-Robot Interaction</article-title>,&#x201d; in <conf-name>Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction</conf-name>, <conf-loc>Salt Lake City, UT</conf-loc>, <conf-date>March 2&#x2013;3</conf-date>, <fpage>33</fpage>&#x2013;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1145/1121241.1121249</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Tabrez</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hayes</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Improving Human-Robot Interaction through Explainable Reinforcement Learning</article-title>,&#x201d; in <conf-name>2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI)</conf-name>, <conf-loc>Daegu, South Korea</conf-loc>, <conf-date>March 11&#x2013;14</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>751</fpage>&#x2013;<lpage>753</lpage>. <pub-id pub-id-type="doi">10.1109/hri.2019.8673198</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tanevska</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Rea</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Sandini</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Ca&#xf1;amero</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sciutti</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Socially Adaptable Framework for Human-Robot Interaction</article-title>. <source>Front. Robot. AI</source> <volume>7</volume>, <fpage>121</fpage>. <pub-id pub-id-type="doi">10.3389/frobt.2020.00121</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Wagner-Hartl</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Gehring</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kopp</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Link</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Machill</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Pottin</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zitz</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gunser</surname>
<given-names>V. E.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Who Would Let a Robot Take Care of Them? - Gender and Age Differences</article-title>,&#x201d; in <conf-name>International Conference on Human-Computer Interaction</conf-name>, <conf-loc>Copenhagen, Denmark</conf-loc>, <conf-date>July 19&#x2013;24, 2020</conf-date> (<publisher-name>Springer</publisher-name>), <fpage>196</fpage>&#x2013;<lpage>202</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-50726-8_26</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weitschat</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Aschemann</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Safe and Efficient Human-Robot Collaboration Part II: Optimal Generalized Human-In-The-Loop Real-Time Motion Generation</article-title>. <source>IEEE Robot. Autom. Lett.</source> <volume>3</volume>, <fpage>3781</fpage>&#x2013;<lpage>3788</lpage>. <pub-id pub-id-type="doi">10.1109/lra.2018.2856531</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Pham</surname>
<given-names>D. T.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Disassembly Sequence Planning Using Discrete Bees Algorithm for Human-Robot Collaboration in Remanufacturing</article-title>. <source>Robotics. Computer-Integrated Manufacturing</source> <volume>62</volume>, <fpage>101860</fpage>. <pub-id pub-id-type="doi">10.1016/j.rcim.2019.101860</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>