<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="brief-report" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Pharmacol.</journal-id>
<journal-title>Frontiers in Pharmacology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Pharmacol.</abbrev-journal-title>
<issn pub-type="epub">1663-9812</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1094281</article-id>
<article-id pub-id-type="doi">10.3389/fphar.2022.1094281</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Pharmacology</subject>
<subj-group>
<subject>Perspective</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Reinforcement learning as an innovative model-based approach: Examples from precision dosing, digital health and computational psychiatry</article-title>
<alt-title alt-title-type="left-running-head">Ribba</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fphar.2022.1094281">10.3389/fphar.2022.1094281</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Ribba</surname>
<given-names>Benjamin</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2032337/overview"/>
</contrib>
</contrib-group>
<aff>
<institution>Roche Pharma Research and Early Development (pRED)</institution>, <institution>F. Hoffmann-La Roche Ltd</institution>, <addr-line>Basel</addr-line>, <country>Switzerland</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1062253/overview">Zinnia P. Parra-Guillen</ext-link>, University of Navarra, Spain</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/2093143/overview">Nadia Terranova</ext-link>, Merck, Switzerland</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Benjamin Ribba, <email>benjamin.ribba@roche.com</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Translational Pharmacology, a section of the journal Frontiers in Pharmacology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>17</day>
<month>02</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>1094281</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>11</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>12</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Ribba.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Ribba</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Model-based approaches are instrumental for successful drug development and use. Anchored within pharmacological principles, through mathematical modeling they contribute to the quantification of drug response variability and enables precision dosing. Reinforcement learning (RL)&#x2014;a set of computational methods addressing optimization problems as a continuous learning process&#x2014;shows relevance for precision dosing with high flexibility for dosing rule adaptation and for coping with high dimensional efficacy and/or safety markers, constituting a relevant approach to take advantage of data from digital health technologies. RL can also support contributions to the successful development of digital health applications, recognized as key players of the future healthcare systems, in particular for reducing the burden of non-communicable diseases to society. RL is also pivotal in computational psychiatry&#x2014;a way to characterize mental dysfunctions in terms of aberrant brain computations&#x2014;and represents an innovative modeling approach forpsychiatric indications such as depression or substance abuse disorders for which digital therapeutics are foreseen as promising modalities.</p>
</abstract>
<kwd-group>
<kwd>pharmacometrics</kwd>
<kwd>digital health</kwd>
<kwd>reinforcement learning</kwd>
<kwd>precision dosing</kwd>
<kwd>computational psychiatry</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Reinforcement learning for precision dosing</title>
<p>Precision dosing, or the ability to identify and deliver the right dose and schedule (i.e. the dose and schedule with highest likelihood of maximizing efficacy and minimizing toxicity), is critical for public health and society. Precision dosing is not only important for marketed drugs to reduce the consequences of imprecise dosing in terms of costs and adverse events; but also for therapeutics in development to reduce attrition, often related to the challenge of precisely characterizing the therapeutic window due to a suboptimal understanding of drug-response variability. Achieving the benefit to society of precision dosing requires the identification of the main drivers of response variability, as early as possible in the drug development process, and the deployment into clinical practice through an infrastructure designed for real-time dosing decisions in patients (<xref ref-type="bibr" rid="B20">Maxfield and Zineh, 2021</xref>; <xref ref-type="bibr" rid="B24">Peck, 2021</xref>).</p>
<p>Model-based approaches to clinical pharmacology, also known as clinical pharmacometrics (PMX) play a critical role in precision dosing. First, they contribute to the identification of the determinants of response variability through quantitative analysis of pharmacokinetic (PK) and pharmacodynamics (PD) relationships, and second, they constitute a central part of the infrastructure providing a simulation engine, predicting individual patient&#x2019;s response to a dose, and from which optimal dosing is identified through reverse engineering. Often this reverse engineering comprises two steps: first the PMX model&#x2019;s individual parameters are calculated through Bayesian inference, i.e. through the calculation of the mode of posterior distribution (maximum a posteriori or MAP); second, an optimal dosing scheduling is calculated, often <italic>via</italic> an heuristic approach through simulating various feasible dosing scenarios on inferred individuals model&#x2019;s instances.</p>
<p>Many examples exist in literature describing relevant PKPD models for precision dosing. For instance, in oncology, a model describing the time course of neutrophils following chemotherapy treatment is an ideal candidate for optimizing chemotherapy delivery (see (<xref ref-type="bibr" rid="B7">Friberg et al., 2002</xref>) as an example). Studies have also reported clinical investigations of model-based precision dosing approaches. For instance, the clinical study &#x201c;MODEL1&#x201d; was a phase I/II trial and a clear clinical attempt of a personalized dosing regimen of docetaxel and epirubicin patients with metastatic breast cancer and was shown to lead to improved efficacy-toxicity balance (<xref ref-type="bibr" rid="B9">Henin et al., 2016</xref>).</p>
<p>Reinforcement learning (RL) was also used for precision dosing. Still in oncology, Maier et al. extended the classical framework of model-driven precision dosing with RL coupled or not with data assimilation techniques (<xref ref-type="bibr" rid="B19">Maier et al., 2021</xref>). Previously, RL applications&#x2014;although without clinical confirmation&#x2014;were developed for brain tumors (<xref ref-type="bibr" rid="B36">Yauney and Shah, 2018</xref>) based on a model of tumor size response to chemotherapy (<xref ref-type="bibr" rid="B29">Ribba et al., 2012</xref>). We have recently evaluated the performance of RL algorithms for precision dosing of propofol for general anesthesia and for which a meta-analysis showed that the monitoring of the bispectral index (BIS)&#x2014;a PD endpoint&#x2014;contributes to reduce the amount of propofol given and the incidence of adverse reactions (<xref ref-type="bibr" rid="B35">Wang et al., 2021</xref>). In (<xref ref-type="bibr" rid="B28">Ribba et al., 2022</xref>), we performed a theoretical analysis of propofol precision dosing confronting RL to hallmarks of clinical pharmacology problems during drug development, i.e. the low number of patients and tested dosing regimen, the incomplete understanding of the drivers of response and the presence of high variability in the data.</p>
<p>While RL does not present as a universal solution for all types of precision dosing problems, it is an interesting modeling paradigm worth exploring. In comparison to the way PMX traditionally addresses precision dosing, RL presents several advantages. First, the possibility to take into account high dimensional PKPD variables while classical model-based approaches are often limited to a low number of variables (plasma concentration and one endpoint). In doing so, it represents an opportunity for the integration of digital health data such as from wearable devices or digital health technologies in general. Second, the definition of the precision dosing policy in a dynamic and adaptable manner through the continuous learning of the algorithm through real and simulated experience (data). RL is an approach by which both the underlying model and the optimal dosing rules are learnt simultaneously while for classical approaches, these represent two sequential steps: in other words, the consequence of the dose does not influence the model structure. Recently, studies have been published illustrating methodologies for adapting PKPD model structures through data assimilation (<xref ref-type="bibr" rid="B18">Lu et al., 2021</xref>; <xref ref-type="bibr" rid="B2">Bram et al., 2022</xref>). While high dosing frequency is not a prerequisite condition for the applicability of RL to precision dosing, this approach is well suited when the solution space of dosing is large, making heuristic approaches to find optimal dosing solutions inadequate. In our example on propofol, dosing could happen every 5&#xa0;s so over a short period of 2&#xa0;min, the space of solutions to explore when considering dichotomous dosing even is greater than 16 million possibilities.</p>
<p>RL is at the crossroads between two scientific fields. First, the field of learning by trial and error that started with the study of the psychology of animal learning and second, the field of optimal control (<xref ref-type="bibr" rid="B33">Sutton and Barto, 2018</xref>). RL are often formally described with Markov Decision Process or MDP which includes all important features a learning agent should have, namely, being able to sense the environment, being able to take action and have clarity on the goal. In RL, a learning agent takes an action and, as a result, transitions from one state to another. After each action taken, the interaction between the agent and its environment produces a reward. The goal of the RL problem is to map actions to situations (state), i.e. knowing which actions to take in each state to maximize the accumulated reward. As long as the optimization problem can be formulated within the MDP framework, RL can be applied and its efficiency explored.</p>
<p>For precision dosing of propofol, the state can be represented by a table, an approach also called tabular solution methods. In the next two sections, the state will be defined by a continuous function. The reward was determined based on the value reached by the BIS as a direct consequence of the action taken: the closer the BIS to the target, the higher the reward. Finally, given the theoretical study, the true PKPD model (linking the dose application to BIS) was used as an experience (data) generator. The left column of <xref ref-type="table" rid="T1">Table 1</xref> summarizes the characteristics of the application of RL to the propofol precision dosing problem.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Main characteristics of RL algorithm implementation to the precision dosing of pharmacological interventions (left column); the precision dosing of digital intervention (middle column); and computational psychiatry (right column). While there are multiple similarities between the precision dosing of pharmacological and digital interventions, the application of RL in computational psychiatry shows as a paradigm shift. RL computational machinery is not deployed as a technical approach to address the optimal control problem of precision dosing but is fitted to (cognitive task) data assuming the algorithm itself presents mechanistic similarities with how brain&#x2019;s participants functioned during the task.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Precision dosing of a pharmacological intervention</th>
<th align="center">Precision dosing of a digital intervention</th>
<th align="center">Computation psychiatry</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Study case [References]</td>
<td align="left">Optimal dosing of propofol administration (<xref ref-type="bibr" rid="B28">Ribba et al., 2022</xref>)</td>
<td align="left">Just-in-time-adaptive-intervention for HeartSteps, mobile app aimed at reducing physical inactivity (<xref ref-type="bibr" rid="B17">Liao et al., 2020</xref>)</td>
<td align="left">Population analysis of signal-detection task in anhedonic subjects (<xref ref-type="bibr" rid="B10">Huys et al., 2013</xref>)</td>
</tr>
<tr>
<td align="left">Type of RL solution</td>
<td align="center">Tabular</td>
<td colspan="2" align="center">Continuous</td>
</tr>
<tr>
<td rowspan="2" align="left">State</td>
<td colspan="2" align="center">
<italic>Is directly linked to the state of the patient</italic>
</td>
<td align="center">
<italic>Is linked to the situation the participant to the task is presented with and based on which an action must taken</italic>
</td>
</tr>
<tr>
<td align="left">PK drivers and/or PD endpoint such as the BIS</td>
<td align="left">Contextual drivers (e.g. weather conditions, time of the day) and patient-related status derived from wearable device equipment</td>
<td align="left">Belief of the correctness (weight) of each stimuli present in the task</td>
</tr>
<tr>
<td align="left">Action</td>
<td align="left">Dose or not</td>
<td align="left">Dose (walking suggestion message) or not</td>
<td align="left">Participant&#x2019;s answer choice</td>
</tr>
<tr>
<td rowspan="2" align="left">Reward</td>
<td colspan="2" align="center">
<italic>Defined to enable the algorithm converging to the optimal dosing solution</italic>
</td>
<td align="left">
<italic>Corresponds to whether the answer is correct or wrong</italic>
</td>
</tr>
<tr>
<td align="left">Simple function of BIS leading to high reward when actual BIS is close to its target</td>
<td align="left">Step count in the 30&#xa0;min window after each decision time</td>
<td align="left">Automatically derived from the answer as per task design and setup</td>
</tr>
<tr>
<td rowspan="2" align="left">Use of simulated experience?</td>
<td colspan="2" align="center">
<italic>Yes</italic>
</td>
<td align="center">
<italic>No</italic>
</td>
</tr>
<tr>
<td align="left">The true underlying PKPD model is used</td>
<td align="left">Linear model assimilating real data</td>
<td align="left">No need for simulated experience, RL algorithm is mapped to the trial-by-trial data</td>
</tr>
<tr>
<td align="left">Algorithm</td>
<td align="left">Temporal difference Q-learning</td>
<td align="left">Thomson Sampling</td>
<td align="left">Temporal difference Q-learning</td>
</tr>
<tr>
<td rowspan="2" align="left">Free parameters</td>
<td colspan="2" align="center">
<italic>Used to calibrate model of patient&#x2019;s response to dosing event</italic>
</td>
<td align="center">
<italic>Used to calibrate RL algorithm</italic>
</td>
</tr>
<tr>
<td align="left">Parameters of the PKPD model</td>
<td align="left">Parameters of the linear model for reward prediction under alternative dosing scenarios</td>
<td align="left">Learning rate and reward sensitivity parameter</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The minimal set of RL characteristics makes it a very flexible paradigm, suitable for a large variety of problems. Herein, we will in fact illustrate this flexibility by illustrating how this framework can be viewed as a bridge between a priori distinct areas such as precision dosing of pharmacological drugs, digital health and computational psychiatry.</p>
<p>In the appendix, we propose to demystify how RL algorithms&#x2014;such as temporal difference Q-learning, repeatedly mentioned here&#x2014;work, taking a simple illustration from video gaming.</p>
</sec>
<sec id="s2">
<title>2 Reinforcement learning in digital health</title>
<p>For several years, many reports have indicated the key importance of digital health for reducing the burden to society of non-communicable diseases such as cardiovascular, diabetes, cancer or psychiatric diseases, in part due to the aging of the population and&#x2014;paradoxically&#x2014;the success of pharmacologically-based interventions in increasing life expectancy while being affected by pathological conditions (<xref ref-type="bibr" rid="B6">Fleisch et al., 2021</xref>). Prevention and interventions targeting lifestyle are essential tools to address this societal challenge of ever-growing importance as our healthcare systems risk collapse under cost pressure.</p>
<p>In 2008, it was estimated that physical inactivity causes 6% of the burden of coronary heart disease, 7% of type II diabetes, 10% of breast cancer and 10% of colon cancer and overall the cause of more than 5.3 million of the 57 million deaths which occurred that year (<xref ref-type="bibr" rid="B16">Lee et al., 2012</xref>). In that study, the authors also estimated that with 25% reduction of physical inactivity, 1.3 million of deaths could be averted every year. Given the constant increase of smartphone coverage worldwide, it is natural to think of mobile health technologies to support healthy lifestyle habits and prevention. The thinktank Metaforum from KU Leuven dedicated its position paper 17 on the use of wearables and mobile technologies for collecting information on individual behavior and physical status&#x2014;combined with data from individual&#x2019;s environment&#x2014;to personalize recommendations (interventions) bringing the subject to adopt a healthier lifestyle (<xref ref-type="bibr" rid="B3">Claes, 2022</xref>).</p>
<p>When the intervention is intended to have a therapeutic benefit, it falls in the field of digital therapeutics when associated with demonstration of clinical effectiveness and approved by regulatory bodies (<xref ref-type="bibr" rid="B34">Sverdlov et al., 2018</xref>). This point of junction between digital health applications and pharmacological drugs represents a ground for attempting to reframe PMX&#x2014;a recognized key player in the development of the latter&#x2014;as a key support to the development of the former, in particular when it comes to precision dosing for digital health.</p>
<p>The precision dosing of digital therapeutics overlaps with the concept of just-in-time adaptive intervention or JITAI (<xref ref-type="bibr" rid="B22">Nahum-Shani et al., 2018</xref>). In the mobile technology literature, JITAI has been primarily considered as a critical topic for increasing adherence and retention of users; but within a therapeutic perspective, it should encompass both the topic of adherence and retention to the therapeutic modality and the topic of its optimal dosing in order to maximize clinical benefit. For clarity, these two different learning problems should be distinguished as many existing applications focus primarily on the first one. For example, a growing number of mobile applications developed under the concept of virtual coaching aim to optimize the design of the interventions (time and content, e.g. messages sent by the app to the users with the form of a prompt appearing on a locked screen) to incite the user to take actions. HeartSteps was designed to encourage user to increase their physical activity and where content delivery, such as tailored walking suggestion messages, is optimized with an RL algorithm (<xref ref-type="bibr" rid="B17">Liao et al., 2020</xref>). Here, RL is used to address the first learning problem: How to deliver the content so that the user is doing what is recommended. We each need different forms of prompting and potentially different forms of exercise to increase our physical activity. Overall, this problem is similar to that of adherence to a pharmacological regimen. But a second problem is: what is the right dose of the desired intervention? In other words: How many steps is optimal for each patient? This is the usual precision dosing problem for drugs and there is a clear opportunity for digital health applications to extend the domain of application of JITAIs to that problem as well.</p>
<p>One of the particularly interesting aspects of the research on RL algorithms for HeartSteps is that, beyond the innovative nature of the work purely related to the design of personalized interventions, it also includes ways to objectively evaluate its efficiency. An experimental design called micro-randomized trial (MRT) is proposed as a framework to evaluate the effectiveness of personalized <italic>versus</italic> non-personalized interventions (<xref ref-type="bibr" rid="B14">Klasnja et al., 2015</xref>; <xref ref-type="bibr" rid="B26">Qian et al., 2022</xref>). The principle of MRT is to randomize the interventions multiple times for each subject. Statistical approaches have been studied to leverage MRT-derived data in order to inform treatment effects and the response variability (<xref ref-type="bibr" rid="B25">Qian et al., 2020</xref>). In the theoretical propofol example described in the previous section, we used the true PKPD model to simulate experience. In the real-life RL application of HeartSteps, the authors had the objective to design a method for learning quickly and for accommodating noisy data (<xref ref-type="bibr" rid="B17">Liao et al., 2020</xref>). To address these points, the authors used a simulation engine to enhance data collected from real experience and this simulation engine was built with simple linear models. Precisely, the authors modeled the difference in reward function under alternative dosing options with low dimensional linear models, which features were selected based on retrospective analysis of previous HeartSteps data and based on experts&#x2019; guidance. The precision dosing problem was addressed using posterior sampling <italic>via</italic> Thompson-Sampling, identified as performant in balancing exploration and exploitation (<xref ref-type="bibr" rid="B31">Russo and Van Roy, 2014</xref>; <xref ref-type="bibr" rid="B30">Russo et al., 2018</xref>). The definition of the state was based on several individual&#x2019;s features including contextual information or sensor data from wearable devices while the reward was defined as the step counts within 30&#xa0;min after the &#x201c;dosing&#x201d; event. The middle column of <xref ref-type="table" rid="T1">Table 1</xref> summarizes the main characteristic of RL application to this problem.</p>
</sec>
<sec id="s3">
<title>3 Reinforcement learning in computational psychiatry</title>
<p>Like mechanistic modelling, computational psychiatry refers to a systems approach aimed at integrating underlying pathophysiological processes. However, while mechanistic modelling efforts typically use multiscale biological processes as building blocks, some models that fall within the remit of computational psychiatry (such as RL) use different types of building blocks, and in particular brain cognitive processes.</p>
<p>Model-based approaches have shown relevance for addressing major challenges in neuroscience (see (<xref ref-type="bibr" rid="B4">Conrado et al., 2020</xref>) for an example for Alzheimer disease). Quantitative systems pharmacology and mechanistic-based multiscale modelling are, in particular, associated with major hopes while acknowledging significant challenges such as the lack of quantitative and validated biomarkers, the subjective nature of clinical endpoints and the high selectivity of drug candidates not reflecting the complex interactions of different brain circuits (<xref ref-type="bibr" rid="B8">Geerts et al., 2020</xref>; <xref ref-type="bibr" rid="B1">Bloomingdale et al., 2021</xref>). These challenges are equally valid for attempting to address psychiatric conditions. This can partly explain the efficiency of non-pharmacological interventions, such as targeted psychotherapy approaches, recognized as one of the most precise and powerful approaches (<xref ref-type="bibr" rid="B12">Insel and Cuthbert, 2015</xref>).</p>
<p>The efficiency of such interventions is a testimony of how the brain&#x2019;s intrinsic plasticity can alter neural circuits. Some (discursive) disease models&#x2014;with a focus on systems dimensions&#x2013;propose new perspectives in the understanding of such conditions. For instance, it has been reported that emotion-cognition interactions gone awry can lead to anxiety and depression conditions; with anxious individuals displaying attentional-bias toward threatening stimuli and have difficulty disengaging from it (<xref ref-type="bibr" rid="B5">Crocker et al., 2013</xref>). Further data-driven understanding&#x2014;at the systems level&#x2014;is key to increase the likelihood of success of such non-pharmacological interventions, as it is equally the case for research and development of pharmaceutical compounds (<xref ref-type="bibr" rid="B23">Pao and Nagel, 2022</xref>). Such data-driven understanding can be integrated in the design of relevant non-pharmacological interventions, with some of them known to be amenable to digital delivery through, for instance, digital therapeutics (<xref ref-type="bibr" rid="B13">Jacobson et al., 2022</xref>).</p>
<p>A precision medicine initiative&#x2014;precision psychiatry&#x2014;has been initiated for psychiatric indications, such as major depression or substance abuse disorder, constituting a major part of non-communicable diseases (<xref ref-type="bibr" rid="B12">Insel and Cuthbert, 2015</xref>). The core idea of precision psychiatry lies in the reframing the diagnosis and care of affected subjects by moving away from a symptom-based to a data-driven categorization through a focus on system dimension <italic>via</italic> integration of data from cognitive, affective and social neuroscience, overall shifting the way to characterize these conditions in terms of brain circuits (dys-)functioning. This concept materialized in proposing the Research Domain Criteria (RDoc) in 2010 (<xref ref-type="bibr" rid="B11">Insel et al., 2010</xref>) as a framework for research in pathophysiology of psychiatric conditions.</p>
<p>Integrating into a multiscale modelling framework, data from cognitive, affective and social neuroscience is an objective of computational psychiatry, defined as a way to characterize mental dysfunction in terms of aberrant computation in the brain (<xref ref-type="bibr" rid="B21">Montague et al., 2012</xref>). Not surprisingly, by its mimicking of human and animal learning processes, RL plays a key role in computational psychiatry. RL in computational psychiatry proposes to map brain functioning in an algorithmic language offering then the possibility to explore, through simulations, the dysfunctioning of these processes as well as the theoretical benefit of interventional strategies. Two examples will be further developed here and the readers can refer to (<xref ref-type="bibr" rid="B32">Seri&#xe8;s, 2020</xref>) for an overview of more computational psychiatry methods, models and study cases.</p>
<p>In a RL framework, actions by the learner are chosen according to their value function, which holds the expected accumulated reward. The value function is updated through experience using feedback from the environment to the action taken. This update is also called temporal difference. An analogy has been drawn between this temporal difference and reward-error signals carried by dopamine in decision-making. Temporal difference reinforcement learning algorithms learn by estimating a value function based on temporal differences. The learning stops as this different converges to zero (see <xref ref-type="sec" rid="s10">Supplementary Material</xref> for further details). Such a framework can be used to reframe addiction as a decision-making process gone awry. Based on the observation that addictive drugs produce a transient increase in dopamine through neuropharmacological mechanisms, the proposed model assumes that an addictive drug produces a positive temporal difference independent of the value function so that the action of taking drug will be always preferred over other actions (<xref ref-type="bibr" rid="B27">Redish, 2004</xref>). This model provides a tool to explore the efficiency of public health strategies. For instance, the model proposes some hypotheses to explain the incomplete success of strategies based on offering money as an alternate choice from drug intake.</p>
<p>RL models are used for the analysis of data of cognitive tasks and in particular tasks related to decision-making. Instead of focusing on the summary statistics of such tests (e.g, total number of errors), RL-based approaches allow for the integration of trial-by-trial data similarly to what model-based approaches typically do&#x2014;with longitudinal data analysis&#x2014;to better decipher response variability <italic>via</italic> the characterization of PK and PD processes. In the same way, trial-by-trial data can be leveraged to estimate RL-model based parameters which, in turn, can be compared to clinical endpoints such as measures of symptom severity to disentangle the role of brain circuit mechanisms overall contributing to a better understanding of response variability. RL for cognitive testing data in psychiatric populations is a complete paradigm change with respect to its application for precision dosing problems. While&#x2013;in the two previous examples&#x2014;RL was used to solve the problem of optimal dosing, now the RL algorithm is mapped to neuro-cognitive processes. Quantitatively characterizing these processes for each patient (estimating parameters from RL algorithms) is proposed as a methodology for extracting relevant information towards disease characterization and thus, response variability.</p>
<p>In (<xref ref-type="bibr" rid="B10">Huys et al., 2013</xref>), the authors use RL models to analyse population data of a behavioural test (signal-detection task) to study aspects of anhedonia&#x2014;a core symptom of depression&#x2014;related to reward learning. The authors proposed a RL model based on Q-learning update integrating two parameters: the classical learning rate and a parameter related to reward sensitivity modulating the percentage of the reward value actually contributes to the update of the Q value function. By performing a correlation analysis of the inferred parameters with anhedonic depression questionnaire, the authors found a negative correlation between the reward sensitivity but no correlation with the learning rate. Overall, these results led to the conclusion that the sensitivity to the reward and not the learning rate could be the main driver explaining why in anhedonic individuals, reward has less impact than in non-anhedonic individuals. Unravelling these two mechanisms is important for the planning of successful digital, behavioural and pharmacological strategies. The right column in <xref ref-type="table" rid="T1">Table 1</xref> depicts the summary characteristics of RL applied to that study.</p>
</sec>
<sec sec-type="conclusion" id="s4">
<title>4 Conclusion</title>
<p>In this perspective, we have illustrated the flexibility of RL framework throughout the described applications in precision dosing, digital health and computational psychiatry and with that have demonstrated the benefit for the modeling community to become familiar with these approaches. The contrary is also true, and the field of precision digital therapeutics and computational psychiatry can benefit much from a proximity to the PMX community.</p>
<p>First, PMX methods could make RL even better. The field of computational psychiatry could benefit from input from the PMX community when it comes to statistical aspects related to parameters inference and clinical endpoint modelling. Two areas for which PMX has adopted as its state-of-the-art, population approach (with powerful algorithms such as stochastic approximation expectation-maximization algorithm (<xref ref-type="bibr" rid="B15">Lavielle, 2014</xref>)) and joint modelling respectively.</p>
<p>Second, the field of digital health should benefit from what constitutes one of the essential objectives of model-based drug development approaches, namely: elucidating response variability. It is particularly important for the successful development of digital therapeutic interventions to know how to characterize the efficacy and safety profiles and to know how to develop personalization strategies based on this understanding. The fact that it is about digital interventions should not prevent developers from prioritizing research in understanding underlying causal biological and (patho)-physiological processes of response, which will always be a key factor of successful therapy development, either pharmacological or not. <xref ref-type="fig" rid="F1">Figure 1</xref> proposes an illustration of these mutual benefits.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Illustration of the mutual benefits of increased permeability between model-based approaches to precision dosing and digital health, on one hand, and computational psychiatry on the other hand.</p>
</caption>
<graphic xlink:href="fphar-13-1094281-g001.tif"/>
</fig>
</sec>
<sec id="s5">
<title>5 Legend</title>
<p>
<xref ref-type="table" rid="T1">Table 1</xref>
<bold>:</bold> Main characteristics of RL algorithm implementation to the precision dosing of pharmacological interventions (left column); the precision dosing of digital intervention (middle column); and computational psychiatry (right column). While there are multiple similarities between the precision dosing of pharmacological and digital interventions, the application of RL in computational psychiatry shows as a paradigm shift. RL computational machinery is not deployed as a technical approach to address the optimal control problem of precision dosing but is fitted to (cognitive task) data assuming the algorithm itself present mechanistic similarities with how brain&#x2019;s participants functioned during the task.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="sec" rid="s10">Supplementary Material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s7">
<title>Author contributions</title>
<p>BR: manuscript writing.</p>
</sec>
<ack>
<p>The author wishes to acknowledge Lucy Hutchinson, Richard Peck and Denis Engelmann for providing inputs on the drat manuscript.</p>
</ack>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of interest</title>
<p>The author is employed by F. Hoffmann La Roche Ltd.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s10">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fphar.2022.1094281/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fphar.2022.1094281/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.docx" id="SM1" mimetype="application/docx" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bloomingdale</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Karelina</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Cirit</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Muldoon</surname>
<given-names>S. F.</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>McCarty</surname>
<given-names>W. J.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Quantitative systems pharmacology in neuroscience: Novel methodologies and technologies</article-title>. <source>CPT Pharmacometrics Syst. Pharmacol.</source> <volume>10</volume> (<issue>5</issue>), <fpage>412</fpage>&#x2013;<lpage>419</lpage>. <pub-id pub-id-type="doi">10.1002/psp4.12607</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bram</surname>
<given-names>D. S.</given-names>
</name>
<name>
<surname>Parrott</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hutchinson</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Steiert</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Introduction of an artificial neural network-based method for concentration-time predictions</article-title>. <source>CPT Pharmacometrics Syst. Pharmacol.</source> <volume>11</volume> (<issue>6</issue>), <fpage>745</fpage>&#x2013;<lpage>754</lpage>. <pub-id pub-id-type="doi">10.1002/psp4.12786</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Claes</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2022</year>), <article-title>Mobile health revolution in healthcare: Are we ready?</article-title> <comment>Metaforum position paper 17 2019 [cited 2022 October 10]; Available at: <ext-link ext-link-type="uri" xlink:href="https://www.kuleuven.be/metaforum/visie-en-debatteksten/2019-mobile-health-revolution-in-healthcare">https://www.kuleuven.be/metaforum/visie-en-debatteksten/2019-mobile-health-revolution-in-healthcare</ext-link>
</comment>.</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Conrado</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Duvvuri</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Geerts</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Burton</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Biesdorf</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ahamadi</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Challenges in alzheimer&#x27;s disease drug discovery and development: The role of modeling, simulation, and open data</article-title>. <source>Clin. Pharmacol. Ther.</source> <volume>107</volume> (<issue>4</issue>), <fpage>796</fpage>&#x2013;<lpage>805</lpage>. <pub-id pub-id-type="doi">10.1002/cpt.1782</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crocker</surname>
<given-names>L. D.</given-names>
</name>
<name>
<surname>Heller</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Warren</surname>
<given-names>S. L.</given-names>
</name>
<name>
<surname>O&#x27;Hare</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Infantolino</surname>
<given-names>Z. P.</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>G. A.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Relationships among cognition, emotion, and motivation: Implications for intervention and neuroplasticity in psychopathology</article-title>. <source>Front. Hum. Neurosci.</source> <volume>7</volume>, <fpage>261</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2013.00261</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Fleisch</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Franz</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Herrmann</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2021</year>). <source>The digital pill</source>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friberg</surname>
<given-names>L. E.</given-names>
</name>
<name>
<surname>Henningsson</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Maas</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Karlsson</surname>
<given-names>M. O.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Model of chemotherapy-induced myelosuppression with parameter consistency across drugs</article-title>. <source>J. Clin. Oncol.</source> <volume>20</volume> (<issue>24</issue>), <fpage>4713</fpage>&#x2013;<lpage>4721</lpage>. <pub-id pub-id-type="doi">10.1200/JCO.2002.02.140</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Geerts</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wikswo</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>van der Graaf</surname>
<given-names>P. H.</given-names>
</name>
<name>
<surname>Bai</surname>
<given-names>J. P. F.</given-names>
</name>
<name>
<surname>Gaiteri</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bennett</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Quantitative systems pharmacology for neuroscience drug discovery and development: Current status, opportunities, and challenges</article-title>. <source>CPT Pharmacometrics Syst. Pharmacol.</source> <volume>9</volume> (<issue>1</issue>), <fpage>5</fpage>&#x2013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1002/psp4.12478</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Henin</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Meille</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Barbolosi</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Guitton</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Iliadis</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>Revisiting dosing regimen using PK/PD modeling: The MODEL1 phase I/II trial of docetaxel plus epirubicin in metastatic breast cancer patients</article-title>. <source>Breast Cancer Res. Treat.</source> <volume>156</volume> (<issue>2</issue>), <fpage>331</fpage>&#x2013;<lpage>341</lpage>. <pub-id pub-id-type="doi">10.1007/s10549-016-3760-9</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huys</surname>
<given-names>Q. J.</given-names>
</name>
<name>
<surname>Pizzagalli</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Bogdan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Dayan</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Mapping anhedonia onto reinforcement learning: A behavioural meta-analysis</article-title>. <source>Biol. Mood Anxiety Disord.</source> <volume>3</volume> (<issue>1</issue>), <fpage>12</fpage>. <pub-id pub-id-type="doi">10.1186/2045-5380-3-12</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Insel</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Cuthbert</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Garvey</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Heinssen</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Pine</surname>
<given-names>D. S.</given-names>
</name>
<name>
<surname>Quinn</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2010</year>). <article-title>Research domain criteria (RDoC): Toward a new classification framework for research on mental disorders</article-title>. <source>Am. J. Psychiatry</source> <volume>167</volume> (<issue>7</issue>), <fpage>748</fpage>&#x2013;<lpage>751</lpage>. <pub-id pub-id-type="doi">10.1176/appi.ajp.2010.09091379</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Insel</surname>
<given-names>T. R.</given-names>
</name>
<name>
<surname>Cuthbert</surname>
<given-names>B. N.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Medicine. Brain disorders? Precisely</article-title>. <source>Science</source> <volume>348</volume> (<issue>6234</issue>), <fpage>499</fpage>&#x2013;<lpage>500</lpage>. <pub-id pub-id-type="doi">10.1126/science.aab2358</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Jacobson</surname>
<given-names>N. C.</given-names>
</name>
<name>
<surname>Kowatsch</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Marsch</surname>
<given-names>L. A.</given-names>
</name>
</person-group> (<year>2022</year>). <source>Digital therapeutics for mental health and addiction: The state of the science and vision for the future</source>. <publisher-loc>San Diego, CA</publisher-loc>: <publisher-name>Academic Press</publisher-name>, <fpage>270</fpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Klasnja</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hekler</surname>
<given-names>E. B.</given-names>
</name>
<name>
<surname>Shiffman</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Boruvka</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Almirall</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Tewari</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <article-title>Microrandomized trials: An experimental design for developing just-in-time adaptive interventions</article-title>. <source>Health Psychol.</source> <volume>34S</volume>, <fpage>1220</fpage>&#x2013;<lpage>1228</lpage>. <pub-id pub-id-type="doi">10.1037/hea0000305</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lavielle</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2014</year>). <source>Mixed effects models for the population approach: Models, tasks, methods and tools</source>. <edition>1st edition</edition>. <publisher-name>Chapman and Hall/CRC</publisher-name>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>I. M.</given-names>
</name>
<name>
<surname>Shiroma</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Lobelo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Puska</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Blair</surname>
<given-names>S. N.</given-names>
</name>
<name>
<surname>Katzmarzyk</surname>
<given-names>P. T.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Effect of physical inactivity on major non-communicable diseases worldwide: An analysis of burden of disease and life expectancy</article-title>. <source>Lancet</source> <volume>380</volume> (<issue>9838</issue>), <fpage>219</fpage>&#x2013;<lpage>229</lpage>. <pub-id pub-id-type="doi">10.1016/S0140-6736(12)61031-9</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liao</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Greenewald</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Klasnja</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Personalized HeartSteps: A reinforcement learning algorithm for optimizing physical activity</article-title>. <source>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.</source> <volume>4</volume> (<issue>1</issue>), <fpage>18</fpage>. <pub-id pub-id-type="doi">10.1145/3381007</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Neural-ODE for pharmacokinetics modeling and its advantage to alternative machine learning models in predicting new dosing regimens</article-title>. <source>iScience</source> <volume>24</volume> (<issue>7</issue>), <fpage>102804</fpage>. <pub-id pub-id-type="doi">10.1016/j.isci.2021.102804</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maier</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hartung</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Kloft</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Huisinga</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>de Wiljes</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Reinforcement learning and Bayesian data assimilation for model-informed precision dosing in oncology</article-title>. <source>CPT Pharmacometrics Syst. Pharmacol.</source> <volume>10</volume> (<issue>3</issue>), <fpage>241</fpage>&#x2013;<lpage>254</lpage>. <pub-id pub-id-type="doi">10.1002/psp4.12588</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maxfield</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zineh</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Precision dosing: A clinical and public health imperative</article-title>. <source>JAMA</source> <volume>325</volume> (<issue>15</issue>), <fpage>1505</fpage>&#x2013;<lpage>1506</lpage>. <pub-id pub-id-type="doi">10.1001/jama.2021.1004</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Montague</surname>
<given-names>P. R.</given-names>
</name>
<name>
<surname>Dolan</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Friston</surname>
<given-names>K. J.</given-names>
</name>
<name>
<surname>Dayan</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Computational psychiatry</article-title>. <source>Trends Cogn. Sci.</source> <volume>16</volume> (<issue>1</issue>), <fpage>72</fpage>&#x2013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2011.11.018</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nahum-Shani</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>S. N.</given-names>
</name>
<name>
<surname>Spring</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Collins</surname>
<given-names>L. M.</given-names>
</name>
<name>
<surname>Witkiewitz</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Tewari</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Just-in-Time adaptive interventions (JITAIs) in mobile health: Key components and design principles for ongoing health behavior support</article-title>. <source>Ann. Behav. Med.</source> <volume>52</volume> (<issue>6</issue>), <fpage>446</fpage>&#x2013;<lpage>462</lpage>. <pub-id pub-id-type="doi">10.1007/s12160-016-9830-8</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Nagel</surname>
<given-names>Y. A.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Paradigms for the development of transformative medicines-lessons from the EGFR story</article-title>. <source>Ann. Oncol.</source> <volume>33</volume> (<issue>5</issue>), <fpage>556</fpage>&#x2013;<lpage>560</lpage>. <pub-id pub-id-type="doi">10.1016/j.annonc.2022.02.005</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peck</surname>
<given-names>R. W.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Precision dosing: An industry perspective</article-title>. <source>Clin. Pharmacol. Ther.</source> <volume>109</volume> (<issue>1</issue>), <fpage>47</fpage>&#x2013;<lpage>50</lpage>.</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qian</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Klasnja</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>S. A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Linear mixed models with endogenous covariates: Modeling sequential treatment effects with application to a mobile health study</article-title>. <source>Stat. Sci.</source> <volume>35</volume> (<issue>3</issue>), <fpage>375</fpage>&#x2013;<lpage>390</lpage>. <pub-id pub-id-type="doi">10.1214/19-sts720</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qian</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Walton</surname>
<given-names>A. E.</given-names>
</name>
<name>
<surname>Collins</surname>
<given-names>L. M.</given-names>
</name>
<name>
<surname>Klasnja</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Lanza</surname>
<given-names>S. T.</given-names>
</name>
<name>
<surname>Nahum-Shani</surname>
<given-names>I.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>The microrandomized trial for developing digital interventions: Experimental design and data analysis considerations</article-title>. <source>Psychol. Methods</source> <volume>27</volume>, <fpage>874</fpage>&#x2013;<lpage>894</lpage>. <pub-id pub-id-type="doi">10.1037/met0000283</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Redish</surname>
<given-names>A. D.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Addiction as a computational process gone awry</article-title>. <source>Science</source> <volume>306</volume> (<issue>5703</issue>), <fpage>1944</fpage>&#x2013;<lpage>1947</lpage>. <pub-id pub-id-type="doi">10.1126/science.1102384</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ribba</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <source>Model enhanced reinforcement learning to enable precision dosing: A theoretical case study with dosing of propofol</source>. <publisher-name>CPT Pharmacometrics Syst Pharmacol</publisher-name>.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ribba</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kaloshi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Peyre</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ricard</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Calvez</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Tod</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>A tumor growth inhibition model for low-grade glioma treated with chemotherapy or radiotherapy</article-title>. <source>Clin. Cancer Res.</source> <volume>18</volume> (<issue>18</issue>), <fpage>5071</fpage>&#x2013;<lpage>5080</lpage>. <pub-id pub-id-type="doi">10.1158/1078-0432.CCR-12-0084</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Russo</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Van Roy</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kazerouni</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Osband</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A tutorial on Thompson sampling</article-title>. <source>Found. Trends&#xae; Mach. Learn.</source> <volume>11</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>96</lpage>. <pub-id pub-id-type="doi">10.1561/2200000070</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Russo</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Van Roy</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Learning to optimize via posterior sampling</article-title>. <source>Math. Operations Res.</source> <volume>39</volume> (<issue>4</issue>), <fpage>1221</fpage>&#x2013;<lpage>1243</lpage>. <pub-id pub-id-type="doi">10.1287/moor.2014.0650</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Seri&#xe8;s</surname>
<given-names>P. E.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Computational psychiatry</source>. <publisher-name>The MIT Press</publisher-name>.</citation>
</ref>
<ref id="B33">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sutton</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Barto</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Reinforcement learning: An introduction</source>. <edition>Second edition</edition>.</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sverdlov</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>van Dam</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hannesdottir</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Thornton-Wells</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Digital therapeutics: An integral component of digital innovation in drug development</article-title>. <source>Clin. Pharmacol. Ther.</source> <volume>104</volume> (<issue>1</issue>), <fpage>72</fpage>&#x2013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1002/cpt.1036</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Bispectral index monitoring of the clinical effects of propofol closed-loop target-controlled infusion: Systematic review and meta-analysis of randomized controlled trials</article-title>. <source>Med. Baltim.</source> <volume>100</volume> (<issue>4</issue>), <fpage>e23930</fpage>. <pub-id pub-id-type="doi">10.1097/MD.0000000000023930</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Yauney</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Reinforcement learning with action-derived rewards for chemotherapy and clinical trial dosing regimen selection</article-title>,&#x201d; in <conf-name>Proceedings of the 3rd Machine Learning for Healthcare Conference</conf-name> (<publisher-name>PMLR: Proceedings of Machine Learning Research</publisher-name>), <fpage>161</fpage>&#x2013;<lpage>226</lpage>. <comment>D.-V. Finale, et al., Editors.</comment>
</citation>
</ref>
</ref-list>
</back>
</article>