<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Oncol.</journal-id>
<journal-title>Frontiers in Oncology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Oncol.</abbrev-journal-title>
<issn pub-type="epub">2234-943X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fonc.2023.1124458</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Oncology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>An integrated solution of deep reinforcement learning for automatic IMRT treatment planning in non-small-cell lung cancer</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Wang</surname>
<given-names>Hanlin</given-names>
</name>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2138932"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bai</surname>
<given-names>Xue</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/843341"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Yajuan</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lu</surname>
<given-names>Yanfei</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Binbing</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/843343"/>
</contrib>
</contrib-group>
<aff id="aff1">
<institution>Department of Radiation Physics, Zhejiang Key Laboratory of radiation Oncology, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital)</institution>, <addr-line>Hangzhou, Zhejiang</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Tonghe Wang, Memorial Sloan Kettering Cancer Center, United States</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Piergiorgio Cerello, National Institute of Nuclear Physics of Turin, Italy; Chin-Shiuh Shieh, National Kaohsiung University of Science and Technology, Taiwan</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Hanlin Wang, <email xlink:href="mailto:wanghanlins@163.com">wanghanlins@163.com</email>
</p>
</fn>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Radiation Oncology, a section of the journal Frontiers in Oncology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>03</day>
<month>02</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>13</volume>
<elocation-id>1124458</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>12</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>01</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Wang, Bai, Wang, Lu and Wang</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Wang, Bai, Wang, Lu and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<sec>
<title>Purpose</title>
<p>To develop and evaluate an integrated solution for automatic intensity-modulated radiation therapy (IMRT) planning in non-small-cell lung cancer (NSCLC) cases.</p>
</sec>
<sec>
<title>Methods</title>
<p>A novel algorithm named as multi-objectives adjustment policy network (MOAPN) was proposed and trained to learn how to adjust multiple optimization objectives in commercial Eclipse treatment planning system (TPS), based on the multi-agent deep reinforcement learning (DRL) scheme. Furthermore, a three-dimensional (3D) dose prediction module was developed to generate the patient-specific initial optimization objectives to reduce the overall exploration space during MOAPN training. 114 previously treated NSCLC cases suitable for stereotactic body radiotherapy (SBRT) were selected from the clinical database. 87 cases were used for the model training, and the remaining 27 cases for evaluating the feasibility and effectiveness of MOAPN in automatic treatment planning.</p>
</sec>
<sec>
<title>Results</title>
<p>For all tested cases, the average number of adjustment steps was 21 &#xb1; 5.9 (mean &#xb1; 1 standard deviation). Compared with the MOAPN initial plans, the actual dose of chest wall, spinal cord, heart, lung (affected side), esophagus and bronchus in the MOAPN final plans reduced by 14.5%, 11.6%, 4.7%, 16.7%, 1.6% and 7.7%, respectively. The dose result of OARs in the MOAPN final plans was similar to those in the clinical plans. The complete automatic treatment plan for a new case was generated based on the integrated solution, with about 5-6&#xa0;min.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We successfully developed an integrated solution for automatic treatment planning. Using the 3D dose prediction module to obtain the patient-specific optimization objectives, MOAPN formed action-value policy can simultaneously adjust multiple objectives to obtain a high-quality plan in a shorter time. This integrated solution contributes to improving the efficiency of the overall planning workflow and reducing the variation of plan quality in different regions and treatment centers. Although improvement is warranted, this proof-of-concept study has demonstrated the feasibility of this integrated solution in automatic treatment planning based on the Eclipse TPS.</p>
</sec>
</abstract>
<kwd-group>
<kwd>automatic treatment planning</kwd>
<kwd>deep reinforcement learning</kwd>
<kwd>integrated solution</kwd>
<kwd>multi-objectives adjustment policy network</kwd>
<kwd>intensity-modulated radiation therapy</kwd>
</kwd-group>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content>
</contract-sponsor>
<counts>
<fig-count count="8"/>
<table-count count="3"/>
<equation-count count="8"/>
<ref-count count="36"/>
<page-count count="13"/>
<word-count count="7186"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<label>1</label>
<title>Introduction</title>
<p>A major advance in radiotherapy technology is the application of intensity modulated radiation therapy (IMRT) as one of the principal delivery techniques (<xref ref-type="bibr" rid="B1">1</xref>). Current IMRT plans are usually inversely planned using treatment planning systems (TPS) (<xref ref-type="bibr" rid="B2">2</xref>), to safely deliver uniform dose to the target, while minimizing damage to the nearby healthy tissues and organs at risk (OARs).</p>
<p>In the whole plan design process, plan optimization is integral to inverse treatment planning, which is a trial-and-error process to find a good set of optimization objectives, including weighting factors, dose limits, and volume constraints. Among them, the patient-specific optimization objectives are critically important for the creation of a high-quality plan. In the clinical workflow, the trial-and-error process is usually a manual, tedious and time-consuming task. In addition, the final plan quality is dependent on planners&#x2019; experience, planning difficulty and available time (<xref ref-type="bibr" rid="B3">3</xref>). Plans are often accepted under clinical pressure, although further improvement is still possible.</p>
<p>To address these issues, the concept of automatic planning has been proposed and implemented using a range of methodologies (<xref ref-type="bibr" rid="B4">4</xref>). Among the most commonly used approach is protocol based automatic iterative optimization (PB-AIO), which is designed to mimic the planning operations of the physicians by some artificial protocols (<xref ref-type="bibr" rid="B5">5</xref>). Earlier works on the iterative improvement of optimization objective/weights built on the seminal work of Xing et al (<xref ref-type="bibr" rid="B6">6</xref>). This approach is often combined with optimization algorithms in commercial TPSs (<xref ref-type="bibr" rid="B7">7</xref>) (<xref ref-type="bibr" rid="B8">8</xref>) (<xref ref-type="bibr" rid="B9">9</xref>), such as RayStation (RaySearch Laboratories AB, Stockholm, Sweden), Pinnacle (Philips Healthcare GmbH, Hamburg, Germany) and Eclipse (Varian Medical Systems, Palo Alto, CA). Another approach is knowledge-based planning (KBP), which directly utilizes prior knowledge and experience to either predict an achievable dose for a new patient from a similar population or to derive a better starting point for the further optimization. The commercial Eclipse TPS uses the KBP-based RapidPlan module in automatic planning for various tumor sites to estimate two-dimensional dose-volume histograms (DVH) (<xref ref-type="bibr" rid="B10">10</xref>) (<xref ref-type="bibr" rid="B11">11</xref>). In addition, deep learning prediction model is a new solution that uses neural networks to analyze spatial features and achieve patient-specific three-dimensional (3D) dose distribution (<xref ref-type="bibr" rid="B12">12</xref>). The dose prediction model has been applied to automate treatment planning in some studies (<xref ref-type="bibr" rid="B13">13</xref>) (<xref ref-type="bibr" rid="B14">14</xref>).</p>
<p>Recently, deep reinforcement learning (DRL) has showed exceptional performance in some sequential decision-making problems. Notably, it outperformed human experts in Atari games (<xref ref-type="bibr" rid="B15">15</xref>) and made a breakthrough in Go (<xref ref-type="bibr" rid="B16">16</xref>). In TPS, searching for the optimal optimization objectives of treatment planning through trail-and-error method is essentially a sequential decision-making problem. Compared with PB-AIO and KBP approaches, the DRL model is universal and does not rely on previous training experience. To date, the feasibility of DRL in treatment planning has been demonstrated in preliminary studies (<xref ref-type="bibr" rid="B17">17</xref>) (<xref ref-type="bibr" rid="B18">18</xref>). Pu et&#xa0;al. developed an intelligent DRL-based brachytherapy treatment planning framework that can utilize the learned dwell time adjustment policy to obtain a satisfactory plan (<xref ref-type="bibr" rid="B19">19</xref>). Shen et&#xa0;al. proposed the virtual treatment planner network based on an in-house TPS (<xref ref-type="bibr" rid="B20">20</xref>). By an end-to-end training, the network could operate in-house TPS parameters to generate high-quality plans. Previous studies showed that the DRL framework can support decisions for certain tasks in a human-like fashion to achieve similar or even better performance compared with humans.</p>
<p>Although the DRL-based solutions to the sequential decision-making problem of treatment planning are encouraging, the training efficiency of this model is a major concern. Clinically, the more adjustable optimization objectives need to be added when the complexity of treatment planning increases. Depending on a single-agent model, treatment planning selects one objective to adjust in each optimization. In this scenario, a longer time is required to solve the treatment planning optimization problems, which considerably prolongs the training process. In this paper, we proposed a multi-objectives adjustment policy network (MOAPN) based on a multi-agent DRL scheme, and further explored its feasibility to adjust multiple objectives in commercial Eclipse TPS. Furthermore, we developed a 3D dose prediction module, which worked with MOAPN to form an integrated solution for automatic treatment planning. The patient-specific initial optimization objectives were generated based on the predicted 3D dose distribution of each patient case. Therefore, the overall exploration space of each treatment planning could be reduced during MOAPN training. This study aimed to demonstrate the feasibility of this approach to automatic stereotactic body radiotherapy (SBRT) planning for lung cancer.</p>
</sec>
<sec id="s2" sec-type="material|methods">
<label>2</label>
<title>Material and methods</title>
<sec id="s2_1">
<label>2.1</label>
<title>Patient cases collection</title>
<p>The 114 peripheral non-small-cell lung cancer (NSCLC) cases were retrospectively selected from the clinical database as the total study cohort, dated from June 2018 to March 2022. Two or more radiotherapists agreed that they were suitable for SBRT. Lung scanning was performed using a GE LightSpeed-RT simulator or a Philips large-aperture CT simulator. The clinical target volume (CTV), lung, chest wall, esophagus, bronchus, spinal cord and heart were delineated by experienced radiation oncologists. Due to set-up errors and organ motion, the planning target volume (PTV) was obtained by expanding 5&#xa0;mm of the CTV in the 3D direction. The prescribed dose was 50 Gy spread over 5 fractions, and scaled to cover 95% of the PTV for all cases.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Auto-planning creation</title>
<p>In this study, the proposed automated optimization process of SBRT plans was implemented with Eclipse Scripting Application Programming interface (ESAPI) provided by the Eclipse TPS. Python-based scripts were developed in a research mode to implement the proposed automatic planning procedure for NSCLC patients, including plan creation, parameters setting &amp; modification, plan optimization, data reading, and plan evaluation. All studied cases were preformed on a Varian True-Beam linear accelerator with a coplanar beam energy of 6 MV, equipped with the Millennium 120 MLC. All treatment plans were uniformly designed as IMRT and the angular interval of the fields was set to 40&#xb0;.</p>
<p>In the inverse SBRT planning process, dose-limiting rings are often introduced (<xref ref-type="bibr" rid="B21">21</xref>) and used to control the dose gradient outside the target to an acceptable level. In this study, five 3D ring structures with a width of 4&#xa0;mm were generated at distance of 0.3&#xa0;cm, 1.0&#xa0;cm, 2.4&#xa0;cm, 4.4&#xa0;cm and 6&#xa0;cm outside the PTV, named Ring1, Ring2, Ring3, Ring4 and Ring5, respectively. These ring structures were used to participate in planning optimization to control the dose gradient outside the target and spare the normal tissues surrounding the target as much as possible. Moreover, an additional ring structure named D2cm was also generated by a prewritten script to evaluate the result of dose gradient. This ring structure with a width of 1&#xa0;cm was obtained by expanding 2&#xa0;cm outside the PTV in the 3D direction. All settings and requirements were the same between different studied cases. For the target, 100% prescription dose need to cover 95% of the PTV. For the OARs, the dose constraint followed the recommendations of radiation oncology working group (RTOG) 0915 report (<xref ref-type="bibr" rid="B22">22</xref>). The OAR dose was expected to be as low as possible without compromising the dose coverage to the target.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>Optimization objective function</title>
<p>In this paper, the Eclipse Photon optimization (PO) algorithm (<xref ref-type="bibr" rid="B23">23</xref>) was used as the optimizer. The optimization was currently performed on an overall objective with multiple terms of dose volume histogram (DVH) constraints. Each DVH constraint designed for various considerations, corresponding to a set of planning optimization objectives, was composed of four input parameters: an optimization weight factor, a 2D-position on the DVH-graph representing the dose-volume objective, and a Boolean variable describing the direction of the constraint (upper of lower) on the DVH curve. If the dose-volume objectives are not met, a weighted quadratic cost is calculated, as follow:</p>
<disp-formula>
<label>(1)</label>
<mml:math display="block" id="M1">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:mi>w</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Where <italic>D<sub>actual</sub>
</italic> and <italic>D<sub>obj</sub>
</italic> refer to the actual dose after optimization and the desired objective (e.g. <italic>V</italic>
<sub>20</sub>, relative volume of the lung receiving doses of &gt; 20 Gy should be less than 15%) before optimization, respectively. is a weight factor of the dose-volume objective. For the upper objective, the cost is applied for the portion of doses that exceed the desired dose value and volume level. For the lower objective, the cost is applied for the portion of doses that fall short of the desired dose value and volume level.</p>
<p>In addition, an alternative solution of the dose-volume objectives is the generalized form of equivalent uniform dose (gEUD) (<xref ref-type="bibr" rid="B24">24</xref>). Compared with physical dose, the gEUD considers radiobiologic factors and has the potential to improve the sparing of the critical structures in IMRT (<xref ref-type="bibr" rid="B25">25</xref>). However, the target is insensitive to the existence of hot spots within the gEUD optimization, and potentially has worse dose coverage. Then, optimization evaluates the gEUD values for the structure, and a square law cost is applied when an objective is not met in the same manner as in Eq. (1):</p>
<disp-formula>
<label>(2)</label>
<mml:math display="block" id="M2">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Where w is a weight factor of the gEUD objective.<italic>D<sub>EUD</sub>
</italic> is the desired objective for EUD. The <italic>gEUD<sub>actual</sub>
</italic> represents the actual value for EUD after optimization, as follows:</p>
<disp-formula>
<label>(3)</label>
<mml:math display="block" id="M3">
<mml:mrow>
<mml:mtext>gEUD</mml:mtext>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>V</mml:mi>
</mml:mfrac>
<mml:mo>&#xd7;</mml:mo>
<mml:msub>
<mml:mo>&#x2211;</mml:mo>
<mml:mi>V</mml:mi>
</mml:msub>
<mml:mi>D</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>a</mml:mi>
</mml:msup>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>a</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Where a is a biological parameter controlling the dose distribution inside the structure. Typical values range from -40 to 40. For serial structures (e.g., spinal cord, esophagus, chest wall) or ring structures, the gEUD parameter is assigned to a large positive value (a = 40) that tends to near the maximum dose. For parallel structures such as lung, the value of a would be small and positive (a = 1), the dose response may be more closely related to the mean dose (<xref ref-type="bibr" rid="B26">26</xref>). V is the volume of the structure. D(x) represents the dose in position x inside the volume V. For the lower gEUD objective, the cost value is 0 when gEUD &gt; <italic>D<sub>EUD</sub>
</italic>. For the upper gEUD objective, the cost value is 0 when gEUD &gt; <italic>D<sub>EUD</sub>
</italic>.</p>
<p>In Eclipse TPS, the optimization of an IMRT plan using PO optimizer was defined as a minimization problem. To obtain the ideal dose coverage and spare the critical structures, a hybrid objective function combined gEUD and physical dose constraints is developed as:</p>
<disp-formula>
<label>(4)</label>
<mml:math display="block" id="M4">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:munder>
<mml:mrow>
<mml:mtext>min</mml:mtext>
</mml:mrow>
<mml:mi>D</mml:mi>
</mml:munder>
<mml:mo stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>T</mml:mi>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>T</mml:mi>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>w</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>T</mml:mi>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>T</mml:mi>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>+</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mi>i</mml:mi>
</mml:munder>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mi>j</mml:mi>
</mml:munder>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo stretchy="false">)</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>Where <inline-formula>
<mml:math display="inline" id="im1">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>T</mml:mi>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im2">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>T</mml:mi>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are weights for the two objectives of the PTV, respectively. <italic>D<sub>p</sub>
</italic> and <italic>D<sub>max</sub>
</italic> are the prescription dose and the maximum dose for the PTV. <inline-formula>
<mml:math display="inline" id="im3">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is a weight of the <italic>i<sup>th</sup>
</italic> OAR, including chest wall, spinal cord, esophagus, bronchus, heart and lung, respectively. <inline-formula>
<mml:math display="inline" id="im4">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c9;</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is a weight of the <italic>j<sup>th</sup>
</italic> ring structure, including Ring1, Ring2, Ring3, Ring4 and Ring5, respectively.</p>
</sec>
<sec id="s2_4">
<label>2.4</label>
<title>Dose prediction module</title>
<p>At the initial step of optimization, the clinical planners are accustomed to loading a set of predefined objectives generated by plan experience templates into the PO optimizer, performing the first optimization. These predefined objectives represent that a set of fixed optimization parameters is applied to a large number of patient cases, even with various variations between them. A set of good initial optimization objective parameters can not only achieve good optimization result, but also shorten the optimization process and improve work efficiency. In this study, a dose prediction module was developed to predict the 3D dose distribution and obtain the patient-specific initial optimization objective parameters.</p>
<p>With previous experiences on dose prediction (<xref ref-type="bibr" rid="B27">27</xref>), a dose prediction module based on U-Net architecture was configured with 76 clinical treated SBRT plans for peripheral NSCLC cases (60 cases for training and 16 cases for validation). All training plans were manually created by senior physicians following consistent dose prescription and planning protocols. The U-Net model had a total of 10 layers and all convolutional layers applied a 3&#xd7;3&#xd7;3 kernel except the output layer with 1&#xd7;1&#xd7;1 kernel. The input data was contours of the planning structures, which were converted from DICOM format files. It was a 3D matrix (64&#xd7;256&#xd7;256) with one channel, including PTV, lung, chest wall, esophagus, bronchus, spinal cord and heart, respectively. According to the structure type, each voxel was assigned a specific value (e.g. 1.0 for PTV, 0.88 for heart, 0.75 for spinal cord, 0.63 for esophagus, 0.5 for chest wall, 0.43 for bronchus and 0 for the voxel outside of the body). The output data was the 3D predicted dose distribution. The model was implemented in Keras, and the Adam optimization algorithm was used for the sharp loss function minimization (<xref ref-type="bibr" rid="B28">28</xref>).</p>
<p>A complete dose prediction module is composed of the following three steps, as shown in <xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1</bold>
</xref>. First, using previously treated plans as training data generated DL model to predict the 3D dose distribution. Second, the new case&#x2019;s CT with structure information of the target and OARs was used as inputs to automatically generate the 3D dose distribution. Third, based on the 3D dose distribution predicted by new cases, a prewritten script was utilized to count the dose values obtained for each dose voxel in the entire dose grid. The statistical results would form a 2D dose-volume histogram, yielding smooth dose-volume curves for all structures by interpolation. Referring to the clinical guidelines, the specific dose values or volume values for each structure (e.g. lung V<sub>5</sub> and spinal cord D<italic>
<sub>max</sub>
</italic>) were selected in dose-volume curves and converted into initial optimization objective values, e.g. <inline-formula>
<mml:math display="inline" id="im5">
<mml:mrow>
<mml:msubsup>
<mml:mtext>D</mml:mtext>
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im6">
<mml:mrow>
<mml:msubsup>
<mml:mtext>D</mml:mtext>
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:mi>U</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, to participate in the optimization in Eq. (4).</p>
<fig id="f1" position="float">
<label>Figure&#xa0;1</label>
<caption>
<p>The overall workflow of an integrated solution for automatic treatment planning.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-13-1124458-g001.tif"/>
</fig>
</sec>
<sec id="s2_5">
<label>2.5</label>
<title>Multi-objective adjustment policy network</title>
<sec id="s2_5_1">
<label>2.5.1</label>
<title>Network architecture</title>
<p>To build the proposed MOAPN, we used the Q-learning framework, which is a reinforcement learning algorithm (<xref ref-type="bibr" rid="B29">29</xref>) for solving the Markov Decision Processes (<xref ref-type="bibr" rid="B30">30</xref>). This framework tries to build the optimal action-value policy and its updated formula is:</p>
<disp-formula>
<label>(5)</label>
<mml:math display="block" id="M5">
<mml:mrow>
<mml:mtext>Q</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2190;</mml:mo>
<mml:mtext>Q</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mtext>&#x3b1;</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mtext>&#x3b3;maxQ</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mtext>Q</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Where S<italic>
<sub>t</sub>
</italic> and S<italic>
<sub>t</sub>
</italic>
<sub>+1</sub> are the states at t and t+1 steps; a<italic>
<sub>t</sub>
</italic> and a<italic>
<sub>t</sub>
</italic>
<sub>+1</sub> are the actions at t and t+1 steps; r<italic>
<sub>t</sub>
</italic> represents reward value at t step; &#x3b3; is the discount factor; &#x3b1; is the learning rate.</p>
<p>For large state spaces and action spaces, the standard Q-learning algorithm cannot bear huge computational burden. Thus, deep Q network (DQN) (<xref ref-type="bibr" rid="B15">15</xref>) is proposed to approximate action value function <italic>via</italic> a multi-layered neural network. Generally, the loss function of DQN is defined as:</p>
<disp-formula>
<label>(6)</label>
<mml:math display="block" id="M6">
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Where <inline-formula>
<mml:math display="inline" id="im7">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im8">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are the evaluation-network parameter and the target-network parameter, respectively.</p>
<p>To avoid overestimation of action values in the training process, the double-deep Q network (DDQN) (<xref ref-type="bibr" rid="B31">31</xref>) was proposed to decouple the selection of target Q-value actions and the calculation of target Q-value. The updated action-value function for DDQN algorithm is as follows:</p>
<disp-formula>
<label>(7)</label>
<mml:math display="block" id="M7">
<mml:mrow>
<mml:msubsup>
<mml:mi>Q</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>Q</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>a</mml:mi>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In this study, we developed MOAPN based on the network architecture of DDQN. Additionally, the experience replay and fix Q target techniques were used in MOAPN to stabilize the training process.</p>
</sec>
<sec id="s2_5_2">
<label>2.5.2</label>
<title>States and actions</title>
<p>In the MOAPN, the state was defined as input data, which was the DVH matrix (850&#xd7;12) of an optimized plan for each set of objectives, including PTV, all ring structures, lung, heart, spinal cord, esophagus, bronchus and chest wall. To shorten the training process and increase the robustness of network, we used the dose prediction module to obtain patient-specific initial optimization objectives. The MOAPN consisted of five similar DDQN networks controlled by five agents and with independent input data and memory buffers, as shown in <xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2</bold>
</xref>. During optimization, four input parameters were used for each optimization objective: an optimization weight factor, a 2D position goal on the DVH graph, and Boolean variable describing the direction of the constraint, as described in Section 2.3. In this model, the optimization weight factors of all structures were maintained at fixed values of 300 and 150 for the target objectives and other structures&#x2019; objectives, respectively. This indicates that more attention was paid to the target than other structures in planning optimization. The objective values of five ring structures obtained from the dose-volume goal on the DVH graph were adjusted to achieve desired dose gradient outside the target. Because the dose of OARs was affected by the adjustment of the ring dose, the dose constraints of all OARs were not adjusted in the optimization. When the ring dose reached the established requirements, the dose of OARs could meet the clinical requirements. Input data for each DDQN network were included in three columns corresponding to the DVHs of PTV, body and ring structure. In addition, we defined four possible adjustment actions for each objective: (a) 0-action: decrease the objective by 10%; (b) 1-action: decrease the objective by 3%; (c) 2-action: keep the objective unchanged; (d) 3-action: increase the objective by 5%. The network output was Q value matrix for all actions.</p>
<fig id="f2" position="float">
<label>Figure&#xa0;2</label>
<caption>
<p>The network architecture of the MOAPN.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-13-1124458-g002.tif"/>
</fig>
</sec>
<sec id="s2_5_3">
<label>2.5.3</label>
<title>Reward function</title>
<p>To evaluate the plan quality, we used a plan quality scoring system Eva(s). The Eva(s) scoring system was derived from the Plan IQ evaluation system (Sun Nuclear, Melbourne, FL), which included partial functions considered applicable to our study. The scoring system consisted of a set of criteria for evaluating plan quality based on target coverage, dose gradient and spring of OARs. Following the UK 2022 consensus (<xref ref-type="bibr" rid="B32">32</xref>) and RTOG 0915 report (<xref ref-type="bibr" rid="B22">22</xref>), the Eva(s) gave stricter criteria to achieve better plan quality. For the <italic>D</italic>
<sub>95%</sub> of PTV, the scoring criterion was defined as a piecewise linear function. No penalty occurred when the prescription dose was met. Once the actual dose was below the prescription dose, this criterion score rapidly decayed. Other scoring criteria were defined as once-linear functions to reduce the surrounding OARs dose as much as possible while maintaining the clinical prescription dose. The components of the Eva(s) are shown in <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref>.</p>
<table-wrap id="T1" position="float">
<label>Table&#xa0;1</label>
<caption>
<p>Plan quality metrics and assigned scores for assessing automatic plan quality.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="middle" align="center">Structure</th>
<th valign="middle" align="center">Metric</th>
<th valign="middle" align="center">Scoring Criterion</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="middle" align="center">PTV</td>
<td valign="middle" align="center">D<sub>95%</sub> (Gy)</td>
<td valign="middle" align="center">
<disp-formula>
<mml:math display="block" id="im9">
<mml:mrow>
<mml:mtext>score</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>100</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2009;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mn>95</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>50</mml:mn>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>100</mml:mn>
<mml:mo>*</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mn>95</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>48.5</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">/</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>50</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>48.5</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mn>95</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mo>&lt;</mml:mo>
<mml:mn>50</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow> </mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
</td>
</tr>
<tr>
<td valign="middle" align="center">PTV</td>
<td valign="middle" align="center">CI</td>
<td valign="middle" align="center">score=100*(CI&#x2212;0.75)/(1&#x2212;0.75)</td>
</tr>
<tr>
<td valign="middle" align="center">PTV</td>
<td valign="middle" align="center">GI</td>
<td valign="middle" align="center">score=100*(GI&#x2212;3.75)/(3&#x2212;3.75)</td>
</tr>
<tr>
<td valign="middle" align="center">Ring1</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=100*(<italic>D</italic>
<sub>
<italic>r</italic>
<italic>e</italic>
<italic>f</italic>
</sub>&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/500</td>
</tr>
<tr>
<td valign="middle" align="center">Ring2</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=100*(<italic>D</italic>
<sub>
<italic>r</italic>
<italic>e</italic>
<italic>f</italic>
</sub>&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/500</td>
</tr>
<tr>
<td valign="middle" align="center">Ring3</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=100*(<italic>D</italic>
<sub>
<italic>r</italic>
<italic>e</italic>
<italic>f</italic>
</sub>&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/500</td>
</tr>
<tr>
<td valign="middle" align="center">Ring4</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=100*(<italic>D</italic>
<sub>
<italic>r</italic>
<italic>e</italic>
<italic>f</italic>
</sub>&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/500</td>
</tr>
<tr>
<td valign="middle" align="center">Ring5</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=100*(<italic>D</italic>
<sub>
<italic>r</italic>
<italic>e</italic>
<italic>f</italic>
</sub>&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/500</td>
</tr>
<tr>
<td valign="middle" align="center">Chest wall</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=50*(50&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/(50&#x2212;20)</td>
</tr>
<tr>
<td valign="middle" align="center">Spinal cord</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=50*(15&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/15</td>
</tr>
<tr>
<td valign="middle" align="center">Heart</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=50*(15&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/15</td>
</tr>
<tr>
<td valign="middle" align="center">Lung (AS)</td>
<td valign="middle" align="center">V<sub>5</sub> (%)</td>
<td valign="middle" align="center">score=50*(35&#x2212;<italic>V</italic>
<sub>5</sub>)/35</td>
</tr>
<tr>
<td valign="middle" align="center">Bronchus</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=50*(15&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/15</td>
</tr>
<tr>
<td valign="middle" align="center">Esophagus</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">score=50*(15&#x2212;<italic>D</italic>
<sub>
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
</sub>)/15</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>D<sub>max</sub>, maximum dose; D<sub>ref</sub>, reference dose; D<sub>x%</sub>, absolute dose received by x% of the volume; V<sub>x</sub>, relative volume receiving &gt; x Gy absolute dose; CI, conformity index; CI, (TV<sub>RI</sub>/TV)&#xd7;(TV<sub>RI</sub>/V<sub>RI</sub>). TV<sub>RI</sub> is the target volume covered by the prescription dose, TV the target volume, V<sub>RI</sub> the volume covered by the prescription dose; GI, gradient index, GI = V<sub>50%</sub>/V<sub>p</sub>, V<sub>50%</sub> is the volume covered by the 50% of the prescription dose, V<sub>p</sub> is the volume covered by the prescription dose; Lung (AS), lung (affected side).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>With the scoring system Eva(s), the reward function in MOAPN can quantify the plan quality change caused by objectives adjustment. The reward function is defined as follows:</p>
<disp-formula>
<label>(8)</label>
<mml:math display="block" id="M8">
<mml:mrow>
<mml:mtext>r</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>200</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mn>95</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>T</mml:mi>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&lt;</mml:mo>
<mml:mn>95</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>&#xd7;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>50</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mi>i</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mi>i</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:msup>
<mml:mi>g</mml:mi>
<mml:mi>i</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>20</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>E</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&lt;</mml:mo>
<mml:mi>E</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>E</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>E</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>20</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>E</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&gt;</mml:mo>
<mml:mi>E</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>This reward function assigned a positive reward when the plan score increased at the next step. In contrast, a negative reward was given when the plan score decreased. We awarded a large penalty value to avoid adjusting objectives in the wrong direction. In addition, when the plan score was unchanged, we assigned a smaller positive reward to encourage MOAPN to maintain the current state to avoid non-convergence caused by endless exploration for a better state. In the training process, we allowed MOAPN to search for a better state by sacrificing a certain amount of target coverage. A decay factor was defined to penalize plan score when the target did not meet the prescription dose. When the target coverage was below the minimum limitation, we assigned a bigger negative reward and stopped the optimization.</p>
</sec>
<sec id="s2_5_4">
<label>2.5.4</label>
<title>Prioritized experience replays</title>
<p>In the training process, the states <italic>s<sub>t</sub>
</italic> and <italic>s<sub>t</sub>
</italic>
<sub>+1</sub>, the chosen action <italic>a<sub>t</sub>
</italic>, and the reward <italic>r<sub>t</sub>
</italic> were stored as a transition (<italic>s<sub>t</sub>
</italic>, <italic>a<sub>t</sub>
</italic>, <italic>r<sub>t</sub>
</italic>, <italic>s<sub>t</sub>
</italic>
<sub>+1</sub>) in the experience replay pool. The temporal difference error (TD-error) which represented the difference in Q value between the evaluation-network and target-network was used to update network parameters in MOAPN. Based on the TD error, the Sum-Tree structure was introduced to give sampling priority to each transition stored in the experience replay pool (<xref ref-type="bibr" rid="B33">33</xref>), which was constantly updated for each transition during training. When sampling from the experience replay pool, samples with larger TD-error were easier to be sampled.</p>
</sec>
<sec id="s2_5_5">
<label>2.5.5</label>
<title>Training strategy</title>
<p>The MOAPN was trained on 11 patient cases selected from the total study cohort. All training cases included the patient-specific initial optimization objectives by dose prediction module, as shown in <xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>. For each training case, five DDQN networks independently performed three processes: objectives adjustment, transition storage and sample, and network training. After all objectives were adjusted, an optimization process was performed. In each step, the &#x3f5; greedy algorithm was used for MOAPN to select actions. Specifically, each DDQN network randomly selected its own action among all possible actions with a probability of &#x3f5;. Otherwise, the optimal actions that achieved the highest output value were selected with a probability of 1 - &#x3f5;. The probability &#x3f5; was started at 0.95 and decayed at a rate of 0.95/episode along the training process, with a minimum of 0.1. Considering that there were five objectives to adjust, five independent experience replay pools each with maximum capacity of 4096 were generated to store transitions (<italic>S<sub>t</sub>
</italic>, <italic>a<sub>t</sub>
</italic>, <italic>r<sub>t</sub>
</italic>, <italic>S<sub>t</sub>
</italic>
<sub>+1</sub>) based on the Sum-Tree structure. In each iteration, 20 training samples were selected to update the parameters <italic>&#x3b8;<sup>eva</sup>
</italic> of evaluation-network. The parameters <italic>&#x3b8;<sup>tar</sup>
</italic> of target-network were replaced with those of evaluation-network after every 80 steps. The processes of adjustment, optimization and training were repeated for each training case until either of the following termination criterion was met: (1) a maximum number of adjustment steps of 20 was reached; (2) the target coverage reached the minimum limitation, i.e. <italic>D</italic>
<sub>95%</sub> (<italic>PTV</italic>)&lt; 95% &#xd7;<italic>D<sub>p</sub>
</italic>
</p>
<table-wrap id="T2" position="float">
<label>Table&#xa0;2</label>
<caption>
<p>The initial patient-specific objectives of two representative cases based on the 3D dose prediction module.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="middle" rowspan="2" align="center">Structure</th>
<th valign="middle" rowspan="2" align="center">Objective type</th>
<th valign="middle" rowspan="2" align="center">Parameter a/Volume (%)</th>
<th valign="middle" colspan="2" align="center">Patient 1</th>
<th valign="middle" colspan="2" align="center">Patient 2</th>
</tr>
<tr>
<th valign="middle" align="center">Dose (Gy)</th>
<th valign="middle" align="center">Weight</th>
<th valign="middle" align="center">Dose (Gy)</th>
<th valign="middle" align="center">Weight</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="middle" align="center">PTV</td>
<td valign="middle" align="center">Point (lower)</td>
<td valign="middle" align="center">Volume=100</td>
<td valign="middle" align="center">52.0</td>
<td valign="middle" align="center">300</td>
<td valign="middle" align="center">52</td>
<td valign="middle" align="center">300</td>
</tr>
<tr>
<td valign="middle" align="center">PTV</td>
<td valign="middle" align="center">Point (upper)</td>
<td valign="middle" align="center">Volume=0</td>
<td valign="middle" align="center">67.5</td>
<td valign="middle" align="center">300</td>
<td valign="middle" align="center">67.5</td>
<td valign="middle" align="center">300</td>
</tr>
<tr>
<td valign="middle" align="center">Ring1</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">51.4</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">52.2</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Ring2</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">35.4</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">34.9</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Ring3</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">23.4</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">21.1</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Ring4</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">18.2</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">14.0</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Ring5</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">15.2</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">8.5</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Chest wall</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">41.1</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">37.1</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Spinal cord</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">6.8</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">6.5</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Esophagus</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">10.2</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">7.3</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Heart</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">14.9</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">24.1</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Lung(AS)</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 1</td>
<td valign="middle" align="center">8.6</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">5.5</td>
<td valign="middle" align="center">150</td>
</tr>
<tr>
<td valign="middle" align="center">Bronchus</td>
<td valign="middle" align="center">gEUD (upper)</td>
<td valign="middle" align="center">a = 40</td>
<td valign="middle" align="center">12.8</td>
<td valign="middle" align="center">150</td>
<td valign="middle" align="center">4.0</td>
<td valign="middle" align="center">150</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Point (lower), the type of optimization objective function is dose-volume objective (minimum constraint); Point (upper), the type of optimization objective function is dose-volume objective (maximum constraint); gEUD (upper), the type of optimization objective function is gEUD objective (maximum constraint); Lung (AS), lung (affected side).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>All the computations were performed using Python on a Varian desktop workstation with 16 Intel Xeon Silver 4110 CPU processors, 32GB memory and 2 NVIDIA Quadro P5000 GPU cards. Based on the Python-ESAPI, the Pycharm software was used to interact with TPS and debug all scripts.</p>
</sec>
<sec id="s2_5_6">
<label>2.5.6</label>
<title>Evaluation</title>
<p>MOAPN was evaluated on a total of 27 cases were selected from the study cohort. The prescription dose for all cases was 50 Gy/5 fractions. Based on the dose prediction module, the optimization for each case started with patient-specific initial optimization objectives. The trained MOAPN was used for objectives adjustment and optimization. The iteration was terminated if the target did not meet the prescription dose, or the maximum number of 25 adjustment steps was reached. Wilcoxon signed rank test was used to investigate the significance of the difference in the final plans versus the initial plans and the clinical plans, with a statistically significance threshold set at P value&lt; 0.05.</p>
</sec>
</sec>
</sec>
<sec id="s3" sec-type="results">
<label>3</label>
<title>Results</title>
<p>In this study, the MOAPN training for lung SBRT cases was successfully performed, and the training time was about 3 days. The automatic treatment plan for each tested case was generated by MOAPN. The completed planning optimization process took about 5-6&#xa0;min.</p>
<p>For all training cases, the relationship between the total training steps and the cumulative reward value is shown in <xref ref-type="fig" rid="f3">
<bold>Figure&#xa0;3</bold>
</xref>. The effectiveness of the proposed MOAPN framework was indicated by the increasing trend in rewards along the training steps. In the original curve, the cumulative reward value was found to fluctuated slightly. This phenomenon reflected that MOAPN used the trial-and-error approach to explore the optimal action-value policy. In the fitted curve, the cumulative reward per 4000 iterations increased by 65% and 67%, which indicated that MOAPN gradually learned objectives adjustment policy to improve the plan quality.</p>
<fig id="f3" position="float">
<label>Figure&#xa0;3</label>
<caption>
<p>Correlation between the total training iterations and the cumulative reward value.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-13-1124458-g003.tif"/>
</fig>
<p>
<xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4</bold>
</xref> shows the actual dose result of the ring structures of all tested cases in adjustment steps. The average number of adjustment steps for all tested cases was 21 &#xb1; 5.9. For all ring structures, the median, extremum and quartile of the average <italic>D<sub>max</sub>
</italic> decreased significantly before 13 adjustment steps. After the 13th adjustment step, the median <italic>D<sub>max</sub>
</italic> of all ring structures tended to converge. Therefore, a new case was able to complete optimization with about 13 adjustment steps by the trained MOAPN.</p>
<fig id="f4" position="float">
<label>Figure&#xa0;4</label>
<caption>
<p>Box plots showing the relationship between the adjustment steps and the maximum dose of Ring1, Ring2, Ring3, Ring4, Ring5 and D2cm in all tested cases. The symbol * indicates some outliers present in the statistical values.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-13-1124458-g004.tif"/>
</fig>
<p>A summary of the quantitative comparison of MOAPN final plans, MOAPN initial plans and clinical plans for the OARs sparing is presented in <xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>. Compared with the MOAPN initial plans, the actual dose of the listed OARs in the MOAPN final plans decreased significantly except for the esophagus (p &gt; 0.05). Among these, the average <italic>D<sub>max</sub>
</italic>for the chest wall and the average for the affected side lung were reduced by 14.5% and 16.7%, respectively. However, the dose result of various OARs in the MOAPN final plans was similar to those in the clinical plans, e.g. spinal cord, heart, affected side lung, esophagus, bronchus and D2cm structure. Among these, the average <italic>D<sub>max</sub>
</italic> for spinal cord, esophagus, bronchus and D2cm structure showed no significant difference (p &gt; 0.05).</p>
<table-wrap id="T3" position="float">
<label>Table&#xa0;3</label>
<caption>
<p>Dosimetric parameters comparison between MOAPN final plans versus MOAPN initial plans and clinical plans (mean &#xb1; standard deviation).</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="middle" align="center">Structure</th>
<th valign="middle" align="center">Parameter</th>
<th valign="middle" align="center">Final plans</th>
<th valign="middle" align="center">Initial plans</th>
<th valign="middle" align="center">P value</th>
<th valign="middle" align="center">Clinical plans</th>
<th valign="middle" align="center">P value</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="middle" align="center">Chest wall</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">37.02 &#xb1; 11.44</td>
<td valign="middle" align="center">43.33 &#xb1; 9.27</td>
<td valign="middle" align="center">&lt;0.001</td>
<td valign="middle" align="center">42.50 &#xb1; 9.44</td>
<td valign="middle" align="center">&lt;0.001</td>
</tr>
<tr>
<td valign="middle" align="center">Spinal cord</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">8.05 &#xb1; 3.94</td>
<td valign="middle" align="center">9.11 &#xb1; 5.43</td>
<td valign="middle" align="center">0.044</td>
<td valign="middle" align="center">8.06 &#xb1; 5.46</td>
<td valign="middle" align="center">0.885</td>
</tr>
<tr>
<td valign="middle" align="center">Heart</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">9.24 &#xb1; 6.97</td>
<td valign="middle" align="center">9.70 &#xb1; 6.89</td>
<td valign="middle" align="center">0.01</td>
<td valign="middle" align="center">9.91 &#xb1; 8.01</td>
<td valign="middle" align="center">0.031</td>
</tr>
<tr>
<td valign="middle" align="center">Lung (AS)</td>
<td valign="middle" align="center">
<sub>5</sub> (%)</td>
<td valign="middle" align="center">24.51 &#xb1; 9.31</td>
<td valign="middle" align="center">29.45 &#xb1; 9.08</td>
<td valign="middle" align="center">&lt;0.001</td>
<td valign="middle" align="center">25.26 &#xb1; 7.87</td>
<td valign="middle" align="center">0.037</td>
</tr>
<tr>
<td valign="middle" align="center">Esophagus</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">8.21 &#xb1; 3.86</td>
<td valign="middle" align="center">8.37 &#xb1; 3.78</td>
<td valign="middle" align="center">0.683</td>
<td valign="middle" align="center">7.26 &#xb1; 3.77</td>
<td valign="middle" align="center">0.064</td>
</tr>
<tr>
<td valign="middle" align="center">Bronchus</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">10.70 &#xb1; 5.64</td>
<td valign="middle" align="center">11.60 &#xb1; 5.61</td>
<td valign="middle" align="center">0.017</td>
<td valign="middle" align="center">10.84 &#xb1; 5.59</td>
<td valign="middle" align="center">0.904</td>
</tr>
<tr>
<td valign="middle" align="center">D2cm_PTV</td>
<td valign="middle" align="center">D<sub>max</sub> (Gy)</td>
<td valign="middle" align="center">22.14 &#xb1; 2.29</td>
<td valign="middle" align="center">26.47 &#xb1; 1.23</td>
<td valign="middle" align="center">&lt;0.001</td>
<td valign="middle" align="center">22.25 &#xb1; 2.18</td>
<td valign="middle" align="center">0.597</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Lung (AS), lung (affected side); D<sub>max</sub>, maximum dose; V<sub>x</sub>, relative volume receiving &gt; x Gy absolute dose; P value, The P values in the first and second column represent the statistical results of the final plans versus the initial plans and the clinical plan, respectively.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>To better demonstrate the effectiveness of MOAPN, the result of one representative case is shown in <xref ref-type="fig" rid="f5">
<bold>Figures&#xa0;5</bold>
</xref>, <xref ref-type="fig" rid="f6">
<bold>6</bold>
</xref>. In the first 10 adjustment steps, the 0-action and 1-action were frequently selected for five ring structures to decrease the corresponding objectives significantly or slightly, respectively. After 15 adjustment steps, the 2-action was selected for each ring structure to keep the objectives unchanged. In addition, the Ring2 selected the 2-action 23 times in overall adjustment steps, indicating that the result of dose prediction module had basically reached the desired objective of MOAPN. The advantages of MOAPN can also be observed visually through the planning score in overall adjustment steps, as shown in the <xref ref-type="fig" rid="f6">
<bold>Figure&#xa0;6</bold>
</xref>. The average plan score increased from 18.3 to 809.1 and converged after 15 adjustment steps.</p>
<fig id="f5" position="float">
<label>Figure&#xa0;5</label>
<caption>
<p>The selected action values of Ring1, Ring2, Ring3, Ring4 and Ring5 in the adjustment steps for one representative case.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-13-1124458-g005.tif"/>
</fig>
<fig id="f6" position="float">
<label>Figure&#xa0;6</label>
<caption>
<p>The obtained plan score of one representative case in the MOAPN adjustment steps.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-13-1124458-g006.tif"/>
</fig>
<p>To better understand plan quality changes, the DVH curves of all structures for the representative case between the 0th, 4th, 8th, 15th and 25th adjustment steps are compared in <xref ref-type="fig" rid="f7">
<bold>Figure&#xa0;7</bold>
</xref>. Obviously, the significant improvement in the DVH curves can be observed in five ring structures. Affected by adjusting the ring structure objectives, the dose of OAR decreased to varying degrees. In addition, we found that the 15th DVH curve and 25th DVH curve of various structures basically coincided, indicating that the representative case had converged and tended to the optimal plan at the 15th adjustment step. The comparison of the isodose distribution for the representative case between the 0th, 4th, 8th, 12th, 16th and 25th adjustment steps is shown in <xref ref-type="fig" rid="f8">
<bold>Figure&#xa0;8</bold>
</xref>. Before the 12th adjustment step, it can be visually observed that the isodose lines tightened continuously and the value of the maximum dose point of the transverse slice kept increasing. It should be noted that the comparison results of isodose distribution in other tested cases were also similar.</p>
<fig id="f7" position="float">
<label>Figure&#xa0;7</label>
<caption>
<p>DVH curves of one representative case between 0th, 4th, 8th, 15th and 25th adjustment steps compared from ring structures and OARs.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-13-1124458-g007.tif"/>
</fig>
<fig id="f8" position="float">
<label>Figure&#xa0;8</label>
<caption>
<p>Comparison of the transverse isodose distributions between 0th, 4th, 8th, 12th, 16th and 25th adjustment steps for one representative case.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-13-1124458-g008.tif"/>
</fig>
</sec>
<sec id="s4" sec-type="discussion">
<label>4</label>
<title>Discussion</title>
<p>Based on ESAPI, this study developed an integrated solution for automatic treatment planning. First, a 3D dose prediction module was built to obtain patient-specific initial optimization objectives. With the help of a multi-agent DRL scheme, the MOAPN was developed and trained to operate an optimization engine to generate high-quality plans. The clinical feasibility of automatic optimization of radiotherapy plans based on the Eclipse TPS was demonstrated. The MOAPN generated an action-value policy similar to the physician&#x2019;s iterative operation during the planning optimization. With this integrated solution, a more efficient overall workflow of treatment planning was achieved.</p>
<p>Numerous studies have reported RL-based automatic planning for different tumor sites. Hrinivich et&#xa0;al. proposed a RL VMAT algorithm in a in-house developed TPS, which can learn a machine control policy using previous cases geometry (<xref ref-type="bibr" rid="B17">17</xref>). This policy was used for new cases to rapidly optimize treatment plans and generate sequences of deliverable machine parameters without adjusting optimization objectives. Shen et&#xa0;al. constructed a hierarchical virtual treatment planner network consisting of structure-net, parameter-net and action-net (<xref ref-type="bibr" rid="B34">34</xref>). This network selected the structure and parameter to adjust and determined the specific adjustment sequence, similar to the behavior of a planner. Compared with in-house automatic planning strategy, our approach used the commercial TPS optimization engine and algorithms to adjust the optimization parameters and generate dose distribution. To our knowledge, this is the first RL implementation for automatic IMRT planning <italic>via</italic> ESAPI. The main limitation is that the Eclipse&#x2019;s optimization algorithm is treated as a black box in iteration process. Hrinivich&#x2019;s approach used the uniform objective map, which cannot optimally reflect the variation of cases. With the objective values met, further improvement was not needed. Based on dose prediction result, our approach further searched patient-specific optimization parameters, which helped achieve a better plan quality. In addition, Shen&#x2019;s hierarchical DRL network shows a good application in parameter decision-making process when the number of optimization parameters is sufficiently large. We aim to further enhance the hierarchical framework of MOAPN to improve its application to complex clinical situations.</p>
<p>The anatomical geometry of the tumor and body varies from case to case, and this difference can exert various effects on the final dose distribution. Unreasonable objectives will enlarge the parameter space explored, resulting in a time-consuming trial-and-error process in optimization. Therefore, it is necessary to obtain a set of patient-specific optimization parameters for each patient case (5). In our study, the dose prediction module was summarized into two steps: predicting 3D dose distribution by training and generating the two-dimensional dose-volume parameters. However, a major limitation of this KBP approach is that the results are strongly dependent on the training database used. Higher performance of the dose prediction model can be achieved with a sufficiently large, high-quality planning database. If the ground truth doses are suboptimal, the predicted doses will also be suboptimal. Moreover, the dose prediction model need to consider the unique clinical characteristics of each patient case, including PTV size, which shows significant variability and spatial complexity of neighboring anatomy. However, the KBP approach has region-specific features, as it is typically used with independent databases and models in different treatment centers to improve the accuracy of the prediction result. This limits the universal applicability of effective planning models in different regions. In our study, the integrated solution combines the respective advantages of the dose prediction module and DRL method. This precludes the need for optimal prediction results. The 3D dose prediction module is utilized to obtain the initial patient-specific initial optimization objectives. In addition, with this module, uniform optimization objectives are not necessary, which minimizes the parameter search space and optimizes computational resources. When the predicted results are ideal, MOAPN can complete the planning optimization process in a few iterations. Even with an inaccurate predicted dose, the action-value policy formed by MOAPN has the ability to adjust objectives and gradually generate a satisfactory plan. Therefore, this integrated solution has a wide application potential in different regions and treatment centers. In addition, the patient-specific initial optimization objectives can not only better reflect the variation of cases, but also contribute to faster convergence of MOAPN in training process and improve the overall work efficiency.</p>
<p>This study focused on a problem of lung SBRT automatic treatment planning for a less complex case with moderate number of OARs. Five dose-limiting ring structures were generated using a prewritten script and used to control the dose gradient outside the target to an acceptable level without adjusting the dose constraints of all OARs. In <xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>, the experimental result shows that when the objectives of ring structures are adjusted, the dose of OARs can be controlled to clinically acceptable levels. Compared with the single-agent model, the MOAPN can provide independent training process for each objective and simultaneously adjust multiple objectives based on the current state. Thus, this method could improve the work efficiency and reduce the variation of plan quality in clinical application. As this method is established using the ESAPI module, it can be readily integrated into any Eclipse TPS (v. &#x2265; 15.6) as a plug-in application. Script-based automatic approaches have been proven to have great potential for reducing workloads in clinical radiotherapy (<xref ref-type="bibr" rid="B35">35</xref>). In addition, it is very flexible and can be modified for various prescriptions, tumor sites (e.g. rectum and head-and-neck), and OARs. In a follow-up study, we aim to develop an automatic planning assistance module for TPS using ESAPI. Planners can directly use the automatic planning design program by inputting the relevant parameters in the prompt box, without fully understanding the code architecture of this method, providing more convenience for clinical work.</p>
<p>However, the current study still has several limitations. First, to demonstrate the feasibility of the proposed method, we constructed a simple plan quality scoring system and reward function. As these may not fully reflect the clinical criteria for plan quality evaluation, it is necessary to add more clinical criteria to the reward function. In addition, this study only illustrated the feasibility of the MOAPN approach in only a few cases. A large number of cases are required to find and seal loopholes in its clinical implementation and improve robustness in the handing of clinical cases. The quality of automatic planning generated by MOAPN may also not achieve clinical acceptability by physicians. Second, the MOAPN only had four action options for adjusting the objectives, which may affect the ability of adjustment to some degree. To provide diversified adjustment steps, we aim to improve MOAPN in the future to implement the continuous action control based on the current state (<xref ref-type="bibr" rid="B36">36</xref>). Third, the input feature of MOAPN only was based on DVH matrix since DVH is a concise representation of plan quality. However, it is well-known that DVH cannot capture spatial location information such as hot/cold spots, which is critical to physician decisions. We expect that the 3D dose information will be used to handle more complicated tasks in the future. Fourth, this study is a new attempt at automatic planning. However, the current application scope of the automatic plans is limited to peripheral NSCLC cases. Based on the original model, we have introduced more tumor sites data for training to obtain a more general model in the future. Nevertheless, it must be acknowledged that these improvements inevitably lead to a larger MOAPN architecture and higher computational burden, such as when training with 3D dose information, adding more adjustable objectives, and handling more complex plans. We need to upgrade hardware devices and software to improve the efficiency of algorithm execution. Although our method shows good performance and great potential in the field of automatic planning, it does not completely replace manual intervention, especially for some complex cases. If the planners are not satisfied with the results of the automatic planning, the obtained plan can be an intermediate step for further manual intervention, which can accelerate the trial-and-error process.</p>
</sec>
<sec id="s5" sec-type="conclusion">
<label>5</label>
<title>Conclusion</title>
<p>With the help of ESAPI, we proposed an effective and efficient integrated solution for automatic treatment planning. It first includes the dose prediction module for obtaining patient-specific initial optimization objectives. We demonstrated that the trained MOAPN can mimic the operations of the physicians during optimization and adjust multiple objectives to obtain a high-quality plan in Eclipse TPS. The quality of automatic plans created by MOAPN shows progressive improvement during the adjustment step and is close to that of the clinical plan. Moreover, this integrated solution contributes to improving the efficiency of the overall planning workflow and reducing the variation of plan quality in clinical practice. In conclusion, the proposed integrated solution is a promising practical and effective approach for automatic planning in commercial TPS.</p>
</sec>
<sec id="s6" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary Material</bold>
</xref>, further inquiries can be directed to the corresponding author/s.</p>
</sec>
<sec id="s7" sec-type="author-contributions">
<title>Author contributions</title>
<p>HW is the first authors of this paper. HW contributed to every part of the whole research, including collecting clinical data sets, designing methodology, writing script codes and writing the manuscript et&#xa0;al. For the other authors, XB was responsible for the dose prediction module. YW and YL were responsible for providing clinical assistance and manual planning. HW and XB were responsible for reviewing data analysis result. XB and BW contributed to reviewing the manuscript. HW was responsible for the revision and improvement of the manuscript.</p>
</sec>
</body>
<back>
<sec id="s8" sec-type="funding-information">
<title>Funding</title>
<p>This work was supported by National Natural Science Foundation of China (12005190).</p>
</sec>
<ack>
<title>Acknowledgments</title>
<p>The authors thank all physicists at Zhejiang Cancer Hospital for their assistance. The authors thank Varian Medical Systems for providing an open research environment for all researchers through their treatment planning system and declare that this company was not involved in the study design, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication. In addition, the authors would like to thank the suggestions from the anonymous reviewers and editors, which could drastically improve the quality of this paper.</p>
</ack>
<sec id="s9" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s10" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s11" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fonc.2023.1124458/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fonc.2023.1124458/full#supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet_1.xlsx" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet">
<label>Date sheet&#xa0;1</label>
<caption>
<p>In this document, the dosimetric parameters for all tested cases are recorded between the MOAPN final plans, the clinical plans, and the MOAPN initial plans.</p>
</caption>
</supplementary-material>
<supplementary-material xlink:href="DataSheet_2.xlsx" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"><label>Date sheet&#xa0;2</label>
<caption>
<p>In this file is recorded that the actual dose of six ring structures (Ring1, Ring2, Ring3, Ring4, Ring5, D2cm) changes during the overall adjustment steps..</p>
</caption>
</supplementary-material>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<collab>Intensity Modulated Radiation Therapy Collaborative Working Group</collab>
</person-group>. <article-title>Intensity-modulated radiotherapy: Current status and issues of interest</article-title>. <source>Int J Radiat Oncol Biol Phys</source> (<year>2001</year>) <volume>51</volume>(<issue>4</issue>):<fpage>880</fpage>&#x2013;<lpage>914</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/s0360-3016(01)01749-7</pub-id>
</citation>
</ref>
<ref id="B2">
<label>2</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Webb</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>The physical basis of IMRT and inverse planning</article-title>. <source>Br J Radiol</source> (<year>2003</year>) <volume>76</volume>(<issue>910</issue>):<page-range>678&#x2013;89</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1259/bjr/65676879</pub-id>
</citation>
</ref>
<ref id="B3">
<label>3</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nelms</surname> <given-names>BE</given-names>
</name>
<name>
<surname>Robinson</surname> <given-names>G</given-names>
</name>
<name>
<surname>Markham</surname> <given-names>J</given-names>
</name>
<name>
<surname>Velasco</surname> <given-names>K</given-names>
</name>
<name>
<surname>Boyd</surname> <given-names>S</given-names>
</name>
<name>
<surname>Narayan</surname> <given-names>S</given-names>
</name>
<etal/>
</person-group>. <article-title>Variation in external beam treatment plan quality: An inter-institutional study of planners and planning systems</article-title>. <source>Pract Radiat Oncol</source> (<year>2012</year>) <volume>2</volume>(<issue>4</issue>):<fpage>296</fpage>&#x2013;<lpage>305</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.prro.2011.11.012</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hussein</surname> <given-names>M</given-names>
</name>
<name>
<surname>Heijmen</surname> <given-names>BJM</given-names>
</name>
<name>
<surname>Verellen</surname> <given-names>D</given-names>
</name>
<name>
<surname>Nisbet</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Automation in intensity modulated radiotherapy treatment planning-a review of recent innovations</article-title>. <source>Br J Radiol</source> (<year>2018</year>) <volume>91</volume>(<issue>1092</issue>):<elocation-id>20180270</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1259/bjr.20180270</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>H</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>R</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>J</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Yao</surname> <given-names>K</given-names>
</name>
<name>
<surname>Yue</surname> <given-names>H</given-names>
</name>
<etal/>
</person-group>. <article-title>Tree-based exploration of the optimization objectives for automatic cervical cancer IMRT treatment planning</article-title>. <source>Br J Radiol</source> (<year>2021</year>) <volume>94</volume>(<issue>1123</issue>):<elocation-id>20210214</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1259/bjr.20210214</pub-id>
</citation>
</ref>
<ref id="B6">
<label>6</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xing</surname> <given-names>L</given-names>
</name>
<name>
<surname>Li</surname> <given-names>JG</given-names>
</name>
<name>
<surname>Donaldson</surname> <given-names>S</given-names>
</name>
<name>
<surname>Le</surname> <given-names>QT</given-names>
</name>
<name>
<surname>Boyer</surname> <given-names>AL</given-names>
</name>
</person-group>. <article-title>Optimization of importance factors in inverse planning</article-title>. <source>Phys Med Biol</source> (<year>1999</year>) <volume>44</volume>(<issue>10</issue>):<page-range>2525&#x2013;36</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1088/0031-9155/44/10/311</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname> <given-names>X</given-names>
</name>
<name>
<surname>Li</surname> <given-names>X</given-names>
</name>
<name>
<surname>Quan</surname> <given-names>EM</given-names>
</name>
<name>
<surname>Pan</surname> <given-names>X</given-names>
</name>
<name>
<surname>Li</surname> <given-names>Y</given-names>
</name>
</person-group>. <article-title>A methodology for automatic intensity-modulated radiation treatment planning for lung cancer</article-title>. <source>Phys Med Biol</source> (<year>2011</year>) <volume>56</volume>(<issue>13</issue>):<page-range>3873&#x2013;93</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1088/0031-9155/56/13/009</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xhaferllari</surname> <given-names>I</given-names>
</name>
<name>
<surname>Wong</surname> <given-names>E</given-names>
</name>
<name>
<surname>Bzdusek</surname> <given-names>K</given-names>
</name>
<name>
<surname>Lock</surname> <given-names>M</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>J</given-names>
</name>
</person-group>. <article-title>Automated IMRT planning with regional optimization using planning scripts</article-title>. <source>J Appl Clin Med Phys</source> (<year>2013</year>) <volume>14</volume>(<issue>1</issue>):<elocation-id>4052</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1120/jacmp.v14i1.4052</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Shao</surname> <given-names>K</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>M</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Shan</surname> <given-names>G</given-names>
</name>
</person-group>. <article-title>Automatic planning for nasopharyngeal carcinoma based on progressive optimization in RayStation treatment planning system</article-title>. <source>Technol Cancer Res Treat</source> (<year>2020</year>) <volume>19</volume>(<issue>3</issue>):<elocation-id>153303382091571</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1177/1533033820915710</pub-id>
</citation>
</ref>
<ref id="B10">
<label>10</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tol</surname> <given-names>JP</given-names>
</name>
<name>
<surname>Delaney</surname> <given-names>AR</given-names>
</name>
<name>
<surname>Dahele</surname> <given-names>M</given-names>
</name>
<name>
<surname>Slotman</surname> <given-names>BJ</given-names>
</name>
<name>
<surname>Verbakel</surname> <given-names>WF</given-names>
</name>
</person-group>. <article-title>Evaluation of a knowledge-based planning solution for head and neck cancer</article-title>. <source>Int J Radiat Oncol Biol Phys</source> (<year>2015</year>) <volume>91</volume>(<issue>3</issue>):<page-range>612&#x2013;20</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.ijrobp.2014.11.014</pub-id>
</citation>
</ref>
<ref id="B11">
<label>11</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fogliata</surname> <given-names>A</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>PM</given-names>
</name>
<name>
<surname>Belosi</surname> <given-names>F</given-names>
</name>
<name>
<surname>Olivio</surname> <given-names>A</given-names>
</name>
<name>
<surname>Nicolini</surname> <given-names>G</given-names>
</name>
<name>
<surname>Vanetti</surname> <given-names>E</given-names>
</name>
<etal/>
</person-group>. <article-title>Assessment of a model based optimization engine for volumetric modulated arc therapy for patients with advanced hepatocellular cancer</article-title>. <source>Radiat Oncol</source> (<year>2014</year>) <volume>9</volume>:<elocation-id>236</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s13014-014-0236-0</pub-id>
</citation>
</ref>
<ref id="B12">
<label>12</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>C</given-names>
</name>
<name>
<surname>Zhu</surname> <given-names>X</given-names>
</name>
<name>
<surname>Hong</surname> <given-names>JC</given-names>
</name>
<name>
<surname>Zheng</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Artificial intelligence in radiotherapy treatment planning: Present and future</article-title>. <source>Technol Cancer Res Treat</source> (<year>2019</year>) <volume>18</volume>:<elocation-id>1533033819873922</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1177/1533033819873922</pub-id>
</citation>
</ref>
<ref id="B13">
<label>13</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nguyen</surname> <given-names>D</given-names>
</name>
<name>
<surname>Long</surname> <given-names>T</given-names>
</name>
<name>
<surname>Jia</surname> <given-names>X</given-names>
</name>
<name>
<surname>Lu</surname> <given-names>W</given-names>
</name>
<name>
<surname>Gu</surname> <given-names>X</given-names>
</name>
<name>
<surname>Iqbal</surname> <given-names>Z</given-names>
</name>
<etal/>
</person-group>. <article-title>A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning</article-title>. <source>Sci Rep</source> (<year>2019</year>) <volume>9</volume>(<issue>1</issue>):<fpage>1076</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/s41598-018-37741-x</pub-id>
</citation>
</ref>
<ref id="B14">
<label>14</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fan</surname> <given-names>J</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Hu</surname> <given-names>C</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Hu</surname> <given-names>W</given-names>
</name>
</person-group>. <article-title>Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique</article-title>. <source>Med Phys</source> (<year>2019</year>) <volume>46</volume>(<issue>1</issue>):<page-range>370&#x2013;81</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1002/mp.13271</pub-id>
</citation>
</ref>
<ref id="B15">
<label>15</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mnih</surname> <given-names>V</given-names>
</name>
<name>
<surname>Kavukcuoglu</surname> <given-names>K</given-names>
</name>
<name>
<surname>Silver</surname> <given-names>D</given-names>
</name>
<name>
<surname>Rusu</surname> <given-names>AA</given-names>
</name>
<name>
<surname>Veness</surname> <given-names>J</given-names>
</name>
<name>
<surname>Bellemare</surname> <given-names>MG</given-names>
</name>
<etal/>
</person-group>. <article-title>Human-level control through deep reinforcement learning</article-title>. <source>Nature</source> (<year>2015</year>) <volume>518</volume>(<issue>7540</issue>):<page-range>529&#x2013;33</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/nature14236</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Silver</surname> <given-names>D</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>A</given-names>
</name>
<name>
<surname>Maddison</surname> <given-names>CJ</given-names>
</name>
<name>
<surname>Guez</surname> <given-names>A</given-names>
</name>
<name>
<surname>Sifre</surname> <given-names>L</given-names>
</name>
<name>
<surname>van den Driessche</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>Mastering the game of go with deep neural networks and tree search</article-title>. <source>Nature</source> (<year>2016</year>) <volume>529</volume>(<issue>7587</issue>):<page-range>484&#x2013;9</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/nature16961</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hrinivich</surname> <given-names>WT</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>J</given-names>
</name>
</person-group>. <article-title>Artificial intelligence-based radiotherapy machine parameter optimization using reinforcement learning</article-title>. <source>Med Phys</source> (<year>2020</year>) <volume>47</volume>(<issue>12</issue>):<page-range>6140&#x2013;50</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1002/mp.14544</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname> <given-names>C</given-names>
</name>
<name>
<surname>Gonzalez</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Klages</surname> <given-names>P</given-names>
</name>
<name>
<surname>Qin</surname> <given-names>N</given-names>
</name>
<name>
<surname>Jung</surname> <given-names>H</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>L</given-names>
</name>
<etal/>
</person-group>. <article-title>Intelligent inverse treatment planning <italic>via</italic> deep reinforcement learning, a proof-of-principle study in high dose-rate brachytherapy for cervical cancer</article-title>. <source>Phys Med Biol</source> (<year>2019</year>) <volume>64</volume>(<issue>11</issue>):<fpage>115013</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1088/1361-6560/ab18bf</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pu</surname> <given-names>G</given-names>
</name>
<name>
<surname>Jiang</surname> <given-names>S</given-names>
</name>
<name>
<surname>Yang</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Hu</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>Z</given-names>
</name>
<etal/>
</person-group>. <article-title>Deep reinforcement learning for treatment planning in high-dose-rate cervical brachytherapy</article-title>. <source>Phys Med</source> (<year>2022</year>) <volume>94</volume>:<fpage>1</fpage>&#x2013;<lpage>7</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.ejmp.2021.12.009</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname> <given-names>C</given-names>
</name>
<name>
<surname>Nguyen</surname> <given-names>D</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>L</given-names>
</name>
<name>
<surname>Gonzalez</surname> <given-names>Y</given-names>
</name>
<name>
<surname>McBeth</surname> <given-names>R</given-names>
</name>
<name>
<surname>Qin</surname> <given-names>N</given-names>
</name>
<etal/>
</person-group>. <article-title>Operating a treatment planning system using a deep-reinforcement learning-based virtual treatment planner for prostate cancer intensity-modulated radiation therapy treatment planning</article-title>. <source>Med Phys</source> (<year>2020</year>) <volume>47</volume>(<issue>6</issue>):<page-range>2329&#x2013;36</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1002/mp.14114</pub-id>
</citation>
</ref>
<ref id="B21">
<label>21</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Duan</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Gan</surname> <given-names>W</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>H</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>H</given-names>
</name>
<name>
<surname>Gu</surname> <given-names>H</given-names>
</name>
<name>
<surname>Shao</surname> <given-names>Y</given-names>
</name>
<etal/>
</person-group>. <article-title>On the optimal number of dose-limiting shells in the SBRT auto-planning design for peripheral lung cancer</article-title>. <source>J Appl Clin Med Phys</source> (<year>2020</year>) <volume>21</volume>(<issue>9</issue>):<page-range>134&#x2013;42</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1002/acm2.12983</pub-id>
</citation>
</ref>
<ref id="B22">
<label>22</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Videtic</surname> <given-names>GM</given-names>
</name>
<name>
<surname>Hu</surname> <given-names>C</given-names>
</name>
<name>
<surname>Singh</surname> <given-names>AK</given-names>
</name>
<name>
<surname>Chang</surname> <given-names>JY</given-names>
</name>
<name>
<surname>Parker</surname> <given-names>W</given-names>
</name>
<name>
<surname>Olivier</surname> <given-names>K</given-names>
</name>
<etal/>
</person-group>. <article-title>Radiation therapy oncology group (RTOG) protocol 0915: A randomized phase 2 study comparing 2 stereotactic body radiation therapy (SBRT) schedules for medically inoperable patients with stage I peripheral non-small cell lung cancer</article-title>. <source>Int J Radiat Oncol Biol Phys</source> (<year>2013</year>) <volume>87</volume>(<issue>2</issue>):<fpage>S3</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.ijrobp.2013.06.016</pub-id>
</citation>
</ref>
<ref id="B23">
<label>23</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Binny</surname> <given-names>D</given-names>
</name>
<name>
<surname>Kairn</surname> <given-names>T</given-names>
</name>
<name>
<surname>Lancaster</surname> <given-names>CM</given-names>
</name>
<name>
<surname>Trapp</surname> <given-names>JV</given-names>
</name>
<name>
<surname>Crowe</surname> <given-names>SB</given-names>
</name>
</person-group>. <article-title>Photon optimizer (PO) vs progressive resolution optimizer (PRO): a conformality- and complexity-based comparison for intensity-modulated arc therapy plans</article-title>. <source>Med Dosim</source> (<year>2018</year>) <volume>43</volume>(<issue>3</issue>):<page-range>267&#x2013;75</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.meddos.2017.10.003</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Niemierko</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Reporting and analyzing dose distributions: a concept of equivalent uniform dose</article-title>. <source>Med Phys</source> (<year>1997</year>) <volume>24</volume>(<issue>1</issue>):<page-range>103&#x2013;10</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1118/1.598063</pub-id>
</citation>
</ref>
<ref id="B25">
<label>25</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname> <given-names>Q</given-names>
</name>
<name>
<surname>Djajaputra</surname> <given-names>D</given-names>
</name>
<name>
<surname>Wu</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Zhou</surname> <given-names>J</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>HH</given-names>
</name>
<name>
<surname>Mohan</surname> <given-names>R</given-names>
</name>
</person-group>. <article-title>Intensity-modulated radiotherapy optimization with gEUD-guided dose-volume objectives</article-title>. <source>Phys Med Biol</source> (<year>2003</year>) <volume>48</volume>(<issue>3</issue>):<page-range>279&#x2013;91</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1088/0031-9155/48/3/301</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname> <given-names>Q</given-names>
</name>
<name>
<surname>Mohan</surname> <given-names>R</given-names>
</name>
<name>
<surname>Niemierko</surname> <given-names>A</given-names>
</name>
<name>
<surname>Schmidt-Ullrich</surname> <given-names>R</given-names>
</name>
</person-group>. <article-title>Optimization of intensity-modulated radiotherapy plans based on the equivalent uniform dose</article-title>. <source>Int J Radiat Oncol Biol Phys</source> (<year>2002</year>) <volume>52</volume>(<issue>1</issue>):<page-range>224&#x2013;35</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/s0360-3016(01)02585-8</pub-id>
</citation>
</ref>
<ref id="B27">
<label>27</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bai</surname> <given-names>X</given-names>
</name>
<name>
<surname>Shan</surname> <given-names>G</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>M</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>B</given-names>
</name>
</person-group>. <article-title>Approach and assessment of automated stereotactic radiotherapy planning for early stage non-small-cell lung cancer</article-title>. <source>BioMed Eng Online</source> (<year>2019</year>) <volume>18</volume>(<issue>1</issue>):<fpage>101</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s12938-019-0721-7</pub-id>
</citation>
</ref>
<ref id="B28">
<label>28</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bai</surname> <given-names>X</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>B</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>S</given-names>
</name>
<name>
<surname>Xiang</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Hou</surname> <given-names>Q</given-names>
</name>
</person-group>. <article-title>Sharp loss: a new loss function for radiotherapy dose prediction based on fully convolutional networks</article-title>. <source>BioMed Eng Online</source> (<year>2021</year>) <volume>20</volume>(<issue>1</issue>):<fpage>101</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s12938-021-00937-w</pub-id>
</citation>
</ref>
<ref id="B29">
<label>29</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Watkins</surname> <given-names>CJ</given-names>
</name>
<name>
<surname>Dayan</surname> <given-names>P</given-names>
</name>
</person-group>. <article-title>Q-learning</article-title>. <source>Mach Learn</source> (<year>1992</year>) <volume>8</volume>:<page-range>279&#x2013;92</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/BF00992698</pub-id>
</citation>
</ref>
<ref id="B30">
<label>30</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sutton</surname> <given-names>R</given-names>
</name>
<name>
<surname>Barto</surname> <given-names>A</given-names>
</name>
</person-group>. <source>Reinforcement Learning: An introduction</source>. <publisher-name>MIT Press</publisher-name> (<year>1998</year>).</citation>
</ref>
<ref id="B31">
<label>31</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hasselt</surname> <given-names>HV</given-names>
</name>
<name>
<surname>Guez</surname> <given-names>A</given-names>
</name>
<name>
<surname>Silver</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Deep reinforcement learning with double q-learning</article-title>. <source>Comput Sci</source> (<year>2015</year>). doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.1509.06461</pub-id>
</citation>
</ref>
<ref id="B32">
<label>32</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Diez</surname> <given-names>P</given-names>
</name>
<name>
<surname>Hanna</surname> <given-names>GG</given-names>
</name>
<name>
<surname>Aitken</surname> <given-names>KL</given-names>
</name>
<name>
<surname>van An</surname> <given-names>N</given-names>
</name>
<name>
<surname>Carver</surname> <given-names>A</given-names>
</name>
<name>
<surname>Colaco</surname> <given-names>RJ</given-names>
</name>
<etal/>
</person-group>. <article-title>UK 2022 consensus on normal tissue dose-volume constraints for oligometastatic, primary lung and hepatocellular carcinoma stereotactic ablative radiotherapy</article-title>. <source>Clin Oncol (R Coll Radiol)</source> (<year>2022</year>) <volume>34</volume>(<issue>5</issue>):<fpage>288</fpage>&#x2013;<lpage>300</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.clon.2022.02.010</pub-id>
</citation>
</ref>
<ref id="B33">
<label>33</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schaul</surname> <given-names>T</given-names>
</name>
<name>
<surname>Quan</surname> <given-names>J</given-names>
</name>
<name>
<surname>Antonoglou</surname> <given-names>I</given-names>
</name>
<name>
<surname>Silver</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Prioritized experience replay</article-title>. <source>Comput Sci</source> (<year>2015</year>). doi:&#xa0;<pub-id pub-id-type="doi">10.48550/arXiv.1511.05952</pub-id>
</citation>
</ref>
<ref id="B34">
<label>34</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname> <given-names>C</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>L</given-names>
</name>
<name>
<surname>Jia</surname> <given-names>X</given-names>
</name>
</person-group>. <article-title>A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy</article-title>. <source>Phys Med Biol</source> (<year>2021</year>) <volume>66</volume>(<issue>13</issue>):<fpage>134002 (17pp)</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1088/1361-6560/ac09a2</pub-id>
</citation>
</ref>
<ref id="B35">
<label>35</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xia</surname> <given-names>X</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Li</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Peng</surname> <given-names>J</given-names>
</name>
<name>
<surname>Fan</surname> <given-names>J</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>J</given-names>
</name>
<etal/>
</person-group>. <article-title>An artificial intelligence-based full-process solution for radiotherapy: A proof of concept study on rectal cancer</article-title>. <source>Front Oncol</source> (<year>2021</year>) <volume>10</volume>:<elocation-id>616721</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.3389/fonc.2020.616721</pub-id>
</citation>
</ref>
<ref id="B36">
<label>36</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Merrick</surname> <given-names>K</given-names>
</name>
<name>
<surname>Jin</surname> <given-names>L</given-names>
</name>
<name>
<surname>Abass</surname> <given-names>HA</given-names>
</name>
</person-group>. <article-title>Hierarchical deep reinforcement learning for continuous action control</article-title>. <source>IEEE Trans Neural Netw Learn Syst</source> (<year>2018</year>) <volume>29</volume>(<issue>11</issue>):<page-range>5174&#x2013;84</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/TNNLS.2018.2805379</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>