<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="brief-report" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Energy Res.</journal-id>
<journal-title>Frontiers in Energy Research</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Energy Res.</abbrev-journal-title>
<issn pub-type="epub">2296-598X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">760525</article-id>
<article-id pub-id-type="doi">10.3389/fenrg.2021.760525</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Energy Research</subject>
<subj-group>
<subject>Brief Research Report</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Intelligent Frequency Control Strategy Based on Reinforcement Learning of Multi-Objective Collaborative Reward Function</article-title>
<alt-title alt-title-type="left-running-head">Zhang et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Intelligent Strategy for Frequency Control</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Lei</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1465582/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Xie</surname>
<given-names>Yumiao</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1432073/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Ye</surname>
<given-names>Jing</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Xue</surname>
<given-names>Tianliang</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cheng</surname>
<given-names>Jiangzhou</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Zhenhua</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Tao</given-names>
</name>
</contrib>
</contrib-group>
<aff>College of Electrical Engineering and New Energy, China Three Gorges University, <addr-line>Yichang</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1378126/overview">Zhenhao Tang</ext-link>, Northeast Electric Power University, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1418370/overview">Zhu Zhang</ext-link>, Hefei University of Technology, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1431864/overview">Yuanchao Hu</ext-link>, Shandong University of Technology, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Jing Ye, <email>x1620730050@163.com</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Smart Grids, a section of the journal Frontiers in Energy Research</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>30</day>
<month>09</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>9</volume>
<elocation-id>760525</elocation-id>
<history>
<date date-type="received">
<day>18</day>
<month>08</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>09</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Zhang, Xie, Ye, Xue, Cheng, Li and Zhang.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Zhang, Xie, Ye, Xue, Cheng, Li and Zhang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Large scale wind power integration into the power grid will pose a serious threat to the frequency control of power system. If only Control Performance Standard (CPS) index is used as the evaluation standard of frequency quality, it will easily lead to short-term centralized frequency crossing, which will affect the effect of intelligent Automatic Generation Control (AGC) on frequency quality. In order to solve this problem, a multi-objective collaborative reward function is constructed by introducing a collaborative evaluation mechanism with multiple evaluation indexes. In addition, Negotiated W-Learning strategy is proposed to globally optimize the solution of the objective function from multi dimensions, it avoids the poor learning efficiency of the traditional Greedy strategy. The AGC control model simulation of standard two area interconnected power grid shows that the proposed intelligent strategy can effectively improve the frequency control performance and improve the frequency quality of the system in the whole-time&#x20;scale.</p>
</abstract>
<kwd-group>
<kwd>wind power grid-connected</kwd>
<kwd>intelligent frequency control strategy</kwd>
<kwd>multi-dimensional frequency control performance standard</kwd>
<kwd>Negotiated W-Learning algorithm</kwd>
<kwd>global optimization</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Automatic Generation Control (AGC) is an important means to realize the balance of active power-load supply and demand in the power system. Among them, the quality of frequency control strategy is an important factor that affects the performance of AGC control (<xref ref-type="bibr" rid="B2">Alhelou et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B10">Shen et&#x20;al., 2021a</xref>; <xref ref-type="bibr" rid="B12">Shen and Raksincharoensak, 2021a</xref>). However, the control strategies applied in engineering, such as the threshold zone AGC control strategy that takes into account the combined effects of the proportional component, integral component and Control Performance Standard (CPS) control component of the regional control deviation (<xref ref-type="bibr" rid="B3">Arya and Kumar, 2017</xref>; <xref ref-type="bibr" rid="B14">Shen et&#x20;al., 2020a</xref>; <xref ref-type="bibr" rid="B21">Xi et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B13">Shen and Raksincharoensak, 2021b</xref>), have been unable to adapt to the increasingly complex frequency control of interconnected power grids (<xref ref-type="bibr" rid="B16">Shen et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B28">Zhang and Luo, 2018</xref>).</p>
<p>In recent years, the intelligent frequency control strategy of reinforcement learning has received lots of attention (<xref ref-type="bibr" rid="B27">Yu et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B1">Abouheaf et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B20">Xi et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B15">Shen et&#x20;al., 2020b</xref>; <xref ref-type="bibr" rid="B8">Liu et&#x20;al., 2020</xref>), because it does not rely on models and does not require precise training samples or system prior knowledge (<xref ref-type="bibr" rid="B19">Watkins and Dayan, 1992</xref>; <xref ref-type="bibr" rid="B26">Yang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B6">Li et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B24">Yang et&#x20;al., 2021a</xref>; <xref ref-type="bibr" rid="B11">Shen et&#x20;al., 2021b</xref>).</p>
<p>However, most intelligent control strategies are built on the CPS frequency control performance evaluation standard. The CPS index has low sensitivity for short-term inter-area power support evaluation, and cannot take into account the short-term benefits of frequency control performance (<xref ref-type="bibr" rid="B5">Kumar and Singh, 2019</xref>; <xref ref-type="bibr" rid="B23">Yang et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B29">Zhu et&#x20;al., 2019</xref>). In a system with large-scale wind power grid connection, the ability of each region to comply with CPS indicators is limited. The intelligent AGC control strategy that only considers the CPS control criteria can easily cause short-term concentrated frequency crossings, which seriously affects the control effect of the intelligent AGC control strategy (<xref ref-type="bibr" rid="B17">Wang and James, 2013</xref>; <xref ref-type="bibr" rid="B22">Xie et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B25">Yang et&#x20;al., 2021b</xref>).</p>
<p>In fact, with the development of grid-connected new energy sources and smart grids, the grid frequency control evaluation standard is transitioning from single-scale evaluation to multi-time-scale and multi-dimensional evaluation. The North American Electric Reliability Council (NERC) proposed a new frequency evaluation performance index named Balancing Authority ACE Limits (BAAL), which is used to ensure the short-term frequency quality of the system by constraining the mean value of the frequency difference fluctuates in any 30&#xa0;min not to exceed the limit. However, the intelligent AGC control strategy under both BAAL and CPS indicators is a kind of multi-objective control problem, and there is no relevant literature to study&#x20;it.</p>
<p>In response to the above problems, this paper proposes an intelligent frequency control strategy for collaborative evaluation of multi-dimensional control standards. This strategy constructs and introduces a collaborative reward function that considers the CPS index and the BAAL index in the multi-objective reinforcement learning algorithm. Then, the Negotiated W-Learning strategy is used to learn the action space of the agent, which effectively solves the problem that the agent cannot fully explore the action (<xref ref-type="bibr" rid="B9">Nathan and Ballard, 2003</xref>; <xref ref-type="bibr" rid="B7">Liu et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B18">Wang et&#x20;al., 2019</xref>). Simulation examples show that the proposed intelligent control strategy can effectively improve the overall frequency performance quality of the power system.</p>
</sec>
<sec id="s2">
<title>2 Frequency Control Performance Evaluation Standard of Interconnected Power Grid</title>
<sec id="s2-1">
<title>2.1 CPS1 Frequency Control Performance Evaluation Standard</title>
<p>NERC uses the BAL (BAL-001) disturbance control series of indicators to evaluate the frequency control quality of the interconnected power grid. Among them, the CPS1 (BAL-001-2: R1) indicator is the most widely used in China, as shown in <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>:<disp-formula id="e1">
<mml:math id="m1">
<mml:mi>A</mml:mi>
<mml:mi>V</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>10</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2264;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
<label>(1)</label>
</disp-formula>where &#x394;<italic>F</italic>
<sub>1&#x2009;</sub> <sub>min</sub> and <inline-formula id="inf1">
<mml:math id="m2">
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mspace width="0.3333em" class="nbsp"/>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> are separately the average value of the frequency deviation and power deviation in the control area within 1&#xa0;min, <italic>B</italic>
<sub>
<italic>m</italic>
</sub> is the frequency deviation coefficient of the area m, and represents the frequency adjustment responsibility assigned to area m. <italic>AVG</italic>
<sub>1,<italic>T</italic>
</sub> (&#x22c5;) means calculate the average value for 12&#xa0;months, <italic>&#x25b;</italic> is the upper limit of the area m in controlling the frequency deviation.</p>
<p>Taking the situation that the actual frequency is higher than the planned frequency as an example, expand <xref ref-type="disp-formula" rid="e1">Eq. 1</xref> as follows:<disp-formula id="e2">
<mml:math id="m3">
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:msubsup>
<mml:mrow>
<mml:mo>&#x222b;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>tie&#x2009;</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>10</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mi>d</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
<label>(2)</label>
</disp-formula>where: <italic>T</italic> is the entire time period, &#x394;<italic>F</italic>/<italic>&#x25b;</italic> is the frequency deviation contribution of the region itself, &#x394;<italic>P</italic>
<sub>tie</sub>/&#x2212; 10<italic>B</italic>
<sub>
<italic>m</italic>
</sub>
<italic>&#x25b;</italic> is the frequency contribution of other regions to this region, and &#x394;<italic>P</italic>
<sub>tie</sub>/&#x2212; 10 B<sub>
<italic>m</italic>
</sub>
<italic>&#x25b;</italic> &#x2b; &#x394;<italic>F</italic>/<italic>&#x25b;</italic> is the comprehensive frequency deviation contribution. For the convenience of analysis, define <inline-formula id="inf2">
<mml:math id="m4">
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>F</mml:mi>
<mml:mo>/</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>tie&#x2009;</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>10</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mspace width="0.3333em" class="nbsp"/>
<mml:mi mathvariant="normal">B</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>&#x3b5;</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>F</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> as the comprehensive frequency deviation factor, and denoted by <italic>&#x3c8;</italic>.</p>
<p>The CPS1 indicator statistically evaluates the rolling root mean square of the frequency difference time series during the <italic>T</italic> period in the evaluation area. When <italic>T</italic> is large enough, the system frequency deviation qualification rate is greater than 99.99<italic>%</italic>. Therefore, CPS1 is a long-term evaluation index reflecting the frequency quality of interconnected power&#x20;grids.</p>
</sec>
<sec id="s2-2">
<title>2.2 BAAL Frequency Control Performance Evaluation Standard</title>
<p>NERC proposed the BAAL (BAL-001-2: R2) evaluation index in 2013 and began to implement it in 2016, as shown in <xref ref-type="disp-formula" rid="e3">Eq. 3</xref> &#x223c;4:<disp-formula id="e3">
<mml:math id="m5">
<mml:mi>T</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2265;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>10</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>I</mml:mi>
<mml:mi>L</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
<label>(3)</label>
</disp-formula>
<disp-formula id="e4">
<mml:math id="m6">
<mml:mi>T</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>10</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>I</mml:mi>
<mml:mi>L</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>min</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
<label>(4)</label>
</disp-formula>where <italic>F</italic>
<sub>
<italic>A</italic>
</sub> is the actual frequency value; <italic>F</italic>
<sub>
<italic>S</italic>
</sub> is the planned frequency value; <italic>F</italic>
<sub>FTL-high</sub>/<italic>F</italic>
<sub>FTL-low</sub> is the high/low frequency trigger limit; T<sub>v</sub> is the specified allowable continuous time limit. <italic>T</italic> [&#x22c5;] is the continuous over-limit&#x20;time.</p>
<p>Taking the situation that the actual frequency is higher than the planned frequency as an example, <xref ref-type="disp-formula" rid="e3">Eq. 3</xref> can be transformed into the following form in the same way:<disp-formula id="e5">
<mml:math id="m7">
<mml:mi>T</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:msubsup>
<mml:mrow>
<mml:mo>&#x222b;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>tie</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>10</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mspace width="0.3333em" class="nbsp"/>
<mml:mi mathvariant="normal">B</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mi>d</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>v</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
</sec>
<sec id="s2-3">
<title>2.3 Performance Analysis Under the Joint Control of BAAL Standard and CPS1 Standard</title>
<p>In order to further study the feature of the two index, <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> shows the change curve of the comprehensive frequency deviation factor <italic>&#x3c8;</italic>, which considers different performance indicators under the influence of the time dimension.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>The distribution curve of the integrated frequency deviation factor over&#x20;time.</p>
</caption>
<graphic xlink:href="fenrg-09-760525-g001.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>, taking point A as the critical point of frequency line crossing, when only CPS1 is considered, the system frequency can still meet the requirements of control performance index, but it will affect the safe operation of various equipment in the system and cause the power quality reduced. If only the BAAL indicator is considered, the system frequency may appear &#x201c;vertical dro&#x201d; and &#x201c;tip oscillatio,&#x201d; as shown in point B in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>. At this time, the synchronous generator frequently receives the opposite frequency deviation signal that occurs in a short period of time. This situation will increase the wear of the unit. When considering the effects of CPS1 and BAAL indicators at the same time, the frequency will change into the reverse process under the influence of BAAL performance after short-term limit violation.</p>
<p>In summary, if CPS1 and BAAL indicators can be coordinated to constrain the system frequency closely, it can guarantee not only the long-term frequency quality but also the short-term frequency safety.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Intelligent AGC Control Strategy Considering Cooperative Evaluation of Multi-Dimensional Control Standards</title>
<p>Based on the analysis in <xref ref-type="sec" rid="s2-3">Section 2.3</xref>, this paper constructs an AGC control model based on a multi-objective collaborative reward function reinforcement learning frequency control strategy. As shown in <xref ref-type="fig" rid="F2">Figure&#x20;2A</xref>, it mainly consists of the following parts: system governor, equivalent module of the generator, dynamic model of system&#x2019;s frequency deviation, and intelligent brain controller. Where <italic>R</italic>, <italic>T</italic>
<sub>
<italic>g</italic>
</sub>, <italic>T</italic>
<sub>
<italic>t</italic>
</sub>, <italic>M</italic>, <italic>D</italic> are separately the equivalent unit adjustment coefficient, time constant of the governor, equivalent generator time constant, equivalent inertia coefficient and equivalent damping coefficient of the power system in area m; &#x394;<italic>P</italic>
<sub>tie</sub> is the exchange power deviation of the tie line in area m, &#x394;<italic>X</italic>
<sub>
<italic>g</italic>
</sub>, &#x394;<italic>P</italic>
<sub>
<italic>g</italic>
</sub>, &#x394;<italic>P</italic>
<sub>
<italic>d</italic>
</sub> are separately the change in the position of the regulating valve, in generator output power and in load disturbance, &#x394;<italic>P</italic>
<sub>&#x3a3;</sub> is the total adjustment command of the&#x20;unit.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Intelligent AGC control strategy: <bold>(A)</bold> Intelligent AGC control strategy for collaborative evaluation of multi-scale standards, <bold>(B)</bold> The framework of Negotiated W-Learning.</p>
</caption>
<graphic xlink:href="fenrg-09-760525-g002.tif"/>
</fig>
<p>Frequency controller intelligent learning stage: This article uses a multi-objective collaborative reward function reinforcement learning strategy to learn and train the intelligent frequency controller. This strategy mainly includes two parts, namely CPS1 index and BAAL index cooperative reward function and Negotiated W-Learning based intelligent frequency control learning algorithm. First, use the MORL idea to construct the instant reward function of CPS1 index and BAAL index, and use dynamic coordination factors to characterize the impact of different indicators on environmental changes. Then, the implementation rewards given under the MORL learning are used to update the respective state action sets of the CPS1 index and the BAAL index. Finally, Negotiated W-Learning conducts a global search to get the final action, which will meet the CPS1 and BAAL indicators and environmental feedback characteristic information.</p>
<p>Frequency controller online deployment stage: The learned and mature frequency controller receives the SCADA database in the Energy Management System (EMS) in each AGC control cycle to collect frequency deviation, ACE, CPS, BAAL, and other data in real time, and make real-time frequency control action.</p>
<sec id="s3-1">
<title>3.1 Collaborative Reward Function of CPS1 Indicator and BAAL Indicator</title>
<p>This paper constructs a cooperative reward function based on the CPS1 indicator and the BAAL indicator, which is expressed as follows:<disp-formula id="e6">
<mml:math id="m8">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>B</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>S</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>S</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(6)</label>
</disp-formula>
</p>
<p>Among them: <inline-formula id="inf3">
<mml:math id="m9">
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> is the instant reward value obtained when the <italic>i</italic>th goal is transferred from state <italic>s</italic> to state <italic>s</italic>&#x2032; through action <italic>a</italic>; <italic>ACE</italic> (<italic>t</italic>) is the real-time value of the regional control deviation at the current moment; <italic>s</italic> is the system state [<italic>ACE</italic>(<italic>t</italic>)] at time t, <italic>s</italic>&#x2032; is the state [<italic>ACE</italic> (<italic>t</italic>&#x20;&#x2b; 1)] at time <italic>t</italic>&#x20;&#x2b; 1, <italic>a</italic> is the system action <inline-formula id="inf4">
<mml:math id="m10">
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="normal">&#x394;</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:math>
</inline-formula> when the system goes from <italic>s</italic> to state <italic>s</italic>&#x2032;. <italic>BAAL</italic>(<italic>t</italic>) is the instantaneous value of BAAL at time <italic>t</italic>, <italic>CPS</italic>1 (<italic>t</italic>) is the instantaneous value of <italic>CPS</italic>1 at time <italic>t</italic>, <italic>CPS</italic>1&#x2a; is the target value, generally&#x20;200<italic>%</italic>.</p>
<p>
<italic>&#x3bb;</italic>
<sub>
<italic>i</italic>
</sub> is the dynamic coordination factor of the cooperative reward function, that is, <italic>&#x3bb;</italic>
<sub>
<italic>i</italic>
</sub> changes dynamically with each state transition process. This paper adopts the method of comprehensive weighting and multiplicative weighting, comprehensively considers the preferences of decision makers and the inherent statistical law between the index data to determine the value of the dynamic coordination factor.</p>
<p>Firstly, Define parameter <italic>K</italic> as a parameter for evaluating the importance of frequency performance evaluation indicators. <italic>K</italic>
<sub>
<italic>i</italic>
</sub>
<italic>,</italic>
<sub>
<italic>j</italic>
</sub> represents the importance degree of the evaluation index relative to another one in the frequency performance evaluation. When there is an out-of-bounds situation such as ACE &#x3c; BAAL or <italic>CPS</italic>1 &#x3e; 200, the importance of the corresponding indicators will increase accordingly. When the two indicators play equal or unimportant roles in the frequency evaluation process, the corresponding <italic>K</italic>
<sub>
<italic>i</italic>
</sub>
<italic>,</italic>
<sub>
<italic>j</italic>
</sub>/<italic>K</italic>
<sub>
<italic>j</italic>
</sub>
<italic>,</italic>
<sub>
<italic>i</italic>
</sub> values are all 4 or 0. The relative importance of any index increases by one point, the corresponding <italic>K</italic>
<sub>
<italic>i</italic>
</sub>
<italic>,</italic>
<sub>
<italic>j</italic>
</sub>/<italic>K</italic>
<sub>
<italic>j</italic>
</sub>
<italic>,</italic>
<sub>
<italic>i</italic>
</sub> value increases by 1, and the <italic>K</italic>
<sub>
<italic>j</italic>
</sub>
<italic>,</italic>
<sub>
<italic>i</italic>
</sub>/<italic>K</italic>
<sub>
<italic>i</italic>
</sub>
<italic>,</italic>
<sub>
<italic>j</italic>
</sub> value decreases by 1. Then obtain the weighting factors of each target in each action cycle:<disp-formula id="e7">
<mml:math id="m11">
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>In order to eliminate subjectivity, the entropy method is used to calculate the coefficient of difference between the two indicators <italic>&#x3b2;</italic>
<sub>
<italic>i</italic>
</sub>:<disp-formula id="e8">
<mml:math id="m12">
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>ln</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>ln</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(8)</label>
</disp-formula>
<disp-formula id="e9">
<mml:math id="m13">
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
<label>(9)</label>
</disp-formula>Where: <italic>x</italic>
<sub>
<italic>y</italic>
</sub>
<italic>,</italic>
<sub>
<italic>i</italic>
</sub> is the standardized index value of the <italic>i</italic>th frequency control performance evaluation index at the <italic>y</italic>th time, <italic>K</italic> represents the number of the <italic>i</italic>th frequency control performance evaluation index from 0 to the current time t, and N represents the target number. <italic>P</italic>
<sub>
<italic>y</italic>
</sub>
<italic>,</italic>
<sub>
<italic>i</italic>
</sub> is the proportion of <italic>x</italic>
<sub>
<italic>y</italic>
</sub>
<italic>,</italic>
<sub>
<italic>i</italic>
</sub> to the total number of indicators from 0 to&#x20;t.</p>
<p>At last, the final coordination factor is determined by multiplication weighted method. Therefore, the coordination factor can be obtained by combining 8 and 9:<disp-formula id="e10">
<mml:math id="m14">
<mml:msub>
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msqrt>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(10)</label>
</disp-formula>
</p>
</sec>
<sec id="s3-2">
<title>3.2 Negotiated W-Learning Intelligent Frequency Control Learning Algorithm</title>
<p>The update formula of MORL is the same as the state-action value function update of traditional Q learning, as shown in <xref ref-type="disp-formula" rid="e11">Eq. 11</xref>. In order to facilitate the selection of the optimal action that satisfies each of the following goals, this paper uses the <bold>
<italic>MQ</italic>
</bold> (<italic>s</italic>, <italic>a</italic>) vector to represent the state-action value function Q value of the action <italic>a</italic> in the state <italic>s</italic> for the <italic>N</italic> goals, as shown in <xref ref-type="disp-formula" rid="e12">Eq. 12</xref>, and the optimal action strategy <inline-formula id="inf5">
<mml:math id="m15">
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> for each target in the current state expressed in <xref ref-type="disp-formula" rid="e13">Eq. 13</xref>:<disp-formula id="e11">
<mml:math id="m16">
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2190;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:munder>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(11)</label>
</disp-formula>
<disp-formula id="e12">
<mml:math id="m17">
<mml:mi mathvariant="bold-italic">M</mml:mi>
<mml:mi mathvariant="bold-italic">Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>N</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(12)</label>
</disp-formula>
<disp-formula id="e13">
<mml:math id="m18">
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>MQ</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi>arg</mml:mi>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi mathvariant="bold-italic">M</mml:mi>
<mml:mi mathvariant="bold-italic">Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(13)</label>
</disp-formula>
</p>
<p>In <xref ref-type="disp-formula" rid="e11">Eq. 11</xref>:<italic>&#x3b1;</italic> (0 &#x3c; <italic>&#x3b1;</italic> &#x3c; 1) is the learning rate, which is set to 0.01 in this article; <italic>&#x3b3;</italic> is the discount coefficient, which is set to 0.9 in this article; <italic>Q</italic>
<sub>
<italic>i</italic>
</sub> (<italic>s</italic>, <italic>a</italic>) represents the Q value of the <italic>i</italic>th target&#x2019;s choice of action <italic>a</italic> in state&#x20;<italic>s</italic>.</p>
<p>However, the above-mentioned optimal action selection strategy cannot guarantee that the agent fully explores the entire state-action space. In this paper, Negotiated W-learning strategy is used to optimize the <bold>
<italic>MQ</italic>
</bold> (<italic>s</italic>, <italic>a</italic>) vector space. This strategy defines variable <italic>W</italic>
<sub>
<italic>i</italic>
</sub> as a leader parameter. The operation steps are as follows, and <xref ref-type="fig" rid="F2">Figure&#x20;2B</xref> is a reference flow chart:</p>
<p>Step 1: Choose an objective function in the <bold>
<italic>MQ</italic>
</bold>(<italic>s</italic>, <italic>a</italic>) vector space as the guide objective function. Its investigation parameter is expressed as <italic>W</italic>
<sub>
<italic>i</italic>
</sub>. The first guide objective function is uniformly set to <italic>W</italic>
<sub>
<italic>cir</italic>
</sub> &#x3d; 0, and the guide action is obtained as follows:<disp-formula id="e14">
<mml:math id="m19">
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>arg</mml:mi>
<mml:mi>max</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(14)</label>
</disp-formula>
</p>
<p>Step 2: The remaining objective functions are calculated according to the following methods, as shown in 15:<disp-formula id="e15">
<mml:math id="m20">
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>max</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(15)</label>
</disp-formula>
</p>
<p>Step 3: Choose the maximum value of for other objective functions except the guide objective function, and compare it with <italic>W</italic>
<sub>
<italic>cir</italic>
</sub>. If <italic>W</italic>
<sub>
<italic>i</italic>
</sub>
<italic>,</italic>
<sub>&#x2009;max</sub> &#x3e; <italic>W</italic>
<sub>
<italic>cir</italic>
</sub>, the objective function which is&#x20;corresponding to this maximum value of <italic>W</italic>
<sub>
<italic>i</italic>
</sub> should be&#x20;selected as the new guidance objective function, the&#x20;guidance value <italic>W</italic>
<sub>
<italic>cir</italic>
</sub> should be updated as the value of <italic>W</italic>
<sub>
<italic>i</italic>
</sub>
<italic>,</italic>
<sub>
<italic>max</italic>
</sub>, the corresponding action <italic>a</italic> should be made to be&#x20;the&#x20;new guidance action <italic>a</italic>
<sub>
<italic>cir</italic>
</sub>, and then go back to step 2 for repeated iterations until this condition is no longer&#x20;met.</p>
<p>If <italic>W</italic>
<sub>
<italic>i</italic>
</sub> &#x2264; <italic>W</italic>
<sub>
<italic>cir</italic>
</sub> is obtained, record the guidance action <italic>a</italic>
<sub>
<italic>cir</italic>
</sub> and the guidance objective function at this time as the final&#x20;value.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Simulation Results</title>
<p>This paper builds a typical two-region interconnected power grid AGC model for controlling load frequency. The parameter settings of the two regions in the model system are the same, and the system base capacity is 1000&#xa0;MW.</p>
<p>
<xref ref-type="fig" rid="F3">Figure&#x20;3A,B</xref> shows the pre-learning process of single CPS1 target and Negotiated W-Learning Algorithm. In the pre-learning stage, a continuous sinusoidal load disturbance with a period of 1,200&#xa0;s, an amplitude of 100&#xa0;MW and a duration of 20,000&#xa0;s is applied to the A area, and a 2-norm Q function matrix <inline-formula id="inf7">
<mml:math id="m22">
<mml:msup>
<mml:mrow>
<mml:mfenced open="&#x2016;" close="&#x2016;">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>&#x3b6;</mml:mi>
</mml:math>
</inline-formula> (<italic>&#x3b6;</italic> is a constant) is used as the standard for pre-learning to achieve the optimal strategy (<xref ref-type="bibr" rid="B4">Imthias Ahamed et&#x20;al., 2002</xref>).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Simulation results: <bold>(A)</bold> Differential convergence result of Q function under CPS1 objective, <bold>(B)</bold> <italic>CPS</italic>1<sub>avg</sub>-<sub>10&#x2212;min</sub> curve, <bold>(C)</bold> Self-contribution curve, <bold>(D)</bold> Curve reflecting the change of CPS1 value, <bold>(E)</bold> The curve of CPS1.</p>
</caption>
<graphic xlink:href="fenrg-09-760525-g003.tif"/>
</fig>
<p>It can be seen from <xref ref-type="fig" rid="F3">Figure&#x20;3A</xref> that after many iterations, the Q function tends to stabilize, reaching the optimal strategy for the CPS1 target. <xref ref-type="fig" rid="F3">Figure&#x20;3B</xref> shows the average value of CPS1 (<italic>CPS</italic>1<sub>avg</sub>-<sub>10&#x2212;min</sub>) in area A every 10&#xa0;min during the pre-learning process. It is found that the curve almost remains at a stable and acceptable value in the later stage, which shows that the Negotiated W-Learning algorithm has approached the optimal CPS1 control strategy. At the same time, the Q matrix corresponding to the target BAAL has also converged.</p>
<p>In addition, from the perspective of algorithm learning time, the four algorithms have been simulated for many times, and the average calculation time has been counted. See <xref ref-type="table" rid="T1">Table&#x20;1</xref> for details. Due to the difference in the number of optimization targets and the difficulty of calculating the coordination factor, the calculation time of the single target CPS1-MORL is the shortest. Since the CoordinateQ-MORL algorithm cannot fully explore the action set, its calculation time is the second. Compared with the global search algorithm Greedy-MORL, Negotiated W-Learning has gone through more search steps, so its time is the longest.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Simulation results under different algorithms.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Algorithms</th>
<th align="center">Calculating time/s (pre-learning)</th>
<th align="center">&#x7c;&#x394;<italic>f</italic>&#x7c;/Hz</th>
<th align="center">CPS1<italic>%</italic>
</th>
<th align="center">BAAL<italic>%</italic>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">CPS1-MORL</td>
<td align="char" char=".">12,031</td>
<td align="char" char=".">0.0143</td>
<td align="char" char=".">196</td>
<td align="char" char=".">86.4</td>
</tr>
<tr>
<td align="left">Coordinate Q-MORL</td>
<td align="char" char=".">18,546</td>
<td align="char" char=".">0.0132</td>
<td align="char" char=".">197</td>
<td align="char" char=".">96.2</td>
</tr>
<tr>
<td align="left">Greedy-MORL</td>
<td align="char" char=".">20,015</td>
<td align="char" char=".">0.0129</td>
<td align="char" char=".">199</td>
<td align="char" char=".">97.2</td>
</tr>
<tr>
<td align="left">Negotiated W-Learning</td>
<td align="char" char=".">21,457</td>
<td align="char" char=".">0.0064</td>
<td align="char" char=".">200</td>
<td align="char" char=".">98.5</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In order to further verify the adaptability of Negotiated W-Learning in the constantly changing power grid environment, this paper applies a random disturbance with a period of 1,200&#xa0;s and an amplitude of 100&#xa0;MW in area A. Four types of algorithms are set for comparison as follows.</p>
<p>
<statement content-type="algorithm" id="alg1">
<label>Algorithm 1</label>
<p>Traditional single-objective reinforcement learning algorithm for intelligent frequency control based on CPS1 frequency control performance evaluation index (CPS1-MORL).</p>
</statement>
</p>
<p>
<statement content-type="algorithm" id="alg2">
<label>Algorithm 2</label>
<p>Multi-objective reinforcement learning algorithm for intelligent frequency control based on the traditional greedy strategy of multi-dimensional frequency control performance evaluation index and multi-objective Q function (Coordinate Q-MORL).</p>
</statement>
</p>
<p>
<statement content-type="algorithm" id="alg3">
<label>Algorithm 3</label>
<p>Under the traditional greedy strategy, this algorithm uses a cooperative reward function based on multi-dimensional frequency control performance evaluation indicators to achieve multi-objective reinforcement learning and intelligent frequency control algorithm (Greedy-MORL).</p>
</statement>
</p>
<p>
<statement content-type="algorithm" id="alg4">
<label>Algorithm 4</label>
<p>The Negotiated W-Learning algorithm proposed in this paper is based on the collaborative reward letter under the multi-dimensional frequency control performance evaluation index for multi-objective reinforcement learning and intelligent frequency control (Negotiated W-MORL).</p>
</statement>
</p>
<sec id="s4-1">
<title>4.1 Control Strategy Performance Analysis</title>
<p>
<xref ref-type="fig" rid="F3">Figure&#x20;3C</xref> shows the frequency deviation self-contribution degree (&#x394;<italic>f</italic>/<italic>&#x25b;</italic>) and CPS1 index change curve of <xref ref-type="other" rid="alg1">Algorithm 1</xref> and <xref ref-type="other" rid="alg4">Algorithm 4</xref>. In this paper, the threshold is used for calculation, where <inline-formula id="inf6">
<mml:math id="m21">
<mml:mi mathvariant="script">E</mml:mi>
</mml:math>
</inline-formula> is 0.01. The frequency contribution degree has the ability to reflect the frequency quality of different algorithms. If the frequency contribution degree exceeds &#xb1; 1, it means that the frequency at this time has exceeded the prescribed limit 3<italic>&#x25b;</italic>. It can be seen that the frequency contribution curve of <xref ref-type="other" rid="alg1">Algorithm 1</xref> exceeds the short-term index frequency continuous limit time specified in this article and has a steep drop in this interval, which will cause greater influence on system operation safety. However, the frequency contribution curve of <xref ref-type="other" rid="alg4">Algorithm 4</xref> stays within the defined range. There are two main reasons for this phenomenon: One is that <xref ref-type="other" rid="alg4">Algorithm 4</xref> controls the frequency by relaxing the weights of the two indicators in real time. If frequency fluctuations or &#x201c;frequency drops&#x201d; occur, the BAAL indicator will be given greater weight. If the frequency continuously exceeds the limit during the simulation period, CPS1 will be given a larger weight for regulation. The second is that <xref ref-type="other" rid="alg4">Algorithm 4</xref> considers two indicators to participate in the evaluation of AGC control at the same time, while <xref ref-type="other" rid="alg1">Algorithm 1</xref> only considers the impact of CPS1. At the same time, the CPS1 curve of <xref ref-type="other" rid="alg4">Algorithm 4</xref> in <xref ref-type="fig" rid="F3">Figure&#x20;3D</xref> fluctuates less throughout the simulation cycle, while the fluctuation of <xref ref-type="other" rid="alg1">Algorithm 1</xref> is larger, which further proves that <xref ref-type="other" rid="alg4">Algorithm 4</xref> is superior to <xref ref-type="other" rid="alg1">Algorithm 1</xref> in terms of frequency control effect.</p>
<p>In summary, combining the BAAL and CPS1 indicators to constrain the system frequency can effectively improve the frequency quality of the system at the full time&#x20;scale.</p>
</sec>
<sec id="s4-2">
<title>4.2 The Influence of Cooperative Reward Function on Frequency Control Performance</title>
<p>In order to verify the effectiveness of the collaborative reward function proposed in this paper, the control performance indicators of <xref ref-type="other" rid="alg2">Algorithm 2</xref> and <xref ref-type="other" rid="alg3">Algorithm 3</xref> can be compared. It can be seen that the control performance indicators of <xref ref-type="other" rid="alg3">Algorithm 3</xref> are better than those of <xref ref-type="other" rid="alg2">Algorithm 2</xref>. This is because the introduction of coordination factors between the multi-objective state-action value function may cause the agent to not fully explore the action set, leading to the omission of key actions, and the use of collaborative reward functions can effectively solve the above problems.</p>
<p>In summary, the introduction of a collaborative reward function can effectively improve the system frequency quality and various frequency performance indicators.</p>
</sec>
<sec id="s4-3">
<title>4.3 The Influence of Different Learning Strategies on Control Performance</title>
<p>In order to verify the effectiveness of <xref ref-type="other" rid="alg4">Algorithm 4</xref> proposed in this paper, <xref ref-type="fig" rid="F3">Figure&#x20;3D</xref> shows the CPS1 curve of <xref ref-type="other" rid="alg3">Algorithm 3</xref> and <xref ref-type="other" rid="alg4">Algorithm 4</xref>. It can be seen from <xref ref-type="fig" rid="F3">Figure&#x20;3E</xref> that <xref ref-type="other" rid="alg4">Algorithm 4</xref> has a faster convergence rate and a more stable fluctuation situation than <xref ref-type="other" rid="alg3">Algorithm 3</xref> after the occurrence of load disturbance. This is because the Negotiated W-Learning strategy selects actions from global considerations, which effectively improves the traditional greedy strategy that is, easy to fall into the local optimal solution problem.</p>
<p>In summary, the global search strategy Negotiated W-Learning is more time-consuming than the local search strategies Greedy and CoordinateQ, but the search quality is higher.</p>
</sec>
</sec>
<sec id="s5">
<title>5 Conclusion</title>
<p>This paper proposes a multi-intelligence frequency control strategy based on multi-dimensional evaluation criteria and cooperative reward function.</p>
<p>The simulation results show that: 1) Compared with the general algorithm, the Negotiated W-Learning algorithm can effectively improve the quality of the system frequency on the full time scale, and better explore the global action. 2) The collaborative reward function proposed in this paper can improve the linear weight of the traditional multi-objective Q function. In general, the intelligent AGC control strategy based on the collaboration of CPS1 and BAAL learning criteria proposed in this paper can effectively deal with the short-term power disturbance problem caused by the grid connection of new energy sources such as wind power, and improve the stability of the system.</p>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>LZ put forward the main research points; LZ, YX completed manuscript writing and revision; JY completed simulation research; TX collected relevant background information; JC, ZL, TZ revised grammar and expression.</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>This manuscript was supported in part by the National Natural Science Foundation of China 52007103.</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abouheaf</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gueaieb</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Sharaf</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Load Frequency Regulation for Multi-Area Power System Using Integral Reinforcement Learning</article-title>. <source>IET Generation, Transm. Distribution</source> <volume>13</volume> (<issue>19</issue>), <fpage>4311</fpage>&#x2013;<lpage>4323</lpage>. <pub-id pub-id-type="doi">10.1049/iet-gtd.2019.0218</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alhelou</surname>
<given-names>H. H.</given-names>
</name>
<name>
<surname>Hamedani-Golshan</surname>
<given-names>M.-E.</given-names>
</name>
<name>
<surname>Zamani</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Heydarian-Forushani</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Siano</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Challenges and Opportunities of Load Frequency Control in Conventional, Modern and Future Smart Power Systems: a Comprehensive Review</article-title>. <source>Energies</source> <volume>11</volume> (<issue>10</issue>), <fpage>2497</fpage>. <pub-id pub-id-type="doi">10.3390/en11102497</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arya</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Optimal Control Strategy&#x2013;Based Agc of Electrical Power Systems: A Comparative Performance Analysis</article-title>. <source>Optimal Control. Appl. Methods</source> <volume>38</volume> (<issue>6</issue>), <fpage>982</fpage>&#x2013;<lpage>992</lpage>. <pub-id pub-id-type="doi">10.1002/oca.2304</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Imthias Ahamed</surname>
<given-names>T. P.</given-names>
</name>
<name>
<surname>Rao</surname>
<given-names>P. S. N.</given-names>
</name>
<name>
<surname>Sastry</surname>
<given-names>P. S.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>A Reinforcement Learning Approach to Automatic Generation Control</article-title>. <source>Electric Power Syst. Res.</source> <volume>63</volume> (<issue>1</issue>), <fpage>9</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1016/s0378-7796(02)00088-3</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kumar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Recent Strategies for Automatic Generation Control of Multi-Area Interconnected Power Systems</article-title>,&#x201d; in <conf-name>2019 3rd International Conference on Recent Developments in Control, Automation &#x26; Power Engineering (RDCAPE)</conf-name>, <conf-loc>Xi&#x2019;an, China</conf-loc>, <conf-date>April 25&#x2013;27, 2019</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>153</fpage>&#x2013;<lpage>158</lpage>. <pub-id pub-id-type="doi">10.1109/rdcape47089.2019.8979071</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Abu-Siada</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Research on a Composite Voltage and Current Measurement Device for Hvdc Networks</article-title>. <source>IEEE Trans. Ind. Electron.</source> <volume>68</volume>, <fpage>8930</fpage>-<lpage>8941</lpage>. <pub-id pub-id-type="doi">10.1109/tie.2020.3013772</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Real-time Vehicle-To-Grid Control for Frequency Regulation with High Frequency Regulating Signal</article-title>. <source>Prot. Control. Mod. Power Syst.</source> <volume>3</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1186/s41601-018-0085-1</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>X.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Multi-lateral Participants Decision-Making: A Distribution System Planning Approach with Incomplete Information Game</article-title>. <source>IEEE Access</source> <volume>8</volume>, <fpage>88933</fpage>&#x2013;<lpage>88950</lpage>. <pub-id pub-id-type="doi">10.1109/access.2020.2991181</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Nathan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ballard</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2003</year>). <source>Multiple-goal Reinforcement Learning with Modular Sarsa (0)</source>. <comment>Doctoral Dissertation</comment> <publisher-loc>Rochester</publisher-loc>: <publisher-name>University of Rochester</publisher-name>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ouyang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Zhuang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Sample-based Neural Approximation Approach for Probabilistic Constrained Programs</article-title>. <source>IEEE Trans. Neural Networks Learn. Syst.</source> <fpage>1</fpage>-<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1109/tnnls.2021.3102323</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ouyang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Khajorntraidet</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhuang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Mixture Density Networks-Based Knock Simulator</article-title>. <source>IEEE/ASME Trans. Mechatronics</source>, <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/tmech.2021.3059775</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Raksincharoensak</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Pedestrian-aware Statistical Risk Assessment</article-title>. <source>IEEE Trans. Intell. Transportation Syst.</source>, <fpage>1</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1109/tits.2021.3074522</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Raksincharoensak</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Statistical Models of Near-Accident Event and Pedestrian Behavior at Non-signalized Intersections</article-title>. <source>J.&#x20;Appl. Stat.</source>, <fpage>1</fpage>&#x2013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1080/02664763.2021.1962263</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Sata</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Gaussian Mixture Model Clustering-Based Knock Threshold Learning in Automotive Engines</article-title>. <source>IEEE/ASME Trans. Mechatronics</source> <volume>25</volume> (<issue>6</issue>), <fpage>2981</fpage>&#x2013;<lpage>2991</lpage>. <pub-id pub-id-type="doi">10.1109/tmech.2020.3000732</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ouyang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Raksincharoensak</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Cooperative Comfortable-Driving at Signalized Intersections for Connected and Automated Vehicles</article-title>. <source>IEEE Robotics Automation Lett.</source> <volume>5</volume> (<issue>4</issue>), <fpage>6247</fpage>&#x2013;<lpage>6254</lpage>. <pub-id pub-id-type="doi">10.1109/lra.2020.3014010</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Khajorntraidet</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Spark advance self-optimization with knock probability threshold for lean-burn operation mode of si engine</article-title>. <source>Energy</source> <volume>122</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1016/j.energy.2017.01.065</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>James</surname>
<given-names>D. M. C.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Impact of Wind Power on Control Performance Standards</article-title>. <source>Int. J.&#x20;Electr. Power Energ. Syst.</source> <volume>47</volume>, <fpage>225</fpage>&#x2013;<lpage>234</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijepes.2012.11.010</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Lei</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Multiobjective Reinforcement Learning-Based Intelligent Approach for Optimization of Activation Rules in Automatic Generation Control</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>17480</fpage>&#x2013;<lpage>17492</lpage>. <pub-id pub-id-type="doi">10.1109/access.2019.2894756</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Watkins</surname>
<given-names>C. J.&#x20;C. H.</given-names>
</name>
<name>
<surname>Dayan</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1992</year>). <article-title>Q-learning</article-title>. <source>Machine Learn.</source> <volume>8</volume> (<issue>3-4</issue>), <fpage>279</fpage>&#x2013;<lpage>292</lpage>. <pub-id pub-id-type="doi">10.1023/a:1022676722315</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xi</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A Novel Multi-Agent Ddqn-Ad Method-Based Distributed Strategy for Automatic Generation Control of Integrated Energy Systems</article-title>. <source>IEEE Trans. Sustain. Energ.</source> <volume>11</volume> (<issue>4</issue>), <fpage>2417</fpage>&#x2013;<lpage>2426</lpage>. <pub-id pub-id-type="doi">10.1109/tste.2019.2958361</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xi</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Multi-step Unified Reinforcement Learning Method for Automatic Generation Control in Multi-Area Interconnected Power Grid</article-title>. <source>IEEE Trans. Sustain. Energ.</source> <volume>12</volume> (<issue>2</issue>), <fpage>1406</fpage>&#x2013;<lpage>1415</lpage>. <pub-id pub-id-type="doi">10.1109/tste.2020.3047137</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Development Approach of a Programmable and Open Software Package for Power System Frequency Response Calculation</article-title>. <source>Prot. Control. Mod. Power Syst.</source> <volume>2</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1186/s41601-017-0045-1</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Hou</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Adaptive Nonparametric Kernel Density Estimation Approach for Joint Probability Density Function Modeling of Multiple Wind Farms</article-title>. <source>Energies</source> <volume>12</volume> (<issue>7</issue>), <fpage>1356</fpage>. <pub-id pub-id-type="doi">10.3390/en12071356</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xing</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>An Improved Robust Scuc Approach Considering Multiple Uncertainty and Correlation</article-title>. <source>IEEJ&#x20;Trans. Electr. Electron. Eng.</source> <volume>16</volume> (<issue>1</issue>), <fpage>21</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1002/tee.23265</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Intelligent Data-Driven Decision-Making Method for Dynamic Multi-Sequence: An E-Seq2seq Based Scuc Expert System</article-title>. <source>IEEE Trans. Ind. Inform.</source> <lpage>1</lpage>. <pub-id pub-id-type="doi">10.1109/tii.2021.3107406</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Research on Modelling and Solution of Stochastic Scuc under Ac Power Flow Constraints</article-title>. <source>IET Generation, Transm. Distribution</source> <volume>12</volume> (<issue>15</issue>), <fpage>3618</fpage>&#x2013;<lpage>3625</lpage>. </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y. M.</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>W. J.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>K. W.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Stochastic Optimal Generation Command Dispatch Based on Improved Hierarchical Reinforcement Learning Approach</article-title>. <source>IET generation, Transm. distribution</source> <volume>5</volume> (<issue>8</issue>), <fpage>789</fpage>&#x2013;<lpage>797</lpage>. <pub-id pub-id-type="doi">10.1049/iet-gtd.2010.0600</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Lei.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>Yi.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Combined Heat and Power Scheduling: Utilizing Building-Level thermal Inertia for Short-Term thermal Energy Storage in District Heat System</article-title>. <source>IEEJ&#x20;Trans. Electr. Electron. Eng.</source> <volume>13</volume> (<issue>6</issue>), <fpage>804</fpage>&#x2013;<lpage>814</lpage>. <pub-id pub-id-type="doi">10.1002/tee.22633</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Don</surname>
<given-names>M. V.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Coat Circuits for Dc&#x2013;Dc Converters to Improve Voltage Conversion Ratio</article-title>. <source>IEEE Trans. Power Electron.</source> <volume>35</volume> (<issue>4</issue>), <fpage>3679</fpage>&#x2013;<lpage>3687</lpage>. </citation>
</ref>
</ref-list>
</back>
</article>