<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2022.877569</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Impact of Box-Cox Transformation on Machine-Learning Algorithms</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Blum</surname> <given-names>Luca</given-names></name>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1683535/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Elgendi</surname> <given-names>Mohamed</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/499556/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Menon</surname> <given-names>Carlo</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/76764/overview"/>
</contrib>
</contrib-group>
<aff><institution>Biomedical and Mobile Health Technology Laboratory, ETH Zurich</institution>, <addr-line>Zurich</addr-line>, <country>Switzerland</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Fabrizio Riguzzi, University of Ferrara, Italy</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Kazushi Maruo, University of Tsukuba, Japan; Abbas Cheddad, Blekinge Institute of Technology, Sweden</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Mohamed Elgendi <email>moe.elgendi&#x00040;hest.ethz.ch</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Machine Learning and Artificial Intelligence, a section of the journal Frontiers in Artificial Intelligence</p></fn>
<fn fn-type="equal" id="fn002"><p>&#x02020;These authors share first authorship</p></fn></author-notes>
<pub-date pub-type="epub">
<day>07</day>
<month>04</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>5</volume>
<elocation-id>877569</elocation-id>
<history>
<date date-type="received">
<day>16</day>
<month>02</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>03</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Blum, Elgendi and Menon.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Blum, Elgendi and Menon</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>This paper studied the effects of applying the Box-Cox transformation for classification tasks. Different optimization strategies were evaluated, and the results were promising on four synthetic datasets and two real-world datasets. A consistent improvement in accuracy was demonstrated using a grid exploration with cross-validation. In conclusion, applying the Box-Cox transformation could drastically improve the performance by up to a 12% accuracy increase. Moreover, the Box-Cox parameter choice was dependent on the data and the used classifier.</p></abstract>
<kwd-group>
<kwd>Box-Cox transformation</kwd>
<kwd>power transformation</kwd>
<kwd>Non-linear mappings</kwd>
<kwd>feature transformation</kwd>
<kwd>accuracy improvement</kwd>
<kwd>classifier optimization</kwd>
<kwd>preprocessing data</kwd>
<kwd>monotonic transformation</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="8"/>
<equation-count count="4"/>
<ref-count count="11"/>
<page-count count="16"/>
<word-count count="9348"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Feature transformation can improve the performance of a machine learning algorithm. Simple transformations already had a significant impact on classification performance (Bicego and Baldo, <xref ref-type="bibr" rid="B1">2016</xref>; Liang et al., <xref ref-type="bibr" rid="B8">2020</xref>). Motivated by their findings, the impact of the Box-Cox transformation for classification tasks was studied. Often, Box-Cox is used to increase the Gaussianity of data. This can help in some special cases; however, we observed that transformations that do not maximize the Gaussianity of the data are often superior for classification accuracy. Additionally, Bicego and Baldo (<xref ref-type="bibr" rid="B1">2016</xref>) have shown that the Gaussianity of datasets is not critical and by allowing the effect of the Box&#x02013;Cox transformation work in operational ranges that do not necessarily correspond to an increase in Gaussianity, they have shown that class separability can be improved. Furthermore, they proposed an automatic procedure for obtaining an optimal transformation. Their procedure relied on the <italic>spherical</italic> and <italic>diagonal</italic> optimization of statistical measurements, such as maximum likelihood or Fisher criterion. They showed that both are capable of improving the classification result, although the <italic>diagonal</italic> case often gives higher accuracy. This can be expected due to the higher number of parameters. Furthermore, they demonstrated that the choice of optimization criteria depends on the classifier itself.</p>
<p>Gao et al. (<xref ref-type="bibr" rid="B5">2017</xref>) attempted to find the optimal Box-Cox transformation in big data. They focused on regression and tried to get a maximum likelihood estimation (MLE) for the Box-Cox parameter when the dataset is massive. By using MapReduce, they proposed an algorithm that can be run in parallel and is able to process big data in chunks.</p>
<p>Cheddad (<xref ref-type="bibr" rid="B4">2020</xref>) investigated the effect of the Box-Cox transformation on images. They proposed an image pre-processing tool by using the Box-Cox transformation for histogram transformation. The parameters for the transformation were calculated using the MLE. By using image histograms instead of the image data, the time complexity could be kept static, and thus independent of the size of the image.</p>
<p>However, our focus is on the classification of tabular data that fits into the main memory. We sought to explore a generalization of the approach from Bicego and Baldo (<xref ref-type="bibr" rid="B1">2016</xref>) and provide an optimization procedure that is classifier dependent.</p>
</sec>
<sec id="s2">
<title>Box-Cox Transformation</title>
<p>The original Box-Cox transformation is a one-dimensional transformation with one parameter often called &#x003BB; and is applied element-wise to a vector <italic>y</italic> (Box and Cox, <xref ref-type="bibr" rid="B2">1964</xref>):</p>
<disp-formula id="E1"><mml:math id="M1"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mtext>Let&#x000A0;</mml:mtext><mml:mi>y</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mi>&#x0211D;</mml:mi><mml:mi>n</mml:mi></mml:msup><mml:mtext>&#x000A0;and&#x000A0;</mml:mtext><mml:mi>&#x003BB;</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x0211D;</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msubsup><mml:mi>y</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003BB;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>y</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003BB;</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mfrac></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:mi>&#x003BB;</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mtext>0</mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>ln</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:mi>&#x003BB;</mml:mi><mml:mtext>=0</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Many different criteria have been proposed for an optimal &#x003BB;. The most used method, which was introduced by Box and Cox (<xref ref-type="bibr" rid="B2">1964</xref>), is a MLE. Other approaches include a Bayesian approach (Sweeting, <xref ref-type="bibr" rid="B10">1984</xref>), robust estimators, Carroll and Ruppert (<xref ref-type="bibr" rid="B3">1985</xref>), Lawrance (<xref ref-type="bibr" rid="B7">1988</xref>), and Kim et al. (<xref ref-type="bibr" rid="B6">1996</xref>) and an attempt to iteratively maximize Gaussianity (V&#x000E9;lez et al., <xref ref-type="bibr" rid="B11">2015</xref>). The Box-Cox transformation is mostly studied for regression tasks. For &#x003BB;&#x0003E;1 the transformation is convex and for &#x003BB; &#x0003C;1 the transformation is concave. As described by Bicego and Baldo (<xref ref-type="bibr" rid="B1">2016</xref>), the data is stretched in the positive direction for &#x003BB;&#x0003E;1 and stretched in the negative direction for &#x003BB; &#x0003C;1. Assuming the data is range standardized between 1 and 2, this means for &#x003BB;&#x0003E;1 that data points near 1 have a smaller relative distance than points near 2 after applying the Box-Cox transformation (Bicego and Baldo, <xref ref-type="bibr" rid="B1">2016</xref>). The opposite behavior holds for &#x003BB; &#x0003C;1 (Bicego and Baldo, <xref ref-type="bibr" rid="B1">2016</xref>). For &#x003BB; &#x0003D; 1 the data is only shifted by 1 in the negative direction. The Box-Cox transformation is monotonic and therefore does not change the ordering of the data. These properties might help to increase class separability. For multi-dimensional data, <italic>X</italic>&#x02208;&#x0211D;<sup><italic>n</italic>&#x000D7;<italic>p</italic></sup>, it is usually applied <italic>p</italic> times as 1-dimensional mapping to each column with different values for &#x003BB;. Therefore, the overall transformation is specified by a <italic>p</italic>-dimensional vector, &#x0039B; &#x0003D; [&#x003BB;<sub>1</sub>, &#x003BB;<sub>2</sub>, &#x02026;, &#x003BB;<sub><italic>p</italic></sub>].</p>
<p>The optimization of the parameter vector &#x0039B; can be done in several ways. Naturally, one could optimize &#x003BB;<sub><italic>i</italic></sub> of the corresponding column <italic>X</italic><sub><italic>i</italic></sub> independently with traditional criteria such as MLE (Box and Cox, <xref ref-type="bibr" rid="B2">1964</xref>) or the Bayesian approach (Sweeting, <xref ref-type="bibr" rid="B10">1984</xref>). This will be referred to as <italic>diagonal</italic> setting,</p>
<disp-formula id="E2"><label>(1)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>&#x0039B;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mi>L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mi>L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mi>L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>L</italic>(&#x000B7;, &#x000B7;) is a criterion that needs to be minimized. A simplification of this case is the <italic>spherical</italic> setting. Only a scalar value &#x003BB; gets optimized and applied to every column.</p>
<disp-formula id="E3"><label>(2)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BB;</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The most general case is called <italic>full</italic> and optimizes.</p>
<disp-formula id="E4"><label>(3)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>&#x0039B;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mi>L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x0039B;</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mi>L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x0039B;</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">argmin</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mi>L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x0039B;</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>X</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
<sec id="s3">
<title>Motivation</title>
<p>To demonstrate the influence of the Box-Cox transformation, a stratified cross validation with 10 folds and 5 repetitions was executed on various artificial 2-dimensional binary classification tasks with varying &#x0039B;s. For each direction, <italic>i</italic>&#x02208;{1, 2}, &#x003BB;<sub><italic>i</italic></sub> was distributed evenly in the interval [&#x02212;5, 5] with a spacing of 1. Hence, 11 &#x000D7; 11 accuracy estimates were conducted. Accuracy measurements were carried out for the different classifiers described in <xref ref-type="table" rid="T1">Table 1</xref>, as implemented in the Python library scikit-learn (Pedregosa et al., <xref ref-type="bibr" rid="B9">2011</xref>). Additionally, the corresponding acronyms are given. Unless otherwise stated, the default parameters were used, and if provided, random seeds/states were set to 42. Python version 3.6.0, scikit-learn version 0.24.2, NumPy version 1.19.5, and SciPy version 1.5.4 were used.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Evaluated classifiers for a grid exploration and to test the proposed optimization method on real-world data: Details can be found in Pedregosa et al. (<xref ref-type="bibr" rid="B9">2011</xref>).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Classifier</bold></th>
<th valign="top" align="left"><bold>Description</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Linear</td>
<td valign="top" align="left">Linear classifier with Perceptron loss and trained with stochastic</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">gradient descent</td>
</tr>
<tr>
<td valign="top" align="left">KNN</td>
<td valign="top" align="left">Nearest neighbors voting with number of neighbors <italic>k</italic> &#x0003D; 5</td>
</tr>
<tr>
<td valign="top" align="left">Bayesian</td>
<td valign="top" align="left">Gaussian naive Bayes classifier</td>
</tr>
<tr>
<td valign="top" align="left">SVC</td>
<td valign="top" align="left">C-Support Vector Classification with radial basis function kernel</td>
</tr>
<tr>
<td valign="top" align="left">NN</td>
<td valign="top" align="left">Multi-layer neural network with 2 hidden layers, 10 neurons each,</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">relu activation and cross entropy loss</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> shows the different datasets that were used to study the accuracy for different values of &#x0039B;.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Various artificial binary classification problems were created to study the influence of the Box-Cox transformation with a grid exploration. <bold>(A)</bold> Gaussian quantiles, <bold>(B)</bold> interleaving half circles, <bold>(C)</bold> isotropic Gaussian blobs, and <bold>(D)</bold> random dataset.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-877569-g0001.tif"/>
</fig>
<p><xref ref-type="fig" rid="F2">Figure 2</xref> shows the accuracy measurements for the exhaustive grid exploration of &#x0039B; on the random classification dataset <xref ref-type="fig" rid="F1">Figure 1D</xref>. The corresponding pseudo-code is given in <xref ref-type="table" rid="TA1">Algorithm 1</xref>. Before applying the Box-Cox transformation, all datasets were preprocessed with a range standardization between 1 and 2. This was done to show the exclusive behavior of the Box-Cox transformation without the influence of other effects; however, the transformation needed positive data. The upper range bound ensured that the features did not explode when transformed with a larger &#x0039B;. The results of the Box-Cox transformation were also standard scaled before being given to the classifiers.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Accuracy heatmaps generated by <xref ref-type="table" rid="TA1">Algorithm 1</xref> for a random dataset. The numbers 1, 2, and 3 correspond to the optimal solution for the <italic>spherical, diagonal</italic>, and <italic>full</italic> optimization. If there are multiple solutions then only one possibility is shown. It was observed that the optimal parameter choice for the Box-Cox transformation depends on the classifier. The heatmaps showed multiple local maxima and <italic>full</italic> optimization led to the best optimization result. <bold>(A)</bold> Linear classifier, <bold>(B)</bold> KNN classifier, <bold>(C)</bold> Bayesian classifier, <bold>(D)</bold> SVC classifier, and <bold>(E)</bold> NN classifier.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-877569-g0002.tif"/>
</fig>
<table-wrap position="float" id="TA1">
<caption><p><bold>Algorithm 1</bold> : 2D Accuracy Gridexploration</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>D</italic>: dataset (<italic>X, Y</italic>)</monospace></td>
</tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<inline-formula><mml:math id="M5"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mn>4</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>4</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>5</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<inline-formula><mml:math id="M6"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>-</mml:mo><mml:mn>4</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>4</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>5</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>repetitions</italic>&#x02190;5</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>kfolds</italic>&#x02190;10 &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; number of folds in crossvalidation</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>C</italic>: classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>A</italic>: matrix to store accuracies</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;for each item &#x003BB;<sub>1</sub> in <inline-formula><mml:math id="M7"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for each item &#x003BB;<sub>2</sub> in <inline-formula><mml:math id="M8"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>a</italic>&#x02190;0 &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; current accuracy</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for <italic>rep</italic> &#x0003D; 1 to<italic>repetitions</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for <italic>I</italic><sub><italic>train</italic></sub>, <italic>I</italic><sub><italic>test</italic></sub> in cvpartition(<italic>D</italic>, kfold) <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <inline-formula><mml:math id="M9"><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">boxcox</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></monospace></td>
</tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <inline-formula><mml:math id="M10"><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; split data</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>Y</italic><sub><italic>train</italic></sub>&#x02190;<italic>Y</italic>[<italic>I</italic><sub><italic>train</italic></sub>]</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <inline-formula><mml:math id="M11"><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>Y</italic><sub><italic>test</italic></sub>&#x02190;<italic>Y</italic>[<italic>I</italic><sub><italic>test</italic></sub>]</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; Scaler = Standard_Scaler()</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>train</italic></sub>= Scaler.fit_transform(<italic>X</italic><sub><italic>train</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; train model</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>C</italic><sub><italic>t</italic></sub> &#x0003D; train(<italic>C, X</italic><sub><italic>train</italic></sub>, <italic>Y</italic><sub><italic>train</italic></sub>)</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>test</italic></sub>= Scaler.transform (<italic>X</italic><sub><italic>test</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; evaluate model</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>P</italic> &#x0003D; predict(<italic>C</italic><sub><italic>t</italic></sub>, <italic>X</italic><sub><italic>test</italic></sub>)</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>a</italic>&#x02190;<italic>a</italic> &#x0002B; accuracy (<italic>P, Y</italic><sub><italic>test</italic></sub>)</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <inline-formula><mml:math id="M12"><mml:mi>A</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:mo>&#x0002A;</mml:mo><mml:mi>k</mml:mi><mml:mi>f</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mi>d</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;end <bold>for</bold></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<p>It was observed that the different heatmaps were not similar; hence, the Box-Cox transformation was dependent on the classifier itself. For example, &#x0039B; &#x0003D; [&#x02212;5, 4] gave the best performance for the SVC classifier, but it was almost the worst for the neural network. While &#x0039B; &#x0003D; [1, 5] was the best for the Bayesian classifier, it was bad for the KNN classifier. This suggests that the optimization of the Box-Cox transformation was not only dependent on the data but also on the classifier. This observation was also made by Bicego and Baldo (<xref ref-type="bibr" rid="B1">2016</xref>).</p>
<p>The heatmaps also showed multiple local maxima. Hence, the optimization should be non-convex. Similar observations were made for the other datasets, and the corresponding heatmaps are provided in <xref ref-type="supplementary-material" rid="SM1">Appendix A</xref>.</p>
<p>Finally, it was obvious that the <italic>full</italic> optimization gave better results than the <italic>spherical</italic> and <italic>diagonal</italic> settings. The possible <italic>spherical</italic> configurations were seen on the diagonal of the heatmap (e.g., &#x0039B;&#x02208;{[&#x02212;5, &#x02212;5], [&#x02212;4, &#x02212;4], &#x02026;, [5, 5]}). The diagonal can be illustrated by first fixing one direction &#x003BB;<sub><italic>i</italic></sub> &#x0003D; 1 and optimizing in the other direction and then vice versa. Possible optimal solutions for the <italic>spherical, diagonal</italic>, and <italic>full</italic> optimization were indicated with corresponding numbers 1, 2, and 3. If there were multiple options for the optimal solution in one direction for <italic>diagonal</italic> optimization, then the case that led to higher final optimization accuracy was used.</p>
<p><xref ref-type="table" rid="T2">Table 2</xref> summarizes the accuracy heatmaps for all four datasets in <xref ref-type="fig" rid="F1">Figure 1</xref>. It shows the performance before applying the Box-Cox transformation and after applying the Box-Cox transformation with the best reported configuration of &#x0039B;. The numbers are rounded to the first decimal point. The accuracy before applying the Box-Cox transformation corresponds to a Box-Cox transformation with &#x0039B; &#x0003D; [1, 1] because this only shifts the data by 1 in each direction and therefore does not influence the classification result.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Accuracy of five classifiers before and after applying Box-Cox transformation using three optimization strategies.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Classifier</bold></th>
<th valign="top" align="center"><bold>Acc before [%]</bold></th>
<th valign="top" align="center"><bold>Acc after [%]</bold></th>
<th valign="top" align="center"><bold>Full (&#x003B4;) [%]</bold></th>
<th valign="top" align="center"><bold>Spherical (&#x003B4;) [%]</bold></th>
<th valign="top" align="center"><bold>Diagonal (&#x003B4;) [%]</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="6"><bold>Gaussian quantiles</bold></td>
</tr>
<tr>
<td valign="top" align="left">Linear</td>
<td valign="top" align="center">49.0</td>
<td valign="top" align="center">54.1</td>
<td valign="top" align="center">6.1</td>
<td valign="top" align="center">5.2</td>
<td valign="top" align="center">5.2</td>
</tr>
<tr>
<td valign="top" align="left">KNN</td>
<td valign="top" align="center">96.2</td>
<td valign="top" align="center">96.6</td>
<td valign="top" align="center">0.4</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">0.3</td>
</tr>
<tr>
<td valign="top" align="left">Bayesian</td>
<td valign="top" align="center">96.8</td>
<td valign="top" align="center">96.9</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">0.1</td>
</tr>
<tr>
<td valign="top" align="left">SVC</td>
<td valign="top" align="center">98.8</td>
<td valign="top" align="center">99.2</td>
<td valign="top" align="center">0.5</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">0.2</td>
</tr>
<tr>
<td valign="top" align="left">NN</td>
<td valign="top" align="center">98.9</td>
<td valign="top" align="center">99.2</td>
<td valign="top" align="center">0.3</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">0.1</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6"><bold>Interleaving half circles</bold></td>
</tr>
<tr>
<td valign="top" align="left">Linear</td>
<td valign="top" align="center">83.5</td>
<td valign="top" align="center">84.8</td>
<td valign="top" align="center">1.3</td>
<td valign="top" align="center">0.4</td>
<td valign="top" align="center">&#x02212;0.4</td>
</tr>
<tr>
<td valign="top" align="left">KNN</td>
<td valign="top" align="center">100.0</td>
<td valign="top" align="center">100.0</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">0.0</td>
</tr>
<tr>
<td valign="top" align="left">Bayesian</td>
<td valign="top" align="center">87.4</td>
<td valign="top" align="center">89.3</td>
<td valign="top" align="center">1.9</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">1.9</td>
</tr>
<tr>
<td valign="top" align="left">SVC</td>
<td valign="top" align="center">99.7</td>
<td valign="top" align="center">99.7</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">0.0</td>
</tr>
<tr>
<td valign="top" align="left">NN</td>
<td valign="top" align="center">99.9</td>
<td valign="top" align="center">100.0</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">0.0</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6"><bold>Isotropic Gaussian blobs</bold></td>
</tr>
<tr>
<td valign="top" align="left">Linear</td>
<td valign="top" align="center">68.1</td>
<td valign="top" align="center">70.5</td>
<td valign="top" align="center">2.4</td>
<td valign="top" align="center">1.9</td>
<td valign="top" align="center">&#x02212;0.8</td>
</tr>
<tr>
<td valign="top" align="left">KNN</td>
<td valign="top" align="center">74.6</td>
<td valign="top" align="center">75.2</td>
<td valign="top" align="center">0.5</td>
<td valign="top" align="center">0.5</td>
<td valign="top" align="center">0.2</td>
</tr>
<tr>
<td valign="top" align="left">Bayesian</td>
<td valign="top" align="center">76.1</td>
<td valign="top" align="center">76.4</td>
<td valign="top" align="center">0.3</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">&#x02212;1.0</td>
</tr>
<tr>
<td valign="top" align="left">SVC</td>
<td valign="top" align="center">75.7</td>
<td valign="top" align="center">76.1</td>
<td valign="top" align="center">0.4</td>
<td valign="top" align="center">0.4</td>
<td valign="top" align="center">0.1</td>
</tr>
<tr>
<td valign="top" align="left">NN</td>
<td valign="top" align="center">76.0</td>
<td valign="top" align="center">76.5</td>
<td valign="top" align="center">0.5</td>
<td valign="top" align="center">0.3</td>
<td valign="top" align="center">0.3</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6"><bold>Random dataset</bold></td>
</tr>
<tr>
<td valign="top" align="left">Linear</td>
<td valign="top" align="center">77.3</td>
<td valign="top" align="center">80.5</td>
<td valign="top" align="center">3.5</td>
<td valign="top" align="center">2.7</td>
<td valign="top" align="center">1.7</td>
</tr>
<tr>
<td valign="top" align="left">KNN</td>
<td valign="top" align="center">87.2</td>
<td valign="top" align="center">87.7</td>
<td valign="top" align="center">0.5</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">0.5</td>
</tr>
<tr>
<td valign="top" align="left">Bayesian</td>
<td valign="top" align="center">85.9</td>
<td valign="top" align="center">86.6</td>
<td valign="top" align="center">0.8</td>
<td valign="top" align="center">0.6</td>
<td valign="top" align="center">0.6</td>
</tr>
<tr>
<td valign="top" align="left">SVC</td>
<td valign="top" align="center">87.2</td>
<td valign="top" align="center">87.5</td>
<td valign="top" align="center">0.3</td>
<td valign="top" align="center">0.0</td>
<td valign="top" align="center">&#x02212;0.2</td>
</tr>
<tr>
<td valign="top" align="left">NN</td>
<td valign="top" align="center">87.7</td>
<td valign="top" align="center">87.9</td>
<td valign="top" align="center">0.2</td>
<td valign="top" align="center">0.2</td>
<td valign="top" align="center">0.0</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Column Acc after corresponded to full optimization, which was observed as the optimal optimization. Spherical was able to get smaller but also consistent improvements. Diagonal achieved some gains but sometimes decreased the accuracy</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>It was observed that the linear classifier benefited most from the Box-Cox transformation. The other classifiers also benefited, unless the classification result was almost perfect before applying the transformation (KNN and NN in the interleaving half circles dataset). Thus, Box-Cox transformation consistently improved the classification result.</p>
<p>It was also seen that, mostly, <italic>spherical</italic> optimization did not achieve the same improvements as <italic>full</italic> optimization. This is expected because of the lower number of parameters. In contrast, however, <italic>diagonal</italic> optimization resulted in even worse accuracies. This was observed, for example, for the linear classifier in the interleaving half circles dataset. Fixing in one direction and optimizing in the other direction resulted in an increase in accuracy (fixing &#x003BB;<sub>1</sub> &#x0003D; 1 led to &#x003BB;<sub>2</sub> &#x0003D; 4 with an improvement of 0.68%, and fixing &#x003BB;<sub>2</sub> &#x0003D; 1 led to &#x003BB;<sub>1</sub> &#x0003D; &#x02212;3 with an improvement of 0.54%). However, combining the independent results led to &#x0039B; &#x0003D; [&#x02212;3, 4] and a loss of accuracy of &#x02212;0.36%. This can get arbitrarily bad because the outcome of a combination of the independent optimizations was unknown in advance.</p>
<p>To further study the behavior of the different optimization methods, the random dataset <xref ref-type="fig" rid="F1">Figure 1D</xref> was generated 10 times with different random seeds and the accuracy for each optimization method was measured for the five classifiers given in <xref ref-type="table" rid="T1">Table 1</xref> with a stratified cross validation with 10 folds and 5 repetitions. The average of the accuracy is given in <xref ref-type="table" rid="T3">Table 3</xref>.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Average accuracy of five classifiers before and after applying Box-Cox transformation using three optimization strategies for 10 times regenerated random dataset <xref ref-type="fig" rid="F1">Figure 1D</xref> with different random seeds.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Classifier</bold></th>
<th valign="top" align="center"><bold>Acc before [%]</bold></th>
<th valign="top" align="center"><bold>Acc after [%]</bold></th>
<th valign="top" align="center"><bold>Full (&#x003B4;) [%]</bold></th>
<th valign="top" align="center"><bold>Spherical (&#x003B4;) [%]</bold></th>
<th valign="top" align="center"><bold>Diagonal (&#x003B4;) [%]</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Linear</td>
<td valign="top" align="center">84.1</td>
<td valign="top" align="center">86.7</td>
<td valign="top" align="center">2.6</td>
<td valign="top" align="center">1.8</td>
<td valign="top" align="center">0.3</td>
</tr>
<tr>
<td valign="top" align="left">KNN</td>
<td valign="top" align="center">92.0</td>
<td valign="top" align="center">92.4</td>
<td valign="top" align="center">0.4</td>
<td valign="top" align="center">0.2</td>
<td valign="top" align="center">0.3</td>
</tr>
<tr>
<td valign="top" align="left">Bayesian</td>
<td valign="top" align="center">89.0</td>
<td valign="top" align="center">90.2</td>
<td valign="top" align="center">1.3</td>
<td valign="top" align="center">0.8</td>
<td valign="top" align="center">1.0</td>
</tr>
<tr>
<td valign="top" align="left">SVC</td>
<td valign="top" align="center">91.9</td>
<td valign="top" align="center">92.3</td>
<td valign="top" align="center">0.4</td>
<td valign="top" align="center">0.2</td>
<td valign="top" align="center">0.2</td>
</tr>
<tr>
<td valign="top" align="left">NN</td>
<td valign="top" align="center">92.3</td>
<td valign="top" align="center">92.6</td>
<td valign="top" align="center">0.3</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">0.1</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Column Acc after corresponded to full optimization, which was observed as the optimal optimization. Spherical was able to get smaller but also consistent improvements. Diagonal achieved smaller gains</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>The <italic>Full</italic> optimization led consistently to the highest improvement in accuracy. Both <italic>Spherical</italic> and <italic>Diagonal</italic> optimization achieved an improvement for all classifiers. <italic>Diagonal</italic> optimization was better or equal than <italic>Spherical</italic> optimization for all classifiers except for the linear classifier.</p>
</sec>
<sec id="s4">
<title>Model and Optimization</title>
<p>The previous section showed that <italic>full</italic> optimization led to the best improvements. It was also demonstrated that the optimization was dependent on the classifier. Therefore, we propose a procedure for classifier-dependent multi-dimensional non-convex optimization. First, the general setup is described. Then, naive optimization is introduced. This was used as a baseline but suffered from the curse of dimensionality. Next, an iterative optimization is described that solved the dimensionality problem. Subsequently, various techniques for improving the iterative procedure are presented.</p>
<p>The general setup that was used with different optimization techniques consisted of a training function and a predicting function. It is shown in <xref ref-type="table" rid="TA2">Algorithm 2</xref>. First, a model was trained to find the optimal parameter, &#x0039B;, for the Box-Cox transformation with a given classifier. Then the predicting function was used with the optimized Box-Cox parameter, &#x0039B;, to create predictions.</p>
<table-wrap position="float" id="TA2">
<caption><p><bold>Algorithm 2</bold> : Setup</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic><sub><italic>train</italic></sub>: training features</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>Y</italic><sub><italic>train</italic></sub>: training class labels</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic><sub><italic>test</italic></sub>: testing features</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>C</italic>: classifier/model to optimize for</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>M</italic>: min-max scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>S</italic>: standard scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0039B;, <italic>C, M, S</italic>&#x02190; fit_model(<italic>X</italic><sub><italic>train</italic></sub>, <italic>Y</italic><sub><italic>train</italic></sub>, <italic>C</italic>)</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>Y</italic><sub><italic>test</italic></sub>&#x02190; prediction(<italic>X</italic><sub><italic>test</italic></sub>, &#x0039B;, <italic>C, M, S</italic>)</monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<p>The training procedure is given in <xref ref-type="table" rid="TA3">Algorithm 3</xref>. It requires the features, the corresponding class labels, a classifier, and an optimization procedure for &#x0039B;. Suitable optimization procedures are given in <xref ref-type="table" rid="TA5">Algorithm 5</xref> (restricted to 2-dimensional data) and <xref ref-type="table" rid="TA6">Algorithm 6</xref> with further improvements for the latter in 6.1, 6.2, and 6.3. It first scaled the data into the range [1, 2] to ensure that the features were positive so that the Box-Cox transformation could be applied, and to ensure that the features did not explode at a larger &#x0039B;. Then, an optimization procedure was applied to find suitable values for &#x0039B;. As described in the previous section, this was dependent on the classifier itself. Next, the Box-Cox transformation was applied to the features with the optimized &#x0039B;. Then, the data were standard scaled to help classifiers that depended on a distance measure. Finally, the classifier was trained.</p>
<table-wrap position="float" id="TA3">
<caption><p><bold>Algorithm 3</bold> : fit_model(<italic>X, Y, C</italic>)</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace> 1:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic>: features</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 2:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>Y</italic>: corresponding class labels</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 3:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>C</italic>: classifier to optimize for</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 4:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>Opt</italic>: optimization procedure for optimizing &#x0039B;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 5:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>M</italic>: min-max scaler into the range [1, 2]</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 6:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>S</italic>: standard scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 7:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 8:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic><sub><italic>M</italic></sub>&#x02190;<italic>M</italic>.<italic>fit</italic>_<italic>transform</italic>(<italic>X</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; fit min-max scaler and apply it</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 9:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0039B;&#x02190;<italic>Opt</italic>(<italic>X</italic><sub><italic>M</italic></sub>, <italic>Y, C</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; find optimized &#x0039B;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 10:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; (e.g. <xref ref-type="table" rid="TA6">Algorithm 6</xref>, <xref ref-type="table" rid="TA7">6.1</xref>, <xref ref-type="table" rid="TA8">6.2</xref>, <xref ref-type="table" rid="TA9">6.3</xref>, and <xref ref-type="table" rid="TA5">5</xref> for 2D)</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 11:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic><sub><italic>B</italic></sub>&#x02190; boxcox(<italic>X</italic><sub><italic>M</italic></sub>, &#x0039B;) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; apply Box-Cox transformation</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 12:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic><sub><italic>S</italic></sub>&#x02190;<italic>S</italic>.<italic>fit</italic>_<italic>transform</italic>(<italic>X</italic><sub><italic>B</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; fit standard scaler and apply it</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 13:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>C</italic>.train(<italic>X</italic><sub><italic>S</italic></sub>, <italic>Y</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; train classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 14:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 15:&#x000A0;&#x000A0;&#x000A0;&#x000A0;return &#x0039B;, <italic>C, M, S</italic></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<p>The prediction procedure is presented in <xref ref-type="table" rid="TA4">Algorithm 4</xref>. It required the features, &#x0039B;, which was optimized during training, a fitted classifier, a fitted min-max scaler, and a fitted standard scaler. First, the method min-max scaled the data, then applied the Box-Cox transformation with the given &#x0039B;, then used standard scaling, and finally predicted the labels with the given classifier.</p>
<table-wrap position="float" id="TA4">
<caption><p><bold>Algorithm 4</bold> : prediction(<italic>X</italic>, &#x0039B;, <italic>C, M, S</italic>)</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic>: features</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0039B;: optimized parameters of Box-Cox transformation</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>C</italic>: trained classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>M</italic>: fitted min-max scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>S</italic>: fitted standard scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic><sub><italic>M</italic></sub>&#x02190;<italic>M</italic>.<italic>transform</italic>(<italic>X</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; apply fitted min-max scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic><sub><italic>B</italic></sub>&#x02190; boxcox(<italic>X</italic><sub><italic>M</italic></sub>, &#x0039B;) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; apply Box-Cox transformation</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic><sub><italic>S</italic></sub>&#x02190;<italic>S</italic>.<italic>transform</italic>(<italic>X</italic><sub><italic>B</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; apply fitted standard scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>Y</italic>&#x02190;<italic>C</italic>.predict(<italic>X</italic><sub><italic>S</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; predict labels with trained classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;return <italic>Y</italic></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<p>To follow the previously introduced notation in this paper, the optimization criteria <italic>L</italic>(&#x000B7;, &#x000B7;) is defined as 1&#x02212;<italic>ACC</italic>, which maximizes accuracy <italic>ACC</italic> by minimizing the 1&#x02212;<italic>ACC</italic> optimizer. The first optimization procedure that was used in the training function was a grid search. This means that a set of possible values for every &#x003BB;<sub><italic>i</italic></sub> was specified. Then, the optimization tried all combinations. This was an exhaustive search and assuming model fitting and predicting as constant, it runs in polynomial time <italic>O</italic>(<italic>L</italic><sup><italic>p</italic></sup>) where <italic>L</italic> is the number of possible values and <italic>p</italic> is the number of features. Therefore, the grid search suffered from the curse of dimensionality. For example, trying 10 values for 10 features requires 10 billion evaluations. Therefore, this became quite infeasible. Nevertheless, it was used as a reference model for lower dimensional datasets. The pseudo-code for this method for the 2-dimensional case was given in <xref ref-type="table" rid="TA5">Algorithm 5</xref> and was directly used as optimization for training in <xref ref-type="table" rid="TA3">Algorithm 3</xref> in line 9.</p>
<table-wrap position="float" id="TA5">
<caption><p><bold>Algorithm 5</bold> : 2D grid search(<italic>X, Y, C</italic>)</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic>: features</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>Y</italic>: corresponding class labels</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>C</italic>: classifier to optimize for</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>S</italic>: standard scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0039B;<sub><italic>opt</italic></sub>: optimized Box-Cox parameters</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>grid</italic>&#x02190;{&#x02212;5, &#x02212;4, &#x02026;, 4, 5} &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; candidate values for each direction</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>A</italic>&#x02190;0 &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; best accuracy obtained during search</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;for each &#x003BB;<sub>1</sub> in <italic>grid</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for each &#x003BB;<sub>2</sub> in <italic>grid</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>tmp</italic></sub>&#x02190;[&#x003BB;<sub>1</sub>, &#x003BB;<sub>2</sub>]</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>B</italic></sub>&#x02190; boxcox(<italic>X</italic>, &#x0039B;<sub><italic>tmp</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; apply Box-Cox transformation</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>S</italic></sub>&#x02190;<italic>S</italic>.<italic>fit</italic>_<italic>transform</italic>(<italic>X</italic><sub><italic>B</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; fit standard scaler and apply it</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>C</italic>.train(<italic>X</italic><sub><italic>S</italic></sub>, <italic>Y</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; train classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>P</italic>&#x02190;<italic>C</italic>.predict(<italic>X</italic><sub><italic>S</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; evaluate classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>A</italic><sub><italic>tmp</italic></sub>&#x02190; accuracy(<italic>P, Y</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; evaluate accuracy</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; if <italic>A</italic><sub><italic>tmp</italic></sub>&#x0003E;<italic>A</italic> <bold>then</bold> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; update &#x0039B;<sub><italic>opt</italic></sub> if accuracy is improved</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>A</italic>&#x02190;<italic>A</italic><sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>opt</italic></sub>&#x02190;&#x0039B;<sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>if</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;return &#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<p>To solve the dimensionality problem of a grid search, we proposed an iterative optimization. First, an initial point, <italic>G</italic>&#x02208;&#x0211D;<sup><italic>p</italic></sup>, for &#x0039B; was specified. Then, starting from this point, all directions were fixed except for one. The not-fixed direction was optimized with a 1&#x02212;dimensional grid search. Therefore, a set of candidate values for the search needed to be defined. Comparing the possible values and selecting the one that gave the highest improvement led to optimization in the first direction. Then, the next direction was unfixed and all other directions were fixed. Again, the best value was selected with a 1-dimensional grid search. This procedure was repeated until all directions were optimized once. This was referred to as one <italic>epoch</italic>. After that, the same procedure restarted with the previously optimized solution instead of the initial point <italic>G</italic>. The pseudocode for this iterative optimization was given in <xref ref-type="table" rid="TA6">Algorithm 6</xref> and will be referred to as <italic>Iterative grid search</italic>. It was directly used as an optimization procedure for training in <xref ref-type="table" rid="TA3">Algorithm 3</xref> in line 9. Assuming model fitting and predicting as constant, the advantage of this method is that it scaled linearly <italic>O</italic>(<italic>epochs</italic>&#x000B7;<italic>p</italic>&#x000B7;<italic>gridsize</italic>) in the number of features <italic>p</italic>, where gridsize denotes the number of points used for the 1-dimensional grid search. This procedure had three hyperparameters that influenced the result (initial starting point <italic>G</italic>, number of epochs, and the grid).</p>
<table-wrap position="float" id="TA6">
<caption><p><bold>Algorithm 6</bold> : Iterative grid search(<italic>X, Y, C</italic>)</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; full optimization to get optimal &#x0039B; vector</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 1:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic>: features</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 2:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>Y</italic>: corresponding class labels</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 3:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>C</italic>: classifier to optimize for</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 4:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>p</italic>: number of features/directions</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 5:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>S</italic>: standard scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 6:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>G</italic>: initial starting point</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 7:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0039B;<sub><italic>opt</italic></sub>&#x02190;<italic>G</italic> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; optimized Box-Cox parameter</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 8:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>grid</italic>&#x02190;{&#x02212;5, &#x02212;4, &#x02026;, 4, 5} &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; candidate values for each direction</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 9:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>epochs</italic>&#x02190;<italic>e</italic> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; number of epochs</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 10:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>A</italic>&#x02190;0 &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; best accuracy obtained during search</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 11:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 12:&#x000A0;&#x000A0;&#x000A0;&#x000A0;for <italic>epoch</italic> &#x0003D; 1 to <italic>epochs</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 13:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for <italic>dir</italic> &#x0003D; 1 to <italic>p</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 14:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>tmp</italic></sub>&#x02190;&#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 15:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for each &#x003BB;<sub><italic>i</italic></sub> in <italic>grid</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 16:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>tmp</italic></sub>[<italic>dir</italic>]&#x02190;&#x003BB;<sub><italic>i</italic></sub> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; change one direction</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 17:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>B</italic></sub>&#x02190; boxcox(<italic>X</italic>, &#x0039B;<sub><italic>tmp</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; apply Box-Cox transformation</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 18:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>S</italic></sub>&#x02190;<italic>S</italic>.<italic>fit</italic>_<italic>transform</italic>(<italic>X</italic><sub><italic>B</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; fit standard scaler and apply it</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 19:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>C</italic>.train(<italic>X</italic><sub><italic>S</italic></sub>, <italic>Y</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; train classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 20:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>P</italic>&#x02190;<italic>C</italic>.predict(<italic>X</italic><sub><italic>S</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; evaluate classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 21:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>A</italic><sub><italic>tmp</italic></sub>&#x02190; accuracy(<italic>P, Y</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; evaluate accuracy</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 22:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 23:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; if <italic>A</italic><sub><italic>tmp</italic></sub>&#x0003E;<italic>A</italic> <bold>then</bold> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; update &#x0039B;<sub><italic>opt</italic></sub> if accuracy is improved</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 24:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>A</italic>&#x02190;<italic>A</italic><sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 25:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>opt</italic></sub>&#x02190;&#x0039B;<sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 26:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>if</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 27:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 28:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 29:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 30:&#x000A0;&#x000A0;end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 31:&#x000A0;&#x000A0;return &#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<p><xref ref-type="fig" rid="F3">Figure 3A</xref> shows why multiple epochs were beneficial. From the initial point <italic>G</italic> &#x0003D; [1, 1], first optimizing vertically in the &#x003BB;<sub>1</sub> direction, and then horizontally in the &#x003BB;<sub>2</sub> direction, gave an optimal value of 2 for &#x0039B; &#x0003D; [2, 2]. If another epoch, and thus another optimization in both directions, was added, the global optimal solution 3 at &#x0039B; &#x0003D; [3, 2] was obtained. Therefore, multiple epochs helped to find better optimization results.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Motivations for improvements to the iterative method. Multiple epochs helped to further advance the optimization to the maximum. Multiple starting points and shuffling were introduced for escaping or avoiding a local maximum, and a finer grid provided the ability to explore hidden details. <bold>(A)</bold> Multiple epochs, <bold>(B)</bold> Multiple starting points and shuffling optimization order, and <bold>(C)</bold> Finer grid.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-877569-g0003.tif"/>
</fig>
<p>Another useful improvement was to restart the optimization with another initial point. <xref ref-type="fig" rid="F3">Figure 3B</xref> illustrates this. Starting at <italic>G</italic> &#x0003D; [1, 1] and first optimizing vertically and then horizontally resulted in &#x0039B;<sub><italic>opt</italic></sub> &#x0003D; [2, 1]. This cannot be further optimized with the given iterative method. Unfortunately, there was a better solution at &#x0039B; &#x0003D; [1, 2]. If, for example, the optimization started at <italic>G</italic> &#x0003D; [2, 2], the global optimal solution could be attained. Therefore, it was beneficial to restart the optimization procedure with multiple initial points. Corresponding modifications to the <italic>Iterative grid search</italic> optimization are found in <xref ref-type="table" rid="TA7">Algorithm 6.1</xref>. It introduced the <italic>shift_epoch</italic> as a new hyperparameter that determined after how many <italic>epochs</italic> a new starting point <italic>G</italic> was generated.</p>
<table-wrap position="float" id="TA7">
<caption><p><bold>Algorithm 6.1</bold>: Shift(<italic>X, Y, C</italic>)</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; iterative grid search with multiple start points</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x02026; &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; same as <xref ref-type="table" rid="TA6">Algorithm 6</xref> line 1&#x02212;10</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>shift</italic>_<italic>epoch</italic>&#x02190;<italic>s</italic> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; number of epochs until new starting point</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>shift</italic>&#x02190; False &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; Boolean flag to indicate a new starting point</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;for <italic>epoch</italic> &#x0003D; 1 to <italic>epochs</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; if <italic>epoch</italic> mod <italic>shift</italic>_<italic>epoch</italic> &#x0003D; =0 <bold>then</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>G</italic>&#x02190; generate_initial_point()</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>shift</italic>&#x02190; True</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>if</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; if <italic>shift</italic> <bold>then</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>tmp</italic></sub>&#x02190;<italic>G</italic></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; else</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>tmp</italic></sub>&#x02190;&#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>if</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for <italic>dir</italic> &#x0003D; 1 to <italic>p</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for each &#x003BB;<sub><italic>i</italic></sub> in <italic>grid</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x02026; &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; same as <xref ref-type="table" rid="TA6">Algorithm 6</xref> line 16&#x02212;21</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; if <italic>A</italic><sub><italic>tmp</italic></sub>&#x0003E;<italic>A</italic> <bold>then</bold> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; update &#x0039B;<sub><italic>opt</italic></sub> if accuracy is improved</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>A</italic>&#x02190;<italic>A</italic><sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>opt</italic></sub>&#x02190;&#x0039B;<sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>shift</italic>&#x02190; False</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>if</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;return &#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<p>The previous problem could also be solved by changing the order of the optimization directions. So far, the directions have been optimized numerically; that is, first, &#x003BB;<sub>1</sub> was optimized, then &#x003BB;<sub>2</sub> and so on. Starting (in <xref ref-type="fig" rid="F3">Figure 3B</xref>) again at the initial point <italic>G</italic> &#x0003D; [1, 1], instead of first optimizing in the vertical direction, optimization was done first in the horizontal direction. This directly found the global solution. Hence, shuffling the order of directions for optimization was also helpful. The corresponding changes to <italic>Iterative grid search</italic> are found in <xref ref-type="table" rid="TA8">Algorithm 6.2</xref>. Again, there was a new hyperparameter <italic>shuffle_epoch</italic> that determined after how many <italic>epochs</italic> the optimization order got shuffled.</p>
<table-wrap position="float" id="TA8">
<caption><p><bold>Algorithm 6.2</bold> : Shuffle(<italic>X, Y, C</italic>)</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; iterative grid search with changing order of the optimization directions</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x02026; &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; same as <xref ref-type="table" rid="TA6">Algorithm 6</xref> line 1&#x02212;10</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>dir</italic>_<italic>order</italic>&#x02190;[1, 2, &#x02026;, <italic>p</italic>] &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; order of directions for optimization</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>shuffle</italic>_<italic>epoch</italic>&#x02190;<italic>h</italic> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; number of epochs until the order</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; of direction gets shuffled</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;for <italic>epoch</italic> &#x0003D; 1 to <italic>epochs</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; if <italic>epoch</italic> mod <italic>shuffle</italic>_<italic>epoch</italic> &#x0003D; =0 <bold>then</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; shuffle(<italic>dir</italic>_<italic>order</italic>)</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>if</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for each <italic>dir</italic> in <italic>dir</italic>_<italic>order</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x02026; &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; same as <xref ref-type="table" rid="TA6">Algorithm 6</xref> line 14&#x02212;28</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;return &#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<p>Lastly, it might be possible to find a better solution if the grid search is denser. <xref ref-type="fig" rid="F3">Figure 3C</xref> demonstrates this. If the grid only used integer values, then it was impossible to find one of the global optimal solutions &#x0003D; 2. Hence, the grid should be refined to 0.5 increments. Unfortunately, this doubled the computational demand. Another refinement may further improve the result but increase the computational demand even more. One solution to circumvent the increasing computational costs, was to use local refinement. This means that the grid became locally denser and smaller. <italic>Iterative grid search</italic> uses the same global grid for every 1-dimensional grid search (e.g. {&#x02212;5, &#x02212;4, &#x02026;, 4, 5}). To get a finer grid, but with the same number of points, the grid needed to be attached locally to the current &#x0039B;<sub><italic>tmp</italic></sub>. Since the number of grid points ought to remain the same and the grid became denser, it spanned a smaller range of values. For example, starting with the grid {&#x02212;5, &#x02212;4, &#x02026;, 4, 5} and then doubling the resolution led to the following grid {&#x02212;2.5, &#x02212;2, &#x02026;, 2, 2.5}. Both have the same number of points. Instead of testing globally, if any, &#x003BB;<sub><italic>i</italic></sub>&#x02208;{&#x02212;5, &#x02212;4, &#x02026;, 4, 5} improved the result, the current optimal solution in this direction was used, and then the refined grid was attached to it. Therefore, it is tested, if any, &#x003BB;<sub><italic>i</italic></sub>&#x02208;{&#x003BB;<sub><italic>tmp, i</italic></sub>&#x02212;2.5, &#x003BB;<sub><italic>tmp, i</italic></sub>&#x02212;2, &#x02026;, &#x003BB;<sub><italic>tmp, i</italic></sub>&#x0002B;2, &#x003BB;<sub><italic>tmp, i</italic></sub>&#x0002B;2.5} improved the accuracy. To take advantage of both global and local optimization, a global search was used at the beginning of the optimization to capture the full search domain. After some epochs, a local refinement was used to obtain a finer search space. With this modification, the computational cost remained the same. Additionally, it allowed for more and finer candidate values that could result in improvement. Incorporating this method into <italic>Iterative grid search</italic> is shown in <xref ref-type="table" rid="TA9">Algorithm 6.3</xref>. As before, an additional hyperparameter <italic>finer_epoch</italic> was introduced to specify after how many <italic>epochs</italic> the grid was refined.</p>
<table-wrap position="float" id="TA9">
<caption><p><bold>Algorithm 6.3</bold> : Finer(<italic>X, Y, C</italic>)</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; iterative grid search with a refined grid</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x02026; &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; same as <xref ref-type="table" rid="TA6">Algorithm 6</xref> line 1&#x02212;10</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>finer</italic>&#x02190;0.5 &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; refinement of grid</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>finer</italic>_<italic>epoch</italic>&#x02190;<italic>f</italic> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; number of epochs until the grid gets finer</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;global = 1 &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; use global grid search at the beginning</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;for <italic>epoch</italic> &#x0003D; 1 to <italic>epochs</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; if <italic>epoch</italic> mod <italic>finer</italic>_<italic>epoch</italic> &#x0003D; =0 <bold>then</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>global</italic> &#x0003D; 0</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>grid</italic>&#x02190;<italic>finer</italic>&#x0002A;<italic>grid</italic> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; element-wise scale each element in grid</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>if</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for <italic>dir</italic> &#x0003D; 1 to <italic>p</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>tmp</italic></sub>&#x02190;&#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>candidates</italic>&#x02190;<italic>grid</italic>&#x0002B;(1&#x02212;<italic>global</italic>)&#x0002A;&#x0039B;<sub><italic>opt</italic></sub>[<italic>dir</italic>]</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for each &#x003BB;<sub><italic>i</italic></sub> in <italic>candidates</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x02026; &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; same as <xref ref-type="table" rid="TA6">Algorithm 6</xref> line 16&#x02212;26</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;return &#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<p>Additionally, <italic>spherical</italic> and <italic>diagonal</italic> optimizations are given in <xref ref-type="table" rid="TA10">Algorithms 7</xref>, <xref ref-type="table" rid="TA11">8</xref>. This was used for comparison with the proposed <italic>full</italic> optimization. These two methods were developed on classification accuracy like <italic>full</italic> optimization, rather than statistical evaluation (such as MLE or Fisher criterion) Bicego and Baldo (<xref ref-type="bibr" rid="B1">2016</xref>). The reason behind this approach is that the previous study showed that the Box-Cox parameter is classifier dependent.</p>
<table-wrap position="float" id="TA10">
<caption><p><bold>Algorithm 7</bold> : Spherical grid search(<italic>X, Y, C</italic>)</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; full optimization to get optimal &#x0039B; vector</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 1:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic>: features</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 2:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>Y</italic>: corresponding class labels</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 3:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>C</italic>: classifier to optimize for</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 4:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>p</italic>: number of features/directions</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 5:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>S</italic>: standard scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 6:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0039B;<sub><italic>opt</italic></sub>&#x02190;0 &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; optimized Box-Cox parameter</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 7:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>grid</italic>&#x02190;{&#x02212;5, &#x02212;4, &#x02026;, 4, 5} &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; candidate values</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 8:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>A</italic>&#x02190;0 &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; best accuracy obtained during search</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 9:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 10:&#x000A0;&#x000A0;&#x000A0;&#x000A0;for each &#x003BB;<sub><italic>i</italic></sub> in <italic>grid</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 11:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>tmp</italic></sub>&#x02190;[&#x003BB;<sub><italic>i</italic></sub>, &#x003BB;<sub><italic>i</italic></sub>, &#x02026;, &#x003BB;<sub><italic>i</italic></sub>]</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 12:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>B</italic></sub>&#x02190; boxcox(<italic>X</italic>, &#x0039B;<sub><italic>tmp</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; apply Box-Cox transformation</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 13:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>S</italic></sub>&#x02190;<italic>S</italic>.<italic>fit</italic>_<italic>transform</italic>(<italic>X</italic><sub><italic>B</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; fit standard scaler and apply it</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 14:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>C</italic>.train(<italic>X</italic><sub><italic>S</italic></sub>, <italic>Y</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; train classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 15:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>P</italic>&#x02190;<italic>C</italic>.predict(<italic>X</italic><sub><italic>S</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; evaluate classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 16:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>A</italic><sub><italic>tmp</italic></sub>&#x02190; accuracy(<italic>P, Y</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; evaluate accuracy</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 17:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 18:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; if <italic>A</italic><sub><italic>tmp</italic></sub>&#x0003E;<italic>A</italic> <bold>then</bold> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; update &#x0039B;<sub><italic>opt</italic></sub> if accuracy is improved</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 19:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>A</italic>&#x02190;<italic>A</italic><sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 20:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>opt</italic></sub>&#x02190;&#x0039B;<sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 21:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>if</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 22:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 23:&#x000A0;&#x000A0;&#x000A0;&#x000A0;end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 24:&#x000A0;&#x000A0;&#x000A0;&#x000A0;return &#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="TA11">
<caption><p><bold>Algorithm 8</bold> : Diagonal grid search(<italic>X, Y, C</italic>)</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; full optimization to get optimal &#x0039B; vector</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 1:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>X</italic>: features</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 2:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>Y</italic>: corresponding class labels</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 3:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>C</italic>: classifier to optimize for</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 4:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>p</italic>: number of features/directions</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 5:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>S</italic>: standard scaler</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 6:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x0039B;<sub><italic>opt</italic></sub>&#x02190;[1, 1, &#x02026;, 1] &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; optimized Box-Cox parameter</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 7:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>grid</italic>&#x02190;{&#x02212;5, &#x02212;4, &#x02026;, 4, 5} &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; candidate values for each direction</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 8:&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>A</italic>&#x02190;0 &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; best accuracy obtained during search</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 9:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 10:&#x000A0;&#x000A0;&#x000A0;&#x000A0;for <italic>dir</italic> &#x0003D; 1 to <italic>p</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 11:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>tmp</italic></sub>&#x02190;[1, 1, &#x02026;, 1]</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 12:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; for each &#x003BB;<sub><italic>i</italic></sub> in <italic>grid</italic> <bold>do</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 13:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>tmp</italic></sub>[<italic>dir</italic>]&#x02190;&#x003BB;<sub><italic>i</italic></sub> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; change one direction</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 14:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>B</italic></sub>&#x02190; boxcox(<italic>X</italic>, &#x0039B;<sub><italic>tmp</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; apply Box-Cox transformation</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 15:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>X</italic><sub><italic>S</italic></sub>&#x02190;<italic>S</italic>.<italic>fit</italic>_<italic>transform</italic>(<italic>X</italic><sub><italic>B</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; fit standard scaler and apply it</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 16:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>C</italic>.train(<italic>X</italic><sub><italic>S</italic></sub>, <italic>Y</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; train classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 17:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>P</italic>&#x02190;<italic>C</italic>.predict(<italic>X</italic><sub><italic>S</italic></sub>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; evaluate classifier</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 18:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>A</italic><sub><italic>tmp</italic></sub>&#x02190; accuracy(<italic>P, Y</italic>) &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; evaluate accuracy</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 19:&#x000A0;&#x000A0;&#x000A0;&#x000A0;</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 20:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; if <italic>A</italic><sub><italic>tmp</italic></sub>&#x0003E;<italic>A</italic> <bold>then</bold> &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x022B3; update &#x0039B;<sub><italic>opt</italic></sub> if accuracy is improved</monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 21:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>A</italic>&#x02190;<italic>A</italic><sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 22:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x0039B;<sub><italic>opt</italic></sub>&#x02190;&#x0039B;<sub><italic>tmp</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 23:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>if</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 24:&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 25:&#x000A0;&#x000A0;&#x000A0;&#x000A0;end <bold>for</bold></monospace></td></tr>
<tr><td align="left" valign="top"><monospace> 26:&#x000A0;&#x000A0;&#x000A0;&#x000A0;return &#x0039B;<sub><italic>opt</italic></sub></monospace></td></tr> 
</tbody>
</table>
</table-wrap>
</sec>
<sec sec-type="results" id="s5">
<title>Results</title>
<p>Following the proposed optimization procedure was applied to different real-world datasets. The setup in <xref ref-type="table" rid="TA2">Algorithm 2</xref> was used which means that <xref ref-type="table" rid="TA3">Algorithm 3</xref> was used to train the model on the training data with the iterative optimization from <xref ref-type="table" rid="TA6">Algorithm 6</xref> and the corresponding improvements <xref ref-type="table" rid="TA7">6.1</xref>, <xref ref-type="table" rid="TA8">6.2</xref>, and <xref ref-type="table" rid="TA9">6.3</xref>. Then the performance was measured using the prediction function in <xref ref-type="table" rid="TA4">Algorithm 4</xref>. The examined classifiers are given in <xref ref-type="table" rid="T1">Table 1</xref>. All results were measured with 10-fold stratified crossvalidation and 5 repetitions. To test the proposed method various settings for the hyperparameters were used. The setup is given in <xref ref-type="table" rid="T4">Table 4</xref>. Optimization in one direction was done evenly spaced over the interval [&#x02212;5, 5] and gridsize corresponded to the number of grid points (e.g. gridsize of 11 gave the set {&#x02212;5, &#x02212;4, &#x02026;, 4, 5} as candidate values). The <italic>Iterative grid search</italic> was just iterative optimization without any further improvements described in <xref ref-type="table" rid="TA6">Algorithm 6</xref>. <italic>Shift, Shuffle</italic>, and <italic>Finer</italic> exclusively showed the influence of restarting the optimization with a new starting point given in <xref ref-type="table" rid="TA7">Algorithm 6.1</xref>, changing the order of directions given in <xref ref-type="table" rid="TA8">Algorithm 6.2</xref>, or refining the optimization grid given in <xref ref-type="table" rid="TA9">Algorithm 6.3</xref>. <italic>Combined 1</italic> and <italic>Combined 2</italic> demonstrated how these improvements to the <italic>Iterative grid search</italic> optimization behaved in combination. The initial starting point, <italic>G</italic>&#x02208;&#x0211D;<sup><italic>p</italic></sup>, for the iterative optimization was chosen in each direction as the MLE. For comparison <italic>spherical</italic> and <italic>diagonal</italic> optimizations are given in <xref ref-type="table" rid="TA10">Algorithms 7</xref>, <xref ref-type="table" rid="TA11">8</xref> was also evaluated. Further traditional Box-Cox optimization of the log-likelihood function as in Box and Cox (<xref ref-type="bibr" rid="B2">1964</xref>) was applied column-wise. This approach maximized the Gaussianity of each column and is called <italic>MLE</italic> in the following tables.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Hyperparameter settings to test the iterative optimization on real-world data.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Name</bold></th>
<th valign="top" align="center"><bold>Gridsize</bold></th>
<th valign="top" align="center"><bold>Epochs</bold></th>
<th valign="top" align="center"><bold>Shift_epoch</bold></th>
<th valign="top" align="center"><bold>Shuffle_epoch</bold></th>
<th valign="top" align="center"><bold>Finer_epoch</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Iterative grid search</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left">Shift</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">8</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">8</td>
</tr>
<tr>
<td valign="top" align="left">Finer</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">16</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2</td>
<td valign="top" align="center">21</td>
<td valign="top" align="center">16</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">4</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec>
<title> Sonar Dataset</title>
<p>The sonar dataset had 207 samples and 60 features. The labels were binary and indicated whether the sonar signal was reflected by a rock or metal. The measurements for the accuracy of the repeated cross-validation are given in <xref ref-type="table" rid="T5">Table 5</xref>. Additional measurements for the F1-score are given in the <xref ref-type="supplementary-material" rid="SM1">Appendix</xref> in Table B1.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Improvement &#x003B4; in accuracy for different iterative optimization settings in the sonar dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>Linear [%]</bold></th>
<th valign="top" align="center"><bold>KNN [%]</bold></th>
<th valign="top" align="center"><bold>Bayesian [%]</bold></th>
<th valign="top" align="center"><bold>SVC [%]</bold></th>
<th valign="top" align="center"><bold>NN [%]</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">75.195</td>
<td valign="top" align="center">81.343</td>
<td valign="top" align="center">67.700</td>
<td valign="top" align="center">84.052</td>
<td valign="top" align="center">84.024</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">0.076</td>
<td valign="top" align="center">0.290</td>
<td valign="top" align="center">1.829</td>
<td valign="top" align="center">&#x02013;0.395</td>
<td valign="top" align="center">&#x02013;0.957</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">0.586</td>
<td valign="top" align="center">0.095</td>
<td valign="top" align="center">6.067</td>
<td valign="top" align="center">&#x02013;0.300</td>
<td valign="top" align="center">&#x02013;1.910</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">&#x02013;0.167</td>
<td valign="top" align="center">0.000</td>
<td valign="top" align="center">6.443</td>
<td valign="top" align="center">1.810</td>
<td valign="top" align="center">&#x02013;0.291</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">1.162</td>
<td valign="top" align="center">1.824</td>
<td valign="top" align="center">7.919</td>
<td valign="top" align="center">2.286</td>
<td valign="top" align="center">&#x02013;0.386</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">0.976</td>
<td valign="top" align="center">1.919</td>
<td valign="top" align="center">7.919</td>
<td valign="top" align="center">2.381</td>
<td valign="top" align="center">&#x02013;0.386</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">1.162</td>
<td valign="top" align="center">1.824</td>
<td valign="top" align="center">7.919</td>
<td valign="top" align="center">2.286</td>
<td valign="top" align="center">&#x02013;0.386</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">0.876</td>
<td valign="top" align="center">1.824</td>
<td valign="top" align="center">7.919</td>
<td valign="top" align="center">2.286</td>
<td valign="top" align="center">&#x02013;0.386</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">0.600</td>
<td valign="top" align="center">1.838</td>
<td valign="top" align="center">7.157</td>
<td valign="top" align="center">2.190</td>
<td valign="top" align="center">&#x02013;0.386</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">2.795</td>
<td valign="top" align="center">3.267</td>
<td valign="top" align="center">8.100</td>
<td valign="top" align="center">2.010</td>
<td valign="top" align="center">&#x02013;0.486</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The proposed optimization achieved a consistent improvement except for the neural network. The different hyperparameter settings had a varying influence on the classifiers defined in <xref ref-type="table" rid="T4">Table 4</xref>. Combined 2 improved the linear, KNN, and Bayesian classifier, whereas Shift already delivered the best performance for SVC. Additionally, the proposed optimization achieved higher improvements than Diagonal, Spherical and MLE except for the neural network</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>There was an improvement in accuracy for the proposed optimization for all classifiers except for the neural network. In particular, the Bayesian classifier improved on average by 7.8%. In contrast, the neural network decreased by &#x02212;0.4% on average. The influence of the changes to the basic iterative optimization in <xref ref-type="table" rid="TA6">Algorithm 6</xref> with <italic>Shift, Shuffle</italic>, and <italic>Finer</italic> was observed for the linear and KNN classifiers. <italic>Shift</italic> restarted the optimization with a new initial point, and it seemed to decrease the accuracy for the linear classifier but slightly increased it for KNN. In contrast, shuffling the order of the directions during optimization did not result in an advantage for the classifier. Refining the grid after some epochs did not provide an increase in accuracy compared to basic iterative optimization. Combining these methods into one optimization sometimes decreased the performance (SVC) and sometimes increased the performance (KNN). Interestingly, the influence of the <italic>Combined 1</italic> was better for the SVC and the neural network when compared to the <italic>Combined 2</italic>, which had a finer grid for &#x0039B;. The opposite was observed for the other classifiers. The <italic>Diagonal, Spherical</italic>, and <italic>MLE</italic> optimization performed worse than the proposed optimization except the <italic>MLE</italic> optimization lead to a smaller decrease in accuracy for the neural network. This behavior was also observed for the F1-score measurements given in <xref ref-type="supplementary-material" rid="SM1">Appendix</xref> in Table B1.</p>
<p>A 2-dimensional feature study was also performed. Two 2-dimensional datasets, <xref ref-type="fig" rid="F4">Figures 4A,B</xref>, were created by extracting two random features from the sonar dataset. With a chi-square test, the ranks of the features were calculated. Then, a dataset with the two highest ranking features, <xref ref-type="fig" rid="F4">Figure 4C</xref> and a dataset <xref ref-type="fig" rid="F4">Figure 4D</xref>, with the third and fourth highest ranking features, were built. The performance of a grid search was measured and served as the baseline. This was done by using the training function in <xref ref-type="table" rid="TA3">Algorithm 3</xref> with 2D grid search from <xref ref-type="table" rid="TA5">Algorithm 5</xref> as optimization procedure and <xref ref-type="table" rid="TA4">Algorithm 4</xref> to create predictions. The grid search used the grid {&#x02212;5, &#x02212;4, &#x02026;, 4, 5} in every direction. The datasets are shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, and the results for the accuracy are given in <xref ref-type="table" rid="T6">Table 6</xref> and for the F1-scores in <xref ref-type="supplementary-material" rid="SM1">Appendix</xref> in Table B2.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>2-D datasets extracted from the sonar dataset to compare proposed iterative optimization with a grid search. <bold>(A)</bold> features 8 and 41, <bold>(B)</bold> features 2 and 48, <bold>(C)</bold> features 11 and 45, and <bold>(D)</bold> features 12 and 36.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-877569-g0004.tif"/>
</fig>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>Improvement &#x003B4; in accuracy for different iterative optimization settings on 2-dimensional subsets of the sonar dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>Linear [%]</bold></th>
<th valign="top" align="center"><bold>KNN [%]</bold></th>
<th valign="top" align="center"><bold>Bayesian [%]</bold></th>
<th valign="top" align="center"><bold>SVC [%]</bold></th>
<th valign="top" align="center"><bold>NN [%]</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="6">Features 8 and 41</td>
</tr>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">53.576</td>
<td valign="top" align="center">54.090</td>
<td valign="top" align="center">57.390</td>
<td valign="top" align="center">62.224</td>
<td valign="top" align="center">63.267</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">&#x02013;2.952</td>
<td valign="top" align="center">0.729</td>
<td valign="top" align="center">2.276</td>
<td valign="top" align="center">&#x02013;0.652</td>
<td valign="top" align="center">&#x02013;0.100</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">6.148</td>
<td valign="top" align="center">0.824</td>
<td valign="top" align="center">4.067</td>
<td valign="top" align="center">&#x02013;0.471</td>
<td valign="top" align="center">0.005</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">&#x02013;0.310</td>
<td valign="top" align="center">0.643</td>
<td valign="top" align="center">4.267</td>
<td valign="top" align="center">0.857</td>
<td valign="top" align="center">0.014</td>
</tr>
<tr>
<td valign="top" align="left">2D grid search (&#x003B4;)</td>
<td valign="top" align="center">7.081</td>
<td valign="top" align="center">&#x02013;0.924</td>
<td valign="top" align="center">3.476</td>
<td valign="top" align="center">&#x02013;1.138</td>
<td valign="top" align="center">&#x02013;1.243</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">7.595</td>
<td valign="top" align="center">&#x02013;1.195</td>
<td valign="top" align="center">3.481</td>
<td valign="top" align="center">&#x02013;0.195</td>
<td valign="top" align="center">0.100</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">8.648</td>
<td valign="top" align="center">&#x02013;1.005</td>
<td valign="top" align="center">3.681</td>
<td valign="top" align="center">&#x02013;0.767</td>
<td valign="top" align="center">&#x02013;0.567</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">7.595</td>
<td valign="top" align="center">&#x02013;1.195</td>
<td valign="top" align="center">3.481</td>
<td valign="top" align="center">&#x02013;0.195</td>
<td valign="top" align="center">0.100</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">8.657</td>
<td valign="top" align="center">&#x02013;1.200</td>
<td valign="top" align="center">3.386</td>
<td valign="top" align="center">&#x02013;0.386</td>
<td valign="top" align="center">&#x02013;0.376</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">8.843</td>
<td valign="top" align="center">&#x02013;2.252</td>
<td valign="top" align="center">3.671</td>
<td valign="top" align="center">&#x02013;0.276</td>
<td valign="top" align="center">&#x02013;0.376</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">6.743</td>
<td valign="top" align="center">&#x02013;1.190</td>
<td valign="top" align="center">4.357</td>
<td valign="top" align="center">0.200</td>
<td valign="top" align="center">&#x02013;0.671</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6">Features 2 and 48</td>
</tr>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">55.948</td>
<td valign="top" align="center">64.862</td>
<td valign="top" align="center">62.590</td>
<td valign="top" align="center">66.538</td>
<td valign="top" align="center">66.719</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">3.686</td>
<td valign="top" align="center">0.267</td>
<td valign="top" align="center">4.943</td>
<td valign="top" align="center">0.886</td>
<td valign="top" align="center">&#x02013;1.329</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">11.276</td>
<td valign="top" align="center">&#x02013;0.424</td>
<td valign="top" align="center">4.957</td>
<td valign="top" align="center">1.467</td>
<td valign="top" align="center">0.214</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">&#x02013;1.186</td>
<td valign="top" align="center">1.424</td>
<td valign="top" align="center">5.519</td>
<td valign="top" align="center">0.019</td>
<td valign="top" align="center">&#x02013;0.852</td>
</tr>
<tr>
<td valign="top" align="left">2D grid search (&#x003B4;)</td>
<td valign="top" align="center">11.562</td>
<td valign="top" align="center">0.090</td>
<td valign="top" align="center">5.224</td>
<td valign="top" align="center">1.271</td>
<td valign="top" align="center">&#x02013;0.086</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">12.638</td>
<td valign="top" align="center">&#x02013;0.014</td>
<td valign="top" align="center">4.633</td>
<td valign="top" align="center">0.305</td>
<td valign="top" align="center">0.105</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">12.343</td>
<td valign="top" align="center">0.186</td>
<td valign="top" align="center">4.633</td>
<td valign="top" align="center">0.305</td>
<td valign="top" align="center">0.300</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">12.638</td>
<td valign="top" align="center">&#x02013;0.014</td>
<td valign="top" align="center">4.633</td>
<td valign="top" align="center">0.305</td>
<td valign="top" align="center">0.105</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">12.724</td>
<td valign="top" align="center">0.567</td>
<td valign="top" align="center">4.448</td>
<td valign="top" align="center">0.800</td>
<td valign="top" align="center">0.105</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">12.052</td>
<td valign="top" align="center">0.476</td>
<td valign="top" align="center">4.543</td>
<td valign="top" align="center">0.510</td>
<td valign="top" align="center">0.200</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">11.571</td>
<td valign="top" align="center">&#x02013;0.100</td>
<td valign="top" align="center">4.543</td>
<td valign="top" align="center">0.705</td>
<td valign="top" align="center">-0.557</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6">Features 11 and 45</td>
</tr>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">63.995</td>
<td valign="top" align="center">71.371</td>
<td valign="top" align="center">64.805</td>
<td valign="top" align="center">73.486</td>
<td valign="top" align="center">72.995</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">1.943</td>
<td valign="top" align="center">&#x02013;0.176</td>
<td valign="top" align="center">8.310</td>
<td valign="top" align="center">&#x02013;1.148</td>
<td valign="top" align="center">0.319</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">8.433</td>
<td valign="top" align="center">&#x02013;0.186</td>
<td valign="top" align="center">7.362</td>
<td valign="top" align="center">&#x02013;0.186</td>
<td valign="top" align="center">&#x02013;0.757</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">0.205</td>
<td valign="top" align="center">&#x02013;2.124</td>
<td valign="top" align="center">5.910</td>
<td valign="top" align="center">&#x02013;0.281</td>
<td valign="top" align="center">0.400</td>
</tr>
<tr>
<td valign="top" align="left">2D grid search (&#x003B4;)</td>
<td valign="top" align="center">9.495</td>
<td valign="top" align="center">&#x02013;0.186</td>
<td valign="top" align="center">8.886</td>
<td valign="top" align="center">&#x02013;0.095</td>
<td valign="top" align="center">&#x02013;1.162</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">8.838</td>
<td valign="top" align="center">&#x02013;0.957</td>
<td valign="top" align="center">9.462</td>
<td valign="top" align="center">0.090</td>
<td valign="top" align="center">&#x02013;1.257</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">9.029</td>
<td valign="top" align="center">&#x02013;0.662</td>
<td valign="top" align="center">9.367</td>
<td valign="top" align="center">&#x02013;0.004</td>
<td valign="top" align="center">&#x02013;1.062</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">8.838</td>
<td valign="top" align="center">&#x02013;0.957</td>
<td valign="top" align="center">9.462</td>
<td valign="top" align="center">0.090</td>
<td valign="top" align="center">&#x02013;1.257</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">9.124</td>
<td valign="top" align="center">&#x02013;0.567</td>
<td valign="top" align="center">9.562</td>
<td valign="top" align="center">0.190</td>
<td valign="top" align="center">&#x02013;1.152</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">8.838</td>
<td valign="top" align="center">&#x02013;0.567</td>
<td valign="top" align="center">9.562</td>
<td valign="top" align="center">0.190</td>
<td valign="top" align="center">&#x02013;1.252</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">9.319</td>
<td valign="top" align="center">&#x02013;0.752</td>
<td valign="top" align="center">9.948</td>
<td valign="top" align="center">0.190</td>
<td valign="top" align="center">&#x02013;1.048</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6">Features 12 and 36</td>
</tr>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">59.510</td>
<td valign="top" align="center">70.443</td>
<td valign="top" align="center">70.705</td>
<td valign="top" align="center">68.876</td>
<td valign="top" align="center">69.824</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">3.662</td>
<td valign="top" align="center">&#x02013;1.271</td>
<td valign="top" align="center">-0.595</td>
<td valign="top" align="center">1.448</td>
<td valign="top" align="center">0.967</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">10.286</td>
<td valign="top" align="center">&#x02013;1.381</td>
<td valign="top" align="center">1.138</td>
<td valign="top" align="center">0.776</td>
<td valign="top" align="center">0.200</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">5.343</td>
<td valign="top" align="center">&#x02013;1.281</td>
<td valign="top" align="center">&#x02013;0.290</td>
<td valign="top" align="center">1.157</td>
<td valign="top" align="center">1.452</td>
</tr>
<tr>
<td valign="top" align="left">2D grid search (&#x003B4;)</td>
<td valign="top" align="center">12.267</td>
<td valign="top" align="center">0.681</td>
<td valign="top" align="center">&#x02013;0.695</td>
<td valign="top" align="center">1.833</td>
<td valign="top" align="center">0.386</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">12.133</td>
<td valign="top" align="center">1.733</td>
<td valign="top" align="center">&#x02013;1.543</td>
<td valign="top" align="center">0.767</td>
<td valign="top" align="center">1.062</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">11.957</td>
<td valign="top" align="center">1.733</td>
<td valign="top" align="center">&#x02013;1.543</td>
<td valign="top" align="center">0.867</td>
<td valign="top" align="center">0.776</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">12.133</td>
<td valign="top" align="center">1.733</td>
<td valign="top" align="center">&#x02013;1.543</td>
<td valign="top" align="center">0.767</td>
<td valign="top" align="center">1.062</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">12.443</td>
<td valign="top" align="center">1.924</td>
<td valign="top" align="center">&#x02013;1.929</td>
<td valign="top" align="center">0.771</td>
<td valign="top" align="center">0.686</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">12.157</td>
<td valign="top" align="center">2.019</td>
<td valign="top" align="center">&#x02013;2.024</td>
<td valign="top" align="center">1.157</td>
<td valign="top" align="center">1.462</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">12.652</td>
<td valign="top" align="center">2.010</td>
<td valign="top" align="center">&#x02013;1.914</td>
<td valign="top" align="center">0.676</td>
<td valign="top" align="center">1.552</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>In 16 out of the 20 feature classification cases (4 tests &#x000D7; 5 classifiers), an improvement was achieved compared to the base accuracy. In 13 out of the 20 cases, iterative optimization was better than a 2D grid search. Additionally, it was better than Diagonal optimization in 13 out of 20 cases and better than Spherical and MLE optimization in 14 out of 20 cases for each. The influence of the hyperparameter settings is data and classifier dependent</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>The iterative method delivered an improvement of the base accuracy in 16 out of the 20 cases for at least one hyperparameter setting. Only the accuracy for the KNN classifier for features [8, 41] and [11, 45], the Bayesian classifier for features [12, 36], and the neural network for features [11, 45] could not be improved. Using the hyperparameter setting that resulted in the lowest loss for each case, the highest decrease in accuracy was only &#x02212;1.543% (Bayesian classifier for features [12, 36]). In contrast, the best hyperparameter setting achieved a gain of 12.724% (linear classifier for features [12, 36]). Comparing the proposed iterative method to a grid search, the iterative optimization achieved better results for all classifiers for features [8, 41] for at least one hyperparameter setting except for the KNN classifier. Further, it provided gains for the linear, KNN, and neural network for features [2, 48], Bayesian, SVC, and neural network for features [11, 47], and linear, KNN and neural network for features [12, 36]. Hence, in 13 out of the 20 cases, iterative optimization resulted in better performance than <italic>2D grid search</italic>. The gain in accuracy for the linear classifier was always positive. This also holds for the Bayesian classifier, except for features [12, 36]. The KNN classifier fluctuated around zero. Sometimes the iterative method cannot achieve any improvement for all tested hyperparameters setting (features [8, 41] and [11, 45]), and sometimes it was able to improve the result. For the SVC, there was always at least one hyperparameter choice that led to an improvement. The same can be observed for the neural network, except for features [11, 45]. Nevertheless, it resulted in a smaller loss of accuracy than grid search for the <italic>Shift, Finer</italic>, and <italic>Combined 2</italic> cases. For all classifiers and datasets, there was often a set of hyperparameters that improved the result of the <italic>Iterative grid search</italic> optimization. Additionally, the influence of <italic>Shift</italic> and <italic>Finer</italic> compared to <italic>Iterative grid search</italic> was sometimes positive and sometimes negative. This varied between classifiers applied to the same dataset, for example, for features [8, 41] <italic>Finer</italic> increased the accuracy for the linear classifier but decreased the accuracy for the KNN classifier. Further, it varied between datasets for the same classifier, for example, the performance of the linear classifier for <italic>Shift</italic> increased for features [8, 41] but decreased for features [2, 48]. The same observation can be made for <italic>Combined 1</italic> and <italic>Combined 2</italic>. <italic>Shuffle</italic> did not influence accuracy compared to <italic>Iterative grid search</italic>. The proposed method achieved better results than <italic>Diagonal</italic> optimization in 13 out of 20 cases. For <italic>Spherical</italic> and <italic>MLE</italic> optimization the results were better in 14 out of 20 cases for both. The measurements for the F1-score are given in <xref ref-type="table" rid="T6">Table 6</xref>. The proposed optimization achieved a better F1-score compared to the base accuracy in 12 out of 20 cases, compared to <italic>2D grid search</italic> in 14 out of 20 cases, compared to <italic>Diagonal</italic> optimization in 13 out of 20 cases, compared to <italic>Spherical</italic> in 16 out of 20 cases and compared to <italic>MLE</italic> optimization in 12 out of 20 cases.</p>
</sec>
<sec>
<title> Breast Cancer Dataset</title>
<p>This dataset consisted of 569 samples and 30 features. It was a binary classification that distinguished between benign and malignant fine needle aspirate (FNA) samples. The influence of the iterative method on accuracy is given in <xref ref-type="table" rid="T7">Table 7</xref>. Additionally, measurements for the F1-score are given in the <xref ref-type="supplementary-material" rid="SM1">Appendix</xref> in Table C3.</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p>Improvement &#x003B4; in accuracy for different iterative optimization settings on the breast cancer dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>Linear [%]</bold></th>
<th valign="top" align="center"><bold>KNN [%]</bold></th>
<th valign="top" align="center"><bold>Bayesian [%]</bold></th>
<th valign="top" align="center"><bold>SVC [<italic>%</italic>]</bold></th>
<th valign="top" align="center"><bold>NN [%]</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">96.487</td>
<td valign="top" align="center">96.838</td>
<td valign="top" align="center">93.289</td>
<td valign="top" align="center">97.539</td>
<td valign="top" align="center">98.103</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">0.317</td>
<td valign="top" align="center">&#x02013;0.140</td>
<td valign="top" align="center">1.232</td>
<td valign="top" align="center">0.352</td>
<td valign="top" align="center">&#x02013;0.246</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">0.422</td>
<td valign="top" align="center">0.000</td>
<td valign="top" align="center">1.443</td>
<td valign="top" align="center">0.176</td>
<td valign="top" align="center">&#x02013;1.230</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">&#x02013;0.177</td>
<td valign="top" align="center">0.279</td>
<td valign="top" align="center">1.477</td>
<td valign="top" align="center">0.246</td>
<td valign="top" align="center">&#x02013;0.281</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">0.175</td>
<td valign="top" align="center">0.245</td>
<td valign="top" align="center">1.371</td>
<td valign="top" align="center">0.211</td>
<td valign="top" align="center">&#x02013;0.598</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">0.175</td>
<td valign="top" align="center">0.069</td>
<td valign="top" align="center">1.229</td>
<td valign="top" align="center">0.105</td>
<td valign="top" align="center">&#x02013;0.773</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">0.175</td>
<td valign="top" align="center">0.245</td>
<td valign="top" align="center">1.336</td>
<td valign="top" align="center">0.211</td>
<td valign="top" align="center">&#x02013;0.598</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">0.316</td>
<td valign="top" align="center">0.245</td>
<td valign="top" align="center">1.371</td>
<td valign="top" align="center">0.211</td>
<td valign="top" align="center">&#x02013;0.598</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">0.069</td>
<td valign="top" align="center">0.315</td>
<td valign="top" align="center">1.336</td>
<td valign="top" align="center">0.211</td>
<td valign="top" align="center">&#x02013;0.563</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">0.239</td>
<td valign="top" align="center">0.140</td>
<td valign="top" align="center">1.442</td>
<td valign="top" align="center">0.211</td>
<td valign="top" align="center">&#x02013;0.669</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Every classifier benefited from the proposed optimization apart from the NN. Finer improved the linear classifier the most, Combine 1 the KNN, Combined 2 the Bayesian. Therefore, the best hyperparameter setting was dependent on the classifier. Diagonal optimization was better than the proposed optimization for the linear classifier, SVC and NN. Spherical optimization was better for the linear and Bayesian classifier. MLE optimization was better for the Bayesian classifier, SVC and NN</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>A similar observation to the sonar dataset was evident. There was an improvement for all classifiers except for the neural network. Again, the Bayesian classifier improved the most, and the influence of the different hyperparameter settings on the SVC was almost static. In comparison to <italic>Iterative grid search, Shift</italic> decreased the performance for all classifiers. Extending the <italic>Iterative grid search</italic> optimization with <italic>Shuffle</italic> had a small negative effect on the Bayesian classifier but did not change the accuracies of the other classifiers. <italic>Finer</italic> improved the results for the linear classifier but did not affect it for the other classifiers. <italic>Combined 1</italic> improved the results of the KNN and neural network classifier compared to the <italic>Iterative grid search</italic> optimization. In contrast, <italic>Combined 2</italic> only improved the neural network. The proposed optimization achieved an improvement of accuracy for the KNN and Bayesian classifier compared to <italic>Diagonal</italic> optimization, for the KNN, SVC, and NN compared to <italic>Spherical</italic> optimization, and for the linear classifier and KNN compared to the <italic>MLE</italic> optimization. The same was observed for the F1-scores in <xref ref-type="supplementary-material" rid="SM1">Appendix</xref> in Table C3 except that the proposed optimization achieved additionally an improvement for the linear classifier compared to the <italic>Diagonal</italic> optimization.</p>
<p>Again, a 2-dimensional feature study was performed. Two 2-dimensional datasets, <xref ref-type="fig" rid="F5">Figures 5A,B</xref>, were created by randomly selecting two features from the breast cancer dataset. In addition, a dataset <xref ref-type="fig" rid="F5">Figure 5C</xref>, with the two highest ranking features and a dataset, <xref ref-type="fig" rid="F5">Figure 5D</xref> with the third and fourth highest ranking features were created by using the ranks of a chi-square test. The datasets are shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. The accuracy measurements are given in <xref ref-type="table" rid="T8">Table 8</xref> and F1-scores in the <xref ref-type="supplementary-material" rid="SM1">Appendix</xref> in Table C4. A grid search with the grid {&#x02212;5, &#x02212;4, &#x02026;, 4, 5} was also executed to obtain a baseline. Therefore, <xref ref-type="table" rid="TA5">Algorithm 5</xref> was used as an optimization procedure for the training method in <xref ref-type="table" rid="TA3">Algorithms 3</xref>, <xref ref-type="table" rid="TA4">4</xref> to create predictions.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>2-D datasets extracted from the breast cancer dataset to compare proposed iterative optimization with a grid search. <bold>(A)</bold> Features 2 and 6, <bold>(B)</bold> features 5 and 27, <bold>(C)</bold> features 4 and 24, and <bold>(D)</bold> features 14 and 23.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-877569-g0005.tif"/>
</fig>
<table-wrap position="float" id="T8">
<label>Table 8</label>
<caption><p>Improvement &#x003B4; in accuracy for different optimization settings on 2-dimensional subsets of the breast cancer dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>Linear [%]</bold></th>
<th valign="top" align="center"><bold>KNN [<italic>%</italic>]</bold></th>
<th valign="top" align="center"><bold>Bayesian [%]</bold></th>
<th valign="top" align="center"><bold>SVC [%]</bold></th>
<th valign="top" align="center"><bold>NN [%]</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">75.610</td>
<td valign="top" align="center">79.615</td>
<td valign="top" align="center">81.622</td>
<td valign="top" align="center">82.678</td>
<td valign="top" align="center">83.66</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">0.664</td>
<td valign="top" align="center">&#x02013;0.177</td>
<td valign="top" align="center">0.843</td>
<td valign="top" align="center">0.456</td>
<td valign="top" align="center">0.283</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">6.258</td>
<td valign="top" align="center">&#x02013;0.105</td>
<td valign="top" align="center">1.161</td>
<td valign="top" align="center">0.456</td>
<td valign="top" align="center">&#x02013;0.138</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">0.107</td>
<td valign="top" align="center">0.176</td>
<td valign="top" align="center">1.197</td>
<td valign="top" align="center">&#x02013;0.211</td>
<td valign="top" align="center">&#x02013;0.034</td>
</tr>
<tr>
<td valign="top" align="left">2D grid search (&#x003B4;)</td>
<td valign="top" align="center">7.064</td>
<td valign="top" align="center">0.176</td>
<td valign="top" align="center">1.019</td>
<td valign="top" align="center">0.736</td>
<td valign="top" align="center">&#x02013;0.175</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">6.153</td>
<td valign="top" align="center">0.034</td>
<td valign="top" align="center">0.951</td>
<td valign="top" align="center">0.490</td>
<td valign="top" align="center">&#x02013;0.245</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">6.293</td>
<td valign="top" align="center">0.034</td>
<td valign="top" align="center">0.845</td>
<td valign="top" align="center">0.490</td>
<td valign="top" align="center">&#x02013;0.175</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">6.187</td>
<td valign="top" align="center">0.034</td>
<td valign="top" align="center">0.635</td>
<td valign="top" align="center">0.455</td>
<td valign="top" align="center">&#x02013;0.140</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">6.222</td>
<td valign="top" align="center">0.068</td>
<td valign="top" align="center">0.635</td>
<td valign="top" align="center">0.490</td>
<td valign="top" align="center">&#x02013;0.105</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">5.871</td>
<td valign="top" align="center">0.246</td>
<td valign="top" align="center">0.705</td>
<td valign="top" align="center">0.490</td>
<td valign="top" align="center">&#x02013;0.070</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">6.713</td>
<td valign="top" align="center">&#x02013;0.037</td>
<td valign="top" align="center">0.739</td>
<td valign="top" align="center">0.597</td>
<td valign="top" align="center">0.001</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6">Features 5 and 27</td>
</tr>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">76.007</td>
<td valign="top" align="center">84.538</td>
<td valign="top" align="center">84.677</td>
<td valign="top" align="center">85.736</td>
<td valign="top" align="center">86.754</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">2.421</td>
<td valign="top" align="center">&#x02013;0.070</td>
<td valign="top" align="center">0.987</td>
<td valign="top" align="center">0.383</td>
<td valign="top" align="center">&#x02013;0.352</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">8.393</td>
<td valign="top" align="center">0.142</td>
<td valign="top" align="center">0.107</td>
<td valign="top" align="center">0.489</td>
<td valign="top" align="center">&#x02013;0.175</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">3.261</td>
<td valign="top" align="center">&#x02013;0.070</td>
<td valign="top" align="center">0.140</td>
<td valign="top" align="center">0.842</td>
<td valign="top" align="center">&#x02013;1.266</td>
</tr>
<tr>
<td valign="top" align="left">2D grid search (&#x003B4;)</td>
<td valign="top" align="center">8.566</td>
<td valign="top" align="center">0.072</td>
<td valign="top" align="center">&#x02013;0.034</td>
<td valign="top" align="center">0.138</td>
<td valign="top" align="center">&#x02013;0.175</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">8.919</td>
<td valign="top" align="center">0.071</td>
<td valign="top" align="center">&#x02013;0.034</td>
<td valign="top" align="center">0.383</td>
<td valign="top" align="center">&#x02013;0.317</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">8.568</td>
<td valign="top" align="center">0.071</td>
<td valign="top" align="center">&#x02013;0.174</td>
<td valign="top" align="center">0.348</td>
<td valign="top" align="center">&#x02013;0.598</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">8.919</td>
<td valign="top" align="center">0.071</td>
<td valign="top" align="center">&#x02013;0.034</td>
<td valign="top" align="center">0.383</td>
<td valign="top" align="center">&#x02013;0.317</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">9.200</td>
<td valign="top" align="center">0.036</td>
<td valign="top" align="center">&#x02013;0.174</td>
<td valign="top" align="center">0.489</td>
<td valign="top" align="center">&#x02013;0.457</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">9.235</td>
<td valign="top" align="center">0.142</td>
<td valign="top" align="center">&#x02013;0.315</td>
<td valign="top" align="center">0.419</td>
<td valign="top" align="center">&#x02013;0.527</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">9.060</td>
<td valign="top" align="center">0.177</td>
<td valign="top" align="center">0.177</td>
<td valign="top" align="center">0.419</td>
<td valign="top" align="center">&#x02013;0.493</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6">Features 4 and 24</td>
</tr>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">90.468</td>
<td valign="top" align="center">92.864</td>
<td valign="top" align="center">90.893</td>
<td valign="top" align="center">92.439</td>
<td valign="top" align="center">92.829</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">0.148</td>
<td valign="top" align="center">&#x02013;0.106</td>
<td valign="top" align="center">&#x02013;0.352</td>
<td valign="top" align="center">&#x02013;0.281</td>
<td valign="top" align="center">0.142</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">1.484</td>
<td valign="top" align="center">0.211</td>
<td valign="top" align="center">&#x02013;0.246</td>
<td valign="top" align="center">&#x02013;0.211</td>
<td valign="top" align="center">0.281</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">&#x02013;0.207</td>
<td valign="top" align="center">0.000</td>
<td valign="top" align="center">&#x02013;2.494</td>
<td valign="top" align="center">&#x02013;0.174</td>
<td valign="top" align="center">0.211</td>
</tr>
<tr>
<td valign="top" align="left">2D grid search (&#x003B4;)</td>
<td valign="top" align="center">1.553</td>
<td valign="top" align="center">&#x02013;0.211</td>
<td valign="top" align="center">0.140</td>
<td valign="top" align="center">&#x02013;0.281</td>
<td valign="top" align="center">0.212</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">1.694</td>
<td valign="top" align="center">&#x02013;0.140</td>
<td valign="top" align="center">&#x02013;0.246</td>
<td valign="top" align="center">&#x02013;0.175</td>
<td valign="top" align="center">0.177</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">1.694</td>
<td valign="top" align="center">&#x02013;0.105</td>
<td valign="top" align="center">&#x02013;0.387</td>
<td valign="top" align="center">&#x02013;0.316</td>
<td valign="top" align="center">0.212</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">1.694</td>
<td valign="top" align="center">&#x02013;0.140</td>
<td valign="top" align="center">&#x02013;0.246</td>
<td valign="top" align="center">&#x02013;0.175</td>
<td valign="top" align="center">0.177</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">1.519</td>
<td valign="top" align="center">&#x02013;0.246</td>
<td valign="top" align="center">&#x02013;0.246</td>
<td valign="top" align="center">&#x02013;0.175</td>
<td valign="top" align="center">0.387</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">1.519</td>
<td valign="top" align="center">&#x02013;0.070</td>
<td valign="top" align="center">&#x02013;0.246</td>
<td valign="top" align="center">&#x02013;0.211</td>
<td valign="top" align="center">0.352</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">1.729</td>
<td valign="top" align="center">&#x02013;0.387</td>
<td valign="top" align="center">&#x02013;0.106</td>
<td valign="top" align="center">&#x02013;0.281</td>
<td valign="top" align="center">0.211</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6">Features 14 and 23</td>
</tr>
<tr>
<td valign="top" align="left">Base accuracy</td>
<td valign="top" align="center">88.435</td>
<td valign="top" align="center">90.439</td>
<td valign="top" align="center">90.966</td>
<td valign="top" align="center">92.407</td>
<td valign="top" align="center">92.057</td>
</tr>
<tr>
<td valign="top" align="left">Diagonal</td>
<td valign="top" align="center">0.695</td>
<td valign="top" align="center">&#x02013;0.247</td>
<td valign="top" align="center">0.632</td>
<td valign="top" align="center">&#x02013;0.421</td>
<td valign="top" align="center">&#x02013;0.702</td>
</tr>
<tr>
<td valign="top" align="left">Spherical</td>
<td valign="top" align="center">3.339</td>
<td valign="top" align="center">&#x02013;0.316</td>
<td valign="top" align="center">&#x02013;0.071</td>
<td valign="top" align="center">&#x02013;0.035</td>
<td valign="top" align="center">&#x02013;0.597</td>
</tr>
<tr>
<td valign="top" align="left">MLE</td>
<td valign="top" align="center">&#x02013;0.423</td>
<td valign="top" align="center">&#x02013;0.037</td>
<td valign="top" align="center">0.738</td>
<td valign="top" align="center">&#x02013;0.595</td>
<td valign="top" align="center">&#x02013;0.352</td>
</tr>
<tr>
<td valign="top" align="left">2D grid search (&#x003B4;)</td>
<td valign="top" align="center">3.620</td>
<td valign="top" align="center">&#x02013;0.315</td>
<td valign="top" align="center">0.385</td>
<td valign="top" align="center">0.528</td>
<td valign="top" align="center">&#x02013;0.773</td>
</tr>
<tr>
<td valign="top" align="left">Iterative grid search (&#x003B4;)</td>
<td valign="top" align="center">3.409</td>
<td valign="top" align="center">&#x02013;0.386</td>
<td valign="top" align="center">0.349</td>
<td valign="top" align="center">&#x02013;0.350</td>
<td valign="top" align="center">&#x02013;0.387</td>
</tr>
<tr>
<td valign="top" align="left">Shift (&#x003B4;)</td>
<td valign="top" align="center">3.586</td>
<td valign="top" align="center">&#x02013;0.316</td>
<td valign="top" align="center">0.384</td>
<td valign="top" align="center">&#x02013;0.386</td>
<td valign="top" align="center">&#x02013;0.492</td>
</tr>
<tr>
<td valign="top" align="left">Shuffle (&#x003B4;)</td>
<td valign="top" align="center">3.409</td>
<td valign="top" align="center">&#x02013;0.386</td>
<td valign="top" align="center">0.349</td>
<td valign="top" align="center">&#x02013;0.350</td>
<td valign="top" align="center">&#x02013;0.387</td>
</tr>
<tr>
<td valign="top" align="left">Finer (&#x003B4;)</td>
<td valign="top" align="center">3.516</td>
<td valign="top" align="center">&#x02013;0.352</td>
<td valign="top" align="center">0.384</td>
<td valign="top" align="center">&#x02013;0.350</td>
<td valign="top" align="center">&#x02013;0.387</td>
</tr>
<tr>
<td valign="top" align="left">Combined 1 (&#x003B4;)</td>
<td valign="top" align="center">3.586</td>
<td valign="top" align="center">&#x02013;0.422</td>
<td valign="top" align="center">0.419</td>
<td valign="top" align="center">&#x02013;0.385</td>
<td valign="top" align="center">&#x02013;0.598</td>
</tr>
<tr>
<td valign="top" align="left">Combined 2 (&#x003B4;)</td>
<td valign="top" align="center">2.919</td>
<td valign="top" align="center">&#x02013;0.387</td>
<td valign="top" align="center">0.385</td>
<td valign="top" align="center">&#x02013;0.281</td>
<td valign="top" align="center">&#x02013;0.457</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The highest improvement was achieved by the linear classifier for features [5, 27] with 9.235%. In 13 out of the 20 feature classification cases (4 tests &#x000D7; 5 classifiers), the iterative optimization improved the overall accuracy of the base classification models. Moreover, they achieved higher accuracy compared to the 2D grid search in 12 out of 20 cases, 16 out of 20 cases compared to Diagonal optimization, 15 out of 20 cases compared to Spherical optimization and 13 out of 20 cases compared to MLE. The best hyperparameter choice was dependent on the data as well as the classifier</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>The iterative method resulted in an improvement of the base accuracy in 13 out of the 20 cases for at least one hyperparameter setting. The highest loss of the best hyperparameter setting was &#x02212;0.387%. This was realized by the neural network for features [14, 23]. In contrast, the highest gain of 9.235% was achieved by the linear classifier for features [5, 27]. Compared to a grid search, the iterative method was able to achieve the same or better performance in 13 out of 20 cases for at least one hyperparameter setting. The linear classifier always benefited from the proposed optimization. The Bayesian classifier also achieved consistent improvements, except for features [4, 24]. The KNN classifier performance improved only in half of the cases (features [2, 6] and [5, 27]), as did that of the neural network (features [4, 24] and [2, 6]). As already observed in the experiment with the sonar dataset, <italic>Shift</italic>, and <italic>Finer</italic> both increased and decreased the performance compared to the <italic>Iterative grid search</italic> optimization. The influence varied from classifier to classifier and from dataset to dataset. For example, for features [2, 6] the linear classifier did benefit from the <italic>Shift</italic>, but the Bayesian classifier did not. In contrast, the linear classifier for features [5, 27] experienced a decrease in accuracy. <italic>Shuffle</italic> did not further influence the performance of the <italic>Iterative grid search</italic> method. <italic>Combined 1</italic> and <italic>Combined 2</italic> had varying influences depending on the classifier and the dataset. Compared to the proposed optimization, <italic>Diagonal</italic> optimization was worse in 16 out of 20 cases, <italic>Spherical</italic> optimization in 15 out of 20 cases, and <italic>MLE</italic> in 13 out of 20 cases. The F1-scores are given in the <xref ref-type="supplementary-material" rid="SM1">Appendix</xref> in Table C4 and comparable results were measured. The proposed optimization achieved a better score in 10 cases compared to the base accuracy, in 12 cases compared to <italic>2D grid search</italic>, in 14 cases <italic>Diagonal</italic> optimization, in 15 cases compared to <italic>Spherical</italic> optimization, and in 13 cases compared to the <italic>MLE</italic>.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s6">
<title>Discussion</title>
<p>With grid exploration, we have shown that the Box-Cox transformation is able to consistently improve the accuracy. This behavior was also observed by Bicego and Baldo (<xref ref-type="bibr" rid="B1">2016</xref>). According to their results, we have also demonstrated that optimization depends on the classifier. Further, we observed that <italic>full</italic> optimization leads to higher improvements. Therefore, suitable optimization is introduced. This method is further improved to be able to handle different problems during optimization. From two real-world datasets, we demonstrated that the proposed procedure is able to achieve improvements in accuracy and F1-score. Furthermore, we have shown that this optimization is superior to grid searches, <italic>diagonal, spherical</italic>, and <italic>MLE</italic> optimization in the majority of cases. We suspect that the iterative procedure introduces some implicit regularization. Grid search is likely to overfit the training data, whereas the iterative method might not be able to find a global solution on the training set and hence suffers less from overfitting. The proposed optimization also scales linearly with the ability to support finer grids. Real-world dataset studies have shown that the hyperparameter setting is dependent on the data itself and the classifier. Restarting the optimization with multiple starting points and refining the grid influenced the results. However, shuffling the optimization order did not have a meaningful impact.</p>
<p>The Box-Cox transformation is data-dependent. Hence, the optimal choice of &#x003BB; varies, and we recommend using an appropriate optimization method. We have demonstrated that the optimization method should take the classifier into account. However, non-classifier-dependent optimization methods like MLE might also perform well. Therefore, the best approach to obtain the best Box-Cox transformation is to evaluate different optimization procedures and compare the results.</p>
</sec>
<sec sec-type="conclusions" id="s7">
<title>Conclusion</title>
<p>The impact of the Box-Cox transformation in classification tasks was examined. We extended the optimization of the parameters to a full dataset dependent problem and showed that this generalization improved the performance. An optimization procedure was proposed, successfully tested, and improvements up to 12% could be achieved.</p>
<p>In future work, an extensive application of the method to various datasets should be used to test the ability of the optimization. The influence of the hyperparameters should also be analyzed. Furthermore, the optimization could be improved by, for instance, replacing the 1-dimensional grid search with another 1-dimensional optimization. Although the Box-Cox transformation has been shown to increase the accuracy of a base classifier, it remains unclear whether it is also able to push the results of a classifier beyond state-of-the-art performance. Finally, the framework is designed with a general train-predict functionality that is often used in machine learning. Therefore, our method could also be applied to other tasks such as regression.</p>
</sec>
<sec sec-type="data-availability" id="s8">
<title>Data Availability Statement</title>
<p>The developed code and data are publicly available via the following link: <ext-link ext-link-type="uri" xlink:href="http://github.com/Luca-Blum/Box-Cox-for-machine-learning">github.com/Luca-Blum/Box-Cox-for-machine-learning</ext-link>.</p>
</sec>
<sec id="s9">
<title>Author Contributions</title>
<p>ME designed and led the study. LB, ME, and CM conceived the study. All authors approved final manuscript.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec sec-type="supplementary-material" id="s11">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frai.2022.877569/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frai.2022.877569/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/></sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bicego</surname> <given-names>M.</given-names></name> <name><surname>Baldo</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>Properties of the box&#x02013;cox transformation for pattern classification</article-title>. <source>Neurocomputing</source> <volume>218</volume>, <fpage>390</fpage>&#x02013;<lpage>400</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2016.08.081</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Box</surname> <given-names>G. E. P.</given-names></name> <name><surname>Cox</surname> <given-names>D. R.</given-names></name></person-group> (<year>1964</year>). <article-title>An analysis of transformations</article-title>. <source>J. R. Stat. Soc. B</source> <volume>26</volume>, <fpage>211</fpage>&#x02013;<lpage>252</lpage>.</citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carroll</surname> <given-names>R. J.</given-names></name> <name><surname>Ruppert</surname> <given-names>D.</given-names></name></person-group> (<year>1985</year>). <article-title>Transformations in regression: a robust analysis</article-title>. <source>Technometrics</source> <volume>27</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>.</citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheddad</surname> <given-names>A</given-names></name></person-group>. (<year>2020</year>). <article-title>On box-cox transformation for image normality and pattern classification</article-title>. <source>IEEE Access</source> <volume>8</volume>, <fpage>154975</fpage>&#x02013;<lpage>154983</lpage>. <pub-id pub-id-type="doi">10.1109/access.2020.3018874</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>T.</given-names></name> <name><surname>Yang</surname> <given-names>B.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Finding the best box-cox transformation in big data with meta-model learning: a case study on qct developer cloud,&#x0201D;</article-title> in <source>2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud)</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>31</fpage>&#x02013;<lpage>34</lpage>.</citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>C.</given-names></name> <name><surname>Storer</surname> <given-names>B. E.</given-names></name> <name><surname>Jeong</surname> <given-names>M.</given-names></name></person-group> (<year>1996</year>). <article-title>Note on box-cox transformation diagnostics</article-title>. <source>Technometrics</source> <volume>38</volume>, <fpage>178</fpage>&#x02013;<lpage>180</lpage>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lawrance</surname> <given-names>A. J</given-names></name></person-group>. (<year>1988</year>). <article-title>Regression transformation diagnostics using local influence</article-title>. <source>J. Am. Stat. Assoc.</source> <volume>83</volume>, <fpage>1067</fpage>&#x02013;<lpage>1072</lpage>.<pub-id pub-id-type="pmid">34194585</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liang</surname> <given-names>Y.</given-names></name> <name><surname>Hussain</surname> <given-names>A.</given-names></name> <name><surname>Abbott</surname> <given-names>D.</given-names></name> <name><surname>Menon</surname> <given-names>C.</given-names></name> <name><surname>Ward</surname> <given-names>R.</given-names></name> <name><surname>Elgendi</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>Impact of data transformation: an ecg heartbeat classification approach</article-title>. <source>Front. Digital Health</source> <volume>2</volume>, <fpage>53</fpage>. <pub-id pub-id-type="doi">10.3389/fdgth2020.610956</pub-id><pub-id pub-id-type="pmid">34713072</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pedregosa</surname> <given-names>F.</given-names></name> <name><surname>Varoquaux</surname> <given-names>G.</given-names></name> <name><surname>Gramfort</surname> <given-names>A.</given-names></name> <name><surname>Michel</surname> <given-names>V.</given-names></name> <name><surname>Thirion</surname> <given-names>B.</given-names></name> <name><surname>Grisel</surname> <given-names>O.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Scikit-learn: machine learning in Python</article-title>. <source>J. Mach. Learn. Res.</source> <volume>12</volume>, <fpage>2825</fpage>&#x02013;<lpage>2830</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sweeting</surname> <given-names>T. J</given-names></name></person-group>. (<year>1984</year>). <article-title>On the choice of prior distribution for the Box-Cox transformed linear model</article-title>. <source>Biometrika</source> <volume>71</volume>, <fpage>127</fpage>&#x02013;<lpage>134</lpage>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>V&#x000E9;lez</surname> <given-names>J. I.</given-names></name> <name><surname>Correa</surname> <given-names>J. C.</given-names></name> <name><surname>Marmolejo-Ramos</surname> <given-names>F.</given-names></name></person-group> (<year>2015</year>). <article-title>A new approach to the box&#x02013;cox transformation</article-title>. <source>Front. Appl. Math. Stat.</source> <volume>1</volume>, <fpage>12</fpage>. <pub-id pub-id-type="doi">10.3389/fams.2015.00012</pub-id></citation>
</ref>
</ref-list>
</back>
</article>