<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2023.1131493</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Technology and Code</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>IP4GS: Bringing genomic selection analysis to breeders</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Li</surname><given-names>Tong</given-names>
</name>
<xref ref-type="author-notes" rid="fn003"><sup>&#x2020;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2171842"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jiang</surname><given-names>Shan</given-names>
</name>
<xref ref-type="author-notes" rid="fn003"><sup>&#x2020;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2152430"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fu</surname><given-names>Ran</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/2152424"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname><given-names>Xiangfeng</given-names>
</name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Cheng</surname><given-names>Qian</given-names>
</name>
<xref ref-type="author-notes" rid="fn001"><sup>*</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2146013"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Jiang</surname><given-names>Shuqin</given-names>
</name>
<xref ref-type="author-notes" rid="fn001"><sup>*</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><institution>Frontiers Science Center for Molecular Design Breeding, College of Agriculture and Biotechnology, China Agricultural University</institution>, <addr-line>Beijing</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Feng Cheng, Insititute of Vegetables and Flowers (CAAS), China</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Darko Jevremovi&#x107;, Fruit Research Institute, Serbia; Sean Robert Asselin, Agriculture and Agri-Food Canada (AAFC), Canada; Wentao Zhang, National Research Council Canada, Canada</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Qian Cheng, <email xlink:href="mailto:qchengray@cau.edu.cn">qchengray@cau.edu.cn</email>;  Shuqin Jiang, <email xlink:href="mailto:wanshi0066@126.com">wanshi0066@126.com</email>
</p>
</fn>
<fn fn-type="equal" id="fn003">
<p>&#x2020;These authors have contributed equally to this work</p>
</fn>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Plant Bioinformatics, a section of the journal Frontiers in Plant Science</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>06</day>
<month>03</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1131493</elocation-id>
<history>
<date date-type="received">
<day>25</day>
<month>12</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>02</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Li, Jiang, Fu, Wang, Cheng and Jiang</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Li, Jiang, Fu, Wang, Cheng and Jiang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Genomic selection (GS), a strategy to use genotypes to predict phenotypes <italic>via</italic> statistical or machine learning models, has become a routine practice in plant breeding programs. GS can speed up the genetic gain by reducing phenotyping costs and/or shortening the breeding cycles. GS analysis is complicated involving data clean up and formatting, training and test population analysis, model selection and evaluation, and parameter optimization. In addition, GS analysis also requires some programming skills and knowledge of statistical modeling. Thus, we need a more practical GS tools for breeders. To alleviate this difficulty, we developed the web-based platform IP4GS (<ext-link ext-link-type="uri" xlink:href="https://ngdc.cncb.ac.cn/ip4gs/">https://ngdc.cncb.ac.cn/ip4gs/</ext-link>), which offers a user-friendly interface to perform GS analysis simply through point-and-click actions. IP4GS currently includes seven commonly used models, eleven evaluation metrics, and visualization modules, offering great convenience for plant breeders with limited bioinformatics knowledge to apply GS analysis.</p>
</abstract>
<kwd-group>
<kwd>bioinformatics</kwd>
<kwd>genomic selection</kwd>
<kwd>genotype-to-phenotype prediction</kwd>
<kwd>web-based platform</kwd>
<kwd>R shiny</kwd>
</kwd-group>
<counts>
<fig-count count="4"/>
<table-count count="2"/>
<equation-count count="0"/>
<ref-count count="32"/>
<page-count count="9"/>
<word-count count="4238"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<title>Introduction</title>
<p>Polygenic traits of plants, such as grain yield (GY), flowering time (FT), and plant height (PH), are usually controlled by many minor effect genes. In such cases, traditional marker-assisted selection (MAS), which relies on statistical power to identify makers/genes-traits associations, cannot effectively be applied to expedite trait improvement of polygenic traits. (<xref ref-type="bibr" rid="B7">Collard and Mackill, 2008</xref>; <xref ref-type="bibr" rid="B29">Xu and Crouch, 2008</xref>). Genomic selection (GS) predicting phenotypes from genome-wide molecular markers, may act as a complementary approach for the improvements of polygenic traits. (<xref ref-type="bibr" rid="B10">Desta and Ortiz, 2014</xref>; <xref ref-type="bibr" rid="B8">Crossa et&#xa0;al., 2017</xref>). GS models use genome-wide genetic markers to predict phenotypes, which can maximumly capture phenotypic variation contributed by multiple minor effect genes. (<xref ref-type="bibr" rid="B5">Cerrudo et&#xa0;al., 2018</xref>). To perform GS analysis, first, a GS model is built by genotypic and phenotypic data from a training population. The model is then employed to predict phenotypes of a candidate population based on their genotypic data. (<xref ref-type="bibr" rid="B20">Meuwissen et&#xa0;al., 2001</xref>; <xref ref-type="bibr" rid="B10">Desta and Ortiz, 2014</xref>). The advent of advanced next-generation sequence technologies, e.g. genotyping by targeted sequencing (GBTS) has significantly reduced the cost of genotyping. (<xref ref-type="bibr" rid="B14">Guo et&#xa0;al., 2019</xref>). Wide application of GS to more and more plant species has become feasible since genotyping cost is no longer a bottleneck (<xref ref-type="bibr" rid="B2">Bauck et&#xa0;al., 2018</xref>; <xref ref-type="bibr" rid="B3">Belamkar et&#xa0;al., 2018</xref>; <xref ref-type="bibr" rid="B1">Atanda et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B27">Wang et&#xa0;al., 2021</xref>).</p>
<p>Most popularly GS models are linear regression (or parametric) regression models. These models include the best linear unbiased prediction (BLUP) method represented by RRBLUP (ridge regression BLUP) and the Bayesian method represented by BayesA and BayesB (<xref ref-type="bibr" rid="B20">Meuwissen et&#xa0;al., 2001</xref>; <xref ref-type="bibr" rid="B11">Endelman, 2011</xref>). In recent years, machine learning (ML)-based methods have been introduced to build GS models, such as support vector machine (SVM), random forest (RF), deep learning, and light gradient boost (LGB) algorithms (<xref ref-type="bibr" rid="B4">Blondel et&#xa0;al., 2015</xref>; <xref ref-type="bibr" rid="B23">Qiu et&#xa0;al., 2016</xref>; <xref ref-type="bibr" rid="B18">Ma et&#xa0;al., 2018</xref>; <xref ref-type="bibr" rid="B31">Yan et&#xa0;al., 2021</xref>). ML methods have been proposed to be better than linear models to incorporate nonlinear relationships between genotypes and phenotypes. However, ML may require relatively larger data size to achieve better performance and outperform linear models on more complex datasets. (<xref ref-type="bibr" rid="B13">Gonz&#xe1;lez-Recio and Forni, 2011</xref>; <xref ref-type="bibr" rid="B31">Yan et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B30">Yan and Wang, 2022</xref>). In the practice of plant breeding at the current stage, most of the time, the size of training datasets is not big enough and the property of datasets is still relatively simple, because the high cost of phenotyping must still be considered as an important factor. In this situation, BLUP and Bayesian methods outperform most ML methods and have thus gained popularity in plant breeding (<xref ref-type="bibr" rid="B26">Tanaka and Iwata, 2018</xref>; <xref ref-type="bibr" rid="B31">Yan et&#xa0;al., 2021</xref>).</p>
<p>Among all available GS methods, RRBLUP, a &#x201c;simple&#x201d; model that assumes all marker effects possessing the same variance, is the most widely used approach in GS application. It has therefore become the baseline prediction model for the evaluation performance of other GS models. However, the assumption of equal variance of marker effects may be not realistic. Bayesian methods which assumed prior distributions for the variance of marker effects, has become another popular GS approach due to its potential for better estimating of marker effects. Currently, the majority of GS tools are command-line interfaces, requiring advanced data management skills, good programing skills and knowledge of statistical modeling, which restrict the practical use by breeders in the seed industry. There has therefore been an urgent need to develop a web-based platform with a user-friendly interface for facilitating the use of GS strategies among the plant breeding community. To solve this problem, we developed IP4GS, a web-based interactive platform for genomic selection. IP4GS includes seven GS models and eleven evaluation metrics to help users select optimal models for GS analysis. It also includes bioinformatics pipelines for preprocessing of genotypic data and visualization modules for population analysis. The functionality of all modules may be invoked simply by point-and-click actions through a user-friendly interface developed using shiny.</p>
</sec>
<sec id="s2">
<title>Methods</title>
<sec id="s2_1">
<title>Demo datasets for developing IP4GS</title>
<p>The public dataset from a population of 1,404 maize F1 hybrids, which generated by crossing 1,404 inbred lines with an elite tester line Zheng58, was used to develop the web-based platform of IP4GS (<xref ref-type="bibr" rid="B17">Liu et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B28">Xiao et&#xa0;al., 2021</xref>). The genotypic data of the 1,404 F<sub>1</sub> hybrids includes 4,903 SNPs selected from a total of 14.8 million SNPs; details of SNP selection were previously described by Cheng et&#xa0;al. (<xref ref-type="bibr" rid="B6">Cheng et&#xa0;al., 2021</xref>). Phenotypes include flowering time (FT), plant height (PH), and grain yield (GY) (<xref ref-type="bibr" rid="B28">Xiao et&#xa0;al., 2021</xref>). The demo datasets of genotypes and phenotypes are accessible at <ext-link ext-link-type="uri" xlink:href="https://github.com/furan2019/IP4GSdata">https://github.com/furan2019/IP4GSdata</ext-link>.</p>
</sec>
<sec id="s2_2">
<title>Implementation of a web-based IP4GS platform</title>
<p>Shiny is an R-based framework, allowing programmers to develop interactive web applications (<ext-link ext-link-type="uri" xlink:href="https://www.rstudio.com/products/shiny/">https://www.rstudio.com/products/shiny/</ext-link>). Benefitting from the expandability and useability, Shiny has been widely used to develop online applications for bioinformatics software or interactive plot tools (<xref ref-type="bibr" rid="B19">McMurdie and Holmes, 2015</xref>; <xref ref-type="bibr" rid="B32">Yu et&#xa0;al., 2018</xref>; <xref ref-type="bibr" rid="B25">Sievert, 2020</xref>). The shiny-based application can be accessed on the local host or deployed on the public internet for public access. A shiny-based application usually consists of two parts, a user-interface (UI) script (IP4GS_UI.r for IP4GS) and a server script (IP4GS_server.r for IP4GS). The UI script controls the layout of different panels and visualization of results and bridges the user inputs and background functions. IP4GS utilized several packages to enrich and improve the UI interactive experience, such as &#x201c;DT&#x201d; for dynamic tables, &#x201c;plotly&#x201d; for dynamic plots, &#x201c;shinycssloaders&#x201d; for loading animations, &#x201c;shinybusy&#x201d; for progress notification, and &#x201c;shinyWidgets&#x201d; for input of multiform parameters and dynamic controls. HTML5 language and condition panels were also introduced to improve and optimize the layout of panels. The server script plays an important role in shiny-based applications. All functions provided by IP4GS were achieved by server script, including data input, data preprocessing, and GS model building and evaluation. For real-time interaction, user-defined parameters and operations are passed to the server script, which then executes the corresponding functions and formats the outputs. Lastly, the server script returns the results to a specific location according to flags that can bridge the UI script and server script.</p>
</sec>
<sec id="s2_3">
<title>GS models and evaluation metrics in the GS analysis panel</title>
<p>IP4GS integrates five linear models and two ML methods for GS application, which are all called from R libraries (<xref ref-type="table" rid="T1"><bold>Table&#xa0;1</bold></xref>). The RRBLUP method is called from the package &#x201c;rrBLUP&#x201d;. The RRBLUP model is based on the assumption that all markers have equal variance with small and nonzero effect (<xref ref-type="bibr" rid="B11">Endelman, 2011</xref>). The three Bayesian methods, BayesA, BayesB, and BayesC, are called from the package &#x201c;BGLR (Bayesian generalized linear regression)&#x201d; (<xref ref-type="bibr" rid="B9">de los Campos and P&#xe9;rez, 2015</xref>). BayesA utilizes a scaled <italic>t</italic>-distribution for estimating marker effects. BayesB is similar to BayesA, with the main difference that it utilizes both shrinkage and variable selection algorithms to estimate marker effects. By contrast, BayesC estimates marker effects based on a Gaussian distribution. The LASSO (least absolute shrinkage and selection operator) method is called from the package &#x201c;glmnet (Lasso and elastic-net regularized generalized linear models),&#x201d; which combines both shrinkage and variable selection algorithms to estimate marker effects. The two ML methods, SVR (support vector regression) and RFR (random forest regression), are called from the &#x201c;e1071&#x201d; and &#x201c;randomForest&#x201d; packages, respectively. SVR finds an appropriate line (or hyperplane in higher dimensions) to fit the data, and RFR uses a regression model rooted in bootstrapping sample observations (<xref ref-type="bibr" rid="B16">Liaw and Wiener, 2002</xref>; <xref ref-type="bibr" rid="B21">Meyer et&#xa0;al., 2014</xref>; <xref ref-type="bibr" rid="B22">Ornella et&#xa0;al., 2014</xref>).</p>
<table-wrap id="T1" position="float">
<label>Table&#xa0;1</label>
<caption>
<p>Seven GS methods integrated into IP4GS.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Model</th>
<th valign="top" align="center">Important parameter (s)</th>
<th valign="top" align="center">Package</th>
<th valign="top" align="center">Reference</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">BayesA</td>
<td valign="top" rowspan="3" align="center">nlter<break/>burnIn<break/>thin</td>
<td valign="top" rowspan="3" align="center">BGLR</td>
<td valign="top" rowspan="3" align="left">(<xref ref-type="bibr" rid="B9">de los Campos and P&#xe9;rez, 2015</xref>)</td>
</tr>
<tr>
<td valign="top" align="left">BayesB</td>
</tr>
<tr>
<td valign="top" align="left">BayesC</td>
</tr>
<tr>
<td valign="top" align="left">Least absolute shrinkage and selection operator (LASSO)</td>
<td valign="top" align="center">alpha</td>
<td valign="top" align="center">glmnet</td>
<td valign="top" align="left">(<xref ref-type="bibr" rid="B12">Friedman et&#xa0;al., 2010</xref>)</td>
</tr>
<tr>
<td valign="top" align="left">Ridge regression best linear unbiased prediction (RRBLUP)</td>
<td valign="top" align="center">N/A</td>
<td valign="top" align="center">rrBLUP</td>
<td valign="top" align="left">(<xref ref-type="bibr" rid="B11">Endelman, 2011</xref>)</td>
</tr>
<tr>
<td valign="top" align="left">Support vector regression (SVR)</td>
<td valign="top" align="center">gamma; cost; kernel</td>
<td valign="top" align="center">e1071</td>
<td valign="top" align="left">(<xref ref-type="bibr" rid="B21">Meyer et&#xa0;al., 2014</xref>)</td>
</tr>
<tr>
<td valign="top" align="left">Random forest regression (RFR)</td>
<td valign="top" align="center">ntree; nodesize</td>
<td valign="top" align="center">randomForest</td>
<td valign="top" align="left">(<xref ref-type="bibr" rid="B16">Liaw and Wiener, 2002</xref>)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To enable comprehensive evaluation of the prediction accuracy of selected GS models, IP4GS integrates eleven evaluation metrics: five correlation-based metrics to globally measure the relationship between observed and predicted phenotypes, and six threshold-based metrics to count accurately predicted top-ranked individuals (<xref ref-type="table" rid="T2"><bold>Table&#xa0;2</bold></xref>). The correlation-based methods are Pearson correlation coefficient (PCC), Kendall rank correlation coefficient (KCC), Spearman rank correlation coefficient (SCC), squared <italic>R</italic> coefficient of determination (<italic>R</italic><sup>2</sup>), and mean squared error (MSE). These metrics usually measure the global performance of models. For example, KCC treats all pairs equally; however, in breeding practice, more attention should be paid to extreme values such as high yield and short flowering time (<xref ref-type="bibr" rid="B4">Blondel et&#xa0;al., 2015</xref>). Thus, six threshold-based metrics were introduced for top-<italic>k</italic> individuals with ideal phenotypic value. These are normalized discounted cumulative gain (NDCG), mean NDCG, relative efficiency (RE), Accuracy, <italic>F</italic>-score, and Cohen&#x2019;s kappa coefficient (Kappa) (<xref ref-type="bibr" rid="B22">Ornella et&#xa0;al., 2014</xref>; <xref ref-type="bibr" rid="B4">Blondel et&#xa0;al., 2015</xref>). The calculation methods for these metrics are described in <xref ref-type="table" rid="T2"><bold>Table&#xa0;2</bold></xref>. <italic>X</italic> is an array of observed phenotypic values, and <italic>Y</italic> is an array of predicted phenotypic values. For Accuracy, <italic>F</italic>-score, and Kappa, positive samples are those with ideal phenotypic value.</p>
<table-wrap id="T2" position="float">
<label>Table&#xa0;2</label>
<caption>
<p>Eleven metrics for model evaluation integrated into IP4GS.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Evaluation Metric</th>
<th valign="top" align="center">Formula</th>
<th valign="top" align="center">Remarks</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Pearson correlation coefficient (PCC, <italic>r</italic>, <italic>R</italic>)</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im1">
<mml:mrow>
<mml:mtext>PCC</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mtext>i</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:msqrt>
<mml:msqrt>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im2">
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:math>
</inline-formula> is the mean of <italic>X</italic>, <inline-formula>
<mml:math display="inline" id="im3">
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:math>
</inline-formula> is the mean of <italic>Y</italic>
</td>
</tr>
<tr>
<td valign="top" align="left">Kendall rank correlation coefficient (KCC, tau, &#x3c4;)</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im4">
<mml:mrow>
<mml:mtext>KCC</mml:mtext>
<mml:mo>&#xa0;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left"><italic>N</italic>1: number of concordant pairs (e.g., <italic>x</italic><sub><italic>i</italic>
</sub> and <italic>y</italic><sub><italic>i</italic>
</sub>);<break/><italic>N</italic>2: number of discordant pairs</td>
</tr>
<tr>
<td valign="top" align="left">Spearman rank correlation coefficient (SCC, rho, &#x3c1;)</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im5">
<mml:mrow>
<mml:mtext>SCC</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mtext>cov</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left"><italic>R</italic>(<italic>X</italic>) and <italic>R</italic>(<italic>Y</italic>) are the rank of variables</td>
</tr>
<tr>
<td valign="top" align="left">Coefficient of determination, <italic>R</italic> squared (<italic>R</italic><sup>2</sup>, <italic>r</italic><sup>2</sup>)</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im6">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mn>2</mml:mn>
<mml:mo>&#xa0;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im7">
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:math>
</inline-formula> is the mean of <italic>X</italic>
</td>
</tr>
<tr>
<td valign="top" align="left">Mean squared error (MSE)</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im8">
<mml:mrow>
<mml:mtext>MSE</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" align="left"><italic>N</italic> is the total number of samples</td>
</tr>
<tr>
<td valign="top" align="left">Normalized discounted cumulative gain (NDCG)</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im9">
<mml:mrow>
<mml:mtext>NDCG</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" rowspan="3" align="left">
<inline-formula>
<mml:math display="inline" id="im10">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>is a discount function; <italic>x</italic>(<italic>i</italic>,<italic>Y</italic>) is the <italic>i</italic>th value of <italic>X</italic> with the order of <italic>Y; x</italic>(<italic>i</italic>,<italic>X</italic>) is the <italic>i</italic>th value of X with the order of <italic>X</italic>;<break/><italic>k</italic> is the top <italic>k</italic> individuals with ideal phenotypic values; <inline-formula>
<mml:math display="inline" id="im11">
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:math>
</inline-formula> is the mean of <italic>X</italic>
</td>
</tr>
<tr>
<td valign="top" align="left">Mean normalized discounted cumulative gain (mean NDCG)</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im12">
<mml:mrow>
<mml:mtext>meanNDCG</mml:mtext>
<mml:mo>@</mml:mo>
<mml:mi>K</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>K</mml:mi>
</mml:mfrac>
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>K</mml:mi>
</mml:munderover>
<mml:mtext>NDCG</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">Relative efficiency (RE)</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im13">
<mml:mrow>
<mml:mtext>RE</mml:mtext>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>k</mml:mi>
</mml:mfrac>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>k</mml:mi>
</mml:mfrac>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">Accuracy</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im14">
<mml:mrow>
<mml:mtext>Accuracy</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mtext>TN</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td valign="top" rowspan="3" align="left">TP: true positive; TN: true negative; FP: false positive; FN: false negative;<break/>default <italic>&#x3b2;</italic> = 1; <inline-formula>
<mml:math display="inline" id="im15">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:mfrac>
<mml:mo>;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>; <italic>N<sub>Op</sub> = N</italic><sub>TP</sub> <italic>+ N</italic><sub>FN</sub>; <italic>N<sub>On</sub> = N</italic><sub>FP</sub> <italic>+ N</italic><sub>TN</sub>; <italic>N<sub>Pp</sub> = N</italic><sub>TP</sub> <italic>+ N</italic><sub>FN</sub><italic>; N<sub>Pn</sub> = N</italic><sub>FN</sub> <italic>+ N</italic><sub>TN</sub>.</td>
</tr>
<tr>
<td valign="top" align="left"><italic>F</italic>-score</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im16">
<mml:mrow>
<mml:msub>
<mml:mi>F</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mtext>score</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:msup>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mo>+</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td valign="top" align="left">Cohen&#x2019;s kappa coefficient (Kappa)</td>
<td valign="top" align="left">
<inline-formula>
<mml:math display="inline" id="im17">
<mml:mrow>
<mml:mtext>Kappa</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>; <inline-formula>
<mml:math display="inline" id="im18">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mtext>N</mml:mtext>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mtext>N</mml:mtext>
<mml:mrow>
<mml:mtext>TN</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>; <inline-formula>
<mml:math display="inline" id="im19">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xd7;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xd7;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_4">
<title>Bioinformatics pipelines in the data preprocessing panel</title>
<p>Currently, IP4GS accepted genotypes with AA/AB/BB alleles (allele format) and 0/1/2 format (numeric format). When data processing button is pressed, the data processing function will compute allele frequency, define the major and minor alleles, filter and format submitted genotypic data. It is worth noting that genotypic data in numeric format is not applicable to this function, IP4GS suppose that genotypic data submitted in numeric format is already processed by users. For genotypic data in</p>
<p>allele format, the function will compute allele frequency and missing rate, define the major and minor alleles. Genotypic data will be filtered by minor allele frequency (MAF) and missing rate, markers with MAF below 0.05 (&lt;= 0.05) or missing rate higher than 0.2 (&gt;= 0.2) will be removed and the criteria can be defined by users. For format conversion, IP4GS uses a common 0, 1, 2 coding scheme based on defined major and minor alleles, AA (homozygous genotype comprising major alleles), AB (homozygous genotype) and BB (homozygous genotype comprising minor alleles) will be coded as 0, 1, and 2, respectively. Two methods are provided in IP4GS for imputation of missing genotype values, &#x201c;Mean&#x201d; for mean value of each SNP and &#x201c;Major&#x201d; for the code with highest frequency of each SNP. In addition, IP4GS can implement dimensionality reduction of genotypic data using three algorithms, the &#x201c;prcomp&#x201d; function with default parameters (e.g., center = TRUE and scale. = FLASE) in the &#x201c;stats&#x201d; package for principal component analysis (PCA), the &#x201c;umap&#x201d; package for uniform manifold approximation and projection (UMAP), and the &#x201c;tsne&#x201d; package for <italic>t</italic>-distributed stochastic neighbor embedding (<italic>t</italic>-SNE). For current version of IP4GS, the utilization of all above three algorithms with default parameters except the dimension which can be defined by users.</p>
</sec>
<sec id="s2_5">
<title>Permissions and accessibility</title>
<p>The free version of IP4GS for academia distributed in the public domain is available at <ext-link ext-link-type="uri" xlink:href="https://ngdc.cncb.ac.cn/ip4gs/">https://ngdc.cncb.ac.cn/ip4gs/</ext-link>. All functions of the public IP4GS described herein are freely accessible for small datasets. There are suggested limitations on the number of SNP markers (&lt;10,000 SNPs) and samples (&lt;1,000 individuals) because of the limited high-performance computing resources of the public web server hosting IP4GS. Considering the need for confidentiality of breeding data by industrial users, an offline version of IP4GS that can be installed on a private server or local devices without any limitation on the number of SNPs and samples is also available. Users interested in nonlimited IP4GS may contact the corresponding author for access.</p>
</sec>
</sec>
<sec id="s3" sec-type="results">
<title>Results</title>
<sec id="s3_1">
<title>Overall workflow</title>
<p>The IP4GS platform can be divided into two main panels of functional modules: the &#x201c;Data preprocessing&#x201d; panel and the &#x201c;GS analysis&#x201d; panel. The Data preprocessing panel comprises not only bioinformatics pipelines for data processing and quality control of input datasets, but also a variety of dimensionality reduction (DR) algorithms for population structure visualization based on genotypes (<xref ref-type="fig" rid="f1"><bold>Figure&#xa0;1</bold></xref>, left). Additionally, users may either preview the processed data through the web browser or download the data to a local computer. The GS analysis panel takes the input datasets generated from the first panel and performs G2P prediction with seven regression-based GS models and eleven evaluation metrics (<xref ref-type="fig" rid="f1"><bold>Figure&#xa0;1</bold></xref>, right). The parameters for each GS model can be predefined on the parameter input panel by users. Visualization of evaluation results allows users to identify and output predicted phenotypes from the optimal model.</p>
<fig id="f1" position="float">
<label>Figure&#xa0;1</label>
<caption>
<p>Overall workflow of IP4GS. IP4GS comprises two panels of functional modules: the Data preprocessing panel (left) and the GS analysis panel (right).</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-14-1131493-g001.tif"/>
</fig>
</sec>
<sec id="s3_2">
<title>Functional modules in the data preprocessing panel</title>
<p>To run the analytical modules in the Data preprocessing panel, users need to prepare one mandatory input file of genotypic and phenotypic data and one optional file containing data for fixed effects considered by the model (<xref ref-type="fig" rid="f2"><bold>Figure&#xa0;2A</bold></xref>). Acceptable data formats for the genotypic data file are either the standard &#x201c;HapMap&#x201d; format or a plain text file containing a matrix of genotypes in columns and individuals in rows. In the matrix file, the genotype of each SNP must be converted by the users to &#x201c;0,&#x201d; &#x201c;1,&#x201d; or &#x201c;2,&#x201d; representing homozygous major alleles, heterozygous alleles, and homozygous minor alleles, respectively. If users select the &#x201c;Custom&#x201d; option from the pulldown list of file formats, the genotype of each SNP can be entered in the character format &#x201c;A, C, G, T,&#x201d; which is automatically converted to the &#x201c;0, 1, 2&#x201d; format on the basis of allele frequency computed by IP4GS. Either a.csv (comma-separated values) file or a tab-delimited.txt (text) file is acceptable by IP4GS for phenotypic data.</p>
<fig id="f2" position="float">
<label>Figure&#xa0;2</label>
<caption>
<p>Preprocessing of genotypic data. <bold>(A)</bold> IP4GS accepts genotypic data in HapMap, Matrix, and Custom file formats, phenotypic data in CSV and tab-delimited text formats, and an optional file including features as fixed effects. <bold>(B)</bold> Control console for genotypic data filtration (left) and display window for preview of a variety of data tables (right). <bold>(C)</bold> Preview of processed genotypic data with statistics for MAF, AF, and MR for each SNP. <bold>(D)</bold> Preview of DR data generated by PCA, UMAP, and t-SNE algorithms.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-14-1131493-g002.tif"/>
</fig>
<p>After the input files are uploaded, the genotypic data file is first processed with regard to two criteria, namely minor allele frequency (MAF, default&lt;= 0.05) and missing rate (MR, default &gt;= 0.2), to remove low-quality SNPs (<xref ref-type="fig" rid="f2"><bold>Figure&#xa0;2B</bold></xref>). In addition, IP4GS offers users the option of whether imputation is performed on genotypic data or not. As reference haplotype-based imputation may consume a large volume of computing resources, IP4GS only offers a simplified imputation method using &#x201c;Mean&#x201d; or &#x201c;Major&#x201d; to replace missing genotypic values (see Methods). The preprocessed and filtered genotypic data can then be partially previewed or fully downloaded by clicking the download button in the console (<xref ref-type="fig" rid="f2"><bold>Figure&#xa0;2C</bold></xref>). In addition to the raw and processed genotypic and phenotypic data, the DR data computed using PCA (principal component analysis), UMAP (uniform manifold approximation and projection), and <italic>t</italic>-SNE (<italic>t</italic>-distributed stochastic neighbor embedding) algorithms can also be previewed and downloaded (<xref ref-type="fig" rid="f2"><bold>Figure&#xa0;2D</bold></xref>).</p>
<p>The last module in the Data preprocessing panel is interactive visualization of genotypic and phenotypic data with a control console and display window. This function facilitates not only visualization of the genetic composition of the population subjected to GS analysis (<xref ref-type="fig" rid="f3"><bold>Figure&#xa0;3A</bold></xref>) but also understanding of the distribution of phenotypes about to be predicted (<xref ref-type="fig" rid="f3"><bold>Figure&#xa0;3B</bold></xref>). When all data preprocessing is complete, IP4GS performs a quality-control assessment on the processed data to ensure proper execution of the subsequent GS analysis.</p>
<fig id="f3" position="float">
<label>Figure&#xa0;3</label>
<caption>
<p>Data visualization console. <bold>(A)</bold> Parameter-setting console (left) for visualization of population structure (right) based on genotypic data. <bold>(B)</bold> Parameter-setting console (left) for visualization of data distribution (right) of selected phenotypic data.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-14-1131493-g003.tif"/>
</fig>
</sec>
<sec id="s3_3">
<title>Functional modules in the GS analysis panel</title>
<p>After the completion of quality control, users may proceed to the GS analysis panel. The current version of IP4GS supports seven GS methods commonly used in plant breeding: five linear methods (RRBLUP, BayesA, BayesB, BayesC, and LASSO) and two ML methods (the RFR and SVR algorithms) (<xref ref-type="table" rid="T1"><bold>Table&#xa0;1</bold></xref>). We highly recommend that users try all seven methods for initial evaluation of G2P prediction results for a given set of data since it has previously been reported that no single GS method is superior for all traits and species (<xref ref-type="bibr" rid="B22">Ornella et&#xa0;al., 2014</xref>; <xref ref-type="bibr" rid="B31">Yan et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B24">Robert et&#xa0;al., 2022</xref>). It is important to select the optimal model given a designated trait and species to ensure the most precise prediction. Additionally, IP4GS offers eleven evaluation metrics for comprehensive evaluation of model performance; these include not only correlation-based Pearson, Spearman, and Kendall algorithms but also other algorithms such as <italic>F</italic>-score and MSE (<xref ref-type="table" rid="T2"><bold>Table&#xa0;2</bold></xref>).</p>
<p>The GS analysis panel is composed of five major parts: modeling console, parameter display window, results display window, visualization console, and plot display window. Modeling console consists of G2P console and CV console. From the G2P console, users may select GS methods and corresponding arguments, define indexes of training and test samples, and select evaluation metrics and corresponding arguments (<xref ref-type="fig" rid="f4"><bold>Figure&#xa0;4A</bold></xref>). The CV console offers the option of three commonly used CV methods: <italic>k</italic>-fold, holdout, and leave-one-out schemes. Users may also set up the repeat time for CV and proportion of testing set included from the console. As long as all parameters for G2P models and CV methods are set, the display window will exhibit these preset parameters for users to double check (<xref ref-type="fig" rid="f4"><bold>Figure&#xa0;4B</bold></xref>). If no further corrections are needed, users may press the execution button to run the GS analysis. It is worth noting that users may select all seven GS methods and eleven evaluation metrics and run G2P prediction and CV evaluation simultaneously to generate a table containing prediction and evaluation results for comparison.</p>
<fig id="f4" position="float">
<label>Figure&#xa0;4</label>
<caption>
<p>GS analysis panel. <bold>(A)</bold> Modeling console for selection of GS methods, evaluation metrics, and setting of model parameters. <bold>(B)</bold> Display window for viewing selected models and parameter settings. <bold>(C)</bold> Previews of prediction results from multiple GS methods (upper panel) and evaluation results from different metrics (bottom panel). <bold>(D)</bold> Visualization console for viewing prediction results. <bold>(E)</bold> Display window for viewing scatter plots of observed and predicted phenotypes.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-14-1131493-g004.tif"/>
</fig>
<p>From the results display window, G2P prediction and model evaluation results derived from all GS methods and evaluation metrics selected can be previewed before downloading of the full results (<xref ref-type="fig" rid="f4"><bold>Figure&#xa0;4C</bold></xref>). Users may further use the visualization console to visually compare either observed phenotypes and predicted phenotypes or any two sets of predicted phenotypes from any two selected methods (<xref ref-type="fig" rid="f4"><bold>Figure&#xa0;4D</bold></xref>). When parameters are set up on the visualization console, a scatter plot depicting the correlation of observed and predicted phenotypes is generated on the plot display window (<xref ref-type="fig" rid="f4"><bold>Figure&#xa0;4E</bold></xref>). When IP4GS finishes the analysis of a set of breeding data, users may select the best prediction results from the optimal model to download.</p>
</sec>
</sec>
<sec id="s4" sec-type="discussion">
<title>Discussion</title>
<p>Owing to the rapid advancement of next-generation sequencing, GBTS has greatly reduced the expense of genotyping, making GS-assisted breeding more and more feasible for a growing number of plant species. However, GS analysis requires not only basic bioinformatics skills for data management but also experience in data modeling. The IP4GS platform was developed using the R shiny package, as an interactive, user-friendly web interface, allowing breeders perform GS analysis without the need of bioinformatics skills. However, as with any web-based application, limitations exist. We only integrated seven GS methods commonly used in plant breeding into IP4GS. It is impossible to include all existing methods, especially those ML methods that require intensive computing resources for model training and parameter tuning. The seven methods were selected on the basis of a previously published evaluation of multiple statistical and ML methods, and all seven satisfy three basic criteria (<xref ref-type="bibr" rid="B31">Yan et&#xa0;al., 2021</xref>). First, prediction accuracy may not be greatly reduced when the size of the training set is smaller than that of the testing set, since the ratio of training versus test set is usually 1:4 in the seed industry (<xref ref-type="bibr" rid="B31">Yan et&#xa0;al., 2021</xref>). Second, model training and CV evaluation may not require too much CPU and memory usage. Third, non-excessive parameters and manual model-tuning are required to properly perform the GS analysis. Another common issue for all web-based applications is the upper size limit of input files uploaded for GS analysis. It is better for users to compile marker sets containing less than 10,000 SNPs, and a population size of smaller than 1,000. Therefore, if a user wants to perform GS analysis of a large dataset or use ML methods consuming intensive computing resource, we do not recommend using IP4GS. Furthermore, IP4GS, as a GS analysis platform, is theoretically applicable to other crops that have successfully applied the GS strategy including rice and wheat. And other species which can provide same format of genotypic and phenotypic data are also applicable but the effectiveness needs further investigation and exploration.</p>
<p>Previously reports indicate that no single GS method outperformed others for all evaluated traits and species. (<xref ref-type="bibr" rid="B15">Heffner et&#xa0;al., 2010</xref>; <xref ref-type="bibr" rid="B28">Xiao et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B31">Yan et&#xa0;al., 2021</xref>). The only solution is to evaluate multiple GS methods and select the optimal one for specific traits and species. Given this need, we integrated multiple GS methods and evaluation metrics so that users may compare results from different predictive models. The seven methods we selected usually generate similar prediction results according to our previous evaluation. Hence, the current version of IP4GS does not include a solution for integration of multi-model prediction results. However, in case of multi-model prediction, we also provided an R script to integrate prediction results from two algorithms. The tool is freely available at <ext-link ext-link-type="uri" xlink:href="https://github.com/furan2019/IP4GSdata.git">https://github.com/furan2019/IP4GSdata.git</ext-link> for users to integrate multi-model prediction results.</p>
</sec>
<sec id="s5" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.</p>
</sec>
<sec id="s6" sec-type="author-contributions">
<title>Author contributions</title>
<p>SQJ and QC conceived and supervised the project. QC, SQJ, and XFW wrote the manuscript. TL developed the main platform and genomic selection modules of IP4GS. SJ developed the bioinformatics pipeline and analytical modules. RF developed the evaluation modules and visualization modules. All authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<sec id="s7" sec-type="funding-information">
<title>Funding</title>
<p>This work was supported by the Hainan Yazhou Bay Seed Laboratory (B21HJ0505), the Chinese Universities Scientific Fund (2022TC139), and the 2115 Talent Development Program of China Agricultural University.</p>
</sec>
<ack>
<title>Acknowledgments</title>
<p>We appreciate the China National Center for Bioinformation at the Beijing Institute of Genomics for offering public server domain and high-performance computing resources for hosting the IP4GS platform. We also appreciate Ms. Yang Zhang for distributing the codes of IP4GS on the server.</p>
</ack>
<sec id="s8" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s9" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Atanda</surname> <given-names>S. A.</given-names>
</name>
<name>
<surname>Olsen</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Burgue&#xf1;o</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Crossa</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Dzidzienyo</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Beyene</surname> <given-names>Y.</given-names>
</name>
<etal/>
</person-group>. (<year>2021</year>). <article-title>Maximizing efficiency of genomic selection in CIMMYT&#x2019;s tropical maize breeding program</article-title>. <source>Theor. Appl. Genet.</source> <volume>134</volume>, <fpage>279</fpage>&#x2013;<lpage>294</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s00122-020-03696-9</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bauck</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Morota</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Kachman</surname> <given-names>S. D.</given-names>
</name>
<name>
<surname>Spangler</surname> <given-names>M. L.</given-names>
</name>
<name>
<surname>Fernandes</surname> <given-names>S. B.</given-names>
</name>
<etal/>
</person-group>. (<year>2018</year>). <article-title>Efficiency of multi-trait, indirect, and trait-assisted genomic selection for improvement of biomass sorghum. genetica TAG. theoretical and applied genetics</article-title>. <source>Theoretische und angewandte Genetik</source> <volume>131</volume>, <fpage>747</fpage>&#x2013;<lpage>755</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s00122-017-3033-y</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Belamkar</surname> <given-names>V.</given-names>
</name>
<name>
<surname>Guttieri</surname> <given-names>M. J.</given-names>
</name>
<name>
<surname>Hussain</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Jarquin</surname> <given-names>D.</given-names>
</name>
<name>
<surname>El-Basyoni</surname> <given-names>I.</given-names>
</name>
<name>
<surname>Poland</surname> <given-names>J.</given-names>
</name>
<etal/>
</person-group>. (<year>2018</year>). <article-title>Genomic selection in preliminary yield trials in a winter wheat breeding program</article-title>. <source>G3: Genes Genomes Genet.</source> <volume>8</volume>, <fpage>2735</fpage>&#x2013;<lpage>2747</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1534/g3.118.200415</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blondel</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Onogi</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Iwata</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Ueda</surname> <given-names>N.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>A ranking approach to genomic selection</article-title>. <source>PLoS One</source> <volume>10</volume>, <elocation-id>e0128570</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1371/journal.pone.0128570</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cerrudo</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Cao</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Yuan</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Martinez</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Suarez</surname> <given-names>E. A.</given-names>
</name>
<name>
<surname>Babu</surname> <given-names>R.</given-names>
</name>
<etal/>
</person-group>. (<year>2018</year>). <article-title>Genomic selection outperforms marker assisted selection for grain yield and physiological traits in a maize doubled haploid population across water treatments</article-title>. <source>Front. Plant Sci.</source> <volume>9</volume>, <elocation-id>366</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.3389/fpls.2018.00366</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cheng</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Jiang</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Xu</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Xiao</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>R.</given-names>
</name>
<etal/>
</person-group>. (<year>2021</year>). <article-title>Genome optimization <italic>via</italic> virtual simulation to accelerate maize hybrid breeding</article-title>. <source>Briefings Bioinf.</source> <volume>23</volume>, <fpage>bbab447</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/bib/bbab447</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Collard</surname> <given-names>B. C.</given-names>
</name>
<name>
<surname>Mackill</surname> <given-names>D. J.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Marker-assisted selection: An approach for precision plant breeding in the twenty-first century</article-title>. <source>Philos. Trans. R. Soc. B: Biol. Sci.</source> <volume>363</volume>, <fpage>557</fpage>&#x2013;<lpage>572</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1098/rstb.2007.2170</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crossa</surname> <given-names>J.</given-names>
</name>
<name>
<surname>P&#xe9;rez-Rodr&#xed;guez</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Cuevas</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Montesinos-L&#xf3;pez</surname> <given-names>O.</given-names>
</name>
<name>
<surname>Jarqu&#xed;n</surname> <given-names>D.</given-names>
</name>
<name>
<surname>de los Campos</surname> <given-names>G.</given-names>
</name>
<etal/>
</person-group>. (<year>2017</year>). <article-title>Genomic selection in plant breeding: Methods, models, and perspectives</article-title>. <source>Trends Plant Sci.</source> <volume>22</volume>, <fpage>961</fpage>&#x2013;<lpage>975</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tplants.2017.08.011</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>de los Campos</surname> <given-names>G.</given-names>
</name>
<name>
<surname>P&#xe9;rez</surname> <given-names>P.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>BGLR: Bayesian generalized linear regression</article-title>,&#x201d; in <source>R package version 1.0</source>, <fpage>4</fpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Desta</surname> <given-names>Z. A.</given-names>
</name>
<name>
<surname>Ortiz</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Genomic selection: Genome-wide prediction in plant improvement</article-title>. <source>Trends Plant Sci.</source> <volume>19</volume>, <fpage>592</fpage>&#x2013;<lpage>601</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tplants.2014.05.006</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Endelman</surname> <given-names>J. B.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Ridge regression and other kernels for genomic selection with r package rrBLUP</article-title>. <source>Plant Genome</source> <volume>4</volume>, <fpage>250</fpage>&#x2013;<lpage>255</lpage>. doi: <pub-id pub-id-type="doi">10.3835/plantgenome2011.08.0024</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedman</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Hastie</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Tibshirani</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Regularization paths for generalized linear models <italic>via</italic> coordinate descent</article-title>. <source>J. Stat. software</source> <volume>33</volume>, <elocation-id>1</elocation-id>. doi: <pub-id pub-id-type="doi">10.18637/jss.v033.i01</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gonz&#xe1;lez-Recio</surname> <given-names>O.</given-names>
</name>
<name>
<surname>Forni</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Genome-wide prediction of discrete traits using Bayesian regressions and machine learning</article-title>. <source>Genet. Sel. Evol.</source> <volume>43</volume>, <fpage>21329522</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1297-9686-43-7</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Tao</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Ren</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Xu</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Wu</surname> <given-names>K.</given-names>
</name>
<etal/>
</person-group>. (<year>2019</year>). <article-title>Development of multiple SNP marker panels affordable to breeders through genotyping by target sequencing (GBTS) in maize</article-title>. <source>Mol. Breed.</source> <volume>39</volume>, <fpage>1</fpage>&#x2013;<lpage>12</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s11032-019-0940-4</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Heffner</surname> <given-names>E. L.</given-names>
</name>
<name>
<surname>Lorenz</surname> <given-names>A. J.</given-names>
</name>
<name>
<surname>Jannink</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Sorrells</surname> <given-names>M. E.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Plant breeding with genomic selection: gain per unit time and cost</article-title>. <source>Crop Sci.</source> <volume>50</volume>, <fpage>1681</fpage>&#x2013;<lpage>1690</lpage>. doi: <pub-id pub-id-type="doi">10.2135/cropsci2009.11.0662</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liaw</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Wiener</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Classification and regression by randomForest</article-title>. <source>R News</source> <volume>2</volume>, <fpage>18</fpage>&#x2013;<lpage>22</lpage>.</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Xiao</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Luo</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Qiao</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Yang</surname> <given-names>W.</given-names>
</name>
<etal/>
</person-group>. (<year>2020</year>). <article-title>CUBIC: An atlas of genetic architecture promises directed maize improvement</article-title>. <source>Genome Biol.</source> <volume>21</volume>, <fpage>1</fpage>&#x2013;<lpage>17</lpage>. doi: <pub-id pub-id-type="doi">10.1186/s13059-020-1930-x</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ma</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Qiu</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Song</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Cheng</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Zhai</surname> <given-names>J.</given-names>
</name>
<etal/>
</person-group>. (<year>2018</year>). <article-title>A deep convolutional neural network approach for predicting phenotypes from genotypes</article-title>. <source>Planta</source> <volume>248</volume>, <fpage>1307</fpage>&#x2013;<lpage>1318</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s00425-018-2976-9</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McMurdie</surname> <given-names>P. J.</given-names>
</name>
<name>
<surname>Holmes</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Shiny-phyloseq: Web application for interactive microbiome analysis with provenance tracking</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>282</fpage>&#x2013;<lpage>283</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/bioinformatics/btu616</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meuwissen</surname> <given-names>T. H.</given-names>
</name>
<name>
<surname>Hayes</surname> <given-names>B. J.</given-names>
</name>
<name>
<surname>Goddard</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Prediction of total genetic value using genome-wide dense marker maps</article-title>. <source>Genetics</source> <volume>157</volume>, <fpage>1819</fpage>&#x2013;<lpage>1829</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/genetics/157.4.1819</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Meyer</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Dimitriadou</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Hornik</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Weingessel</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Leisch</surname> <given-names>F.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Misc functions of the department of statistics (e1071), TU wien</article-title>,&#x201d; in <source>R package version 1.6-3</source>.</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ornella</surname> <given-names>L.</given-names>
</name>
<name>
<surname>P&#xe9;rez</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Tapia</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Gonz&#xe1;lez-Camacho</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Burgue&#xf1;o</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>X.</given-names>
</name>
<etal/>
</person-group>. (<year>2014</year>). <article-title>Genomic-enabled prediction with classification algorithms</article-title>. <source>Heredity</source> <volume>112</volume>, <fpage>616</fpage>&#x2013;<lpage>626</lpage>. doi: <pub-id pub-id-type="doi">10.1038/hdy.2013.144</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Qiu</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Cheng</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Song</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Tang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Ma</surname> <given-names>C.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Application of machine learning-based classification to genomic selection and performance improvement</source> (<publisher-loc>Cham, Switzerland</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>412</fpage>&#x2013;<lpage>421</lpage>.</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robert</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Auzanneau</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Goudemand</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Oury</surname> <given-names>F.-X.</given-names>
</name>
<name>
<surname>Rolland</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Heumez</surname> <given-names>E.</given-names>
</name>
<etal/>
</person-group>. (<year>2022</year>). <article-title>Phenomic selection in wheat breeding: Identification and optimisation of factors influencing prediction accuracy and comparison to genomic selection</article-title>. <source>Theor. Appl. Genet.</source> <volume>135</volume>, <fpage>895</fpage>&#x2013;<lpage>914</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s00122-021-04005-8</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sievert</surname> <given-names>C.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Interactive web-based data visualization with r, plotly, and shiny</source> (<publisher-loc>Boca Raton, Florida</publisher-loc>: <publisher-name>CRC Press</publisher-name>).</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tanaka</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Iwata</surname> <given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Bayesian Optimization for genomic selection: A method for discovering the best genotype among a large number of candidates</article-title>. <source>Theor. Appl. Genet</source>. <volume>131</volume>, <fpage>93</fpage>&#x2013;<lpage>105</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s00122-017-2988-z</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Xu</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Qu</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Cui</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Chater</surname> <given-names>J. M.</given-names>
</name>
<etal/>
</person-group>. (<year>2021</year>). <article-title>Boosting predictabilities of agronomic traits in rice using bivariate genomic selection</article-title>. <source>Briefings Bioinf.</source> <volume>22</volume>, <elocation-id>bbaa103</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/bib/bbaa103</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiao</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Jiang</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Cheng</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Yan</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>R.</given-names>
</name>
<etal/>
</person-group>. (<year>2021</year>). <article-title>The genetic mechanism of heterosis utilization in maize improvement</article-title>. <source>Genome Biol.</source> <volume>22</volume>, <fpage>1</fpage>&#x2013;<lpage>29</lpage>. doi: <pub-id pub-id-type="doi">10.1186/s13059-021-02370-7</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Crouch</surname> <given-names>J. H.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Marker-assisted selection in plant breeding: From publications to practice</article-title>. <source>Crop Sci.</source> <volume>48</volume>, <fpage>391</fpage>&#x2013;<lpage>407</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.2135/cropsci2007.04.0191</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yan</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>X.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Unsupervised and semi-supervised learning: The next frontier in machine learning for plant systems biology</article-title>. <source>Plant J.</source> <volume>111</volume>, <fpage>1527</fpage>&#x2013;<lpage>1538</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1111/tpj.15905</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yan</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Xu</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Cheng</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Jiang</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Xiao</surname> <given-names>Y.</given-names>
</name>
<etal/>
</person-group>. (<year>2021</year>). <article-title>LightGBM: Accelerated genomically designed crop breeding through ensemble learning</article-title>. <source>Genome Biol.</source> <volume>22</volume>, <fpage>1</fpage>&#x2013;<lpage>24</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s13059-021-02492-y</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Ouyang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Yao</surname> <given-names>W.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>shinyCircos: An R/Shiny application for interactive creation of circos plot</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>1229</fpage>&#x2013;<lpage>1231</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/bioinformatics/btx763</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>
