<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2016.01914</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A Meta-Analysis Based Method for Prioritizing Candidate Genes Involved in a Pre-specific Function</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zhai</surname> <given-names>Jingjing</given-names></name>
<xref ref-type="author-notes" rid="fn003"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/345828/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Tang</surname> <given-names>Yunjia</given-names></name>
<xref ref-type="author-notes" rid="fn003"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/397756/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Yuan</surname> <given-names>Hao</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/366770/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Longteng</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/366527/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Shang</surname> <given-names>Haoli</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/397778/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Ma</surname> <given-names>Chuang</given-names></name>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/78201/overview"/>
</contrib>
</contrib-group>
<aff><institution>State Kay Laboratory of Crop Stress Biology for Arid Areas, College of Life Sciences, Northwest A&#x00026;F University</institution> <country>Yangling, China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Yasset Perez-Riverol, European Bioinformatics Institute, UK</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Zoran Nikoloski, Max Planck Institute of Molecular Plant Physiology, Germany; Enrique Audain, Universit&#x000E4;tsklinikum Schleswig-Holstein, Germany</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Chuang Ma <email>chuangma2006&#x00040;gmail.com</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Plant Science</p></fn>
<fn fn-type="other" id="fn003"><p>&#x02020;These authors have contributed equally to this work.</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>15</day>
<month>12</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>7</volume>
<elocation-id>1914</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>08</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>02</day>
<month>12</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2016 Zhai, Tang, Yuan, Wang, Shang and Ma.</copyright-statement>
<copyright-year>2016</copyright-year>
<copyright-holder>Zhai, Tang, Yuan, Wang, Shang and Ma</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>The identification of genes associated with a given biological function in plants remains a challenge, although network-based gene prioritization algorithms have been developed for <italic>Arabidopsis thaliana</italic> and many non-model plant species. Nevertheless, these network-based gene prioritization algorithms have encountered several problems; one in particular is that of unsatisfactory prediction accuracy due to limited network coverage, varying link quality, and/or uncertain network connectivity. Thus, a model that integrates complementary biological data may be expected to increase the prediction accuracy of gene prioritization. Toward this goal, we developed a novel gene prioritization method named RafSee, to rank candidate genes using a random forest algorithm that integrates sequence, evolutionary, and epigenetic features of plants. Subsequently, we proposed an integrative approach named RAP (Rank Aggregation-based data fusion for gene Prioritization), in which an order statistics-based meta-analysis was used to aggregate the rank of the network-based gene prioritization method and RafSee, for accurately prioritizing candidate genes involved in a pre-specific biological function. Finally, we showcased the utility of RAP by prioritizing 380 flowering-time genes in <italic>Arabidopsis</italic>. The &#x0201C;leave-one-out&#x0201D; cross-validation experiment showed that RafSee could work as a complement to a current state-of-art network-based gene prioritization system (AraNet v2). Moreover, RAP ranked 53.68% (204/380) flowering-time genes higher than AraNet v2, resulting in an 39.46% improvement in term of the first quartile rank. Further evaluations also showed that RAP was effective in prioritizing genes-related to different abiotic stresses. To enhance the usability of RAP for <italic>Arabidopsis</italic> and non-model plant species, an R package implementing the method is freely available at <ext-link ext-link-type="uri" xlink:href="http://bioinfo.nwafu.edu.cn/software">http://bioinfo.nwafu.edu.cn/software</ext-link>.</p>
</abstract>
<kwd-group>
<kwd>biological network</kwd>
<kwd>data fusion</kwd>
<kwd>flowering time</kwd>
<kwd>gene prioritization</kwd>
<kwd>machine learning</kwd>
<kwd>meta-analysis</kwd>
<kwd>rank aggregation</kwd>
<kwd>systems biology</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="3"/>
<equation-count count="0"/>
<ref-count count="49"/>
<page-count count="12"/>
<word-count count="8020"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>A major challenge in plant biology is to identify the most promising genes from large lists of candidate genes (e.g., all genes in the whole genome) to find those which play an important role in an agricultural trait or a complex biological process (Lee et al., <xref ref-type="bibr" rid="B19">2010</xref>; Li et al., <xref ref-type="bibr" rid="B24">2015</xref>; Sabaghian et al., <xref ref-type="bibr" rid="B37">2015</xref>). However, an experimental validation of every candidate gene is very time-consuming and costly. A biologist would have to manually select the promising genes based on their potential function, a difficult task considering the paucity and disparity of functional annotation in plant species (Rhee and Mutwil, <xref ref-type="bibr" rid="B34">2014</xref>). Computational methods are thus required to help biologists automatically prioritize candidate genes by integrating large amounts of functional genomic data that is now publicly available.</p>
<p>Gene prioritization was first developed to identify disease-associated human genes within a multigene locus identified by a positional genetic study (Perez-Iratxeta et al., <xref ref-type="bibr" rid="B33">2002</xref>). This application was subsequently expanded to studies that generate candidate genes from the whole genome using genome-wide association analyses and &#x0201C;&#x02013;omics&#x0201D; experiments (Moreau and Tranchevent, <xref ref-type="bibr" rid="B30">2012</xref>). A number of computational approaches and bioinformatics tools have been developed to prioritize disease-related human genes with the use of various data sources such as scientific texts, protein-protein interactions, and functional annotations or pathways (Tranchevent et al., <xref ref-type="bibr" rid="B43">2011</xref>; Moreau and Tranchevent, <xref ref-type="bibr" rid="B30">2012</xref>). However, to the best of our knowledge, none of these approaches and tools designed for human studies can be directly applied to tackle the gene prioritization problem in plants.</p>
<p>Although functional genomic data are becoming available for many plant species, gene prioritization is still nascent in plant science. Until recently, only a few computational algorithms had been developed to address the challenge of gene prioritization for the model plant <italic>Arabidopsis thaliana</italic> (Lee et al., <xref ref-type="bibr" rid="B19">2010</xref>, <xref ref-type="bibr" rid="B22">2015b</xref>; Ma et al., <xref ref-type="bibr" rid="B27">2014</xref>; Sabaghian et al., <xref ref-type="bibr" rid="B37">2015</xref>). Network-based gene prioritization is a commonly used strategy because it is capable of characterizing the complex relationships among genes. In addition to gene co-expression networks (Ma et al., <xref ref-type="bibr" rid="B27">2014</xref>), integrated functional association networks have been recently developed to prioritize genes in plants (Lee et al., <xref ref-type="bibr" rid="B19">2010</xref>, <xref ref-type="bibr" rid="B21">2015a</xref>,<xref ref-type="bibr" rid="B22">b</xref>; Warde-Farley et al., <xref ref-type="bibr" rid="B44">2010</xref>; Sabaghian et al., <xref ref-type="bibr" rid="B37">2015</xref>). One of the better known functional association networks is AraNet (<ext-link ext-link-type="uri" xlink:href="http://www.functionalnet.org/aranet">http://www.functionalnet.org/aranet</ext-link>), which was originally built for prioritizing genes for <italic>Arabidopsis thaliana</italic> using a modified Bayesian system for integrating 24 distinct types of gene-gene associations derived from plant and non-plant species (Lee et al., <xref ref-type="bibr" rid="B19">2010</xref>). The power of AraNet in gene prioritization has been demonstrated by the identification of regulators of drought sensitivity and lateral root development in <italic>Arabidopsis</italic> (Lee et al., <xref ref-type="bibr" rid="B19">2010</xref>). Given the importance of network-based gene prioritization in the identification of plant gene function and in the genetic analysis of plant traits, the integration of functional associations into the design of network-based gene prioritization system (e.g., AraNet v2; <ext-link ext-link-type="uri" xlink:href="http://www.inetbio.org/aranet">http://www.inetbio.org/aranet</ext-link>), has been implemented for 28 non-model plant organisms, including some important crops like <italic>Zea mays</italic> (MaizeNet; <ext-link ext-link-type="uri" xlink:href="http://www.inetbio.org/maizenet">http://www.inetbio.org/maizenet</ext-link>) and <italic>Oryza sativa</italic> (RiceNet v2; <ext-link ext-link-type="uri" xlink:href="http://www.inetbio.org/ricenet">http://www.inetbio.org/ricenet</ext-link>; Lee et al., <xref ref-type="bibr" rid="B21">2015a</xref>).</p>
<p>The rapid increase of functional association networks could accelerate the discovery of genes that are involved in a specific biological process or associated with plant traits of interest. However, the performance of network-based gene prioritization is still unsatisfied, due to limited network coverage, varying link quality, and/or uncertain network connectivity (Lee et al., <xref ref-type="bibr" rid="B19">2010</xref>, <xref ref-type="bibr" rid="B20">2011</xref>). Hence, novel gene prioritization algorithms that integrate sequence, evolutionary, and epigenetic features would complement and strengthen network-based gene prioritization algorithms; this is because some sequence-based features are capable of predicting protein functions (Lee et al., <xref ref-type="bibr" rid="B18">2009</xref>; Libbrecht and Noble, <xref ref-type="bibr" rid="B25">2015</xref>; Lloyd et al., <xref ref-type="bibr" rid="B26">2015</xref>). To summarize, there is a recognized need for developing novel gene prioritization algorithms capable of integrating different type of features, and for investigating how these integrative algorithms may complement the conventional network-based gene prioritization algorithms (e.g., AraNet v2).</p>
<p>In this study, we first develop a novel gene prioritization method named RafSee: it applies a random forest algorithm to integrate features from protein sequences, evolutionary conservation, and epigenetic methylation marks. We then propose an integrative approach named RAP: it prioritizes the most promising genes by aggregating the prediction results from the network-based gene prioritization algorithm and RafSee using an order statistics-based meta-analysis strategy (Kolde et al., <xref ref-type="bibr" rid="B16">2012</xref>). We go on to evaluate the prioritization ability of RafSee, of RAP, and of one state-of-the-art network-based gene prioritization system (AraNet v2), using 449 known flowering-time-related <italic>Arabidopsis</italic> genes manually compiled from different sources. We show that RafSee could be used as a robust complement to AraNet v2 for the prioritization of flowering-time genes in <italic>Arabidopsis</italic>. Moreover, we show that RAP performs better than either AraNet v2 or RafSee in most cases. The RAP method has been implemented as an R package available for public use.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and methods</title>
<sec>
<title>Workflow of RAP</title>
<p>The workflow of RAP is shown in Figure <xref ref-type="fig" rid="F1">1</xref>. Starting from a set of seed genes, RAP first builds an integrative random forest-based gene prioritization method (RafSee) using sequence, evolutionary, and epigenetic features. Then, using an order statistics-based meta-analysis approach, RAP aggregates prediction results from RafSee and one network-based gene prioritization system (AraNet v2) to deliver the top-ranked candidate genes for further experimental validation.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Schematic of the RAP-based gene prioritization</bold>.</p></caption>
<graphic xlink:href="fpls-07-01914-g0001.tif"/>
</fig>
</sec>
<sec>
<title>Compilation of seed genes</title>
<p>To identify candidate flowering-time genes we used seed genes. The latter are a set of genes with a known function in flowering-time control which were collected from four different sources: (1) 293 flowering-time genes annotated in WikiPathways, which is an open, collaborative platform for the curation of pathways by researchers in the entire biology community (<ext-link ext-link-type="uri" xlink:href="http://www.wikipathways.org/index.php/Pathway:WP2312">http://www.wikipathways.org/index.php/Pathway:WP2312</ext-link>; Kutmon et al., <xref ref-type="bibr" rid="B17">2016</xref>); (2) 293 flowering-time genes collected by Zhu et al. (<xref ref-type="bibr" rid="B49">2011</xref>), according to the annotation related to flowering-related traits in The Arabidopsis Information Resource (TAIR) database (TAIR10; version 10; <ext-link ext-link-type="uri" xlink:href="https://www.arabidopsis.org">https://www.arabidopsis.org</ext-link>); (3) 406 flowering-time genes manually collected from literatures by Chen et al. (<xref ref-type="bibr" rid="B7">2012</xref>); (4) 174 flowering-time genes collected by the research group of Professor George Coupland at Max Planck Institute for Plant Breeding Research (<ext-link ext-link-type="uri" xlink:href="http://www.mpipz.mpg.de/14637/Arabidopsis_flowering_genes">http://www.mpipz.mpg.de/14637/Arabidopsis_flowering_genes</ext-link>). After eliminating 14 microRNA genes, we finally obtained a total of 449 protein-coding genes related to flowering time in <italic>Arabidopsis</italic> (Table <xref ref-type="supplementary-material" rid="SM1">S1</xref>).</p>
</sec>
<sec>
<title>RafSee, an integrative random forest-based gene prioritization method</title>
<p>The main process for developing an integrative random forest-based gene prioritization method RafSee had four steps.</p>
<p><bold>Step 1: Sample labeling</bold>. A total of 27 416 <italic>Arabidopsis</italic> genes annotated in the TAIR10 database were partitioned into three sample sets&#x02014;positive, negative, and undocumented&#x02014;of which the first two were required for training the random forest-based machine learning system. The positive sample set consisted of 449 known flowering-time genes (i.e., seed genes). The negative sample set consisted of 8503 protein-coding <italic>Arabidopsis</italic> genes, which had weak or no functional associations with the 449 flowering-time genes as annotated in the <italic>Arabidopsis</italic> protein interaction network from the STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database (version 10.0; <ext-link ext-link-type="uri" xlink:href="http://string-db.org">http://string-db.org</ext-link>; 10 637 352 <italic>Arabidopsis</italic> protein interactions from 24 283 <italic>Arabidopsis</italic> proteins; downloaded on Feb. 25th, 2016). The STRING network was built based on the estimation of an association of confidence score for each protein pair using a Bayesian method that integrated various sources of interactions such as yeast two-hybrid experiments, text mining, co-expression, protein homology, etc. (Jensen et al., <xref ref-type="bibr" rid="B13">2009</xref>). Two proteins were unlinked in the network when their association of confidence was weak (i.e., score &#x0003C;0.15). Note that some of the positive samples may be erroneously annotated as flowering-time genes, while some of the negative samples may in fact be true flowering-time genes not yet discovered. The remaining 18,464 protein-coding genes annotated for the <italic>Arabidopsis</italic> genome in the TAIR10 database were labeled as undocumented samples.</p>
<p><bold>Step 2: Feature encoding</bold>. To be recognized by the random forest-based machine learning system, each protein sequence of 27,416 <italic>Arabidopsis</italic> genes was characterized by sequence, evolutionary, and epigenetic features, resulting in the generation of a 1012-dimensional feature vector with seven encoding schemes. The sequence-based features were generated with four encoding schemes, which are described in detail below.</p>
<list list-type="bullet">
<list-item><p><bold>Amino acid composition (AAC):</bold> The AAC was a 420-dimensional numeric vector, which measured the occurrence frequency of 20 amino acids and 400 amino acid pairs in a protein sequence.</p></list-item>
<list-item><p><bold>Pseudo amino acid composition (PAAC):</bold> The PAAC incorporates both the composition of amino acids and their sequence-order information in a protein (Chou, <xref ref-type="bibr" rid="B8">2001</xref>). There were 25 PAAC-related numeric features generated using the R package &#x0201C;protr&#x0201D; (version 1.1-1; <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/protr">https://cran.r-project.org/web/packages/protr</ext-link>; Zhang et al., <xref ref-type="bibr" rid="B47">2013</xref>) with the parameters &#x003BB; &#x0003D; 5, &#x003C9; &#x0003D; 0.05. The first 20 features are associated with the occurrence frequency of the 20 amino acids, whereas the next five features (21&#x02013;25) reflect the effect of sequence order (see Supplementary Data <xref ref-type="supplementary-material" rid="SM8">1</xref> for full details).</p></list-item>
<list-item><p><bold>Amphiphilic pseudo amino acid composition (APAAC):</bold> A total of 30 APAAC-related numeric features were generated using the R package &#x0201C;protr&#x0201D; with the parameters &#x003BB; &#x0003D; 5, &#x003C9; &#x0003D; 0.05. The first 20 features reflect the components of 20 amino acids, whereas the additional 10 features are a set of correlation factors that represent different hydrophobicity and hydrophilicity distribution patterns along a protein sequence (see Supplementary Data <xref ref-type="supplementary-material" rid="SM8">1</xref> for full details).</p></list-item>
<list-item><p><bold>Physicochemical properties (PCPs):</bold> For each amino acid, 533 PCPs were generated to describe various physicochemical properties using the R package &#x0201C;Interpol&#x0201D; (version 1.3.1; <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/Interpol">https://cran.r-project.org/web/packages/Interpol</ext-link>). The score matrix of PCPs (533 PCPs in rows, 20 amino acids in columns) is given in Table <xref ref-type="supplementary-material" rid="SM2">S2</xref>. As described in Jeong et al. (<xref ref-type="bibr" rid="B14">2009</xref>), for a given protein with a sequence length L, the normalized value for a specific physicochemical property <italic>j</italic> was calculated using the formula: <inline-formula><mml:math id="M1"><mml:mrow><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>L</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi mathvariant='bold'>i</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mn>1</mml:mn></mml:mstyle></mml:mrow><mml:mi mathvariant='bold'>L</mml:mi></mml:msubsup><mml:mrow><mml:mfrac><mml:mrow><mml:msubsup><mml:mi mathvariant='bold'>p</mml:mi><mml:mi mathvariant='bold'>i</mml:mi><mml:mi mathvariant='bold'>j</mml:mi></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi mathvariant='bold'>p</mml:mi><mml:mrow><mml:mi mathvariant='bold'>m</mml:mi><mml:mi mathvariant='bold'>i</mml:mi><mml:mi mathvariant='bold'>n</mml:mi></mml:mrow><mml:mi mathvariant='bold'>j</mml:mi></mml:msubsup></mml:mrow><mml:mrow><mml:msubsup><mml:mi mathvariant='bold'>p</mml:mi><mml:mrow><mml:mi mathvariant='bold'>m</mml:mi><mml:mi mathvariant='bold'>a</mml:mi><mml:mi mathvariant='bold'>x</mml:mi></mml:mrow><mml:mi mathvariant='bold'>j</mml:mi></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi mathvariant='bold'>p</mml:mi><mml:mrow><mml:mi mathvariant='bold'>m</mml:mi><mml:mi mathvariant='bold'>i</mml:mi><mml:mi mathvariant='bold'>n</mml:mi></mml:mrow><mml:mi mathvariant='bold'>j</mml:mi></mml:msubsup></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M2"><mml:mrow><mml:msubsup><mml:mi mathvariant='bold'>p</mml:mi><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> is the score of property <italic>j</italic> for the residue at position <italic>i</italic>, <inline-formula><mml:math id="M3"><mml:msubsup><mml:mi>p</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mi>j</mml:mi></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M4"><mml:msubsup><mml:mi>p</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow><mml:mi>j</mml:mi></mml:msubsup></mml:math></inline-formula> are the maximum and minimum values of the property <italic>j</italic>, respectively. The PCP-based encoding scheme generated a total of 533 numeric features for the corresponding physicochemical properties.</p></list-item>
</list>
<p>The evolutionary-based features included a sequence conservation (SC)-related feature and two whole genome duplication (WGD)-related features. The SC-related feature compares the sequence identity of <italic>Arabidopsis</italic> protein sequences to protein sequences from other 34 plant species&#x02014;26 dicotyledonous, six monocotyledonous, and two other embryophyte species. For a given <italic>Arabidopsis</italic> gene, the BLASTP (basic local alignment search tool for proteins; <ext-link ext-link-type="uri" xlink:href="http://blast.ncbi.nlm.nih.gov/Blast.cgi">http://blast.ncbi.nlm.nih.gov/Blast.cgi</ext-link>) similarity search was first performed to compare the protein sequence with those from 34 other plant species. Then, the 34 identities of the best BLASTP matches in the corresponding plant species were selected. Finally, a SC value was assigned as the median value of these 34 identities. For the two WGD-related features, two binary values were used to indicate either the presence (1) or absence (0) of a paralog produced in &#x003B1; and &#x003B2;&#x003B3; WGD events. Genes with paralogs derived from &#x003B1; and/or &#x003B2;&#x003B3; WGD events were identified by Bowers et al. (<xref ref-type="bibr" rid="B4">2003</xref>). These three evolutionary-based features have recently been encoded for the identification of essential genes in <italic>Arabidopsis</italic> (Lloyd et al., <xref ref-type="bibr" rid="B26">2015</xref>).</p>
<p>The epigenetic feature is a binary value indicating whether the gene body is methylated or not. Body-methylated genes were identified by Takuno and Gaut (<xref ref-type="bibr" rid="B41">2012</xref>), and the status of body-methylation for 27 416 <italic>Arabidopsis</italic> genes was obtained from Lloyd et al. (<xref ref-type="bibr" rid="B26">2015</xref>).</p>
<p><bold>Step 3: Feature selection</bold>. Two feature selection methods, the Student&#x00027;s <italic>t</italic>-test and the chi-square test, were, respectively, used to select numeric and binary features that have the capability of distinguishing positive samples and negative samples. The difference in the distribution of a given feature between positive and negative samples was deemed significant when the <italic>P</italic>-value was &#x0003C; 0.05. In this way, a total of 766 statistically significant features were identified (Table <xref ref-type="supplementary-material" rid="SM3">S3</xref>).</p>
<p><bold>Step 4: Random forest-based prediction model construction</bold>. To implement the integrative random forest-based gene prioritization method RafSee, a prediction model was constructed using the random forest-based machine learning algorithm (Touw et al., <xref ref-type="bibr" rid="B42">2013</xref>). This algorithm generated hundreds of decision trees built using a subset of samples and features randomly selected from a user-input feature matrix (positive and negative samples for training in rows, 766 selected features in columns). Using the trained prediction model, RafSee ranked candidate genes based on their probability to be a true flowering-time gene as estimated from votes from all the trees. The random forest algorithm was implemented using the R package &#x0201C;randomforest&#x0201D; (version 4.6-12; <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/randomForest">https://cran.r-project.org/web/packages/randomForest</ext-link>). The number of decision trees was set to ntree &#x0003D; 500 (all other parameters used default values).</p>
</sec>
<sec>
<title>Network-based gene prioritization</title>
<p>The network-based gene prioritization was performed using the functional association network AraNet v2 web server (<ext-link ext-link-type="uri" xlink:href="http://www.inetbio.org/aranet/">http://www.inetbio.org/aranet/</ext-link>), which was developed for identifying candidate genes of interests from <italic>Arabidopsis</italic> and 28 non-model plant species (Lee et al., <xref ref-type="bibr" rid="B22">2015b</xref>). The functional associations between gene pairs (links) in AraNet v2 are inferred using a Bayesian statistics framework that integrates 19 distinct types of data: namely protein-protein interactions, co-expression, genomic context, domain co-occurrence, and phylogenetic profile similarity (Lee et al., <xref ref-type="bibr" rid="B22">2015b</xref>). The integration of diverse biological data greatly improves network coverage and accuracy. Currently AraNet v2 consists of 895 000 co-functional links, covering 83.5% (22,894 out of 27,416) of all <italic>Arabidopsis</italic> protein-coding genes annotated in the TAIR10 database. The input of AraNet v2 is a set of seed genes of interest; the output is the rank of other genes in the network as determined by the co-functional prediction score of a Bayesian statistics framework.</p>
</sec>
<sec>
<title>Order statistics-based meta-analysis</title>
<p>In RAP, an order statistics algorithm, known as robust rank aggregation, was applied to aggregate the prediction results from different gene prioritization methods (i.e., AraNet v2 and RafSeq). The robust rank aggregation is a powerful meta-analysis algorithm that uses a rank-order statistic for not only taking into account the positional information of input genes, but also for assigning a significance score (<italic>P</italic>-value) for each gene within a theoretical model (Aerts et al., <xref ref-type="bibr" rid="B1">2006</xref>; Kolde et al., <xref ref-type="bibr" rid="B16">2012</xref>). <italic>M</italic> is the number of candidate genes, and <bold><italic>R</italic></bold><sub><italic>i</italic></sub> &#x0003D; (<italic>r</italic><sub><italic>i</italic>, 1</sub>, <italic>r</italic><sub><italic>i</italic>, 2</sub>, &#x02026;, <italic>r</italic><sub><italic>i, n</italic></sub>) is the vector of ranks for candidate gene <italic>i</italic> from different gene prioritization methods (here <italic>n</italic> &#x0003D; 2 for RafSee and AraNet v2). We first normalized gene ranks into percentiles <bold><italic>U</italic></bold><sub><italic>i</italic></sub> &#x0003D; (<italic>u</italic><sub><italic>i</italic>, 1</sub>, <italic>u</italic><sub><italic>i</italic>, 2</sub>, &#x02026;, <italic>u</italic><sub><italic>i, n</italic></sub>) with the formula: <italic>u</italic><sub><italic>i, j</italic></sub> &#x0003D; <italic>r</italic><sub><italic>i, j</italic></sub>/<italic>M</italic> (<italic>j</italic> &#x0003D; 1, 2, &#x02026;, <italic>n</italic>). The <italic>k</italic>th smallest percentiles among <italic>u</italic><sub><italic>i</italic>, 1</sub>, <italic>u</italic><sub><italic>i</italic>, 2</sub>, &#x02026;and <italic>u</italic><sub><italic>i, n</italic></sub>is an order-statistic which follows a beta distribution <italic>B</italic>(<italic>k, n</italic> &#x0002B; 1 &#x02212; <italic>k</italic>), under the assumption that the percentiles are uniformly distributed from 0 to 1. Based on the beta distribution, we then assigned a <italic>P</italic>-value to each percentile in <bold><italic>U</italic></bold><sub><italic>i</italic></sub> indicating how much better it is ranked compared with a null model expecting random ordering. The significance score of the candidate gene <italic>i</italic> is defined as the minimum value of all <italic>P</italic>-values. The robust rank aggregation method was implemented using the R package &#x0201C;RobustRankAggreg&#x0201D; (version 1.1; <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/RobustRankAggreg">https://cran.r-project.org/web/packages/RobustRankAggreg</ext-link>; Kolde et al., <xref ref-type="bibr" rid="B16">2012</xref>).</p>
</sec>
<sec>
<title>Implementation of RAP method</title>
<p>The RAP method has been implemented as an R package, which provides functions for generating sequence-based features (AAC, PAAC, APAAC, and PCP), and for extracting informative features with feature selection methods such as the student&#x00027;s <italic>t</italic>-test and chi-square test feature selection methods. Additionally, RAP provides functions to implement the integrative random forest-based gene prioritization method RafSee and to evaluate the prediction performance of gene prioritization methods with the cross-validation approach. To perform the gene prioritization in <italic>Arabidopsis</italic>, the user is only required to provide a set of genes of interest and the network-based gene prioritization results from the AraNet v2 system. With this data, RAP first ranks undocumented genes using the automatically built random forest-based gene prioritization method RafSee, and then it ranks the undocumented genes using the order statistics-based meta-analysis approach. The source code, sample data, and user manual of this R package are available at <ext-link ext-link-type="uri" xlink:href="http://bioinfo.nwafu.edu.cn/software">http://bioinfo.nwafu.edu.cn/software</ext-link>.</p>
</sec>
<sec>
<title>Performance evaluation using a cross-validation algorithm</title>
<p>Cross-validation is a widely used evaluation method in machine learning for assessing the performance of prediction models. To evaluate the predictive performance of RafSee in distinguishing positives and negatives, we used the 10-fold cross validation algorithm and receiver operating characteristic (ROC) curve analysis. In a 10-fold cross-validation algorithm, positive and negative samples are randomly partitioned into 10 groups having an approximately equal number of genes; each group is successively used for testing the performance of RafSee trained with the other nine groups of positive and negative samples. For each round of cross-validation, the prediction accuracy of RafSee was assessed using the ROC-curve analysis, which measures how true positive rate (<italic>y</italic> axis) changes as function of the false positive rate (<italic>x</italic> axis) at all possible thresholds. The area under the ROC curve (i.e., AUC) was used to quantitatively score the prediction accuracy of RafSee. An AUC value can range from 0 to 1; a higher AUC value indicates better prediction accuracy for RafSee. After testing with each of the 10 groups, the mean value of the 10 AUCs represented the overall performance of RafSee.</p>
<p>The &#x0201C;leave-one-out&#x0201D; cross-validation technique was used to assess the prediction performance of different gene prioritization methods (i.e., AraNet v2, RafSee, and RAP) in ranking the flowering-time genes. In this technique, each flowering-time gene in the positive sample set was retained in turn as the testing sample while the remaining positive samples were used as the seed genes for three gene prioritization methods. The undocumented samples, negative samples, and the retained flowering-time gene were used as candidate genes for testing. A higher ranking of the retained flowering-time gene indicated a greater prediction accuracy of the gene prioritization method(s).</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec>
<title>Sequence, evolutionary, and epigenetic characteristics of flowering-time genes</title>
<p>We first generated 1012 features for each protein sequence of 27,416 <italic>Arabidopsis</italic> genes (449 positive samples, 8503 negative samples, and 18,464 undocumented samples), and then identified 766 features that differed between the positive and negative samples at a significance level of 0.05 (Table <xref ref-type="supplementary-material" rid="SM3">S3</xref>). Among these 766 features there were 255 ACC-related features, including the occurrence frequency of 18 amino acids and 237 amino acid pairs (Figures <xref ref-type="fig" rid="F2">2A,B</xref>). Besides the occurrence frequency of amino acids, we also noticed that the order of amino acids in flowering-time genes was not completely random. For example, seven of 20 amino acid pairs starting with histidine (H) were significantly different in their occurrence frequency between positive and negative samples, while five of 20 amino acid pairs ending with histidine (H) showed significant differences (Figure <xref ref-type="fig" rid="F2">2B</xref>).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Distribution of sequence, evolutionary, and epigenetic features in the positive and negative sample sets. (A)</bold> Boxplot distributions for the occurrence frequency of 20 amino acids in the positive and negative sample sets. Asterisks (<sup>&#x0002A;</sup>) indicate that the differences between positive and negative samples are statistically significant at the level of 0.05. <bold>(B)</bold> Differences in the occurrence frequency of 400 amino acid pairs between positive and negative samples. &#x0201C;Sig&#x0201D; represents a significant difference and &#x0201C;NS&#x0201D; represents a non-significant difference at the level of 0.05. <bold>(C)</bold> Density distributions of the median percentage of identity of positive and negative samples to the top BLASTP matches in 34 plant species. <bold>(D)</bold> Percentage of positive and negative samples that have a paralog derived from &#x003B1; and &#x003B2;&#x003B3; whole genome duplicates. <bold>(E)</bold> Percentage of genes with methylation in the positive and negative sample sets.</p></caption>
<graphic xlink:href="fpls-07-01914-g0002.tif"/>
</fig>
<p>We also detected significant differences for hydrophilicity and hydrophobicity patterns of protein sequences corresponding to six APAAC-related features; these included the third-order factor in term of hydrophilicity of amino acids, the first-order correlation factor, the second-order correlation factor, up to the fifth-order factor in term of hydrophobicity of amino acids.</p>
<p>The PCPs are a group of essential features for characterizing physicochemical properties of protein sequences. For this reason PCPs have been widely used in the prediction of protein structure, functional sites, and biological functions because of their interpretability (Mallick et al., <xref ref-type="bibr" rid="B28">2007</xref>; Li et al., <xref ref-type="bibr" rid="B23">2013</xref>; Chaudhary et al., <xref ref-type="bibr" rid="B6">2015</xref>). Here, 462 out of 533 PCP-related features were significantly different between positive and negative samples (Table <xref ref-type="supplementary-material" rid="SM3">S3</xref>). Among the top 10 PCP-related features ranked by level of statistical significance, five were related to the hydrophobicity of amino acids as calculated with different measures (top 3, 4, 6&#x02013;8; Table <xref ref-type="table" rid="T1">1</xref>). Another three of the top 10 PCP-related features were energy-related features, including that for the free energy of transfer of amino acids from organic solvent to water (top 1; Nozaki and Tanford, <xref ref-type="bibr" rid="B32">1971</xref>), the contribution of amino acids to the stability of proteins (top 2; Zhou and Zhou, <xref ref-type="bibr" rid="B48">2004</xref>), and the energy required to transfer amino acid side chains from water to less polar environments (top 9; Guy, <xref ref-type="bibr" rid="B12">1985</xref>). There were also two PCP-related features involved in the retention coefficients of different amino acids in both NaH<sub>2</sub>PO<sub>4</sub> and NaClO<sub>4</sub> (top 5, 10; Meek and Rossetti, <xref ref-type="bibr" rid="B29">1981</xref>).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>List of the top 10 PCP-related features</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Rank</bold></th>
<th valign="top" align="left"><bold>AA index ID</bold></th>
<th valign="top" align="left"><bold>Description</bold></th>
<th valign="top" align="center"><bold><italic>P</italic>-value</bold></th>
<th valign="top" align="left"><bold>References</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">NOZY710101</td>
<td valign="top" align="left">Transfer energy, organic solvent/water</td>
<td valign="top" align="center">1.16E-86</td>
<td valign="top" align="left">Nozaki and Tanford, <xref ref-type="bibr" rid="B32">1971</xref></td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">ZHOH040101</td>
<td valign="top" align="left">The stability scale from the knowledge-based atom-atom potential</td>
<td valign="top" align="center">7.16E-86</td>
<td valign="top" align="left">Zhou and Zhou, <xref ref-type="bibr" rid="B48">2004</xref></td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">SWER830101</td>
<td valign="top" align="left">Optimal matching hydrophobicity</td>
<td valign="top" align="center">5.04E-83</td>
<td valign="top" align="left">Sweet and Eisenberg, <xref ref-type="bibr" rid="B40">1983</xref></td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">CORJ870102</td>
<td valign="top" align="left">SWEIG index</td>
<td valign="top" align="center">6.67E-83</td>
<td valign="top" align="left">Cornette et al., <xref ref-type="bibr" rid="B11">1987</xref></td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left">MEEJ810102</td>
<td valign="top" align="left">Retention coefficient in NaH<sub>2</sub>PO<sub>4</sub></td>
<td valign="top" align="center">9.77E-82</td>
<td valign="top" align="left">Meek and Rossetti, <xref ref-type="bibr" rid="B29">1981</xref></td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="left">CIDH920104</td>
<td valign="top" align="left">Normalized hydrophobicity scales for alpha/beta-proteins</td>
<td valign="top" align="center">5.62E-81</td>
<td valign="top" align="left">Cid et al., <xref ref-type="bibr" rid="B9">1992</xref></td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="left">CIDH920103</td>
<td valign="top" align="left">Normalized hydrophobicity scales for alpha &#x0002B; beta-proteins</td>
<td valign="top" align="center">5.74E-80</td>
<td valign="top" align="left">Cid et al., <xref ref-type="bibr" rid="B9">1992</xref></td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="left">CIDH920105</td>
<td valign="top" align="left">Normalized average hydrophobicity scales</td>
<td valign="top" align="center">8.34E-80</td>
<td valign="top" align="left">Cid et al., <xref ref-type="bibr" rid="B9">1992</xref></td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="left">GUYH850102</td>
<td valign="top" align="left">Apparent partition energies calculated from Wertz-Scheraga index</td>
<td valign="top" align="center">1.82E-79</td>
<td valign="top" align="left">Guy, <xref ref-type="bibr" rid="B12">1985</xref></td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="left">MEEJ810101</td>
<td valign="top" align="left">Retention coefficient in NaClO<sub>4</sub></td>
<td valign="top" align="center">2.20E-79</td>
<td valign="top" align="left">Meek and Rossetti, <xref ref-type="bibr" rid="B29">1981</xref></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Altered flowering time has been suggested as an evolutionary strategy adopted by plants to quickly adapt to different environments (Kazan and Lyons, <xref ref-type="bibr" rid="B15">2016</xref>). Therefore, we suspect that differences may exist in the evolutionary patterns between positive and negative samples. The SC measures the identity of an <italic>Arabidopsis</italic> protein against protein sequences from 34 other plant species (see Section Materials and Methods). As shown in Figure <xref ref-type="fig" rid="F2">2C</xref>, flowering-time genes have, on average, a 65% shared identity with those of the other 34 species, while the negative samples have just about a 40% identity.</p>
<p>During the evolutionary process, the <italic>Arabidopsis</italic> genome has experienced at least two ancient whole-genome duplication (WGD) events (&#x003B1; WGD and &#x003B2;&#x003B3; WGD; Yun et al., <xref ref-type="bibr" rid="B46">2012</xref>). With the sequenced genome, 6830 and 2896 <italic>Arabidopsis</italic> genes with paralogs derived from &#x003B1; and &#x003B2;&#x003B3; WGD events were identified, respectively (Yun et al., <xref ref-type="bibr" rid="B46">2012</xref>). We found that over 15 and 30% flowering-time genes have a paralog derived from &#x003B1; and &#x003B2;&#x003B3; WGD events, respectively (Figure <xref ref-type="fig" rid="F2">2D</xref>). Nevertheless, as for the negative samples, this amounted to &#x0003C;10 and 20% of those that occurred in these two WGD events.</p>
<p>Following the hypothesis that body-methylated genes would be more functionally important than non-methylated genes (Coleman-Derr and Zilberman, <xref ref-type="bibr" rid="B10">2012</xref>), we examined the percentage of body-methylated genes in the positive and negative sample sets. We found 27.17% (122/449) flowering-time genes that were body-methylated, whereas only 8.49% (722/8503) genes were body-methylated in the negative sample set (Figure <xref ref-type="fig" rid="F2">2E</xref>). This result supports the view that changes in the epigenome are important in regulating the flowering time of plants (Yaish et al., <xref ref-type="bibr" rid="B45">2011</xref>).</p>
</sec>
<sec>
<title>Performance evaluation of RafSee in distinguishing positives and negatives</title>
<p>Using 766 statistically significant features with <italic>P</italic> &#x0003C; 0.05, we presented a novel integrative random forest-based gene prioritization method named RafSee, the prediction performance of which was evaluated with 10-fold cross validation and ROC analysis. In Figure <xref ref-type="fig" rid="F3">3A</xref> are shown the ROC curves of RafSee trained with 766 features, while Figure <xref ref-type="fig" rid="F3">3B</xref> presents the distribution of 10 AUC values generated from the 10-fold cross validation for RafSee that was trained with different sets of statistically significant features. We found that RafSee trained with 461 PCP-related statistically significant features had a mean AUC value of 0.84 (Figure <xref ref-type="fig" rid="F3">3B</xref>). In contrast, RafSee trained with 26 APAAC-, 20 PAAC-, or 255 AAC-related statistically significant features could more accurately distinguish positive and negative samples, as suggested by a higher mean AUC value of &#x0007E;0.87 (Figure <xref ref-type="fig" rid="F3">3B</xref>). The mean AUC value reached 0.89 when all these statistically significant features extracted from protein sequences were considered (Figure <xref ref-type="fig" rid="F3">3A</xref>). The mean AUC value can be further improved from 0.89 to 0.91 by integrating these protein sequence-based features with four additional features (one SC-related feature, two WGD-related features plus one methylation-related feature; Figure <xref ref-type="fig" rid="F3">3A</xref>).</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Performance of RafSee in distinguishing positives and negatives using 10-fold cross validation. (A)</bold> The ROC curves of 10-fold cross validation for RafSee trained with 766 statistically significant features. The dashed curves denote the ROC curves from the testing dataset in each round of 10-fold cross-validation. The solid curves represent the average curve of the 10 ROC curves. <bold>(B)</bold> Boxplot distribution of 10 AUC values of the 10-fold cross validation for RafSee trained with different sets of features. The APAAC, PAAC, AAC, and PCP, respectively indicated 26 APAAC-, 20 PAAC-, 255 AAC-, and 461 PCP-related statistically significant features extracted from protein sequences.</p></caption>
<graphic xlink:href="fpls-07-01914-g0003.tif"/>
</fig>
<p>Taken together, these results suggest that RafSee significantly outperformed the random selection (i.e., AUC &#x0003D; 0.5) in the identification of flowering-time genes.</p>
</sec>
<sec>
<title>Performance comparison of AraNet v2, RafSee, and RAP in the prioritization of flowering-time genes</title>
<p>The &#x0201C;leave-one-out&#x0201D; cross-validation experiment was first employed to evaluate the performance of the network-based gene prioritization system (AraNet v2). We found that 380 of 449 (84.63%) flowering-time genes can be prioritized by the AraNet v2 system (Table <xref ref-type="supplementary-material" rid="SM4">S4</xref>). For a fair comparison, these 380 flowering-time genes were also used to perform the &#x0201C;leave-one-out&#x0201D; cross-validation experiment for the other two gene prioritization methods (RafSee and RAP). We observed that genes tend to be ranked higher by AraNet v2 when they are connected to more known flowering-time genes in the network (Figure <xref ref-type="fig" rid="F4">4A</xref>). However, this trend was not observed for RafSee (Figure <xref ref-type="fig" rid="F4">4B</xref>). This is expected, as the AraNet v2 system uses the edge-based network properties for gene prioritization, while RafSee not. The agreement between the ranks of these 380 flowering-time genes prioritized by AraNet v2 and RafSee was very low, with the Spearman correlation coefficient of 0.31(Figure <xref ref-type="fig" rid="F4">4C</xref>; Table <xref ref-type="supplementary-material" rid="SM4">S4</xref>), and 33.94% (129/380) of the genes prioritized by RafSee had a higher rank than given by AraNet v2 (Figure <xref ref-type="fig" rid="F4">4C</xref>).</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Performance of three different gene prioritization methods for identifying flowering-time genes. (A)</bold> Relationships between gene rank and their connectivity with known flowering-time genes for AraNet v2. <bold>(B)</bold> Relationships between gene rank and their connectivity with known flowering-time genes for RafSee. <bold>(C)</bold> Pairwise comparison between gene ranks predicted by AraNet v2 and RafSee. Each symbol denotes a flowering-time gene, and its coordinates represent the ranks assigned by the corresponding two gene prioritization methods. The dashed diagonal line denotes a 1:1 correspondence. <bold>(D)</bold> Pairwise comparison between gene ranks predicted by AraNet v2 and RAP.</p></caption>
<graphic xlink:href="fpls-07-01914-g0004.tif"/>
</fig>
<p>These results indicate that the integrative random forest-based gene prioritization method (RafSee) could be used as a complement to the network-based gene prioritization method (AraNet v2). As such, it provided an opportunity for us to present a novel integrative approach (RAP) for improving gene prioritization by aggregating gene ranks produced by these two different gene prioritization methods. We found that RAP improved the rank of 53.68% (204 of 380) flowering-time genes (Figure <xref ref-type="fig" rid="F4">4D</xref>). We further evaluated the performance of three gene prioritization methods using different ranking statistics: namely the minimum, first quartile, median, third quartile, and maximum rank (Table <xref ref-type="table" rid="T2">2</xref>). For all these statistics, AraNet v2 had higher ranks than RafSee for identifying flowering-time genes. However, by utilizing the complement between these two gene prioritization methods, RAP obtained the best results for all these ranking statistics (except the maximum rank). For example, RAP obtained the first quartile rank of 90.5, whereas AraNet v2 and RafSee had corresponding values of 149.5 and 415.5, respectively. We note that feature selection is an important factor to affect the performance of RafSee and RAP. For example, using the fairly strict feature selection criteria of <italic>P</italic> &#x0003C; 0.01, RafSee and RAP showed a slightly decreased performance, corresponding to the first quartile rank of 419.75 and 116.5, respectively (Table <xref ref-type="supplementary-material" rid="SM5">S5</xref>). Even so, RAP still obtained the best results for the minimum, first quartile, and third quartile rank.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>Performance statistics for ranking flowering-time genes using different gene prioritization methods</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Methods</bold></th>
<th valign="top" align="center"><bold>Minimum</bold></th>
<th valign="top" align="center"><bold>First quartile</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
<th valign="top" align="center"><bold>Third quartile</bold></th>
<th valign="top" align="center"><bold>Maximum</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">RafSee</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">415.5</td>
<td valign="top" align="center">1908.5</td>
<td valign="top" align="center">5419.5</td>
<td valign="top" align="center">18678</td>
</tr>
<tr>
<td valign="top" align="left">AraNet v2</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center">149.5</td>
<td valign="top" align="center">830</td>
<td valign="top" align="center">3019.25</td>
<td valign="top" align="center"><bold>9817</bold></td>
</tr>
<tr>
<td valign="top" align="left">RAP</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center"><bold>90.5</bold></td>
<td valign="top" align="center"><bold>743</bold></td>
<td valign="top" align="center"><bold>2508.25</bold></td>
<td valign="top" align="center">12099</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Bold denotes the best method for the corresponding ranking criteria</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>These results demonstrate that the integrative analysis further improved the performance of single gene prioritization methods (i.e., AraNet v2 and RafSee).</p>
</sec>
<sec>
<title>Validation of the RAP-based gene prioritization with network analysis and evidence from the literature</title>
<p>With the input of 449 flowering-time genes, RAP was applied to rank the remaining 26 968 genes annotated in the TAIR10 database (Table <xref ref-type="supplementary-material" rid="SM5">S5</xref>). Further, network analysis revealed that the top 20 ranked genes connect with 150 known flowering-time genes in the AraNet v2 system, resulting in the generation of a hierarchical network that contains three modules and 418 functional associations (Figure <xref ref-type="fig" rid="F5">5</xref>; Table <xref ref-type="supplementary-material" rid="SM6">S6</xref>). This result indicates that the top 20 candidates identified by RAP might be functionally associated with flowering time in <italic>Arabidopsis</italic>. To validate the new candidate genes identified by the RAP method, we performed the linkage disequilibrium analysis of a flowering-time-related genome-wide association study dataset (Atwell et al., <xref ref-type="bibr" rid="B3">2010</xref>) using the TASSEL software (<ext-link ext-link-type="uri" xlink:href="http://www.maizegenetics.net/tassel">http://www.maizegenetics.net/tassel</ext-link>). The linkage disequilibrium plots also showed potentially functional associations of the top 20 ranked genes with flowering time in <italic>Arabidopsis</italic> (Figure <xref ref-type="supplementary-material" rid="SM10">S1</xref>). In addition, through a literature review, we found that nine of the top 20 candidates (AT2G25170, AT2G23760, AT1G21700, AT1G19220, AT4G36870, AT4G38130, AT1G28420, AT5G18620, and AT1G48410) have been recently demonstrated to have roles in the control of flowering time with phenotype experiments (Table <xref ref-type="supplementary-material" rid="SM7">S7</xref>). From these results, we conclude that RAP should be reliable and effective to prioritize large numbers of candidate genes in <italic>Arabidopsis</italic>.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>A hierarchical network of functional associations between the top 20 ranked genes and 449 known flowering-time genes</bold>.</p></caption>
<graphic xlink:href="fpls-07-01914-g0005.tif"/>
</fig>
</sec>
<sec>
<title>Evaluating the accuracy of RafSee, AraNet v2, and RAP with stress-related genes</title>
<p>We further performed the &#x0201C;leave-one-out&#x0201D; cross-validation experiment to evaluate the performance of AraNet v2, RafSee, and RAP in prioritizing <italic>Arabidopsis</italic> genes related to different abiotic stresses (salt, water, cold, and temperature). Stress-related genes were obtained from Gene Ontology (GO) database (<ext-link ext-link-type="uri" xlink:href="http://geneontology.org">http://geneontology.org</ext-link>) by exacting terms (i.e., response to salt/water/cold/temperature) annotated with an experimental evidence code IDA (inferred from direct assay), IEP (inferred from expression pattern), IGI (inferred from genetic interaction), IPI (inferred from physical interaction), and/or IMP (inferred from mutant phenotype). The six positive sample sets contained 388, 373, 289, and 238 genes that were mostly experimentally validated to be related to salt, temperature, cold, and water stresses, respectively.</p>
<p>Table <xref ref-type="table" rid="T3">3</xref> lists the evaluation results of AraNet v2, RafSee, and RAP in terms of five ranking criteria (the minimum, first quartile, median, third quartile, and maximum rank). In the prioritization of salt- and temperature-related genes, RAP outperformed RafSee and AraNet v2 for all these ranking statistics (except the maximum rank). While prioritizing cold-related genes, RafSee had the best result only for the first quartile rank, AraNet v2 had the best results for the third quartile and maximum rank, RAP had the best results for the minimum and first quartile rank. Tests on the water stress-related gene set showed that RAP outperformed RafSee and AraNet v2 in terms of both first quartile and median rank.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p><bold>Performance statistics for ranking stress-responsive genes using different gene prioritization methods</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Stress-responsive genes</bold></th>
<th valign="top" align="left"><bold>Methods</bold></th>
<th valign="top" align="center"><bold>Minimum</bold></th>
<th valign="top" align="center"><bold>First quartile</bold></th>
<th valign="top" align="center"><bold>Median</bold></th>
<th valign="top" align="center"><bold>Third quartile</bold></th>
<th valign="top" align="center"><bold>Maximum</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Response to salt (388 genes)</td>
<td valign="top" align="left">RafSee</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">1035.5</td>
<td valign="top" align="center">8928.75</td>
<td valign="top" align="center">8928.75</td>
<td valign="top" align="center">19202</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">AraNet v2</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">1088</td>
<td valign="top" align="center">6131.5</td>
<td valign="top" align="center">6131.5</td>
<td valign="top" align="center"><bold>11539</bold></td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">RAP</td>
<td valign="top" align="center"><bold>4</bold></td>
<td valign="top" align="center"><bold>1011.5</bold></td>
<td valign="top" align="center"><bold>6103</bold></td>
<td valign="top" align="center"><bold>6103</bold></td>
<td valign="top" align="center">13130</td>
</tr>
<tr>
<td valign="top" align="left">Response to temperature (373 genes)</td>
<td valign="top" align="left">RafSee</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center">1270</td>
<td valign="top" align="center">3520</td>
<td valign="top" align="center">8623</td>
<td valign="top" align="center">19550</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">AraNet v2</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center">1025</td>
<td valign="top" align="center">2668</td>
<td valign="top" align="center">5810</td>
<td valign="top" align="center"><bold>11319</bold></td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">RAP</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center"><bold>933</bold></td>
<td valign="top" align="center"><bold>2532</bold></td>
<td valign="top" align="center"><bold>5667</bold></td>
<td valign="top" align="center">13029</td>
</tr>
<tr>
<td valign="top" align="left">Response to cold (289 genes)</td>
<td valign="top" align="left">RafSee</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center"><bold>870</bold></td>
<td valign="top" align="center">3423</td>
<td valign="top" align="center">9165</td>
<td valign="top" align="center">21571</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">AraNet v2</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">1282</td>
<td valign="top" align="center">2832</td>
<td valign="top" align="center"><bold>5217</bold></td>
<td valign="top" align="center"><bold>10416</bold></td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">RAP</td>
<td valign="top" align="center"><bold>4</bold></td>
<td valign="top" align="center">916</td>
<td valign="top" align="center"><bold>2414</bold></td>
<td valign="top" align="center">5542</td>
<td valign="top" align="center">12355</td>
</tr>
<tr>
<td valign="top" align="left">Response to water (238 genes)</td>
<td valign="top" align="left">RafSee</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center">1026.25</td>
<td valign="top" align="center">3712.5</td>
<td valign="top" align="center">8260.25</td>
<td valign="top" align="center">22045</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">AraNet v2</td>
<td valign="top" align="center">18</td>
<td valign="top" align="center">754</td>
<td valign="top" align="center">2155</td>
<td valign="top" align="center"><bold>4569</bold></td>
<td valign="top" align="center"><bold>9908</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">RAP</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center"><bold>626.25</bold></td>
<td valign="top" align="center"><bold>1783</bold></td>
<td valign="top" align="center">5070.25</td>
<td valign="top" align="center">12007</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Bold denotes the best method for the corresponding ranking criteria</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>The number of available genome sequences and gene networks is steadily increasing in the field of plant biology. Network-based gene prioritization approaches has been widely applied to identify new genes involved in biological processes of interests (Li et al., <xref ref-type="bibr" rid="B24">2015</xref>), such as abiotic stress responses (Ma et al., <xref ref-type="bibr" rid="B27">2014</xref>; Sircar and Parekh, <xref ref-type="bibr" rid="B39">2015</xref>), secondary wall formation (Ruprecht et al., <xref ref-type="bibr" rid="B36">2011</xref>), glucosinolate secondary metabolism (Chan et al., <xref ref-type="bibr" rid="B5">2011</xref>), and plant growth (Sabaghian et al., <xref ref-type="bibr" rid="B37">2015</xref>). In this study, we presented an integrative random forest method called RafSee and a meta-analysis based approach called RAP to prioritize genes from a large set of candidates. We validated the predictive power through the &#x0201C;leave-one-out&#x0201D; cross-validation approach in five different case studies including flowering time and four stress-related studies (salt, cold, water, and temperature). All these studies showed that RafSee can be used as a complement to a current state-of-art network-based gene prioritization system (AraNet v2). Moreover, RAP can be used to improve the performance of the network-based gene prioritization system. We anticipate that RAP will accelerate the discovery of genes involved in many biological processes and plant traits of interest.</p>
<p>RAP has several inherent advantages compared with the network-based gene prioritization methods. First, instead of using edge-based network properties, RafSee builds gene prioritization models using features exacted from protein sequences, evolutionary conservation, and epigenetic methylation marks. This allowed RafSee to rank 69 flowering-time genes that failed to be ranked by the AraNet v2 system (Table <xref ref-type="supplementary-material" rid="SM4">S4</xref>). Second, the order statistics-based meta-analysis approach can be used to effectively aggregate the rank of RafSee and the network-based gene prioritization system AraNet v2. While prioritizing flowering-time genes, RAP improved the performance of AraNet v2 from 149.5 to 90.5, resulting in an 39.46% improvement in term of the first quartile rank. Last, the RAP method has been implemented as an R package, providing a flexible framework for aggregating gene prioritizations from different types of biological networks. Besides the functional association networks (e.g., AraNet v2 and STRING), co-expression networks capture the functional relationships between genes solely from gene expression datasets, which can also be integrated in RAP for gene functional analysis in several crop species, including maize, rice, soybean and wheat (Mutwil et al., <xref ref-type="bibr" rid="B31">2011</xref>; Aoki et al., <xref ref-type="bibr" rid="B2">2016</xref>; Ruprecht et al., <xref ref-type="bibr" rid="B35">2016</xref>; Serin et al., <xref ref-type="bibr" rid="B38">2016</xref>).</p>
<p>Nonetheless, we are also aware of several limitations to our proposed method. First, feature selection is applied to select a set of informative features, which may affect the performance of RafSee and RAP (Table <xref ref-type="table" rid="T2">2</xref>, Table <xref ref-type="supplementary-material" rid="SM5">S5</xref>). Second, the power of RafSee and RAP is affected by the size of seed genes. We performed 140 simulation experiments to examine the performance of three gene prioritization methods trained with a varied size of seed genes (350, 300, 250, 200, 150, 100, and 50 randomly selected flowering genes; 10 replications per gene size). We found that RAP improved the performance of AraNet v2 in term of the third quartile rank in the majority of simulation experiments (73%; 73/100), when the size of seed genes was equal to or higher than 150. However, for the same statistic criteria, RAP improved the performance of AraNet v2 in only 35% (14/40) simulation experiments, when the size of seed genes is &#x0003C;150 (Supplementary Data <xref ref-type="supplementary-material" rid="SM9">2</xref>).</p>
<p>In the future, we plan to investigate the effectiveness of RAP in gene prioritization using different biological networks and machine learning algorithms. Finally, we want to expand the application of RAP from model species to crop species.</p>
</sec>
<sec id="s5">
<title>Author contributions</title>
<p>Designed the experiments: JZ, CM. Performed the experiments: JZ, HY, and LW. Analyzed the data: JZ, YT, CM, and HS. Wrote the paper: CM, JZ, and YT. All authors read and approved the final manuscript.</p>
</sec>
<sec id="s6">
<title>Funding</title>
<p>This work was supported by the National Natural Science Foundation of China (31570371), the Agricultural Science and Technology Innovation and Research Project of Shaanxi Province, China (2015NY011), and the Fund of Northwest A&#x00026;F University.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<sec sec-type="supplementary-material" id="s7">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://journal.frontiersin.org/article/10.3389/fpls.2016.01914/full#supplementary-material">http://journal.frontiersin.org/article/10.3389/fpls.2016.01914/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table1.XLSX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S1</label>
<caption><p><bold>The 449 known flowering time-related genes</bold>.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table2.XLSX" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S2</label>
<caption><p><bold>The 533 physicochemical properties of 20 amino acids</bold>.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table3.XLSX" id="SM3" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S3</label>
<caption><p><bold>Significance estimation of 1012 features generated with different encoding schemes</bold>.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table4.XLSX" id="SM4" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S4</label>
<caption><p><bold>The &#x0201C;leave-one-out&#x0201D; cross-validation results of collected flowering-time genes using different gene prioritization methods</bold>.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table5.XLSX" id="SM5" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S5</label>
<caption><p><bold>Performance statistics for ranking 380 flowering-time genes using different gene prioritization methods with fairly strict feature selection criteria of <italic><bold>P</bold></italic> &#x0003C; 0.01</bold>.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table6.xlsx" id="SM6" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S6</label>
<caption><p><bold>Functional associations between top 20 ranked genes and known flowering-time genes</bold>.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table7.XLSX" id="SM7" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S7</label>
<caption><p><bold>Interpretation of the top 20 ranked genes related to flowering time in <italic><bold>Arabidopsis</bold></italic>. The top 20 genes identified by RAP, and their potential roles in flowering time from a survey of the literature</bold>.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="DataSheet1.DOCX" id="SM8" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Data 1</label>
<caption><p><bold>Definition of pseudo amino acid composition (PAAC) and amphiphilic pseudo amino acid composition (APAAC)</bold>.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="DataSheet2.XLSX" id="SM9" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Data 2</label>
<caption><p><bold>Performance of three gene prioritization methods affected by the size of seed genes</bold>.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Image1.PDF" id="SM10" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S1</label>
<caption><p><bold>Linkage disequilibrium plots for top 20 candidate genes</bold>.</p></caption>
</supplementary-material>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aerts</surname> <given-names>S.</given-names></name> <name><surname>Lambrechts</surname> <given-names>D.</given-names></name> <name><surname>Maity</surname> <given-names>S.</given-names></name> <name><surname>Van Loo</surname> <given-names>P.</given-names></name> <name><surname>Coessens</surname> <given-names>B.</given-names></name> <name><surname>De Smet</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Gene prioritization through genomic data fusion</article-title>. <source>Nat. Biotechnol.</source> <volume>24</volume>, <fpage>537</fpage>&#x02013;<lpage>544</lpage>. <pub-id pub-id-type="doi">10.1038/nbt1203</pub-id><pub-id pub-id-type="pmid">16680138</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aoki</surname> <given-names>Y.</given-names></name> <name><surname>Okamura</surname> <given-names>Y.</given-names></name> <name><surname>Tadaka</surname> <given-names>S.</given-names></name> <name><surname>Kinoshita</surname> <given-names>K.</given-names></name> <name><surname>Obayashi</surname> <given-names>T.</given-names></name></person-group> (<year>2016</year>). <article-title>ATTED-II in 2016: a plant coexpression database towards lineage-specific coexpression</article-title>. <source>Plant Cell Physiol.</source> <volume>57</volume>:<fpage>e5</fpage>. <pub-id pub-id-type="doi">10.1093/pcp/pcv165</pub-id><pub-id pub-id-type="pmid">26546318</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Atwell</surname> <given-names>S.</given-names></name> <name><surname>Huang</surname> <given-names>Y. S.</given-names></name> <name><surname>Vilhjalmsson</surname> <given-names>B. J.</given-names></name> <name><surname>Willems</surname> <given-names>G.</given-names></name> <name><surname>Horton</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Genome-wide association study of 107 phenotypes in <italic>Arabidopsis thaliana</italic> inbred lines</article-title>. <source>Nature</source> <volume>465</volume>, <fpage>627</fpage>&#x02013;<lpage>631</lpage>. <pub-id pub-id-type="doi">10.1038/nature08800</pub-id><pub-id pub-id-type="pmid">20336072</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bowers</surname> <given-names>J. E.</given-names></name> <name><surname>Chapman</surname> <given-names>B. A.</given-names></name> <name><surname>Rong</surname> <given-names>J.</given-names></name> <name><surname>Paterson</surname> <given-names>A. H.</given-names></name></person-group> (<year>2003</year>). <article-title>Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events</article-title>. <source>Nature</source> <volume>422</volume>, <fpage>433</fpage>&#x02013;<lpage>438</lpage>. <pub-id pub-id-type="doi">10.1038/nature01521</pub-id><pub-id pub-id-type="pmid">12660784</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chan</surname> <given-names>E. K.</given-names></name> <name><surname>Rowe</surname> <given-names>H. C.</given-names></name> <name><surname>Corwin</surname> <given-names>J. A.</given-names></name> <name><surname>Joseph</surname> <given-names>B.</given-names></name> <name><surname>Kliebenstein</surname> <given-names>D. J.</given-names></name></person-group> (<year>2011</year>). <article-title>Combining genome-wide association mapping and transcriptional networks to identify novel genes controlling glucosinolates in <italic>Arabidopsis thaliana</italic></article-title>. <source>PLoS Biol.</source> <volume>9</volume>:<fpage>e1001125</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pbio.1001125</pub-id><pub-id pub-id-type="pmid">21857804</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chaudhary</surname> <given-names>P.</given-names></name> <name><surname>Naganathan</surname> <given-names>A. N.</given-names></name> <name><surname>Gromiha</surname> <given-names>M. M.</given-names></name></person-group> (<year>2015</year>). <article-title>Folding RaCe: a robust method for predicting changes in protein folding rates upon point mutations</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>2091</fpage>&#x02013;<lpage>2097</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv091</pub-id><pub-id pub-id-type="pmid">25686635</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>C.</given-names></name> <name><surname>Declerck</surname> <given-names>G.</given-names></name> <name><surname>Tian</surname> <given-names>F.</given-names></name> <name><surname>Spooner</surname> <given-names>W.</given-names></name> <name><surname>McCouch</surname> <given-names>S.</given-names></name> <name><surname>Buckler</surname> <given-names>E.</given-names></name></person-group> (<year>2012</year>). <article-title>PICARA, an analytical pipeline providing probabilistic inference about a priori candidates genes underlying genome-wide association QTL in plants</article-title>. <source>PLoS ONE</source> <volume>7</volume>:<fpage>e46596</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0046596</pub-id><pub-id pub-id-type="pmid">23144785</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chou</surname> <given-names>K. C.</given-names></name></person-group> (<year>2001</year>). <article-title>Prediction of protein cellular attributes using pseudo-amino acid composition</article-title>. <source>Proteins</source> <volume>43</volume>, <fpage>246</fpage>&#x02013;<lpage>255</lpage>. <pub-id pub-id-type="doi">10.1002/prot.1035</pub-id><pub-id pub-id-type="pmid">11288174</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cid</surname> <given-names>H.</given-names></name> <name><surname>Bunster</surname> <given-names>M.</given-names></name> <name><surname>Canales</surname> <given-names>M.</given-names></name> <name><surname>Gazitua</surname> <given-names>F.</given-names></name></person-group> (<year>1992</year>). <article-title>Hydrophobicity and structural classes in proteins</article-title>. <source>Protein Eng.</source> <volume>5</volume>, <fpage>373</fpage>&#x02013;<lpage>375</lpage>. <pub-id pub-id-type="doi">10.1093/protein/5.5.373</pub-id><pub-id pub-id-type="pmid">1518784</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coleman-Derr</surname> <given-names>D.</given-names></name> <name><surname>Zilberman</surname> <given-names>D.</given-names></name></person-group> (<year>2012</year>). <article-title>Deposition of histone variant H2A.Z within gene bodies regulates responsive genes</article-title>. <source>PLoS Genet.</source> <volume>8</volume>:<fpage>e1002988</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pgen.1002988</pub-id><pub-id pub-id-type="pmid">23071449</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cornette</surname> <given-names>J. L.</given-names></name> <name><surname>Cease</surname> <given-names>K. B.</given-names></name> <name><surname>Margalit</surname> <given-names>H.</given-names></name> <name><surname>Spouge</surname> <given-names>J. L.</given-names></name> <name><surname>Berzofsky</surname> <given-names>J. A.</given-names></name> <name><surname>Delisi</surname> <given-names>C.</given-names></name></person-group> (<year>1987</year>). <article-title>Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins</article-title>. <source>J. Mol. Biol.</source> <volume>195</volume>, <fpage>659</fpage>&#x02013;<lpage>685</lpage>. <pub-id pub-id-type="doi">10.1016/0022-2836(87)90189-6</pub-id><pub-id pub-id-type="pmid">3656427</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guy</surname> <given-names>H. R.</given-names></name></person-group> (<year>1985</year>). <article-title>Amino acid side-chain partition energies and distribution of residues in soluble proteins</article-title>. <source>Biophys. J.</source> <volume>47</volume>, <fpage>61</fpage>&#x02013;<lpage>70</lpage>. <pub-id pub-id-type="doi">10.1016/S0006-3495(85)83877-7</pub-id><pub-id pub-id-type="pmid">3978191</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jensen</surname> <given-names>L. J.</given-names></name> <name><surname>Kuhn</surname> <given-names>M.</given-names></name> <name><surname>Stark</surname> <given-names>M.</given-names></name> <name><surname>Chaffron</surname> <given-names>S.</given-names></name> <name><surname>Creevey</surname> <given-names>C.</given-names></name> <name><surname>Muller</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>STRING 8&#x02013;a global view on proteins and their functional interactions in 630 organisms</article-title>. <source>Nucleic Acids Res.</source> <volume>37</volume>, <fpage>D412</fpage>&#x02013;<lpage>D416</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkn760</pub-id><pub-id pub-id-type="pmid">18940858</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jeong</surname> <given-names>J. H.</given-names></name> <name><surname>Song</surname> <given-names>H. R.</given-names></name> <name><surname>Ko</surname> <given-names>J. H.</given-names></name> <name><surname>Jeong</surname> <given-names>Y. M.</given-names></name> <name><surname>Kwon</surname> <given-names>Y. E.</given-names></name> <name><surname>Seol</surname> <given-names>J. H.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Repression of FLOWERING LOCUS T chromatin by functionally redundant histone H3 lysine 4 demethylases in <italic>Arabidopsis</italic></article-title>. <source>PLoS ONE</source> <volume>4</volume>:<fpage>e8033</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0008033</pub-id><pub-id pub-id-type="pmid">19946624</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kazan</surname> <given-names>K.</given-names></name> <name><surname>Lyons</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>The link between flowering time and stress tolerance</article-title>. <source>J. Exp. Bot.</source> <volume>67</volume>, <fpage>47</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1093/jxb/erv441</pub-id><pub-id pub-id-type="pmid">26428061</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kolde</surname> <given-names>R.</given-names></name> <name><surname>Laur</surname> <given-names>S.</given-names></name> <name><surname>Adler</surname> <given-names>P.</given-names></name> <name><surname>Vilo</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>Robust rank aggregation for gene list integration and meta-analysis</article-title>. <source>Bioinformatics</source> <volume>28</volume>, <fpage>573</fpage>&#x02013;<lpage>580</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btr709</pub-id><pub-id pub-id-type="pmid">22247279</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kutmon</surname> <given-names>M.</given-names></name> <name><surname>Riutta</surname> <given-names>A.</given-names></name> <name><surname>Nunes</surname> <given-names>N.</given-names></name> <name><surname>Hanspers</surname> <given-names>K.</given-names></name> <name><surname>Willighagen</surname> <given-names>E. L.</given-names></name> <name><surname>Bohler</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>WikiPathways: capturing the full diversity of pathway knowledge</article-title>. <source>Nucleic Acids Res.</source> <volume>44</volume>, <fpage>D488</fpage>&#x02013;<lpage>D494</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv1024</pub-id><pub-id pub-id-type="pmid">26481357</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>B. J.</given-names></name> <name><surname>Shin</surname> <given-names>M. S.</given-names></name> <name><surname>Oh</surname> <given-names>Y. J.</given-names></name> <name><surname>Oh</surname> <given-names>H. S.</given-names></name> <name><surname>Ryu</surname> <given-names>K. H.</given-names></name></person-group> (<year>2009</year>). <article-title>Identification of protein functions using a machine-learning approach based on sequence-derived properties</article-title>. <source>Proteome Sci.</source> <volume>7</volume>:<fpage>27</fpage>. <pub-id pub-id-type="doi">10.1186/1477-5956-7-27</pub-id><pub-id pub-id-type="pmid">19664241</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>I.</given-names></name> <name><surname>Ambaru</surname> <given-names>B.</given-names></name> <name><surname>Thakkar</surname> <given-names>P.</given-names></name> <name><surname>Marcotte</surname> <given-names>E. M.</given-names></name> <name><surname>Rhee</surname> <given-names>S. Y.</given-names></name></person-group> (<year>2010</year>). <article-title>Rational association of genes with traits using a genome-scale gene network for <italic>Arabidopsis thaliana</italic></article-title>. <source>Nat. Biotechnol.</source> <volume>28</volume>, <fpage>149</fpage>&#x02013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.1603</pub-id><pub-id pub-id-type="pmid">20118918</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>I.</given-names></name> <name><surname>Seo</surname> <given-names>Y. S.</given-names></name> <name><surname>Coltrane</surname> <given-names>D.</given-names></name> <name><surname>Hwang</surname> <given-names>S.</given-names></name> <name><surname>Oh</surname> <given-names>T.</given-names></name> <name><surname>Marcotte</surname> <given-names>E. M.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Genetic dissection of the biotic stress response using a genome-scale gene network for rice</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A.</source> <volume>108</volume>, <fpage>18548</fpage>&#x02013;<lpage>18553</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1110384108</pub-id><pub-id pub-id-type="pmid">22042862</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>T.</given-names></name> <name><surname>Oh</surname> <given-names>T.</given-names></name> <name><surname>Yang</surname> <given-names>S.</given-names></name> <name><surname>Shin</surname> <given-names>J.</given-names></name> <name><surname>Hwang</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>C. Y.</given-names></name> <etal/></person-group>. (<year>2015a</year>). <article-title>RiceNet v2: an improved network prioritization server for rice genes</article-title>. <source>Nucleic Acids Res.</source> <volume>43</volume>, <fpage>W122</fpage>&#x02013;<lpage>W127</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv253</pub-id><pub-id pub-id-type="pmid">25813048</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>T.</given-names></name> <name><surname>Yang</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>E.</given-names></name> <name><surname>Ko</surname> <given-names>Y.</given-names></name> <name><surname>Hwang</surname> <given-names>S.</given-names></name> <name><surname>Shin</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2015b</year>). <article-title>AraNet v2: an improved database of co-functional gene networks for the study of <italic>Arabidopsis thaliana</italic> and 27 other nonmodel plant species</article-title>. <source>Nucleic Acids Res.</source> <volume>43</volume>, <fpage>D996</fpage>&#x02013;<lpage>D1002</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gku1053</pub-id><pub-id pub-id-type="pmid">25355510</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Kierczak</surname> <given-names>M.</given-names></name> <name><surname>Shen</surname> <given-names>X.</given-names></name> <name><surname>Ahsan</surname> <given-names>M.</given-names></name> <name><surname>Carlborg</surname> <given-names>O.</given-names></name> <name><surname>Marklund</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>PASE: a novel method for functional prediction of amino acid substitutions based on physicochemical properties</article-title>. <source>Front. Genet.</source> <volume>4</volume>:<fpage>21</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2013.00021</pub-id><pub-id pub-id-type="pmid">23508070</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Pearl</surname> <given-names>S. A.</given-names></name> <name><surname>Jackson</surname> <given-names>S. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Gene networks in plant biology: approaches in reconstruction and analysis</article-title>. <source>Trends Plant Sci.</source> <volume>20</volume>, <fpage>664</fpage>&#x02013;<lpage>675</lpage>. <pub-id pub-id-type="doi">10.1016/j.tplants.2015.06.013</pub-id><pub-id pub-id-type="pmid">26440435</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Libbrecht</surname> <given-names>M. W.</given-names></name> <name><surname>Noble</surname> <given-names>W. S.</given-names></name></person-group> (<year>2015</year>). <article-title>Machine learning applications in genetics and genomics</article-title>. <source>Nat. Rev. Genet.</source> <volume>16</volume>, <fpage>321</fpage>&#x02013;<lpage>332</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3920</pub-id><pub-id pub-id-type="pmid">25948244</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lloyd</surname> <given-names>J. P.</given-names></name> <name><surname>Seddon</surname> <given-names>A. E.</given-names></name> <name><surname>Moghe</surname> <given-names>G. D.</given-names></name> <name><surname>Simenc</surname> <given-names>M. C.</given-names></name> <name><surname>Shiu</surname> <given-names>S. H.</given-names></name></person-group> (<year>2015</year>). <article-title>Characteristics of plant essential genes allow for within- and between-species prediction of lethal mutant phenotypes</article-title>. <source>Plant Cell.</source> <volume>27</volume>, <fpage>2133</fpage>&#x02013;<lpage>2147</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.15.00051</pub-id><pub-id pub-id-type="pmid">26286535</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>C.</given-names></name> <name><surname>Xin</surname> <given-names>M.</given-names></name> <name><surname>Feldmann</surname> <given-names>K. A.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name></person-group> (<year>2014</year>). <article-title>Machine learning-based differential network analysis: a study of stress-responsive transcriptomes in <italic>Arabidopsis</italic></article-title>. <source>Plant Cell</source> <volume>26</volume>, <fpage>520</fpage>&#x02013;<lpage>537</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.113.121913</pub-id><pub-id pub-id-type="pmid">24520154</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mallick</surname> <given-names>P.</given-names></name> <name><surname>Schirle</surname> <given-names>M.</given-names></name> <name><surname>Chen</surname> <given-names>S. S.</given-names></name> <name><surname>Flory</surname> <given-names>M. R.</given-names></name> <name><surname>Lee</surname> <given-names>H.</given-names></name> <name><surname>Martin</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>Computational prediction of proteotypic peptides for quantitative proteomics</article-title>. <source>Nat. Biotechnol.</source> <volume>25</volume>, <fpage>125</fpage>&#x02013;<lpage>131</lpage>. <pub-id pub-id-type="doi">10.1038/nbt1275</pub-id><pub-id pub-id-type="pmid">17195840</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meek</surname> <given-names>J. L.</given-names></name> <name><surname>Rossetti</surname> <given-names>Z. L.</given-names></name></person-group> (<year>1981</year>). <article-title>Factors affecting retention and resolution of peptides in high-performance liquid chromatography</article-title>. <source>J. Chromatogr.</source> <volume>211</volume>, <fpage>15</fpage>&#x02013;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1016/S0021-9673(00)81169-3</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moreau</surname> <given-names>Y.</given-names></name> <name><surname>Tranchevent</surname> <given-names>L. C.</given-names></name></person-group> (<year>2012</year>). <article-title>Computational tools for prioritizing candidate genes: boosting disease gene discovery</article-title>. <source>Nat. Rev. Genet.</source> <volume>13</volume>, <fpage>523</fpage>&#x02013;<lpage>536</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3253</pub-id><pub-id pub-id-type="pmid">22751426</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mutwil</surname> <given-names>M.</given-names></name> <name><surname>Klie</surname> <given-names>S.</given-names></name> <name><surname>Tohge</surname> <given-names>T.</given-names></name> <name><surname>Giorgi</surname> <given-names>F. M.</given-names></name> <name><surname>Wilkins</surname> <given-names>O.</given-names></name> <name><surname>Campbell</surname> <given-names>N. M.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>PlaNet: combined sequence and expression comparsions across plant networks derived from seven species</article-title>. <source>Plant Cell</source> <volume>23</volume>, <fpage>895</fpage>&#x02013;<lpage>910</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.111.083667</pub-id><pub-id pub-id-type="pmid">21441431</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nozaki</surname> <given-names>Y.</given-names></name> <name><surname>Tanford</surname> <given-names>C.</given-names></name></person-group> (<year>1971</year>). <article-title>The solubility of amino acids and two glycine peptides in aqueous ethanol and dioxane solutions. Establishment of a hydrophobicity scale</article-title>. <source>J. Biol. Chem.</source> <volume>246</volume>, <fpage>2211</fpage>&#x02013;<lpage>2217</lpage>. <pub-id pub-id-type="pmid">5555568</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perez-Iratxeta</surname> <given-names>C.</given-names></name> <name><surname>Bork</surname> <given-names>P.</given-names></name> <name><surname>Andrade</surname> <given-names>M. A.</given-names></name></person-group> (<year>2002</year>). <article-title>Association of genes to genetically inherited diseases using data mining</article-title>. <source>Nat. Genet.</source> <volume>31</volume>, <fpage>316</fpage>&#x02013;<lpage>319</lpage>. <pub-id pub-id-type="doi">10.1038/ng895</pub-id><pub-id pub-id-type="pmid">12006977</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rhee</surname> <given-names>S. Y.</given-names></name> <name><surname>Mutwil</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>Towards revealing the functions of all genes in plants</article-title>. <source>Trends Plant Sci.</source> <volume>19</volume>, <fpage>212</fpage>&#x02013;<lpage>221</lpage>. <pub-id pub-id-type="doi">10.1016/j.tplants.2013.10.006</pub-id><pub-id pub-id-type="pmid">24231067</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ruprecht</surname> <given-names>C.</given-names></name> <name><surname>Mendrinna</surname> <given-names>A.</given-names></name> <name><surname>Tohge</surname> <given-names>T.</given-names></name> <name><surname>Sampathkumar</surname> <given-names>A.</given-names></name> <name><surname>Klie</surname> <given-names>S.</given-names></name> <name><surname>Fernie</surname> <given-names>A. R.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>FamNet: a framework to identify multiplied modules driving pathway expansion in plants</article-title>. <source>Plant Physiol.</source> <volume>170</volume>, <fpage>1878</fpage>&#x02013;<lpage>1894</lpage>. <pub-id pub-id-type="doi">10.1104/pp.15.01281</pub-id><pub-id pub-id-type="pmid">26754669</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ruprecht</surname> <given-names>C.</given-names></name> <name><surname>Mutwil</surname> <given-names>M.</given-names></name> <name><surname>Saxe</surname> <given-names>F.</given-names></name> <name><surname>Eder</surname> <given-names>M.</given-names></name> <name><surname>Nikoloski</surname> <given-names>Z.</given-names></name> <name><surname>Persson</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Large-scale co-expression approach to dissect secondary cell wall formation across plant species</article-title>. <source>Front. Plant Sci.</source> <volume>2</volume>:<fpage>23</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2011.00023</pub-id><pub-id pub-id-type="pmid">22639584</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sabaghian</surname> <given-names>E.</given-names></name> <name><surname>Drebert</surname> <given-names>Z.</given-names></name> <name><surname>Inze</surname> <given-names>D.</given-names></name> <name><surname>Saeys</surname> <given-names>Y.</given-names></name></person-group> (<year>2015</year>). <article-title>An integrated network of Arabidopsis growth regulators and its use for gene prioritization</article-title>. <source>Sci. Rep.</source> <volume>5</volume>:<fpage>17617</fpage>. <pub-id pub-id-type="doi">10.1038/srep17617</pub-id><pub-id pub-id-type="pmid">26620795</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Serin</surname> <given-names>E. A.</given-names></name> <name><surname>Nijveen</surname> <given-names>H.</given-names></name> <name><surname>Hilhorst</surname> <given-names>H. W.</given-names></name> <name><surname>Ligterink</surname> <given-names>W.</given-names></name></person-group> (<year>2016</year>). <article-title>Learning from co-expression networks: possibilities and challenges</article-title>. <source>Front. Plant Sci.</source> <volume>7</volume>:<fpage>444</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2016.00444</pub-id><pub-id pub-id-type="pmid">27092161</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sircar</surname> <given-names>S.</given-names></name> <name><surname>Parekh</surname> <given-names>N.</given-names></name></person-group> (<year>2015</year>). <article-title>Functional characterization of drought-responsive modules and genes in <italic>Oryza sativa</italic>: a network-based approach</article-title>. <source>Front. Genet.</source> <volume>6</volume>:<fpage>256</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2015.00256</pub-id><pub-id pub-id-type="pmid">26284112</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sweet</surname> <given-names>R. M.</given-names></name> <name><surname>Eisenberg</surname> <given-names>D.</given-names></name></person-group> (<year>1983</year>). <article-title>Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure</article-title>. <source>J. Mol. Biol.</source> <volume>171</volume>, <fpage>479</fpage>&#x02013;<lpage>488</lpage>. <pub-id pub-id-type="doi">10.1016/0022-2836(83)90041-4</pub-id><pub-id pub-id-type="pmid">6663622</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Takuno</surname> <given-names>S.</given-names></name> <name><surname>Gaut</surname> <given-names>B. S.</given-names></name></person-group> (<year>2012</year>). <article-title>Body-methylated genes in <italic>Arabidopsis thaliana</italic> are functionally important and evolve slowly</article-title>. <source>Mol. Biol. Evol.</source> <volume>29</volume>, <fpage>219</fpage>&#x02013;<lpage>227</lpage>. <pub-id pub-id-type="doi">10.1093/molbev/msr188</pub-id><pub-id pub-id-type="pmid">21813466</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Touw</surname> <given-names>W. G.</given-names></name> <name><surname>Bayjanov</surname> <given-names>J. R.</given-names></name> <name><surname>Overmars</surname> <given-names>L.</given-names></name> <name><surname>Backus</surname> <given-names>L.</given-names></name> <name><surname>Boekhorst</surname> <given-names>J.</given-names></name> <name><surname>Wels</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Data mining in the life sciences with random forest: a walk in the park or lost in the jungle?</article-title> <source>Brief. Bioinform.</source> <volume>14</volume>, <fpage>315</fpage>&#x02013;<lpage>326</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbs034</pub-id><pub-id pub-id-type="pmid">22786785</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tranchevent</surname> <given-names>L. C.</given-names></name> <name><surname>Capdevila</surname> <given-names>F. B.</given-names></name> <name><surname>Nitsch</surname> <given-names>D.</given-names></name> <name><surname>De Moor</surname> <given-names>B.</given-names></name> <name><surname>De Causmaecker</surname> <given-names>P.</given-names></name> <name><surname>Moreau</surname> <given-names>Y.</given-names></name></person-group> (<year>2011</year>). <article-title>A guide to web tools to prioritize candidate genes</article-title>. <source>Brief. Bioinform.</source> <volume>12</volume>, <fpage>22</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbq007</pub-id><pub-id pub-id-type="pmid">21278374</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Warde-Farley</surname> <given-names>D.</given-names></name> <name><surname>Donaldson</surname> <given-names>S. L.</given-names></name> <name><surname>Comes</surname> <given-names>O.</given-names></name> <name><surname>Zuberi</surname> <given-names>K.</given-names></name> <name><surname>Badrawi</surname> <given-names>R.</given-names></name> <name><surname>Chao</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function</article-title>. <source>Nucleic Acids Res.</source> <volume>38</volume>, <fpage>W214</fpage>&#x02013;<lpage>W220</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkq537</pub-id><pub-id pub-id-type="pmid">20576703</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yaish</surname> <given-names>M. W.</given-names></name> <name><surname>Colasanti</surname> <given-names>J.</given-names></name> <name><surname>Rothstein</surname> <given-names>S. J.</given-names></name></person-group> (<year>2011</year>). <article-title>The role of epigenetic processes in controlling flowering time in plants exposed to stress</article-title>. <source>J. Exp. Bot.</source> <volume>62</volume>, <fpage>3727</fpage>&#x02013;<lpage>3735</lpage>. <pub-id pub-id-type="doi">10.1093/jxb/err177</pub-id><pub-id pub-id-type="pmid">21633082</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yun</surname> <given-names>J.</given-names></name> <name><surname>Kim</surname> <given-names>Y. S.</given-names></name> <name><surname>Jung</surname> <given-names>J. H.</given-names></name> <name><surname>Seo</surname> <given-names>P. J.</given-names></name> <name><surname>Park</surname> <given-names>C. M.</given-names></name></person-group> (<year>2012</year>). <article-title>The AT-hook motif-containing protein AHL22 regulates flowering initiation by modifying FLOWERING LOCUS T chromatin in <italic>Arabidopsis</italic></article-title>. <source>J. Biol. Chem.</source> <volume>287</volume>, <fpage>15307</fpage>&#x02013;<lpage>15316</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.M111.318477</pub-id><pub-id pub-id-type="pmid">22442143</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Gou</surname> <given-names>M.</given-names></name> <name><surname>Liu</surname> <given-names>C. J.</given-names></name></person-group> (<year>2013</year>). <article-title>Arabidopsis Kelch repeat F-box proteins regulate phenylpropanoid biosynthesis via controlling the turnover of phenylalanine ammonia-lyase</article-title>. <source>Plant Cell</source> <volume>25</volume>, <fpage>4994</fpage>&#x02013;<lpage>5010</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.113.119644</pub-id><pub-id pub-id-type="pmid">24363316</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>H.</given-names></name> <name><surname>Zhou</surname> <given-names>Y.</given-names></name></person-group> (<year>2004</year>). <article-title>Quantifying the effect of burial of amino acid residues on protein stability</article-title>. <source>Proteins</source> <volume>54</volume>, <fpage>315</fpage>&#x02013;<lpage>322</lpage>. <pub-id pub-id-type="doi">10.1002/prot.10584</pub-id><pub-id pub-id-type="pmid">14696193</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>C.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Yu</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Integrating rare-variant testing, function prediction, and gene network in composite resequencing-based genome-wide association studies (CR-GWAS)</article-title>. <source>G3</source> <volume>1</volume>, <fpage>233</fpage>&#x02013;<lpage>243</lpage>. <pub-id pub-id-type="doi">10.1534/g3.111.000364</pub-id><pub-id pub-id-type="pmid">22384334</pub-id></citation>
</ref>
</ref-list>
</back>
</article>