<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2021.737194</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>StAR-Related Lipid Transfer (START) Domains Across the Rice Pangenome Reveal How Ontogeny Recapitulated Selection Pressures During Rice Domestication</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Mahtha</surname> <given-names>Sanjeet Kumar</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1288794/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Purama</surname> <given-names>Ravi Kiran</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Yadav</surname> <given-names>Gitanjali</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/432174/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Computational Biology Laboratory, National Institute of Plant Genome Research</institution>, <addr-line>New Delhi</addr-line>, <country>India</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Plant Sciences, University of Cambridge</institution>, <addr-line>Cambridge</addr-line>, <country>United Kingdom</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Zefeng Yang, Yangzhou University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Jie Qiu, Shanghai Normal University, China; Swarup Roy Choudhury, Indian Institute of Science Education and Research, Tirupati, India</p></fn>
<corresp id="c001">&#x002A;Correspondence: Gitanjali Yadav, <email>gy@nipgr.ac.in</email></corresp>
<fn fn-type="equal" id="fn002"><p><sup>&#x2020;</sup>These authors have contributed equally to this work</p></fn>
<fn fn-type="other" id="fn004"><p>This article was submitted to Plant Genomics, a section of the journal Frontiers in Genetics</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>08</day>
<month>09</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>737194</elocation-id>
<history>
<date date-type="received">
<day>06</day>
<month>07</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>08</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 Mahtha, Purama and Yadav.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Mahtha, Purama and Yadav</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>The StAR-related lipid transfer (START) domain containing proteins or START proteins, encoded by a plant amplified family of evolutionary conserved genes, play important roles in lipid binding, transport, signaling, and modulation of transcriptional activity in the plant kingdom, but there is limited information on their evolution, duplication, and associated sub- or neo-functionalization. Here we perform a comprehensive investigation of this family across the rice pangenome, using 10 wild and cultivated varieties. Conservation of START domains across all 10 rice genomes suggests low dispensability and critical functional roles for this family, further supported by chromosomal mapping, duplication and domain structure patterns. Analysis of synteny highlights a preponderance of segmental and dispersed duplication among STARTs, while transcriptomic investigation of the main cultivated variety <italic>Oryza sativa</italic> var. <italic>japonica</italic> reveals sub-functionalization amongst genes family members in terms of preferential expression across various developmental stages and anatomical parts, such as flowering. Ka/Ks ratios confirmed strong negative/purifying selection on START family evolution, implying that ontogeny recapitulated selection pressures during rice domestication. Our findings provide evidence for high conservation of START genes across rice varieties in numbers, as well as in their stringent regulation of Ka/Ks ratio, and showed strong functional dependency of plants on START proteins for their growth and reproductive development. We believe that our findings advance the limited knowledge about plant START domain diversity and evolution, and pave the way for more detailed assessment of individual structural classes of START proteins among plants and their domain specific substrate preferences, to complement existing studies in animals and yeast.</p>
</abstract>
<kwd-group>
<kwd>genome-wide identification</kwd>
<kwd>gene duplication</kwd>
<kwd>synteny</kwd>
<kwd>START domain</kwd>
<kwd><italic>Oryza</italic> species</kwd>
<kwd>gene expression</kwd>
<kwd>homeodomains</kwd>
</kwd-group>
<contract-sponsor id="cn001">Department of Biotechnology, Ministry of Science and Technology, India<named-content content-type="fundref-id">10.13039/501100001407</named-content></contract-sponsor>
<contract-sponsor id="cn002">Department of Biotechnology, Ministry of Science and Technology, India<named-content content-type="fundref-id">10.13039/501100001407</named-content></contract-sponsor>
<counts>
<fig-count count="7"/>
<table-count count="4"/>
<equation-count count="0"/>
<ref-count count="80"/>
<page-count count="21"/>
<word-count count="14923"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="S1">
<title>Introduction</title>
<p>The steroidogenic acute regulatory protein (StAR) related lipid transfer (START) domain was initially identified and named after mammalian StAR protein of 30 kDa, which binds to cholesterol (<xref ref-type="bibr" rid="B63">Stocco, 2001</xref>). START domains are evolutionarily conserved domains of approximately 200&#x2013;210 amino acids (<xref ref-type="bibr" rid="B66">Tsujishita and Hurley, 2000</xref>) and play a crucial role in the transfer of lipids/sterols, lipid signaling, and modulation of transcription activity (<xref ref-type="bibr" rid="B47">Ponting and Aravind, 1999</xref>; <xref ref-type="bibr" rid="B60">Soccio and Breslow, 2003</xref>). Presence of START domains across evolutionarily distant organisms indicates a conserved mechanism for protein-lipid/sterol interaction through hydrophobic pockets (<xref ref-type="bibr" rid="B24">Iyer et al., 2001</xref>). Interestingly, START domains are abundant in plants and often associated with homeodomain (HD) transcription factors, a feature unique to the plant kingdom (<xref ref-type="bibr" rid="B59">Schrick et al., 2004</xref>). For instance, 21 of the 90 HD family members identified in <italic>Arabidopsis</italic> possess START domains along with putative leucine zippers (<xref ref-type="bibr" rid="B55">Riechmann, 2002</xref>). Of these 21, 5 are from class III HD-ZIP subfamily and 16 are from class IV HD-ZIP subfamily (<xref ref-type="bibr" rid="B59">Schrick et al., 2004</xref>; <xref ref-type="bibr" rid="B6">Ariel et al., 2007</xref>).</p>
<p>The five genes from the class III HD-ZIP subfamily, namely PHB (phabulosa), PHV (phavoluta), REV (revoluta), CAN (corona) and ATHB8, have multiple and partially overlapping roles in development, including vasculature, organ polarity, and embryonic patterning of the shoot meristem (<xref ref-type="bibr" rid="B48">Prigge et al., 2005</xref>). In contrast, several members of the class IV HD-ZIP subfamily have roles in layer specific cell differentiation. ATML1 (<italic>Arabidopsis thaliana</italic> meristem layer 1) and PDF2 (protodermal factor 2) have putative roles in epidermal differentiation (<xref ref-type="bibr" rid="B36">Lu et al., 1996</xref>; <xref ref-type="bibr" rid="B1">Abe et al., 2003</xref>). GL2 (glabra 2) is required for the differentiation of epidermal cells in the shoot (<xref ref-type="bibr" rid="B54">Rerie et al., 1994</xref>), root (<xref ref-type="bibr" rid="B10">Di Cristina et al., 1996</xref>), and seed (<xref ref-type="bibr" rid="B73">Western et al., 2001</xref>). ROC1 (rice outer most cell-specific gene 1) of rice has similar function as ATML1, where its expression is limited to the outermost epidermal layer from the early embryogenesis (<xref ref-type="bibr" rid="B23">Ito et al., 2002</xref>). OSTF1 (<italic>Oryza sativa</italic> transcription factor 1) also preferentially expressed in epidermis, and developmentally regulated during early embryogenesis (<xref ref-type="bibr" rid="B74">Yang et al., 2002</xref>).</p>
<p>Since HD START proteins act as transcription factors in plants, a major expectation is that START, when it binds to sterol, regulates gene expression similar to steroid hormone receptors from animals, this mechanism would allow cell differentiation to be linked with lipid metabolism in plants (<xref ref-type="bibr" rid="B47">Ponting and Aravind, 1999</xref>; <xref ref-type="bibr" rid="B67">Venkata and Schirck, 2006</xref>). Plant START domains were shown to be required for transcription factor activity in class IV HD-ZIP protein &#x201C;GL2&#x201D; in <italic>Arabidopsis</italic>, and they were also found to have ligand-binding modules, similar to mammalian START domains (<xref ref-type="bibr" rid="B58">Schrick et al., 2014</xref>). Activated expression of HDG11 START domain confers drought tolerance with reduced stomatal density and improved root system in <italic>Arabidopsis</italic> (<xref ref-type="bibr" rid="B76">Yu et al., 2008</xref>).</p>
<p>Although START domains are amplified in plants and appear to have diverse functions, a thorough knowledge of the mechanism of amplification and gene duplication in this family is lacking. With the availability of many varietal genomes, and huge genotypic variation ranging from diploids to polyploids, <italic>Oryza</italic> has a long history of use as a model monocot food crop. Furthermore, with a wide evolutionary history that spans more than 15 million years, <italic>Oryza</italic> is an ideal prototype for such a study (<xref ref-type="bibr" rid="B8">Chatterjee, 1947</xref>; <xref ref-type="bibr" rid="B34">Li et al., 2014</xref>; <xref ref-type="bibr" rid="B62">Stein et al., 2018</xref>). Rice also has a major social significance, consumed by half the global population with an estimated 20% of human dietary calories that are met only by the domesticated Asian rice variety, thereby making it a target for improvement toward addressing the food security issue of a growing world population under a changing climate.</p>
<p>In this work, we focus on 10 diploid <italic>Oryza</italic> species, including three cultivated varieties having AA genotype, <italic>O. sativa</italic> var. <italic>indica</italic>, <italic>O. sativa</italic> var. <italic>japonica</italic> (Asian cultivated variety), along with <italic>Oryza glaberrima (African cultivated variety).</italic> Five of the seven wild <italic>Oryza</italic> species included in this analysis have AA genotype, namely <italic>Oryza rufipogon, Oryza nivara, Oryza barthii, Oryza glumaepatula</italic>, and <italic>Oryza meridionalis</italic>, while two others have BB and FF genotypes, <italic>Oryza punctata</italic> and <italic>Oryza brachyantha</italic>, respectively. Regardless of genotypes, all 10 species have enormous repeats, varying from one fourth to half the genome size (<xref ref-type="bibr" rid="B62">Stein et al., 2018</xref>). In general, repeat regions are accumulated with increasing evolutionary order from early-evolved wild relatives such as <italic>O. brachyantha</italic> and <italic>O. meridionalis</italic> (approximately 27&#x2013;29%) to the recent cultivated varieties <italic>O. sativa</italic> var. <italic>japonica</italic> and <italic>indica</italic> (approximately 40&#x2013;50%). <italic>O. punctata</italic> is an exception, despite being an early evolved wild species, has half of its genome containing repeats; resulting in a huge repertoire of synteny within the <italic>Oryza</italic> genome, varying from 90 to 97% (<xref ref-type="bibr" rid="B62">Stein et al., 2018</xref>). In addition, there is a gene flow among AA type <italic>Oryza</italic> genomes, which needs to be thoroughly investigated to understand the specific changes that occurred in the gene families (<xref ref-type="bibr" rid="B34">Li et al., 2014</xref>; <xref ref-type="bibr" rid="B62">Stein et al., 2018</xref>). The expanded gene family of START domains can be single or multi-domain (<xref ref-type="bibr" rid="B59">Schrick et al., 2004</xref>; <xref ref-type="bibr" rid="B3">Alpy and Tomasetto, 2005</xref>), and has been reported to associate with several other domains such as homeodomain, MEKHLA, and PH (pleckstrin homology) domains, known for their involvement in transcription regulation, sensing and signaling, respectively (<xref ref-type="bibr" rid="B47">Ponting and Aravind, 1999</xref>; <xref ref-type="bibr" rid="B59">Schrick et al., 2004</xref>; <xref ref-type="bibr" rid="B44">Mukherjee and B&#x00FC;rglin, 2006</xref>; <xref ref-type="bibr" rid="B67">Venkata and Schirck, 2006</xref>). Among the multi-domain START proteins, ligand binding by the START domain can modulate the activity of other domains that co-occur with START domains (<xref ref-type="bibr" rid="B47">Ponting and Aravind, 1999</xref>; <xref ref-type="bibr" rid="B24">Iyer et al., 2001</xref>; <xref ref-type="bibr" rid="B58">Schrick et al., 2014</xref>).</p>
<p>In this article, we aim to provide a comprehensive comparative genomic analysis of START genes across the 10 <italic>Oryza</italic> genomes, investigated all the way from identification and classification to sequence homology, genome-wide mapping, and duplication analysis of START genes. Available transcriptomic data for <italic>O. sativa</italic> var. <italic>japonica</italic> was investigated to understand co-expression patterns for potential sub- or neo-functionalization among these genes. Genome wide identification revealed a total of 249 START genes taking all 10 rice species together and showed that the gene family size for START genes varies from 22 to 28. Domain structure analysis (DSA) confirmed the presence of additional functional domains associated with STARTs such as HDs, MEKHLA, PH, and DUF1336 and classified the START proteins into total eight unique combinations based on associated domain patterns. Phylogenetics revealed the extent of divergence amongst START proteins and we find distinct clusters based on above-mentioned domain structure patterns. The genome-wide mapping showed that these genes are distributed among 11 chromosomes out of 12 in most of the cultivated and wild rice species. Gene duplication studies indicate that START genes preferred segmental and dispersed modes of duplication for gene expansion under natural selection. Hierarchical clustering of transcriptome data revealed many duplicated gene pairs have similar expression patterns across developmental stages and anatomy. In summary, this is comparative genomics of START genes across wild and cultivated rice and enhances our understanding of the mechanism of START gene amplification in plants.</p>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="S2.SS1">
<title>Data Collection</title>
<p>The complete genomic sequences, protein sequences, and annotation information of nine species of <italic>Oryza</italic>, including seven wild varieties <italic>O. brachyantha</italic> (<italic>Obra</italic><sub><italic>w</italic></sub>), <italic>O. punctata</italic> (<italic>Opun</italic><sub><italic>w</italic></sub>), <italic>O. meridionalis</italic> (<italic>Omer</italic><sub><italic>w</italic></sub>), <italic>O. glumaepatula</italic> (<italic>Oglu</italic><sub><italic>w</italic></sub>), <italic>O. barthii</italic> (<italic>Obar</italic><sub><italic>w</italic></sub>), <italic>O. nivara</italic> (<italic>Oniv</italic><sub><italic>w</italic></sub>), and <italic>O. rufipogon</italic> (<italic>Orup</italic><sub><italic>w</italic></sub>) along with two cultivated varieties <italic>O. glaberrima</italic> (<italic>Ogla</italic><sub><italic>c</italic></sub>) and <italic>O. sativa</italic> var. <italic>indica</italic> (<italic>Oind</italic><sub><italic>c</italic></sub>), were downloaded from Ensembl (<xref ref-type="bibr" rid="B28">Kersey et al., 2018</xref>). In addition, similar data for the main cultivated variety, <italic>O. sativa</italic> var. <italic>japonica</italic> (<italic>Ojap</italic><sub><italic>c</italic></sub>) was downloaded from the Phytozome v12 having the latest updated version of sequences (<xref ref-type="bibr" rid="B17">Goodstein et al., 2011</xref>). Throughout this work, these 10 species are referred to in subscripted <bold><italic>Oabc</italic><sub><italic>x</italic></sub></bold> format where <italic>abc</italic> represents first three letters of the species/subspecies name, while the subscript &#x201C;<italic>x</italic>&#x201D; is <italic>c or w</italic>, representing cultivated or wild nature, respectively.</p>
</sec>
<sec id="S2.SS2">
<title>Identification and Validation of START Proteins</title>
<p>Previously reviewed and characterized sequences of 109 START domain-containing proteins were collected from InterPro consortium (<xref ref-type="bibr" rid="B26">Jones et al., 2014</xref>). The START regions in these proteins were extracted based on annotated border residues, and sequence redundancy was removed at cut off 95% using CD-hit (<xref ref-type="bibr" rid="B21">Huang et al., 2010</xref>). The resulting 84 sequences were used to construct a profile Hidden Markov Model (HMM) with HMMER 3.2.1<sup><xref ref-type="fn" rid="footnote1">1</xref></sup> (<xref ref-type="bibr" rid="B11">Eddy, 1998</xref>; <xref ref-type="bibr" rid="B15">Finn et al., 2011</xref>). The profile was run against all 10 <italic>Oryza</italic> proteomes and short hits (sequence length &#x003C;100 residues) were discarded, followed by removal of redundancy, performed by filtering out all but the longest peptide for each protein. The validation of identified hits as START family proteins was performed using Conserved Domain Database (CDD) (<xref ref-type="bibr" rid="B40">Marchler-Bauer et al., 2014</xref>).</p>
</sec>
<sec id="S2.SS3">
<title>Domain Structure Analysis of START Domain Containing Proteins</title>
<p>The putative START domain containing proteins identified as described above were subjected to domain structural pattern analysis to ascertain additional domains associated with START. DSA was carried out using a web-based Batch CD-search Tool, selecting CDD (<xref ref-type="bibr" rid="B41">Marchler-Bauer et al., 2011</xref>). CDD includes curated data from NCBI (National Center for Biotechnology Information) (<xref ref-type="bibr" rid="B2">Agarwala et al., 2017</xref>) SMART (Simple Modular Architecture Research Tool) (<xref ref-type="bibr" rid="B33">Letunic et al., 2015</xref>) Pfam (protein families) database (<xref ref-type="bibr" rid="B14">Finn et al., 2014</xref>), PRK [PRotein K(c)lusters] (<xref ref-type="bibr" rid="B39">Maglott et al., 2011</xref>), COG (Clusters of Orthologous Groups of proteins) (<xref ref-type="bibr" rid="B65">Tatusov et al., 2003</xref>), and TIGRFAMs (The Institute for Genomic Research&#x2019;s database of protein families) (<xref ref-type="bibr" rid="B18">Haft et al., 2003</xref>). The additional associated domains, as identified in this step were used to classify rice START domains into various domain structural classes. Besides, transmembrane helical segments associated with START domains were predicted using TMHMM Server v. 2.0 (<xref ref-type="bibr" rid="B31">Krogh et al., 2001</xref>). The domain arrangement of START proteins was illustrated using IBS v.1.0 (<xref ref-type="bibr" rid="B35">Liu et al., 2015</xref>).</p>
</sec>
<sec id="S2.SS4">
<title>Gene Structure Analysis of START Coding Genes</title>
<p>Gene structure analysis (GSA) was carried out to understand the exon&#x2013;intron patterns for different classes of START encoding genes among 10 rice species. Gene structure was visualized using Gene Structure Display Server (GDSD) (<xref ref-type="bibr" rid="B20">Hu et al., 2015</xref>). The corresponding Gene and CoDing Sequence (CDS) of each START encoding protein were used as input for GSA. Visualizing the structure and annotated features of genes can help in understanding function and evolution intuitively. The visualization of gene features such as composition and position of exons and introns for genes offers visual presentation for integrating annotation for each conserved domain. Accordingly, we highlighted the exons coding for different types of functional domains across START proteins, which further enabled us to understand exon&#x2013;intron pattern across wild and cultivated rice genomes.</p>
</sec>
<sec id="S2.SS5">
<title>Genome-Wide Mapping and Identification of Homologous and Orthologous START Coding Genes Amongst 10 <italic>Oryza</italic> Species</title>
<p>In order to map the START coding genes onto rice chromosomes, gene location data was extracted from the respective GFF annotation files (general feature format), and karyotype information was extracted based on chromosomal length. Chromosomal visualization of genes in all 10 rice species was done using Circos (<xref ref-type="bibr" rid="B32">Krzywinski et al., 2009</xref>), colored by structural class. Orthologous START genes in nine <italic>Oryza</italic> species were identified in reference to <italic>Ojap</italic><sub><italic>c</italic></sub> by local protein BLAST, based on maximum identity and similarity.</p>
</sec>
<sec id="S2.SS6">
<title>Phylogenetic Analysis of Different Structural Classes of START Domain Containing Proteins</title>
<p>Phylogenetic analysis was carried out for different structural classes of START proteins across all 10 species, to explore intra- and inter-species divergence. All 249 full-length START proteins in the 10 <italic>Oryza</italic> genomes and 35 sequences from <italic>A. thaliana</italic> were included in the phylogenetic study. The available gene symbols are used in case of <italic>O. sativa</italic> var. <italic>japonica</italic> and <italic>A. thaliana.</italic> Multiple sequence alignment was performed using MUSCLE at default settings (<xref ref-type="bibr" rid="B38">Madeira et al., 2019</xref>). Aligned sequences were used for phylogenetic tree construction. The tree was generated through RAxML (raxmlGUI <italic>v</italic>2.0.5) (<xref ref-type="bibr" rid="B61">Stamatakis, 2014</xref>; <xref ref-type="bibr" rid="B12">Edler et al., 2021</xref>) using maximum likelihood method at bootstrap value of 1000 and the tree was visualized using Figtree v1.4.2 (<xref ref-type="bibr" rid="B52">Rambaut, 2014</xref>).<sup><xref ref-type="fn" rid="footnote2">2</xref></sup></p>
</sec>
<sec id="S2.SS7">
<title>Gene Duplications, Collinearity, and Nucleotide Substitution Rates</title>
<p>The MCScanX software package (<xref ref-type="bibr" rid="B71">Wang et al., 2012</xref>) was used to identify various duplication modes for START genes among <italic>Oryza</italic> species. This program works on the all-vs-all BLASTp results and this was performed for all 10 rice proteomes (<xref ref-type="bibr" rid="B4">Altschul et al., 1990</xref>). The results were fed into duplicate gene classifier, a module of MCScanX, to detect dispersed, proximal, tandem, and/or segmental duplications. The criteria used by the duplicate gene classifier for assignment of duplication modes were as follows: Initially, all genes were ranked in order of occurrence along the chromosome and labeled as singletons. Gene pairs were evaluated based on BLASTp hits, and pairs identified at a cut-off distance of 20 were re-labeled as &#x201C;dispersed duplicates.&#x201D; Gene pairs that showed gene rank difference of less than 20 were re-labeled as &#x201C;proximal duplicates&#x201D; while the gene pairs found next to each other (i.e., gene rank difference = 1), were re-labeled as &#x201C;tandem duplicates.&#x201D; Following this, collinear blocks within the individual plant genomes were identified, and anchor genes found in collinear blocks were re-labeled as &#x201C;segmental/WGD duplicates.&#x201D; Finally, all genes were assigned to different duplication modes based on the following order of priority, i.e., whole genome duplication (WGD) / segmental &#x003E; tandem &#x003E; proximal &#x003E; dispersed. Unduplicated genes (that occur only once in the genome) retained their original classification as &#x201C;singletons&#x201D; (<xref ref-type="bibr" rid="B71">Wang et al., 2012</xref>). Collinear blocks for all proteins within individual genomes were generated by MCScanX module (gray color links). START gene homologs within collinear blocks were highlighted using the previously described domain structure class colors. MCScanX-transposed (<xref ref-type="bibr" rid="B70">Wang et al., 2013</xref>) was used to find the newly trans-located START homologs from their original ancestral locations to a novel locus in <italic>Ojap</italic><sub><italic>c</italic></sub>. The START gene homologs obtained from the interspecies BLASTp between <italic>Ojap</italic><sub><italic>c</italic></sub> and the other nine <italic>Oryza</italic> genomes were analyzed for non-synonymous (Ka) and synonymous (Ks) substitution rates by KaKs calculator 2.0 (<xref ref-type="bibr" rid="B68">Wang et al., 2010</xref>).</p>
</sec>
<sec id="S2.SS8">
<title>Transcriptome Analysis and Hierarchical Clustering</title>
<p>Gene expression levels of 28 START genes in the major globally cultivated rice variety <italic>Ojap</italic><sub><italic>c</italic></sub> were investigated using RNA-seq data &#x201C;Os_mRNAseq_Rice_GL-0&#x201D; (MSU v7.0) on the Genevestigator platform (<xref ref-type="bibr" rid="B19">Hruz et al., 2008</xref>). The conditional search tool was used to analyze gene expression across nine developmental stages and 13 anatomical parts, and their log-transformed values were further arranged in hierarchical clustering groups based on Pearson correlation coefficients of START genes by selecting optimal leaf ordering for both, developmental stages and anatomical parts. The heatmaps were generated using Mev_v4.8 (<xref ref-type="bibr" rid="B56">Saeed et al., 2003</xref>).</p>
</sec>
</sec>
<sec sec-type="results" id="S3">
<title>Results</title>
<sec id="S3.SS1">
<title>Identification of START Genes Amongst Wild and Cultivated Varieties of Rice</title>
<p>The HMM profiles, constructed based on known sequences, were used to perform the hmmsearch against 10 <italic>Oryza</italic> proteomes (listed in <xref ref-type="table" rid="T1">Table 1</xref>) and only those hits were retained that matched the minimum length criteria, and were validated for the presence of START, as described in section &#x201C;Materials and Methods.&#x201D; This led to the identification of 360 START proteins (including protein isoforms), coded by 249 gene transcripts across the 10 species of rice. In order to remove redundancy, only the single longest protein coded by each set of gene transcripts was retained for downstream analysis. START coding genes were found to vary from 22 to 28 in these <italic>Oryza</italic> species as shown in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Identification and domain structure analysis of START proteins across cultivated and wild rice species.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Name of plants and (genotype)</td>
<td valign="top" align="center">Species code</td>
<td valign="top" align="center">Total genes in each genome</td>
<td valign="top" align="center">No. of START genes</td>
<td valign="top" align="center">HZSM</td>
<td valign="top" align="center">SM</td>
<td valign="top" align="center">HZS</td>
<td valign="top" align="center">HS</td>
<td valign="top" align="center">PSD</td>
<td valign="top" align="center">PS</td>
<td valign="top" align="center">SD</td>
<td valign="top" align="center">mS</td>
<td valign="top" align="center">TM containing START proteins</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="13">Cultivated rice species</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza sativa</italic> var. <italic>japonica</italic> (AA)</td>
<td valign="top" align="center"><italic>Ojap</italic><sub><italic>c</italic></sub></td>
<td valign="top" align="center">42,189</td>
<td valign="top" align="center">28</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">2</td>
<td/>
<td valign="top" align="center">1</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza sativa</italic> var. <italic>indica</italic> (AA)</td>
<td valign="top" align="center"><italic>Oind</italic><sub><italic>c</italic></sub></td>
<td valign="top" align="center">42,031</td>
<td valign="top" align="center">27</td>
<td valign="top" align="center">5</td>
<td/>
<td valign="top" align="center">2</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">3</td>
<td/>
<td valign="top" align="center">1</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza glaberrima</italic> (AA)</td>
<td valign="top" align="center"><italic>Ogla</italic><sub><italic>c</italic></sub></td>
<td valign="top" align="center">34,130</td>
<td valign="top" align="center">24</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">1</td>
<td/>
<td valign="top" align="center">2</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left" colspan="13">Wild rice species</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza rufipogon</italic> (AA)</td>
<td valign="top" align="center"><italic>Oruf</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">37,912</td>
<td valign="top" align="center">25</td>
<td valign="top" align="center">5</td>
<td/>
<td valign="top" align="center">2</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">2</td>
<td/>
<td/>
<td valign="top" align="center">7</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza nivara</italic> (AA)</td>
<td valign="top" align="center"><italic>Oniv</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">37,026</td>
<td valign="top" align="center">26</td>
<td valign="top" align="center">5</td>
<td/>
<td valign="top" align="center">2</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">4</td>
<td/>
<td/>
<td valign="top" align="center">6</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza barthii</italic> (AA)</td>
<td valign="top" align="center"><italic>Obar</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">35,553</td>
<td valign="top" align="center">23</td>
<td valign="top" align="center">5</td>
<td/>
<td valign="top" align="center">2</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">2</td>
<td/>
<td/>
<td valign="top" align="center">6</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza glumaepatula</italic> (AA)</td>
<td valign="top" align="center"><italic>Oglu</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">36,379</td>
<td valign="top" align="center">25</td>
<td valign="top" align="center">5</td>
<td/>
<td valign="top" align="center">2</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">3</td>
<td/>
<td/>
<td valign="top" align="center">6</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza meridionalis</italic> (AA)</td>
<td valign="top" align="center"><italic>Omer</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">30,241</td>
<td valign="top" align="center">22</td>
<td valign="top" align="center">5</td>
<td/>
<td valign="top" align="center">2</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">3</td>
<td/>
<td/>
<td valign="top" align="center">5</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza punctata</italic> (BB)</td>
<td valign="top" align="center"><italic>Opun</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">32,550</td>
<td valign="top" align="center">24</td>
<td valign="top" align="center">6</td>
<td/>
<td valign="top" align="center">2</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">2</td>
<td/>
<td/>
<td valign="top" align="center">4</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza brachyantha</italic>(FF)</td>
<td valign="top" align="center"><italic>Obra</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">32,463</td>
<td valign="top" align="center">25</td>
<td valign="top" align="center">5</td>
<td/>
<td valign="top" align="center">2</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left">Total</td>
<td valign="top" align="justify"/>
<td valign="top" align="justify"/>
<td valign="top" align="center">249</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">20</td>
<td valign="top" align="center">87</td>
<td valign="top" align="center">25</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">59</td>
<td valign="top" align="center">17</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><italic>HZSM, HD bZIP START MEKHLA; SM, START MEKHLA; HZS, HD bZIP START; HS, HD START; PSD, PH START DUF1336; PS, PH START; SD, START DUF1336; mS, minimal START; TM, transmembrane segments. The species are first categorized based on their cultivated and wild nature and then ranked in the order of evolution, from recent to oldest evolved.</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<p>As can be seen from <xref ref-type="table" rid="T1">Table 1</xref>, the most widely cultivated varieties <italic>Ojap</italic><sub><italic>c</italic></sub>, and <italic>Oind</italic><sub><italic>c</italic></sub> possess the highest numbers of START coding genes, when compared to early evolved African cultivated species <italic>Ogla</italic><sub><italic>c</italic></sub> and the seven wild species. The oldest AA variety <italic>Omer</italic><sub><italic>w</italic></sub> has 22 START coding genes, lowest among all, although START numbers do not vary greatly between species, and their numbers are proportional to genome size in most cases. Among the wild varieties, the earliest evolved <italic>Obra</italic><sub><italic>w</italic></sub> has the same number of START genes as the most recently evolved wild variety <italic>Oruf</italic><sub><italic>w</italic></sub>. The breakup of these START domains, in terms of potential functions based on domain combinations, is explored further in the next section, but these general numbers suggest that the increase in START domains among cultivated rices, may reflect an evolving role of STARTs in stress induction or stress response. For accession IDs of START coding genes found in all 10 species, along with protein and domain information, see <xref ref-type="supplementary-material" rid="TS1">Supplementary Table 1</xref>.</p>
</sec>
<sec id="S3.SS2">
<title>Domain Structure Analysis and Classification of START Proteins</title>
<p>The DSA was carried out to find the additional domains associated with START domains and to understand their arrangement among the START proteins. START proteins have been known to exist both as minimal START domains, as well as in association with other functional domains, and this domain organization has been used as a criterion for their classification along with information on specific ligands, which they bind (<xref ref-type="bibr" rid="B59">Schrick et al., 2004</xref>; <xref ref-type="bibr" rid="B3">Alpy and Tomasetto, 2005</xref>). We explored the domain structure of all 249 identified START domains, and classified them into eight groups, as shown in <xref ref-type="table" rid="T1">Table 1</xref>, depending on the combinatorial patterns of STARTs with additional domains such as HD, bZIP (basic leucine zipper domain), MEKHLA domains, PH domain, and DUF1336 (domain of unknown function). Six of these eight groups have been reported earlier (<xref ref-type="bibr" rid="B59">Schrick et al., 2004</xref>, <xref ref-type="bibr" rid="B58">2014</xref>), including (a) mS, i.e., minimal START lacking any additional domains, (b) HS (having HD), (c) HZS (containing HD and bZIP), (d) HZSM (having HD, bZIP, and MEKHLA), (e) PSD (with PH and DUF1336), and (f) SD (START with DUF1336), while two new combinations (not reported earlier) were also seen, namely (g) SM (i.e., START with MEKHLA) and (h) PS (START with PH). Interestingly, these last two combinations are the only ones that are either completely absent from the cultivated varieties (as in case of PS), or completely absent from wild varieties (as in case of SM).</p>
<p>As can be seen in <xref ref-type="table" rid="T1">Table 1</xref>, almost 24% of rice START domains belong to minimal START (i.e., lacking any additional domains), while homeodomains constitute the largest category of domains co-occurring with STARTs. The recently evolved cultivated rices (<italic>Ojap</italic><sub><italic>c</italic></sub> or <italic>Oind<sub><italic>c</italic></sub>)</italic> have a higher number of minimal STARTs compared to early evolved wild species <italic>(Obra<sub><italic>w</italic></sub></italic> or <italic>Opun<sub><italic>w</italic></sub>)</italic>. We have previously shown that the HD associated with STARTs in plants has unique roles in plant transcription (<xref ref-type="bibr" rid="B58">Schrick et al., 2014</xref>), and this seems to be an ancient feature since all wild rice species also have the homeodomains. The HDs are always found in association with a leucine zipper in class III and class IV HD-zip family of plant START proteins. Over 60% of the 249 identified domains in rice have these homeodomains in combination with leucine zippers, which in turn, can be of two types; (a) class III HD ZIP START domains with a universally conserved basic leucine zipper known as bZIP, and (b) class IV HD ZIP STARTs, with a plant exclusive leucine zipper, known as ZLZ (<xref ref-type="bibr" rid="B59">Schrick et al., 2004</xref>; <xref ref-type="bibr" rid="B6">Ariel et al., 2007</xref>). Another domain, MEKHLA, often seen associated with the class III HD bZIP START proteins (<xref ref-type="bibr" rid="B44">Mukherjee and B&#x00FC;rglin, 2006</xref>), is completely missing from the START domains in all the wild rice species (SM family), as can be seen from <xref ref-type="table" rid="T1">Table 1</xref>. Our DSA methodology is based on CDD (<xref ref-type="bibr" rid="B40">Marchler-Bauer et al., 2014</xref>) which does not recognize the ZLZ, hence we use the term &#x201C;HD-START&#x201D; for class IV type proteins throughout this study.</p>
<p>Interestingly, the difference between domain structure of wild and cultivated rice does not appear to arise from the homeodomain containing STARTs, all of which occur in large numbers and with moderate uniformity across all rices (see <xref ref-type="table" rid="T1">Table 1</xref>). Apart from the HD containing START domains, the other two major domains that co-occur with STARTs are the PH at the N-terminus, and DUF1336 domains at the C-terminus. These form unusual combinations, two of which have been observed for the first time in this work, as mentioned earlier, and are starkly distinct between wild and cultivated rices; the dual combinations of START DUF1336 (five), START MEKHLA (three), and PH START (one). In fact, a START domain in combination with the PH alone, has only been observed in the earliest evolved wild rice namely, <italic>Obra</italic><sub><italic>w</italic></sub>. Similarly, very few domains show the combination of START domain alone with DUF1336, but the triple combination (PSD category) is seen frequently (35% of non-HD START combinations) across all rices, suggesting that these three domains are more effective in combination, rather than alone. PH domains are well known for intracellular signaling or as constituents of the cytoskeleton proteins. This domain also binds with phosphatidylinositol within biological membranes, thus playing roles in membrane recruitment, subcellular targeting or enabling interactions with other components of the signal transduction pathways (<xref ref-type="bibr" rid="B42">Mayer et al., 1993</xref>; <xref ref-type="bibr" rid="B22">Ingley and Hemmings, 1994</xref>). Intriguingly, another connection to this role is evident in minimal STARTs, 30% of which were found to have transmembrane (TM) segments (17 in all; 11 with two TM segments and 6 with single TM), that shows a huge similarity to a specific class of mammalian STARTs, namely the phosphatidylcholine transfer proteins (STARD2/PCTP) that preferentially bind to phosphatidylcholine (<xref ref-type="bibr" rid="B57">Satheesh et al., 2016</xref>). That PH domains are present singly with START domains in the earliest known rice, and not elsewhere, as well as the presence of TM segments, but only in minimal STARTs, suggests that initiation of association with other domains began with membrane interfaces, and the addition of other, newer domains, may have been critical to the evolution of START functional diversity. The illustrative image for domain organization of 28 START proteins from <italic>Ojap</italic><sub><italic>c</italic></sub> is given in <xref ref-type="fig" rid="F1">Figure 1</xref>. The detailed DSA report of 249 START proteins along with the domain sizes and positions is provided in <xref ref-type="supplementary-material" rid="TS1">Supplementary Table 1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Schematic overview of domain arrangement of START proteins and gene structure of all 28 STARTs genes in <italic>Oryza sativa</italic> var. <italic>japonica.</italic> Index shows exonic colors for homeodomain (HD), homeodomain associated basic leucine zipper domain (HD bZIP), START domain, PH domain, domain of unidentified function (DUF1336), and MEKHLA. The exon portions that code the protein inter-domain regions (gray colored regions in domain arrangement) are depicted in mint shades at gene structure. (&#x002A;Gene pairs are proximally duplicated and shows higher resemblance in gene structure).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-737194-g001.tif"/>
</fig>
</sec>
<sec id="S3.SS3">
<title>Gene Structure Analysis of START Coding Genes</title>
<p>Gene structure analysis for all 249 START domains was performed as described in sections &#x201C;Materials and Methods,&#x201D; and &#x201C;Results&#x201D; are depicted in <xref ref-type="table" rid="T2">Table 2</xref>, listing exon numbers for each functional domain within and between the eight START categories described in the previous section. The GSA for main cultivated variety <italic>Ojap</italic><sub><italic>c</italic></sub> along with its full-length protein domains pattern is depicted in <xref ref-type="fig" rid="F1">Figure 1</xref>, whereas full gene structure maps and complete exon&#x2013;intron details of all START domains in all 10 rices are provided as <xref ref-type="supplementary-material" rid="TS1">Supplementary Tables 1</xref>, <xref ref-type="supplementary-material" rid="TS2">2</xref> and <xref ref-type="supplementary-material" rid="FS1">Supplementary Figures 1A&#x2013;J</xref>. In general, gene sizes vary between 0.5 to 32 kb, with some of the minimal STARTs in the oldest wild rice <italic>Obra</italic><sub><italic>w</italic></sub> encoded by a single exon, while few PSD class STARTS in wild rices have up to 34 exons. The overall pattern (see <xref ref-type="table" rid="T2">Table 2</xref>) is that exon numbers for the START region itself are quite conserved within specific structural classes, and the variability between wild and cultivated rice stems from exon numbers of the associated domains in these proteins. Exon numbers are also highly variable across the eight classes of START domains, with the greatest variability reflected in the minimal STARTs which contain very large intronic regions and long lengths of upstream and downstream UTRs (up to 17 kb), suggesting a potential for the addition of new domains, exon creation and alternate splicing. <xref ref-type="fig" rid="F1">Figure 1</xref> reveals similarities between gene structure of the newly observed SM class of STARTs with other categories; GSA of one of the SM genes is almost identical with HZSM members, after losing the HZ fragment, whereas the other SM gene has a GSA identical to a member of the minimal STARTs (<xref ref-type="fig" rid="F1">Figure 1</xref>), suggesting a gain of function. They are also proximally duplicated where SM classes showed higher expression in both anatomical part and development stages while mS classes are poorly expressed (see the section on gene expression). Both pairs of genes are on the same chromosomes, adding support to these hypotheses, as discussed in the subsequent sections on chromosomal mapping and phylogenetics.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Gene structure analysis: number of exons involve in coding full-length START proteins for different structural classes amongst 10 cultivated and wild <italic>Oryza</italic> species.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Name of plants</td>
<td valign="top" align="center">HZSM</td>
<td valign="top" align="center">SM</td>
<td valign="top" align="center">HZS</td>
<td valign="top" align="center">HS</td>
<td valign="top" align="center">PSD</td>
<td valign="top" align="center">PS</td>
<td valign="top" align="center">SD</td>
<td valign="top" align="center">mS</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="9"><bold>Cultivated rice species</bold></td>
</tr>
<tr>
<td valign="top" align="left"><italic>Ojap</italic><sub><italic>c</italic></sub></td>
<td valign="top" align="center">18 or 19 (8)</td>
<td valign="top" align="center">14 (4 or 5)</td>
<td valign="top" align="center">9 (4)</td>
<td valign="top" align="center">4&#x2013;12 (1 or 4)</td>
<td valign="top" align="center">20 or 22 (7 or 8)</td>
<td/>
<td valign="top" align="center">23 (7)</td>
<td valign="top" align="center">2&#x2013;10 (2&#x2013;5)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oind</italic><sub><italic>c</italic></sub></td>
<td valign="top" align="center">15&#x2013;19 (7 or 8)</td>
<td/>
<td valign="top" align="center">8 or 9 (4)</td>
<td valign="top" align="center">4&#x2013;12 (1&#x2013;5)</td>
<td valign="top" align="center">19&#x2013;22 (7 or 8)</td>
<td/>
<td valign="top" align="center">23 (7)</td>
<td valign="top" align="center">3&#x2013;7 (2&#x2013;5)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Ogla</italic><sub><italic>c</italic></sub></td>
<td valign="top" align="center">18 (8)</td>
<td valign="top" align="center">14 (6)</td>
<td valign="top" align="center">8 or 9 (4)</td>
<td valign="top" align="center">3&#x2013;11 (1&#x2013;4)</td>
<td valign="top" align="center">21 (7)</td>
<td/>
<td valign="top" align="center">20 or 22 (7 or 8)</td>
<td valign="top" align="center">3 or 6 (2&#x2013;5)</td>
</tr>
<tr>
<td valign="top" align="left" colspan="9"><bold>Wild rice species</bold></td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oruf</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">16&#x2013;20 (7 or 8)</td>
<td/>
<td valign="top" align="center">9 (4)</td>
<td valign="top" align="center">5&#x2013;12 (1 or 4)</td>
<td valign="top" align="center">20 or 24 (5 or 7)</td>
<td/>
<td/>
<td valign="top" align="center">2&#x2013;7 (2&#x2013;5)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oniv</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">16&#x2013;21 (7 or 8)</td>
<td/>
<td valign="top" align="center">8 or 9 (4)</td>
<td valign="top" align="center">5&#x2013;13 (1 or 4)</td>
<td valign="top" align="center">20&#x2013;25 (5 or 7)</td>
<td/>
<td/>
<td valign="top" align="center">3&#x2013;11(2&#x2013;5)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Obar</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">16&#x2013;20 (7 or 8)</td>
<td/>
<td valign="top" align="center">9 (4)</td>
<td valign="top" align="center">9&#x2013;12 (3 or 4)</td>
<td valign="top" align="center">20 or 23 (5 or 7)</td>
<td/>
<td/>
<td valign="top" align="center">4&#x2013;10 (3&#x2013;5)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oglu</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">16&#x2013;20 (7 or 8)</td>
<td/>
<td valign="top" align="center">9 or 10 (4)</td>
<td valign="top" align="center">5&#x2013;14 (1&#x2013;5)</td>
<td valign="top" align="center">19&#x2013;26 (2&#x2013;7)</td>
<td/>
<td/>
<td valign="top" align="center">2&#x2013;19 2&#x2013;5)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Omer</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">17&#x2013;21 (7 or 8)</td>
<td/>
<td valign="top" align="center">9 or 10 (4)</td>
<td valign="top" align="center">5&#x2013;15 (1&#x2013;4)</td>
<td valign="top" align="center">19&#x2013;23 (2&#x2013;5)</td>
<td/>
<td/>
<td valign="top" align="center">6&#x2013;15 (4 or 5)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Opun</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">15&#x2013;20 (7 or 8)</td>
<td/>
<td valign="top" align="center">9 (4)</td>
<td valign="top" align="center">5&#x2013;12 (1&#x2013;4)</td>
<td valign="top" align="center">19 or 34 (6 0r 8)</td>
<td/>
<td/>
<td valign="top" align="center">3&#x2013;7 (2&#x2013;5)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Obra</italic><sub><italic>w</italic></sub></td>
<td valign="top" align="center">18&#x2013;20 (7 or 8)</td>
<td/>
<td valign="top" align="center">11 (4)</td>
<td valign="top" align="center">5&#x2013;13 (2&#x2013;4)</td>
<td valign="top" align="center">20&#x2013;24 (7 or 8)</td>
<td valign="top" align="center">12 (6)</td>
<td valign="top" align="center">23 (8)</td>
<td valign="top" align="center">1&#x2013;8 (1&#x2013;5)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><italic>The values given in parentheses are the number of exons that code for START domains regions alone. Acronyms/codes same as <xref ref-type="table" rid="T1">Table 1</xref>.</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<p>Apart from the above mentioned differences in the number of exons, START coding genes also vary in intron length and Untranslated Regions (UTRs) across the cultivated and wild rices. There is still much that is unknown about flanking UTRs at both the terminals of mRNA in the form of 3&#x2032;-UTR and 5&#x2032;-UTR, and although UTR regions have often been implicated in regulatory aspects of gene expression, they need to be investigated further. Almost one-third of all START coding genes reveal long sections of 3&#x2032;-UTR and 5&#x2032;-UTR (ranging from few nucleotides up to 17 kb), but the African and Asian cultivated rices (<italic>Oind</italic><sub><italic>c</italic></sub> and <italic>Ogla</italic><sub><italic>c</italic></sub>) appear to completely lack these at both termini. Clearly, cultivated rices vary by ancestors, and this is reflected in their inherent genetic diversity, as observed between <italic>Oind</italic><sub><italic>c</italic></sub> and <italic>Ojap</italic><sub><italic>c.</italic></sub> As can be seen in <xref ref-type="supplementary-material" rid="FS1">Supplementary Figures 1A&#x2013;J</xref> and <xref ref-type="supplementary-material" rid="TS2">Supplementary Table 2</xref>, UTR lengths were observed to be very long in minimal START genes, along with very long intron lengths, both features suggesting the potential for evolution <italic>via</italic> introduction of new function. Among the various classes of START genes, HZSM shows the shortest exon and introns regions while PSD and SD classes show distinctive combination of several exons and longer introns, aside from long stretches of UTR regions. Most classes have exons flanked by long introns but the HZSM and PSD have exons flanked by short introns. Cultivated rices having fewer cases of long flanking introns, further emphasize the greater genetic diversity in wild rices and scope for exon creation, alternate splicing and addition of functional features.</p>
</sec>
<sec id="S3.SS4">
<title>Ortholog Analysis and Chromosomal Distribution</title>
<p>The putative START coding genes were mapped on to chromosomes based on their gene location and karyotype information. <xref ref-type="fig" rid="F2">Figure 2</xref> depicts this for all 10 <italic>Oryza</italic> genomes and it is clear that despite variation in numbers, START genes show positional and structural conservation on the corresponding chromosomes, with slight variations in some genes reflecting syntenic block shuffling, which may in turn be due to (a) fragment rearrangements among chromosomes during speciation events, (b) isolated gene relocation events due to the homologs recombination or viral or transposon-based gene relocation mechanisms.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Distribution of all eight START categories (colored dots shown internal to chromosomes in circular ribbons and lines on chromosomes) across 10 <italic>Oryza</italic> genomes (each ribbon representing a chromosome in circular form). The 10 rices are arranged in order of cultivated (1&#x2013;3) and wild (4&#x2013;10), and represented in evolutionary order from recently evolved to the oldest: 1-<italic>Oryza sativa</italic> var. <italic>japonica</italic>, 2-<italic>Oryza sativa</italic> var. <italic>indica</italic>, 3-<italic>Oryza glaberrima</italic>, 4-<italic>Oryza rufipogon</italic>, 5-<italic>Oryza nivara</italic>, 6<italic>-Oryza barthii</italic>, 7-<italic>Oryza glumaepatula</italic>, 8-<italic>Oryza meridionalis</italic>, 9-<italic>Oryza punctate</italic>, and 10-<italic>Oryza brachyantha</italic>. START types represented by circular glyphs/dots: blue, Minimal START; gray, SD; very light green, SM; yellow, PS; purple, PSD; red, HS; light orange, HZS; and green, HZSM.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-737194-g002.tif"/>
</fig>
<p>The START genes are distributed among 11 chromosomes (out of 12) across all wild and cultivated rices species (<xref ref-type="fig" rid="F2">Figure 2</xref>), with the highest numbers mapping to chromosomes 8 and 10, while Chr 5 is devoid of any START genes (with a single exception of one START gene in early evolved <italic>Omer</italic><sub><italic>w</italic></sub>). The HZSM class of START genes is predominantly located on chromosomes 1, 3, 10, and 12. Surprisingly, there are two HZS gene orthologs unequivocally present, one each on chromosomes 2 and 8 except in <italic>Oniv</italic><sub><italic>w</italic></sub> where one HZS gene was seen on Chr 10 instead of Chr 8. It may be recalled that SM, a special class of START that was not seen in any of the wild rices, occurs on Chr 1 amongst cultivated rices <italic>Ojap</italic><sub><italic>c</italic></sub> and <italic>Ogla</italic><sub><italic>c</italic></sub> and shares homology with HZSM on the same chromosome in eight other rices, suggesting a possible loss of HZ fragment of some members of the HZSM class. In contrast, the other SM gene on Chr 6 of <italic>Ojap</italic><sub><italic>c</italic></sub> showed homology with minimal START on Chr 6 of <italic>Oruf</italic><sub><italic>w</italic></sub>, <italic>Oniv</italic><sub><italic>w</italic></sub>, <italic>Obar</italic><sub><italic>w</italic></sub>, and <italic>Oglu</italic><sub><italic>w</italic></sub> and is possibly an example of gain of function. These observations match the pairwise GSA patterns observed in the previous section and are further supported by the corresponding pairs of genes being orthologous as shown in <xref ref-type="table" rid="T3">Table 3</xref>. This table lists the orthologous genes in all 10 rices, using the recently evolved cultivated variety <italic>Ojap</italic><sub><italic>c</italic></sub> as reference for the other nine rice genomes, as described in section &#x201C;Materials and Methods.&#x201D; Interestingly, this table also shows that PS, (a special class of STARTs, seen only in the oldest wild rice <italic>Obra</italic><sub><italic>w</italic></sub>) is orthologous to a member of the PSD class in the cultivated varieties. In contrast, the members of the SD class, observed only in cultivated <italic>Oryza</italic> species, and the oldest wild rice, are similar to each other but do not have any orthologs in other genomes of rice, not even in the immediate ancestors of the three cultivated varieties. Overall, the findings of this section support the idea of specialized functional roles for each of the eight START classes in plants, and we further explore this aspect in later sections.</p>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>START genes in <italic>Oryza sativa</italic> var. <italic>japonica</italic> and their best orthologs amongst other nine rice species.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="center" colspan="3">Cultivated rice species<hr/></td>
<td valign="top" align="center" colspan="7">Wild rice species<hr/></td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza sativa</italic> var. <italic>japonica</italic></td>
<td valign="top" align="left"><italic>Oryza sativa</italic> var. <italic>indica</italic></td>
<td valign="top" align="left"><italic>Oryza glaberrima</italic></td>
<td valign="top" align="left"><italic>Oryza rufipogon</italic></td>
<td valign="top" align="left"><italic>Oryza nivara</italic></td>
<td valign="top" align="left"><italic>Oryza barthii</italic></td>
<td valign="top" align="left"><italic>Oryza glumipatula</italic></td>
<td valign="top" align="left"><italic>Oryza meridionalis</italic></td>
<td valign="top" align="left"><italic>Oryza punctata</italic></td>
<td valign="top" align="left"><italic>Oryza brachyantha</italic></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">LOC_Os03g 01890_HZSM</td>
<td valign="top" align="left">BGIOSGA01 1687_HZSM</td>
<td valign="top" align="left">ORGLA03G0 005300_HZSM</td>
<td valign="top" align="left">ORUFI03G 00510_HZSM</td>
<td valign="top" align="left">ONIVA11G211 80_HZSM</td>
<td valign="top" align="left">OBART03G 00720_HZSM</td>
<td valign="top" align="left">OGLUM03G0 0670_HZSM</td>
<td valign="top" align="left">OMERI03G0 0480_HZSM</td>
<td valign="top" align="left">OPUNC03G 00630_HZSM OPUNC0 3G00670_HZSM</td>
<td valign="top" align="left">OB03G1 0760_HZSM</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os03g 43930_HZSM</td>
<td valign="top" align="left">BGIOSGA 013211_HZSM</td>
<td valign="top" align="left">ORGLA03G0 253300_HZSM</td>
<td valign="top" align="left">ORUFI03 G28280_HZSM</td>
<td valign="top" align="left">ONIVA03G 28370_HZSM</td>
<td valign="top" align="left">OBART03G 27200_HZSM</td>
<td valign="top" align="left">OGLUM03G 27880_HZSM</td>
<td valign="top" align="left">OMERI03G 00550_HZSM</td>
<td valign="top" align="left">OPUNC03G 24840_HZSM</td>
<td valign="top" align="left">OB03G35 020_HZSM</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os10g 33960_HZSM</td>
<td valign="top" align="left">BGIOSGA 033144_HZSM</td>
<td valign="top" align="left">ORGLA10G0 115700_HZSM</td>
<td valign="top" align="left">ORUFI10G 14170_HZSM</td>
<td valign="top" align="left">ONIVA10G14 800_HZSM</td>
<td valign="top" align="left">OBART10G 13430_HZSM</td>
<td valign="top" align="left">OGLUM10G 13240_HZSM</td>
<td valign="top" align="left">OMERI10G 10230_HZSM</td>
<td valign="top" align="left">OPUNC10G 11850_HZSM</td>
<td valign="top" align="left">OB10G2 0600_HZSM</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os12g4 1860_HZSM</td>
<td valign="top" align="left">BGIOSGA0 35845_HZSM</td>
<td valign="top" align="left">ORGLA12G0 158900_HZSM</td>
<td valign="top" align="left">ORUFI1 2G20830_HZSM</td>
<td valign="top" align="left">ONIVA12G 17580_HZSM</td>
<td valign="top" align="left">OBART12G 18630_HZSM</td>
<td valign="top" align="left">OGLUM12G 20280_HZSM</td>
<td valign="top" align="left">OMERI12G 14040_HZSM</td>
<td valign="top" align="left">OPUNC12G1 6840_HZSM</td>
<td valign="top" align="left">OB12G 25330_HZSM</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os01g 10320_SM</td>
<td valign="top" align="left">BGIOSGA 002186_HZSM</td>
<td valign="top" align="left">ORGLA01G 0060700_SM</td>
<td valign="top" align="left">ORUFI01 G06940_HZSM</td>
<td valign="top" align="left">ONIVA0 1G07970_HZSM</td>
<td valign="top" align="left">OBART01G 06400_HZSM</td>
<td valign="top" align="left">OGLUM01G 07380_HZSM</td>
<td valign="top" align="left">OMERI01G0 6500_HZSM</td>
<td valign="top" align="left">OPUNC01G0 6180_HZSM</td>
<td valign="top" align="left">OB01G1 6410_HZSM</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os06g 50510_SM</td>
<td/>
<td/>
<td valign="top" align="left">ORUFI06G 29670_mS</td>
<td valign="top" align="left">ONIVA06G 30280_mS</td>
<td valign="top" align="left">OBART06G 27690_mS</td>
<td valign="top" align="left">OGLUM06G 29150_mS</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">LOC_Os02g4 5250_HZS</td>
<td valign="top" align="left">BGIOSGA0 05852_HZS</td>
<td valign="top" align="left">ORGLA02G 0238300_HZS</td>
<td valign="top" align="left">ORUFI02G 29080_HZS</td>
<td valign="top" align="left">ONIVA0 2G30520_HZS</td>
<td valign="top" align="left">OBART02G 27630_HZS</td>
<td valign="top" align="left">OGLUM0 2G28170_HZS</td>
<td valign="top" align="left">OMERI02G2 6880_HZS</td>
<td valign="top" align="left">OPUNC02G 25360_HZS</td>
<td valign="top" align="left">OB02G3 5000_HZS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os08g1 9590_HZS</td>
<td valign="top" align="left">BGIOSGA02 8396_HZS</td>
<td valign="top" align="left">ORGLA08G 0080600_HZS</td>
<td valign="top" align="left">ORUFI08G 10670_HZS</td>
<td valign="top" align="left">ONIVA10G 10170_HZS</td>
<td valign="top" align="left">OBART08G 09450_HZS</td>
<td valign="top" align="left">OGLUM08G 10180_HZS</td>
<td valign="top" align="left">OMERI08G 08060_HZS</td>
<td valign="top" align="left">OPUNC08G 08700_HZS</td>
<td valign="top" align="left">OB08G 18460_HZS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os01g 55549_HS</td>
<td valign="top" align="left">BGIOSGA 000782_HS</td>
<td valign="top" align="left">ORGLA01G0 257700_HS</td>
<td valign="top" align="left">ORUFI0 1G34950_HS</td>
<td valign="top" align="left">ONIVA01G 36080_HS</td>
<td valign="top" align="left">OBART0 1G31750_HS</td>
<td valign="top" align="left">OGLUM01G 35890_HS</td>
<td valign="top" align="left">OMERI01G 28810_HS</td>
<td valign="top" align="left">OPUNC01G 30780_HS</td>
<td valign="top" align="left">OB01G 40650_HS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os01g 57890_HS</td>
<td valign="top" align="left">BGIOSGA 004594_mS BGIOSGA 004593_HS</td>
<td valign="top" align="left">ORGLA01G 0275300_HS</td>
<td valign="top" align="left">ORUFI01G 36780_HS</td>
<td valign="top" align="left">ONIVA01G 38360_HS</td>
<td valign="top" align="left">OBART01G 33600_HS</td>
<td valign="top" align="left">OGLUM01G 37810_HS</td>
<td valign="top" align="left">OMERI01G 30450_HS</td>
<td valign="top" align="left">OPUNC01 G32670_HS</td>
<td valign="top" align="left">OB01G42420_HS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os04g 48070_HS</td>
<td valign="top" align="left">BGIOSG A014527_HS</td>
<td valign="top" align="left">ORGLA04G0 191300_HS</td>
<td valign="top" align="left">ORUFI04G 23720_HS</td>
<td valign="top" align="left">ONIVA04G 20660_HS</td>
<td valign="top" align="left">OBART04G 22070_HS</td>
<td valign="top" align="left">OGLUM04G 22100_HS</td>
<td valign="top" align="left">OMERI04G 18610_HS</td>
<td valign="top" align="left">OPUNC04G 19830_HS</td>
<td valign="top" align="left">OB04G29090_HS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os04g 53540_HS</td>
<td valign="top" align="left">BGIOSGA 014304_HS</td>
<td valign="top" align="left">ORGLA04G0 231200_HS</td>
<td valign="top" align="left">ORUFI04G 27800_HS</td>
<td valign="top" align="left">ONIVA04G 25060_HS</td>
<td valign="top" align="left">OBART04G 26630_HS</td>
<td valign="top" align="left">OGLUM04G 26010_HS</td>
<td valign="top" align="left">OMERI04G 21680_mS</td>
<td valign="top" align="left">OPUNC04G 23630_HS</td>
<td valign="top" align="left">OB04G 32930_HS OB08G1 2470_HS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os06g1 0600_HS</td>
<td valign="top" align="left">BGIOSGA0 21700_HS</td>
<td valign="top" align="left">ORGLA06G 0063600_HS ORGLA06G0 247800_HS</td>
<td valign="top" align="left">ORUFI06G 07070_HS</td>
<td valign="top" align="left">ONIVA06G 08130_HS</td>
<td valign="top" align="left">OBART06G 06830_HS</td>
<td valign="top" align="left">OGLUM06G 07200_HS</td>
<td valign="top" align="left">OMERI06G 08000_HS</td>
<td valign="top" align="left">OPUNC06 G06470_HS</td>
<td valign="top" align="left">OB06G 16250_HS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os08g 04190_HS</td>
<td valign="top" align="left">BGIOSGA 027698_HS</td>
<td valign="top" align="left">ORGLA08G 0016400_HS</td>
<td valign="top" align="left">ORUFI08G 02510_HS</td>
<td valign="top" align="left">ONIVA08G0 2450_HS</td>
<td/>
<td valign="top" align="left">OGLUM08G 02240_HS</td>
<td valign="top" align="left">OMERI08G 02310_HS</td>
<td valign="top" align="left">OPUNC0 8G02090_HS</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">LOC_Os08g 08820_HS</td>
<td valign="top" align="left">BGIOSGA 028102_HS</td>
<td valign="top" align="left">ORGLA08G0 042800_HS</td>
<td valign="top" align="left">ORUFI08G 05670_HS</td>
<td valign="top" align="left">ONIVA08G 05020_HS</td>
<td valign="top" align="left">OBART08G 05070_HS</td>
<td valign="top" align="left">OGLUM08G 05340_HS</td>
<td/>
<td valign="top" align="left">OPUNC08G 04870_HS</td>
<td valign="top" align="left">OB08G 15100_mS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os09g 35760_HS</td>
<td valign="top" align="left">BGIOSGA 029405_HS</td>
<td/>
<td valign="top" align="left">ORUFI09G 18410_HS</td>
<td valign="top" align="left">ONIVA09G 18120_HS</td>
<td valign="top" align="left">OBART09G 17090_HS</td>
<td valign="top" align="left">OGLUM09G 17630_HS</td>
<td valign="top" align="left">OMERI09G 12650_HS</td>
<td valign="top" align="left">OPUNC09G 15430_HS</td>
<td valign="top" align="left">OB09G 23660_HS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os10g 42490_HS</td>
<td valign="top" align="left">BGIOSGA 031343_HS</td>
<td valign="top" align="left">ORGLA10G 0146400_HS</td>
<td valign="top" align="left">ORUFI10 G20730_HS</td>
<td valign="top" align="left">ONIVA10G 22060_HS</td>
<td valign="top" align="left">OBART10G 19440_HS</td>
<td valign="top" align="left">OGLUM10G 19540_HS</td>
<td valign="top" align="left">OMERI10G 15290_HS</td>
<td valign="top" align="left">OPUNC10G 17890_HS OPUNC10G 17920_HS</td>
<td valign="top" align="left">OB10G 26620_HS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os02g 01270_PSD</td>
<td valign="top" align="left">BGIOSGA0 07330_PSD BGIOSGA 022187_PSD</td>
<td valign="top" align="left">ORGLA02G 0002000_SD</td>
<td valign="top" align="left">ORUFI02G 00290_PSD</td>
<td valign="top" align="left">ONIVA02G 00280_PSD ONIVA06G 01480_PSD</td>
<td valign="top" align="left">OBART02G 00270_PSD</td>
<td valign="top" align="left">OGLUM02G 00270_PSD OGLUM06G0 1090_PSD</td>
<td valign="top" align="left">OMERI02G0 9330_PSD OMERI06G 01090_PSD</td>
<td valign="top" align="left">OPUNC11G 05520_PSD</td>
<td valign="top" align="left">OB02G1 0290_PSD OB06G 11230_PS OB10G2 0570_PSD</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os10g 31770_PSD</td>
<td valign="top" align="left">BGIOSGA 033068_PSD</td>
<td valign="top" align="left">ORGLA10G 0106900_PSD</td>
<td valign="top" align="left">ORUFI10G 12680_PSD</td>
<td valign="top" align="left">ONIVA1 0G11380_PSD ONIVA10G 14690_PSD</td>
<td valign="top" align="left">OBART10G 12110_PSD</td>
<td valign="top" align="left">OGLUM10G 11870_PSD</td>
<td valign="top" align="left">OMERI10G 09100_PSD</td>
<td valign="top" align="left">OPUNC10G 10390_PSD</td>
<td valign="top" align="left">OB10G 19410_PSD</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os08g 34060_SD</td>
<td valign="top" align="left">BGIOSGA 028734_SD</td>
<td valign="top" align="left">ORGLA08G 0139700_SD</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">OB08G2 3520_SD</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os02g 03230_mS</td>
<td valign="top" align="left">BGIOSGA 007209_mS</td>
<td valign="top" align="left">ORGLA02G 0018400_mS</td>
<td valign="top" align="left">ORUFI02G 01930_mS</td>
<td valign="top" align="left">ONIVA02G 01830_mS</td>
<td valign="top" align="left">OBART02G 01950_mS</td>
<td valign="top" align="left">OGLUM02G 01810_mS</td>
<td valign="top" align="left">OMERI02G 02610_mS</td>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">LOC_Os02g 26860_mS</td>
<td valign="top" align="left">BGIOSGA 008180_mS</td>
<td valign="top" align="left">ORGLA02G0 141000_mS</td>
<td valign="top" align="left">ORUFI02G1 6760_mS</td>
<td valign="top" align="left">ONIVA02G1 7540_mS</td>
<td valign="top" align="left">OBART02G 16280_mS</td>
<td valign="top" align="left">OGLUM02G 16340_mS</td>
<td valign="top" align="left">OMERI02G 15840_mS</td>
<td valign="top" align="left">OPUNC02G 14500_mS</td>
<td valign="top" align="left">OB02G24430_mS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os04g 02910_mS</td>
<td valign="top" align="left">BGIOSGA 015687_mS</td>
<td valign="top" align="left">ORGLA04G 0008500_mS</td>
<td valign="top" align="left">ORUFI04G 01090_mS</td>
<td valign="top" align="left">ONIVA04G 00760_mS</td>
<td valign="top" align="left">OBART04G 01070_mS</td>
<td valign="top" align="left">OGLUM04G 00980_mS</td>
<td valign="top" align="left">OMERI04G 00950_mS</td>
<td valign="top" align="left">OPUNC04G 01070_mS</td>
<td valign="top" align="left">OB04G 10780_mS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os06g 50560_mS</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">LOC_Os06g 50724_mS</td>
<td valign="top" align="left">BGIOSGA 023612_mS</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">LOC_Os07g0 8760_mS</td>
<td valign="top" align="left">BGIOSGA0 24704_mS</td>
<td valign="top" align="left">ORGLA07G0 045900_mS</td>
<td valign="top" align="left">ORUFI07G 05330_mS</td>
<td valign="top" align="left">ONIVA07G 04320_mS</td>
<td valign="top" align="left">OBART07G 05510_mS</td>
<td valign="top" align="left">OGLUM07G 04970_mS</td>
<td valign="top" align="left">OMERI05G 17970_mS</td>
<td valign="top" align="left">OPUNC07G 05340_mS</td>
<td valign="top" align="left">OB07G14050_mS</td>
</tr>
<tr>
<td valign="top" align="left">LOC_Os07g 47130_mS</td>
<td/>
<td/>
<td valign="top" align="left">ORUFI07G 26520_mS</td>
<td/>
<td/>
<td valign="top" align="left">OGLUM07G 25570_mS</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">LOC_Os11g 14070_mS</td>
<td valign="top" align="left">BGIOSGA 034213_mS</td>
<td valign="top" align="left">ORGLA11G0 072100_mS</td>
<td valign="top" align="left">ORUFI11G 08470_mS</td>
<td valign="top" align="left">ONIVA11G 08330_mS</td>
<td valign="top" align="left">OBART11G 08150_mS</td>
<td/>
<td/>
<td valign="top" align="left">OPUNC11G 07930_mS</td>
<td valign="top" align="left">OB11G 16480_mS</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="S3.SS5">
<title>Phylogenetic Analysis of Different Structural Classes of START Domain Containing Proteins</title>
<p>The 249 START protein sequences from all 10 <italic>Oryza</italic> species and 35 reference sequences from the model plant <italic>A. thaliana</italic> were used to construct a phylogenetic dendrogram as described in section &#x201C;Materials and Methods,&#x201D; and this led to the grouping of genes having closely related evolutionary patterns as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. The phylogenetic tree showed distinct clusters for all major structural classes of START domains, which suggests conservation amongst different structural classes of START proteins in terms of their sequences. Few of the minimal STARTs were distributed among different clusters that might be due to their vast differences in their sequence lengths. As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, The HZSM represented in green forms a single distinct cluster, while HZS and HS represented in light orange and red, respectively, formed a single cluster, as expected with an overlap between these two subclasses. The two minor classes, i.e., PS (shown in olive) and SD (shown in gray) formed sub-cluster together with PSD (shown in purple). The minimal START proteins represented in blue forms a single large cluster, but some of them are distributed among other structural classes, which might be due to vast differences in their sequence lengths. The three SM class (represented in dark green) falls alongside HZSM and minimal START. As expected, all three cultivated rices were observed to lie adjacent to each other or in the same branch as their immediate wild ancestor.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Cladogram for START proteins from all 10 <italic>Oryza</italic> genomes along with <italic>Arabidopsis thaliana</italic> (&#x002A;). Color codes are same as earlier figures: red, HD START (HS); light orange, HD bZIP START (HZS); green, HD bZIP START MEKHLA (HZSM); dark green, START MEKHLA (SM); purple, PH START DUF1336 (PSD); yellow, PH START (PS); gray, START DUF1336 (SD); and blue for minimal START. Phylogeny codes for each locus ID are based on the orthologs analysis with reference to <italic>Ojap</italic><sub><italic>c</italic></sub> as described previously (see <xref ref-type="supplementary-material" rid="TS1">Supplementary Table 1</xref>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-737194-g003.tif"/>
</fig>
<p>Previous studies suggested that class III HD-ZIP proteins are evolutionarily conserved (<xref ref-type="bibr" rid="B6">Ariel et al., 2007</xref>). In this study, although this family formed a single cluster, few intervening minimal STARTs were also found. The unusual START type &#x201C;START MEKHLA&#x201D; also merged with this cluster, which shows evolutionary relatedness with class III HD-ZIP proteins, despite the lack of HD-ZIP region. The HD bZIP START (HZS) and HD START (HS) shared the high similarity between the two and which causes overlapping of clusters. Similarly, PS, PSD, and SD formed a single cluster.</p>
</sec>
<sec id="S3.SS6">
<title>STARTS in Collinear Blocks &#x2013; A Spatial Pattern Conservation of START Genes Among 10 <italic>Oryza</italic> Species</title>
<p>The occurrence of several genes into a collinear block provides clues on the spatial conservation of the individual genes and their proximal neighborhoods that provides the biological significance of gene blocks in the evolutionary sense. <xref ref-type="fig" rid="F4">Figure 4</xref> depicts these blocks as maps with START genes within one block linked by domain structural classes and collinear gene sets vary from 12 to 20% across the genome, the majority being close to 15% (<xref ref-type="supplementary-material" rid="TS3">Supplementary Table 3</xref>).</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Self-collinear blocks of 10 <italic>Oryza</italic> species. Outer circle: 12 chromosomes from each of the 10 <italic>Oryza</italic> genomes that are represented in default colors of circos. Inner circle: START genes are shown as colored bar highlights. Color codes for bar highlights and labels are same as <xref ref-type="fig" rid="F2">Figures 2</xref>, <xref ref-type="fig" rid="F3">3</xref>. Connectors (internal to the bar highlights) are formed between the START homologs that occur as collinear blocks on different chromosomes of the same genome and follows same color codes of START types except for START homologs that belongs to two different structural classes are linked with a black line. Gray connectors are used for showing non-START homologs within the genomes of 10 rices.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-737194-g004.tif"/>
</fig>
<p>Six to ten START genes occur in collinear blocks, with about one-third gene number increase in the cultivated varieties, i.e., <italic>Ojap</italic><sub><italic>c</italic></sub> and <italic>Oind</italic><sub><italic>c</italic></sub> as compared to early evolved rice species <italic>Obra</italic><sub><italic>w</italic></sub>, <italic>Opun</italic><sub><italic>w</italic></sub>, or <italic>Omer</italic><sub><italic>w</italic></sub> and similar is the case of total gene numbers between AA genotype rice varieties <italic>Ojap</italic><sub><italic>c</italic></sub> (recently evolved) and <italic>Omer</italic><sub><italic>w</italic></sub> (early evolved). Lack of consistency in the number of genes in syntenic blocks hints at the possible chromosomal rearrangement of fragments bearing START genes in both wild and cultivated rice species. The patterns of collinear blocks of cultivated varieties <italic>Ojap</italic><sub><italic>c</italic></sub> appear similar to immediate ancestors <italic>Oruf</italic><sub><italic>w</italic></sub>, unlike <italic>Oind</italic><sub><italic>c</italic></sub>, each pair being placed next to each other in <xref ref-type="fig" rid="F4">Figure 4</xref>. In contrast, the number of START genes in cultivated varieties is reminiscent of their wilder early evolved relatives, for instance, cultivated variety <italic>Ojap</italic><sub><italic>c</italic></sub> has 10 START genes in collinear blocks, like its indirect/wilder ancestors <italic>Obar</italic><sub><italic>w</italic></sub> and <italic>Opun</italic><sub><italic>w</italic></sub>. The African domesticated varieties <italic>Ogla</italic><sub><italic>c</italic></sub>, has seven START genes in its collinear blocks, equivalent to its wilder relative <italic>Omer</italic><sub><italic>w</italic></sub>. <xref ref-type="supplementary-material" rid="TS3">Supplementary Table 3</xref> provides a species-wise total number of collinear blocks and START genes that occur in these blocks, while individual circos maps of syntenic collinear blocks are provided in <xref ref-type="supplementary-material" rid="FS2">Supplementary Figures 2A&#x2013;J</xref>.</p>
</sec>
<sec id="S3.SS7">
<title>Identification of Different Modes of START Gene Duplication</title>
<p>In plants, whole-genome duplication leading to polyploids is a frequent event, gene duplication being an important evolutionary phenomenon that helps in the gene dosage, adaptation and speciation; common modes being segmental duplication (SD), dispersed duplication (DD), tandem duplication (TD), and transposed duplication (TsD). Different modes of gene duplication were analyzed for START genes across 10 cultivated and wild <italic>Oryza</italic> species and revealed START genes to exist as duplicated pairs as shown in <xref ref-type="table" rid="T4">Table 4</xref>. As can be seen in <xref ref-type="table" rid="T4">Table 4</xref>, START genes are rarely present as singletons; and there are two major modes of gene duplication, namely, dispersed and segmental (arising from WGD) across the 10 species. Interestingly, dispersed and segmental duplications are similar between pairs of cultivated rice species and their immediate ancestors (<italic>Ojap</italic><sub><italic>c</italic></sub> and <italic>Oind</italic><sub><italic>c</italic></sub> with <italic>Oruf</italic><sub><italic>w</italic></sub>; <italic>Ogla</italic><sub><italic>c</italic></sub> with <italic>Obar</italic><sub><italic>w</italic></sub>) but the proximal and tandem START genes appear to have duplicated after speciation, as the immediate ancestors do not have any. Proximal and tandem duplicate modes among START genes are observed in only two of the early evolved wild species (<italic>Omer</italic><sub><italic>w</italic></sub> and <italic>Opun</italic><sub><italic>w</italic></sub>). <xref ref-type="fig" rid="F5">Figure 5</xref> shows a START gene dendrogram with various modes of duplication and paralogous pairs for the main cultivated variety <italic>Ojap</italic><sub><italic>c</italic></sub>, and it can be seen that of the eight pairs of duplicates, five pairs are segmentally duplicated (between chromosomes 2, 3, 4, 8, 9, 10, and 12), while the three START genes (all on Chr 6) are proximally duplicated. Besides these, two additional STARTS that are found in newly transposed locations as compared to their ancestral gene locations (<xref ref-type="fig" rid="F5">Figure 5</xref>).</p>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>Distribution of different modes of gene duplication based on whole genome and START genes (in parentheses) amongst 10 cultivated and wild <italic>Oryza</italic> species.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Name of plants</td>
<td valign="top" align="center">Singleton</td>
<td valign="top" align="center">Dispersed</td>
<td valign="top" align="center">Proximal</td>
<td valign="top" align="center">Tandem</td>
<td valign="top" align="center">WGD or segmental</td>
<td valign="top" align="center">Total</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="7"><bold>Cultivated rice species</bold></td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza sativa</italic> var. <italic>japonica</italic></td>
<td valign="top" align="center">8989 (1)</td>
<td valign="top" align="center">19,640 (15)</td>
<td valign="top" align="center">3266 (3)</td>
<td valign="top" align="center">4035 (0)</td>
<td valign="top" align="center">6259 (9)</td>
<td valign="top" align="center">42,189 (28)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza sativa</italic> var. <italic>indica</italic></td>
<td valign="top" align="center">7745 (1)</td>
<td valign="top" align="center">18,597 (17)</td>
<td valign="top" align="center">4599 (4)</td>
<td valign="top" align="center">5230 (1)</td>
<td valign="top" align="center">5860 (4)</td>
<td valign="top" align="center">42,031 (27)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza glaberrima</italic></td>
<td valign="top" align="center">7109 (1)</td>
<td valign="top" align="center">15,808 (16)</td>
<td valign="top" align="center">1783 (0)</td>
<td valign="top" align="center">3269 (0)</td>
<td valign="top" align="center">6161 (7)</td>
<td valign="top" align="center">34,130 (24)</td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><bold>Wild rice species</bold></td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza rufipogon</italic></td>
<td valign="top" align="center">9343 (1)</td>
<td valign="top" align="center">17,102 (17)</td>
<td valign="top" align="center">2741 (0)</td>
<td valign="top" align="center">2912 (0)</td>
<td valign="top" align="center">5814 (7)</td>
<td valign="top" align="center">37,912 (25)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza nivara</italic></td>
<td valign="top" align="center">9061 (1)</td>
<td valign="top" align="center">18,192 (18)</td>
<td valign="top" align="center">2605 (0)</td>
<td valign="top" align="center">2887 (0)</td>
<td valign="top" align="center">4281 (7)</td>
<td valign="top" align="center">37,026 (26)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza barthii</italic></td>
<td valign="top" align="center">8736 (1)</td>
<td valign="top" align="center">15,613 (13)</td>
<td valign="top" align="center">2749 (0)</td>
<td valign="top" align="center">3124 (0)</td>
<td valign="top" align="center">5331 (9)</td>
<td valign="top" align="center">35,553 (23)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza glumaepatula</italic></td>
<td valign="top" align="center">8916 (0)</td>
<td valign="top" align="center">16,191 (18)</td>
<td valign="top" align="center">2671 (0)</td>
<td valign="top" align="center">2971 (0)</td>
<td valign="top" align="center">5630 (7)</td>
<td valign="top" align="center">36,379 (25)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza meridionalis</italic></td>
<td valign="top" align="center">7667 (0)</td>
<td valign="top" align="center">14,172 (14)</td>
<td valign="top" align="center">2105 (1)</td>
<td valign="top" align="center">2567 (0)</td>
<td valign="top" align="center">3730 (7)</td>
<td valign="top" align="center">30,241 (22)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza punctata</italic></td>
<td valign="top" align="center">6721 (1)</td>
<td valign="top" align="center">14,027 (10)</td>
<td valign="top" align="center">2539 (3)</td>
<td valign="top" align="center">3245 (0)</td>
<td valign="top" align="center">6018 (10)</td>
<td valign="top" align="center">32,550 (24)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Oryza brachyantha</italic></td>
<td valign="top" align="center">9330 (1)</td>
<td valign="top" align="center">13,561 (17)</td>
<td valign="top" align="center">1597 (0)</td>
<td valign="top" align="center">2690 (0)</td>
<td valign="top" align="center">5285 (7)</td>
<td valign="top" align="center">32,463 (25)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><italic>(#) STARTs that occur in singleton, dispersed, proximal, tandem, and segmental duplication modes are mentioned in parentheses.</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Phylogenetic tree of the <italic>Oryza sativa</italic> var. <italic>japonica</italic> START genes family annotated with different modes of gene duplication.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-737194-g005.tif"/>
</fig>
</sec>
<sec id="S3.SS8">
<title>Nucleotide Substitution Rates and Ka/Ks Ratios</title>
<p>Ka/Ks ratios represent selection pressure on genes, with values of &#x003E;1, &#x003C;1, or 1 signifying positive, negative or neutral selection, respectively (<xref ref-type="bibr" rid="B30">Koonin and Rogozin, 2003</xref>). These calculations for START genes of all nine <italic>Oryza</italic> genomes with respect to recently evolved cultivated variety <italic>Ojap</italic><sub><italic>c</italic></sub> as described in section &#x201C;Materials and Methods&#x201D; are shown in <xref ref-type="fig" rid="F6">Figures 6A&#x2013;D</xref> and <xref ref-type="supplementary-material" rid="FS3">Supplementary Figures 3A&#x2013;C</xref>. With few exceptions, most of the START gene pairs have Ka/Ks values below one, suggesting their being under negative selection. The unique domain categorical group of &#x201C;SM&#x201D; in <italic>Ojap</italic><sub><italic>c</italic></sub> and its orthologous pairs in <italic>Oind</italic><sub><italic>c</italic></sub>, <italic>Ogla</italic><sub><italic>c</italic></sub>, and <italic>Oniv</italic><sub><italic>w</italic></sub> showed a very high positive selection suggesting their being under positive selection. Apart from this, there are few other cases, which also showed Ka/Ks values significantly more than 1, and close to 1, which signifies that these START genes are also undergoing through positive selection. The analysis further confirmed a high rate of synonymous and non-synonymous substitutions for both the PSD type orthologs and single HS homolog (present on Chr 4).</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Evolutionary patterns of duplicated START gene pairs by different modes in <italic>Oryza sativa</italic> var. <italic>japonica.</italic> <bold>(A)</bold> Gene pair distribution among different modes of duplication. <bold>(B)</bold> Ka values of duplicated gene pair. <bold>(C)</bold> Ks value of duplicated gene pairs. <bold>(D)</bold> Ka/Ks values of duplicate genes pairs.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-737194-g006.tif"/>
</fig>
<p>A recent study showed that 99% of genes derived <italic>via</italic> duplication are under negative or purifying selection in rice (<xref ref-type="bibr" rid="B50">Qiao et al., 2019</xref>). In contrast, only 0.5% (WGD), 1.2% (tandem), 1.5 (proximal), 0.2 (transposed), and 1.4 (dispersed) gene pairs showed the positive selection pressure (<xref ref-type="bibr" rid="B50">Qiao et al., 2019</xref>). Estimation of synonymous (Ks) and non-synonymous (Ka) nucleotide substitution rates gave an important insight on evolution of duplicated gene pairs. The higher synonymous mutation rate (Ks) indicated the long evolutionary history of the respective genes in their genomes, thus, highlighting the functional importance for the retention of the gene copies (<xref ref-type="bibr" rid="B53">Ren et al., 2014</xref>; <xref ref-type="bibr" rid="B49">Qiao et al., 2018</xref>). The eight paralogous START gene pairs of <italic>Ojap</italic><sub><italic>c</italic></sub> that were noticed in segmental, transposed and proximal modes of duplications (<xref ref-type="fig" rid="F5">Figure 5</xref>) were further evaluated for the Ka, Ks, and Ka/Ks. As shown in <xref ref-type="fig" rid="F6">Figure 6</xref>, all gene pairs have negative Ka/Ks ratios suggesting low or moderate flexibility for mutational changes. One proximal duplicate pair (LOC_Os06g50560_mS-LOC_Os06g50510_SM) at 0.762 suggests flexibility in the mutational rate of the second copy gene, whereas, for the five segmental START gene duplicates, Ka/Ks varied between 0.166 and 0.063, indicating stringent regulation and highlighting their functional importance (LOC_Os10g33960_HZSM-LOC_Os03g01890_HZSM, LOC_Os 12g41860_HZSM-LOC_Os03g43930_HZSM, LOC_Os08g088 20_HS-LOC_Os04g53540_HS, LOC_Os04g48070_HS-LOC_Os 02g45250_HZS, and LOC_Os04g48070_HS-LOC_Os09g35 760_HS) (detailed analysis provided in <xref ref-type="supplementary-material" rid="TS4">Supplementary Table 4</xref>). As presented in the next section, we also observed similar expression levels among these pairs across anatomical parts but with slight variation between different stages of development.</p>
<p>Overall, Ka/Ks analysis of duplicated START gene pairs in <italic>Ojap</italic><sub><italic>c</italic></sub> suggest that the expanded START gene family is still evolving toward stabilization of function and expanding into new roles or sub-functionalization. The Ka/Ks results also suggest that proximally duplicated STARTs have 99% of identity amongst themselves and have not undergone changes, compared to other modes of duplications, which might be due to the recent incidence of this mode of duplication. Further, segmentally duplicated START gene pairs showed low Ka values and Ka/Ks ratio, alongside of having relatively high Ks values, suggesting evolution under stringent selection pressures over a long time. This is also supported by the phenomenon of the overall number of accumulated mutations over the evolutionary history of an organism (<xref ref-type="bibr" rid="B80">Zhu et al., 2014</xref>). Contrastingly, the transposed duplicates underwent intermediate negative selection pressure, and the segmental duplicates underwent strong negative selection pressures. Similarly, the synonymous substitution rates for transposed duplicates were higher when compared to segmental START duplicates. The transposed pair LOC_Os02g26860_mS-LOC_Os04g02910_mS showed threefold higher Ka and Ks values supporting the phenomenon of evolutionary freeness for development of sub-functionalization or neo-functionalization when compared to segmental duplicate gene pair, i.e., LOC_Os04g48070_HS-LOC_Os09g35760_HS, which showed similar Ka and Ks values with other transposed pair LOC_Os06g10600_HS-LOC_Os07g47130_mS showing slight stringency in mutational frequency of these genes.</p>
<p>The transposed pairs (LOC_Os02g26860_mS-LOC_Os04 g02910_mS and LOC_Os06g10600_HS-LOC_Os07g47130_mS; above 0.365 Ka/Ks ratio) in addition to the proximal duplicated pairs (LOC_Os06g50560_mS-LOC_Os06g50510_SM; 0.762 Ka/Ks ratio) had the highest mean Ka/Ks ratio indicating that they have experienced weaker purifying selection. The segmental gene pair (LOC_Os10g33960_HZSM-LOC_Os03g0 1890_HZSM) had the lowest mean Ka/Ks ratio (0.063) suggesting strong purifying selection and the other four segmental pairs (LOC_Os12g41860_HZSM-LOC_Os03g43930_HZSM, LOC_Os08g08820_HS-LOC_Os04g53540_HS, LOC_Os04g4 8070_HS-LOC_Os02g45250_HZS, and LOC_Os04g48070_HS-LOC_Os09g35760_HS) with intermediary mean Ka/Ks ratio above 0.1, indicating that they had experienced intermediate to stronger purifying selection. Thus, START genes appear to be under purifying selection pressure, further highlighting their functional importance and roles for expansion of START genes among wild and cultivated rices, and we explore this further through gene expression analyses.</p>
</sec>
<sec id="S3.SS9">
<title>Transcriptome Analysis of START Encoding Genes</title>
<p>The function of many START genes especially HD associated START genes have been extensively studied in plants. The class III HD-ZIP family and class IV HD-ZIP family have well-established roles in <italic>Arabidopsis</italic> and involved in various stages of development and gene regulation (<xref ref-type="bibr" rid="B6">Ariel et al., 2007</xref>). In order to explore the potential functions of the 28 START genes found in <italic>O. sativa</italic> var. <italic>japonica</italic>, the tissue and developmental stage-specific expression patterns were investigated in non-stressed condition as described in section &#x201C;Materials and Methods.&#x201D; As can be seen from <xref ref-type="fig" rid="F7">Figures 7A,B</xref> and <xref ref-type="supplementary-material" rid="FS4">Supplementary Figures 4A,B</xref>, the expression heat maps of START genes shows, four HZSM, two HD bZIP STARTs, and two START MEKHLA genes in <italic>O. sativa</italic> var. <italic>japonica</italic> showed significant expression throughout the developmental stages and anatomical parts. Further, five out of nine HD-START genes express constitutively through the various developmental stages, but almost all nine genes showed differential expression across various anatomical parts, suggesting tissue specific roles for this largely amplified sub-group. The eight minimal START genes (LOC_Os02g03230_mS, LOC_Os02g26860_mS, LOC_Os04g02910_mS, LOC_Os06g50560_mS, LOC_Os06g 50724_mS, LOC_Os07g08760_mS, LOC_Os07g47130_mS, and LOC_Os11g14070_mS) of <italic>Ojap</italic><sub><italic>c</italic></sub> showed a wide variation in expression patterns across all stages and tissues, and it is possible to assign them to tissue-specific roles, with only one (LOC_Os07g08760_mS) showing high expression across all anatomical parts. START genes were grouped <italic>via</italic> hierarchical clustering of expression profiles and this is depicted in <xref ref-type="fig" rid="F7">Figures 7A,B</xref>. Comparison of this data with duplication analyses in the earlier section reveals that most of the duplicated START gene pairs showed similar expression pattern across both developmental stages and anatomical parts, except for proximal duplicated START genes (detailed analysis provided in <xref ref-type="supplementary-material" rid="TS4">Supplementary Table 4</xref>). Expression patterns among five segmental gene pairs varied with three pairs in one cluster and two in other clusters, despite showing significant expression throughout all the developmental stages. The five segmental pairs constitute two pairs of HS genes (four genes) and they cluster together in expression as well. Taken together (<xref ref-type="fig" rid="F5">Figures 5</xref>, <xref ref-type="fig" rid="F7">7</xref>) duplicated START genes in segmental and transposed modes showed a unified pattern in gene expression amongst duplicated gene pairs across all the developmental stages as well as in all anatomical parts, which signifies the functional importance of STARTs and the necessity of retaining both the copies of the gene pairs. Contrastingly, the proximal duplicate gene pair showed an uneven expression pattern between the gene pairs, indicating sub-functionalization or neo-functionalization of the duplicated genes, as was observed for the Ka/Ks selection pressures.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p><italic>In silico</italic> expression study of 28 START genes in <italic>Oryza sativa</italic> var. <italic>japonica</italic>. <bold>(A)</bold> Hierarchical gene expression pattern of START genes at different developmental stages. <bold>(B)</bold> Hierarchical gene expression pattern of START genes in various anatomical parts.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-12-737194-g007.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="S4">
<title>Discussion</title>
<p>Genome duplication events play a significant role in environmental adaptation and speciation of organisms. Study on post duplication events has shed light on genome evolution and functional diversification, while studies on loss of alternate copies of duplicated loci have shown species divergence (<xref ref-type="bibr" rid="B43">Mizuta et al., 2010</xref>). Major changes upon post-genome duplication events, are copy gain or loss (that alters dosage), and domain alterations (e.g., gain, loss, or rearrangements) that regulate environmental adaptation (<xref ref-type="bibr" rid="B27">Kassahn et al., 2009</xref>; <xref ref-type="bibr" rid="B75">Yang and Bourne, 2009</xref>; <xref ref-type="bibr" rid="B45">Panchy et al., 2016</xref>; <xref ref-type="bibr" rid="B51">Qiu et al., 2017</xref>). Despite availability of 23 wild and cultivated rice accessions representing 11 genome types (AA, BB, CC, EE, FF, GG, BBCC, CCDD, HHJJ, HHKK, and KKLL), most previous studies have been limited to phylogenetic inferences, that too for gene families involved in transcriptional activation or repression of mainstream physiological processes (<xref ref-type="bibr" rid="B37">Ma and Bennetzen, 2004</xref>; <xref ref-type="bibr" rid="B5">Ammiraju et al., 2008</xref>; <xref ref-type="bibr" rid="B25">Jacquemin et al., 2014</xref>; <xref ref-type="bibr" rid="B78">Zhang et al., 2014</xref>; <xref ref-type="bibr" rid="B79">Zhong et al., 2019</xref>). In the current study, we have attempted to explore the START gene family across seven wild and three cultivated rice varieties to understand evolutionary changes related to copy number variation (CNV), alteration in mutational rates due to selection pressures, and combined these with present day functional divergence based on gene expression and domain conservation. Despite several studies on HD associated START proteins in plants (<xref ref-type="bibr" rid="B59">Schrick et al., 2004</xref>, <xref ref-type="bibr" rid="B58">2014</xref>; <xref ref-type="bibr" rid="B44">Mukherjee and B&#x00FC;rglin, 2006</xref>; <xref ref-type="bibr" rid="B9">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="B46">Pandey et al., 2016</xref>), present study is the first comprehensive comparison of this family between wild and cultivated rices.</p>
<sec id="S4.SS1">
<title>Gene Family Expansion and Copy Number Variation of START Genes in Rice</title>
<p>The START genes are well known to be amplified in plants compared to animals, but their presence in evolutionarily distant kingdoms of bacteria as well as protists, has led to questions about mechanism of amplification during family evolution, and how this amplification may have affected functional diversity of this group in present day plants (<xref ref-type="bibr" rid="B24">Iyer et al., 2001</xref>; <xref ref-type="bibr" rid="B59">Schrick et al., 2004</xref>). Newer versions of genome assemblies have enabled us to update these numbers and we find an increase in the number of START genes in <italic>Ojap</italic><sub><italic>c</italic></sub> (based on MSU Release 7.0).</p>
<p>Gene dosage plays a significant role in the metabolic and phenotype changes, which in turn decides species&#x2019; adaptation to the environmental changes (abiotic stress) and biotic stress in many plant families. CNVs was observed to change from segmental duplicates into both dispersed and proximal START models between the evolutionarily related rice varieties, i.e., <italic>Obra</italic><sub><italic>w</italic></sub> and <italic>Opun</italic><sub><italic>w</italic></sub>; <italic>Ojap</italic><sub><italic>c</italic></sub> and <italic>Oind</italic><sub><italic>c</italic></sub>. However, START gene copies mostly occur as dispersed duplicates, which additionally determines the total number of START genes in different <italic>Oryza</italic> genomes (<xref ref-type="table" rid="T4">Table 4</xref>). Evolution of the rices has seen several genotype changes from FF (<italic>Obra</italic><sub><italic>w</italic></sub>; wild variety) to AA (<italic>Oind</italic><sub><italic>c</italic></sub> and <italic>Ojap</italic><sub><italic>c</italic></sub>; Asian cultivated varieties) in the diploid rices (<xref ref-type="bibr" rid="B77">Yu et al., 2005</xref>). Overall, our results are in concordance with the rice genome evolution from FF to AA genotype which affected the START gene homolog location change, and loss of a copy in some genotypes like HS, PSD, and minimal START (<xref ref-type="table" rid="T1">Tables 1</xref>, <xref ref-type="table" rid="T3">3</xref> and <xref ref-type="supplementary-material" rid="TS3">Supplementary Table 3</xref>).</p>
</sec>
<sec id="S4.SS2">
<title>Sub Genomic Distributions and Syntenic Relationships Among Rice START Genes</title>
<p>We observed a significant change in the positional change of START genes among different chromosomes between the BB and AA genomes, which may correlate well with the increase in the number of chromosomal inversions that were reported. <xref ref-type="bibr" rid="B62">Stein et al. (2018)</xref> observed that AA-genome-specific inversion was seen in <italic>Omer</italic><sub><italic>w</italic></sub> after the split with BB-genome (<italic>Opun</italic><sub><italic>w</italic></sub>) where we have also observed a drop in the START gene numbers (<xref ref-type="bibr" rid="B62">Stein et al., 2018</xref>). Additionally, we have noted a shift in START gene numbers between the segmentally duplicated to dispersed START genes. A drastic change in the overall genome level collinear blocks was observed in the genomes of <italic>Omer</italic><sub><italic>w</italic></sub> and <italic>Oniv</italic><sub><italic>w</italic></sub>, with the former showing loss of chromosomal fragments resulting in shorter genome size, which may be the root cause of fewer START gene numbers. Furthermore, our results showed proximal START duplicates in <italic>Ojap</italic><sub><italic>c</italic></sub> (AA), <italic>Oind</italic><sub><italic>c</italic></sub> (AA), and <italic>Opun</italic><sub><italic>w</italic></sub> (BB), indicating domestication as a cause of individual gene duplications in both <italic>Ojap</italic><sub><italic>c</italic></sub> and <italic>Oind</italic><sub><italic>c</italic></sub>, but the proximal duplicates in <italic>Opun</italic><sub><italic>w</italic></sub> may be ascribed to the long evolutionary history between <italic>Obra</italic><sub><italic>w</italic></sub> and <italic>Opun</italic><sub><italic>w</italic></sub> (approximately 8.24 million years ago) (<xref ref-type="bibr" rid="B62">Stein et al., 2018</xref>). A special kind of START, i.e., SD-type was only seen in the <italic>Obra</italic><sub><italic>w</italic></sub> (earliest evolved), <italic>Ogla</italic><sub><italic>w</italic></sub>, <italic>Oind</italic><sub><italic>c</italic></sub>, and <italic>Ojap</italic><sub><italic>c</italic></sub> but absent in six other <italic>Oryza</italic> species. Several mS START homologs were found to be missing between different <italic>Oryza</italic> species indicating species level chromosomal rearrangements (<xref ref-type="table" rid="T3">Table 3</xref> and <xref ref-type="supplementary-material" rid="FS2">Supplementary Figures 2A&#x2013;J</xref>). Additionally, three proximal duplicates were identified on Chr 6 of <italic>Ojap</italic><sub><italic>c</italic></sub> but these genes belong to different types (two mS and one SM which showed similarity in exon&#x2013;intron patterns for START domain encoding regions). Similar is the case for the single tandem START gene pair that was observed in the <italic>Oind</italic><sub><italic>c</italic></sub> on Chr1 (one HS type and one minimal mS).</p>
<p>Multiple reports support the idea of an initial polyploidization event in rice, followed by stabilization at the diploid level, after several rounds of genome rearrangements and gene loss (<xref ref-type="bibr" rid="B69">Wang et al., 2005</xref>; <xref ref-type="bibr" rid="B77">Yu et al., 2005</xref>; <xref ref-type="bibr" rid="B62">Stein et al., 2018</xref>). These patterns agree well with our observation that 95% of <italic>Ojap</italic><sub><italic>c</italic></sub> START genes are ancient duplicates. We found all possible modes of duplications, including WGD.</p>
</sec>
<sec id="S4.SS3">
<title>Selection Pressure and Evolutionary Fates of Rice START Domains</title>
<p>The START domains were classified into distinct classes based on the presence of additional functional domains and their mutual arrangements/location on the sequence. The presence/absence of additional conserved domains can often provide insights into the divergence of a gene family, or the extent and direction of sub-functionalization among its members. The domain structural classes of START domains across 10 rice genomes provided insights into the distribution of each group among START proteins, as well as in subsequent comparative analyses, such as gene structure, evolutionary conservation, and expression patterns. By comparing these patterns across wild and cultivated rice genomes, we gained further insights into frequency of each domain structural class among closely related species, in terms of species divergence and functional significance of these structural classes. For the duplicates within each domain structural class, we further investigated gene family expansions, which revealed stringency in selection bias and Ka/Ks values below one indicating their functional importance in attaining the species adaptability and environmental robustness (<xref ref-type="bibr" rid="B7">Bokros et al., 2019</xref>). <xref ref-type="bibr" rid="B69">Wang et al. (2005)</xref> reported that 47% of the total genes in <italic>O. sativa</italic> var. <italic>indica</italic> genome were detected in 10 duplicated blocks among the 12 chromosomes, of which we have observed two START gene duplicate pairs among the eight pairs on the largest duplicated block region between Chr 2 and Chr 4, which in turn, was further shown to be a result of large-scale duplication events that occurred c. 70 million years ago, as inferred from phylogeny (<xref ref-type="bibr" rid="B69">Wang et al., 2005</xref>). Expression levels and domain structure changes suggest that change in bZip regions may signify loss in function for the HS class. Very large Ks and Ka values were recorded for the PSD STARTs, indicating long evolutionary history and functional importance. The lower stringency in Ka/Ks ratio for transposed and proximal START gene pairs indicates incomplete sub-functionalization or neo-functionalization states of these genes. Interestingly, the difference between domain structure of wild and cultivated rice was not observed among homeodomain containing STARTs, and this may reflect the importance of regulatory domains during evolution. HD associated STARTs in plants are crucial for development starting from germination to maturation. The higher number and uniformity of HD associated START can also be explained by the previous observations of <xref ref-type="bibr" rid="B16">Freeling (2009)</xref>, which suggests that gene retention after duplication shows a biased trend toward those duplicated genes that play important roles in plant functioning and survival (<xref ref-type="bibr" rid="B16">Freeling, 2009</xref>). Another cause for this uniformity may be localization, which in turn is associated with the conserved synteny pattern during genome duplication.</p>
</sec>
<sec id="S4.SS4">
<title>Transcriptome and Proteome Level Patterns Across Domain Structural Classes</title>
<p>We assessed evolutionary significance of novel START domain structural classes from their expression levels in terms of anatomy and development. PH and START domain containing proteins (PSD class) showed expression in the early seed germination phase as well as many floral and vegetative tissues, with loss-of-function mutant developing resistance to powdery mildew (<xref ref-type="bibr" rid="B64">Tang et al., 2005</xref>). There are very few reports on proteome level changes of post-genome duplications (<xref ref-type="bibr" rid="B29">Kersting et al., 2012</xref>; <xref ref-type="bibr" rid="B13">Finet et al., 2013</xref>), but these early reports, support our data on the formation of novel START classes such as PS and SM, arising from domain gain/loss. Our results also suggest the possibility of two independent truncation events that may have led to the formation of SM subclades of START proteins either from HZSM clade or mS clade. There may be a possible gain/loss in the PSD structural class leading to PS subclade or vice versa. <xref ref-type="bibr" rid="B29">Kersting et al. (2012)</xref> have report on the domain loss in monocots strengthens the idea of possible domain loss mechanism in certain classes of START proteins to form novel structural classes. Additionally, their data show higher domain gain/loss events in <italic>O. sativa</italic> genome than <italic>Brachypodium distachyon</italic>. These reports support the involvement of domain level changes in adaptability of plants to environmental changes. <xref ref-type="bibr" rid="B62">Stein et al. (2018)</xref> have shown a total of nine evolutionarily conserved HD-bZIP containing proteins in Oryza that originated at the Magnoliophyta taxon. We have shown the presence of six to eight HD-bZIP containing START proteins among those nine in various structural forms across all wild and cultivated rices investigated here. Genome divergence played a major role in this variation. Divergence of FF genome (<italic>Obra</italic><sub><italic>w</italic></sub>) to BB genome (<italic>Opun</italic><sub><italic>w</italic></sub>) showed an extra copy of the HZSM due to proximal duplication on Chr 3 in <italic>Opun</italic><sub><italic>w</italic></sub> which in the subsequent evolution to AA genome showed the loss of the extra HZSM gene copy that retained all the original numbers of <italic>Obra</italic><sub><italic>w</italic></sub> except two Oryza AA genomes, i.e., <italic>Ogla</italic><sub><italic>c</italic></sub> and <italic>Ojap</italic><sub><italic>c</italic></sub> but they showed a truncation of the HZ region leaving the SM type (<xref ref-type="table" rid="T1">Table 1</xref>). Although the <italic>Ogla</italic><sub><italic>c</italic></sub> and <italic>Ojap</italic><sub><italic>c</italic></sub> are evolutionarily far when compared to the <italic>Ojap</italic><sub><italic>c</italic></sub> and <italic>Oind</italic><sub><italic>c</italic></sub> our observation of the presence of the SM type homologs in an identical chromosomal location in the <italic>Ogla</italic><sub><italic>c</italic></sub> and <italic>Ojap</italic><sub><italic>c</italic></sub> contradicts the phylogenetic origination of the Oryza genomes. Apart from this, an additional copy of the SM type is seen on Chr 6 as a proximal duplicate in <italic>Ojap</italic><sub><italic>c</italic></sub>. Expression divergence among duplicates that occurred through distinct modes of duplications has been reported earlier for <italic>Oryza</italic> and <italic>Arabidopsis</italic>, i.e., transposed duplicates &#x003E; dispersed duplicates &#x003E; proximal &#x003E; WGD/segmental duplicates = tandem duplications, where the WGD and tandem duplicate pairs are more likely to maintain their original expression pattern (<xref ref-type="bibr" rid="B72">Wang et al., 2011</xref>). We find a very similar expression divergence among the different START duplicate pairs in rice genome. Overall, we find that gene gain and loss events have occurred at both individual genes as well as in collinear gene sets for the START genes among different cultivated <italic>Oryza</italic> genomes, which was evident from absence of homologous gene copies from their respective ancestral genomes.</p>
<p>In summary, we hope that the current comparative genomics analysis in wild and cultivated rice varieties will pave the way for experimental validation of these homologs in <italic>Oryza</italic>, a major food source for the world population. In addition the recent developments in the commercial-scale production of the rice bran oil highlights the importance of the future experimental studies in establishing the roles of START proteins in plants fatty acid metabolic pathway especially in commercial oilseed crops research. These novel domain combinations in addition to their huge gene CNVs in plants highlights their varied functional roles.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="S5">
<title>Conclusion</title>
<p>The START domains are abundant in plants and play a crucial role in plant physiology and development. In this work, we have identified START family proteins in 10 wild and cultivated rice genomes and classified these into distinct structural classes based on functional domains. A detailed phylogenetic analysis was performed to map evolutionary divergence among these structural classes, followed by the superimposition of the data onto wild and cultivated rice varieties, revealing interesting features and patterns of evolution and ancestry of these domains within the 10 species investigated, which further helped us to understand START gene family expansion during domestication of rice. Most importantly, we find gene duplication/ontogeny to recapitulate selection pressures during domestication, revealing the indispensability and crucial roles performed by the START family. Patterns of gene duplication were superimposed on gene expression profiles for the most widely used rice variety across the globe, namely <italic>O. sativa</italic> var. <italic>japonica</italic>, further confirming functional aspects and divergence of this gene family in plant development and tissue specific roles. We hope this work on START gene family in <italic>Oryza</italic> species will pave the way for exploring the functional mechanism and substrate preference of plants START domains.</p>
</sec>
<sec sec-type="data-availability" id="S6">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: the complete genomic sequences, protein sequences, and annotation information of <italic>Oryza brachyantha</italic> (v1.4b), <italic>Oryza punctata</italic> (v1.2), <italic>Oryza meridionalis</italic> (v1.3), <italic>Oryza glumaepatula</italic> (v1.5), <italic>Oryza barthii</italic> (v1), <italic>Oryza nivara</italic> (v1.0), <italic>Oryza rufipogon</italic> (OR_W1943), <italic>Oryza glaberrima</italic> (v1), and <italic>Oryza sativa</italic> var. <italic>indica</italic> (ASM465v1), were downloaded from EnsemblPlants (<ext-link ext-link-type="uri" xlink:href="http://plants.ensembl.org/info/data/ftp/index.html">http://plants.ensembl.org/info/data/ftp/index.html</ext-link>). The similar data for the main cultivated variety, <italic>Oryza sativa</italic> var. <italic>japonica</italic> (MSU Release 7.0) was downloaded from the Phytozome v12 (<ext-link ext-link-type="uri" xlink:href="https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Osativa">https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Osativa</ext-link>). All of the datasets supporting the results of this article are included within the article and its <xref ref-type="supplementary-material" rid="FS1">Supplementary Material</xref>.</p>
</sec>
<sec id="S7">
<title>Author Contributions</title>
<p>GY conceived the work. SKM and RKP performed the research work. All authors performed the data analysis, wrote the manuscript, and approved for final publication.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="S8">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="S15">
<title>Funding</title>
<p>SKM received fellowship from the Department of Biotechnology (DBT) Government of India and National Institute of Plant Genome Research (NIPGR) during his Ph.D. (Fellow no. DBT/2014/NIPGR/265). RKP was funded by the Department of Biotechnology, Government of India (Grant ID BT/PR22334/BID/7/786/2016). The publication charge of this article was covered from NIPGR Core Grant. These funding bodies did not have any role in design of the study and collection, analysis, and interpretation of data and in writing the manuscript.</p>
</sec>
<ack>
<p>We acknowledge the support of National Institute of Plant Genome Research (NIPGR), New Delhi for infrastructure and DBT-eLibrary Consortium (DeLCON) for providing access to e-resources.</p>
</ack>
<sec id="S10" sec-type="supplementary material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2021.737194/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2021.737194/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="FS1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Figure 1</label>
<caption><p>Gene structure analysis of STARTs genes among 10 rice species <bold>(A)</bold> <italic>Oryza sativa</italic> var. <italic>japonica</italic>, <bold>(B)</bold> <italic>Oryza sativa</italic> var. <italic>indica</italic>, <bold>(C)</bold> <italic>Oryza glaberrima</italic>, <bold>(D)</bold> <italic>Oryza rufipogon</italic>, <bold>(E)</bold> <italic>Oryza nivara</italic>, <bold>(F)</bold> <italic>Oryza barthii</italic>, <bold>(G)</bold> <italic>Oryza glumaepatula</italic>, <bold>(H)</bold> <italic>Oryza meridionalis</italic>, <bold>(I)</bold> <italic>Oryza punctata</italic>, and <bold>(J)</bold> <italic>Oryza brachyantha.</italic></p></caption>
</supplementary-material>
<supplementary-material xlink:href="Data_Sheet_2.PDF" id="FS2" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Figure 2</label>
<caption><p>Collinear blocks for the 10 rice genomes <bold>(A)</bold> <italic>Oryza sativa</italic> var. <italic>japonica</italic>, <bold>(B)</bold> <italic>Oryza sativa</italic> var. <italic>indica</italic>, <bold>(C)</bold> <italic>Oryza glaberrima</italic>, <bold>(D)</bold> <italic>Oryza rufipogon</italic>, <bold>(E)</bold> <italic>Oryza nivara</italic>, <bold>(F)</bold> <italic>Oryza barthii</italic>, <bold>(G)</bold> <italic>Oryza glumaepatula</italic>, <bold>(H)</bold> <italic>Oryza meridionalis</italic>, <bold>(I)</bold> <italic>Oryza punctata</italic>, and <bold>(J)</bold> <italic>Oryza brachyantha.</italic></p></caption>
</supplementary-material>
<supplementary-material xlink:href="Data_Sheet_3.PDF" id="FS3" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Figure 3</label>
<caption><p><bold>(A)</bold> Ka, <bold>(B)</bold> Ks, and <bold>(C)</bold> Ka/Ks values for START homologs of different <italic>Oryza</italic> species with respect to <italic>Oryza sativa</italic> var. <italic>japonica.</italic></p></caption>
</supplementary-material>
<supplementary-material xlink:href="Data_Sheet_4.PDF" id="FS4" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Figure 4</label>
<caption><p>Conditional gene expression pattern of START genes for <bold>(A)</bold> different developmental stages <bold>(B)</bold> various anatomical parts.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table_1.XLSX" id="TS1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Table 1</label>
<caption><p>The locus ids of START genes along with sequence analysis information and phylogenetic code.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table_2.XLSX" id="TS2" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Table 2</label>
<caption><p>The detailed gene structure pattern of 10 <italic>Oryza</italic> genome, 5&#x2032;-UTR, 3&#x2032;-UTR, exon and Intron length.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table_3.DOCX" id="TS3" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Table 3</label>
<caption><p>Collinear genes number across 10 <italic>Oryza</italic> genome.</p></caption>
</supplementary-material>
<supplementary-material xlink:href="Table_4.DOCX" id="TS4" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Supplementary Table 4</label>
<caption><p>Ka, Ks, and Ka/Ks analysis among the gene pairs that follow different modes of duplication in <italic>Oryza sativa</italic> var. <italic>japonica</italic> genome.</p></caption>
</supplementary-material>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abe</surname> <given-names>M.</given-names></name> <name><surname>Katsumata</surname> <given-names>H.</given-names></name> <name><surname>Komeda</surname> <given-names>Y.</given-names></name> <name><surname>Takahashi</surname> <given-names>T.</given-names></name></person-group> (<year>2003</year>). <article-title>Regulation of shoot epidermal cell differentiation by a pair of homeodomain proteins in <italic>Arabidopsis</italic>.</article-title> <source><italic>Development</italic></source> <volume>130</volume> <fpage>635</fpage>&#x2013;<lpage>643</lpage>. <pub-id pub-id-type="doi">10.1242/dev.00292</pub-id> <pub-id pub-id-type="pmid">12505995</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Agarwala</surname> <given-names>R.</given-names></name> <name><surname>Barrett</surname> <given-names>T.</given-names></name> <name><surname>Beck</surname> <given-names>J.</given-names></name> <name><surname>Benson</surname> <given-names>D. A.</given-names></name> <name><surname>Bollin</surname> <given-names>C.</given-names></name> <name><surname>Bolton</surname> <given-names>E.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Database resources of the National Center for Biotechnology Information.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>47</volume> <fpage>D23</fpage>&#x2013;<lpage>D28</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkw1071</pub-id> <pub-id pub-id-type="pmid">27899561</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alpy</surname> <given-names>F.</given-names></name> <name><surname>Tomasetto</surname> <given-names>C.</given-names></name></person-group> (<year>2005</year>). <article-title>Give lipids a START: the StAR-related lipid transfer (START) domain in mammals.</article-title> <source><italic>J. Cell Sci.</italic></source> <volume>118</volume> <fpage>2791</fpage>&#x2013;<lpage>2801</lpage>. <pub-id pub-id-type="doi">10.1242/jcs.02485</pub-id> <pub-id pub-id-type="pmid">15976441</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname> <given-names>S. F.</given-names></name> <name><surname>Gish</surname> <given-names>W.</given-names></name> <name><surname>Miller</surname> <given-names>W.</given-names></name> <name><surname>Myers</surname> <given-names>E. W.</given-names></name> <name><surname>Lipman</surname> <given-names>D. J.</given-names></name></person-group> (<year>1990</year>). <article-title>Basic local alignment search tool.</article-title> <source><italic>J. Mol. Biol.</italic></source> <volume>215</volume> <fpage>403</fpage>&#x2013;<lpage>410</lpage>. <pub-id pub-id-type="doi">10.1016/S0022-2836(05)80360-2</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ammiraju</surname> <given-names>J. S. S.</given-names></name> <name><surname>Lu</surname> <given-names>F.</given-names></name> <name><surname>Sanyal</surname> <given-names>A.</given-names></name> <name><surname>Yu</surname> <given-names>Y.</given-names></name> <name><surname>Song</surname> <given-names>X.</given-names></name> <name><surname>Jiang</surname> <given-names>N.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Dynamic evolution of <italic>Oryza</italic> genomes is revealed by comparative genomic analysis of a genus-wide vertical data set.</article-title> <source><italic>Plant Cell</italic></source> <volume>20</volume> <fpage>3191</fpage>&#x2013;<lpage>3209</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.108.063727</pub-id> <pub-id pub-id-type="pmid">19098269</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ariel</surname> <given-names>F. D.</given-names></name> <name><surname>Manavella</surname> <given-names>P. A.</given-names></name> <name><surname>Dezar</surname> <given-names>C. A.</given-names></name> <name><surname>Chan</surname> <given-names>R. L.</given-names></name></person-group> (<year>2007</year>). <article-title>The true story of the HD-Zip family.</article-title> <source><italic>Trends Plant Sci.</italic></source> <volume>12</volume> <fpage>419</fpage>&#x2013;<lpage>426</lpage>. <pub-id pub-id-type="doi">10.1016/j.tplants.2007.08.003</pub-id> <pub-id pub-id-type="pmid">17698401</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bokros</surname> <given-names>N.</given-names></name> <name><surname>Popescu</surname> <given-names>S. C.</given-names></name> <name><surname>Popescu</surname> <given-names>G. V.</given-names></name></person-group> (<year>2019</year>). <article-title>Multispecies genome-wide analysis defines the MAP3K gene family in <italic>Gossypium hirsutum</italic> and reveals conserved family expansions.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>20(Suppl. 2)</volume>:<issue>99</issue>. <pub-id pub-id-type="doi">10.1186/s12859-019-2624-9</pub-id> <pub-id pub-id-type="pmid">30871456</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chatterjee</surname> <given-names>D.</given-names></name></person-group> (<year>1947</year>). <article-title>Botany of the wild and cultivated rices.</article-title> <source><italic>Nature</italic></source> <volume>160</volume> <fpage>234</fpage>&#x2013;<lpage>237</lpage>. <pub-id pub-id-type="doi">10.1038/160234a0</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chew</surname> <given-names>W.</given-names></name> <name><surname>Hrmova</surname> <given-names>M.</given-names></name> <name><surname>Lopato</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>Role of homeodomain leucine zipper (HD-Zip) iv transcription factors in plant development and plant protection from deleterious environmental factors.</article-title> <source><italic>Int. J. Mol. Sci.</italic></source> <volume>14</volume> <fpage>8122</fpage>&#x2013;<lpage>8147</lpage>. <pub-id pub-id-type="doi">10.3390/ijms14048122</pub-id> <pub-id pub-id-type="pmid">23584027</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Di Cristina</surname> <given-names>M.</given-names></name> <name><surname>Sessa</surname> <given-names>G.</given-names></name> <name><surname>Dolan</surname> <given-names>L.</given-names></name> <name><surname>Linstead</surname> <given-names>P.</given-names></name> <name><surname>Baima</surname> <given-names>S.</given-names></name> <name><surname>Ruberti</surname> <given-names>I.</given-names></name><etal/></person-group> (<year>1996</year>). <article-title>The <italic>Arabidopsis</italic> Athb-10 (GLABRA2) is an HD-Zip protein required for regulation of root hair development.</article-title> <source><italic>Plant J.</italic></source> <volume>10</volume> <fpage>393</fpage>&#x2013;<lpage>402</lpage>. <pub-id pub-id-type="doi">10.1046/j.1365-313X.1996.10030393.x</pub-id> <pub-id pub-id-type="pmid">8811855</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eddy</surname> <given-names>S. R.</given-names></name></person-group> (<year>1998</year>). <article-title>Profile hidden Markov models.</article-title> <source><italic>Bioinformatics</italic></source> <volume>14</volume> <fpage>755</fpage>&#x2013;<lpage>763</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/14.9.755</pub-id> <pub-id pub-id-type="pmid">9918945</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Edler</surname> <given-names>D.</given-names></name> <name><surname>Klein</surname> <given-names>J.</given-names></name> <name><surname>Antonelli</surname> <given-names>A.</given-names></name> <name><surname>Silvestro</surname> <given-names>D.</given-names></name></person-group> (<year>2021</year>). <article-title>raxmlGUI 2.0: a graphical interface and toolkit for phylogenetic analyses using RAxML.</article-title> <source><italic>Methods Ecol. Evol.</italic></source> <volume>12</volume> <fpage>373</fpage>&#x2013;<lpage>377</lpage>. <pub-id pub-id-type="doi">10.1111/2041-210X.13512</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finet</surname> <given-names>C.</given-names></name> <name><surname>Berne-Dedieu</surname> <given-names>A.</given-names></name> <name><surname>Scutt</surname> <given-names>C. P.</given-names></name> <name><surname>Marl&#x00E9;taz</surname> <given-names>F.</given-names></name></person-group> (<year>2013</year>). <article-title>Evolution of the ARF gene family in land plants: old domains, new tricks.</article-title> <source><italic>Mol. Biol. Evol.</italic></source> <volume>30</volume> <fpage>45</fpage>&#x2013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1093/molbev/mss220</pub-id> <pub-id pub-id-type="pmid">22977118</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finn</surname> <given-names>R. D.</given-names></name> <name><surname>Bateman</surname> <given-names>A.</given-names></name> <name><surname>Clements</surname> <given-names>J.</given-names></name> <name><surname>Coggill</surname> <given-names>P.</given-names></name> <name><surname>Eberhardt</surname> <given-names>R. Y.</given-names></name> <name><surname>Eddy</surname> <given-names>S. R.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Pfam: the protein families database.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>42</volume> <fpage>D222</fpage>&#x2013;<lpage>D230</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkt1223</pub-id> <pub-id pub-id-type="pmid">24288371</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finn</surname> <given-names>R. D.</given-names></name> <name><surname>Clements</surname> <given-names>J.</given-names></name> <name><surname>Eddy</surname> <given-names>S. R.</given-names></name></person-group> (<year>2011</year>). <article-title>HMMER web server: interactive sequence similarity searching.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>39</volume> <fpage>W29</fpage>&#x2013;<lpage>W37</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkr367</pub-id> <pub-id pub-id-type="pmid">21593126</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freeling</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition.</article-title> <source><italic>Annu. Rev. Plant Biol.</italic></source> <volume>60</volume> <fpage>433</fpage>&#x2013;<lpage>453</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.arplant.043008.092122</pub-id> <pub-id pub-id-type="pmid">19575588</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodstein</surname> <given-names>D. M.</given-names></name> <name><surname>Shu</surname> <given-names>S.</given-names></name> <name><surname>Howson</surname> <given-names>R.</given-names></name> <name><surname>Neupane</surname> <given-names>R.</given-names></name> <name><surname>Hayes</surname> <given-names>R. D.</given-names></name> <name><surname>Fazo</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>Phytozome: a comparative platform for green plant genomics.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>40</volume> <fpage>D1178</fpage>&#x2013;<lpage>D1186</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkr944</pub-id> <pub-id pub-id-type="pmid">22110026</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haft</surname> <given-names>D. H.</given-names></name> <name><surname>Selengut</surname> <given-names>J. D.</given-names></name> <name><surname>White</surname> <given-names>O.</given-names></name></person-group> (<year>2003</year>). <article-title>The TIGRFAMs database of protein families.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>31</volume> <fpage>371</fpage>&#x2013;<lpage>373</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkg128</pub-id> <pub-id pub-id-type="pmid">12520025</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hruz</surname> <given-names>T.</given-names></name> <name><surname>Laule</surname> <given-names>O.</given-names></name> <name><surname>Szabo</surname> <given-names>G.</given-names></name> <name><surname>Wessendorp</surname> <given-names>F.</given-names></name> <name><surname>Bleuler</surname> <given-names>S.</given-names></name> <name><surname>Oertle</surname> <given-names>L.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Genevestigator V3: a reference expression database for the meta-analysis of transcriptomes.</article-title> <source><italic>Adv. Bioinformatics</italic></source> <volume>2008</volume>:<issue>420747</issue>. <pub-id pub-id-type="doi">10.1155/2008/420747</pub-id> <pub-id pub-id-type="pmid">19956698</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>B.</given-names></name> <name><surname>Jin</surname> <given-names>J.</given-names></name> <name><surname>Guo</surname> <given-names>A. Y.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Luo</surname> <given-names>J.</given-names></name> <name><surname>Gao</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). <article-title>GSDS 2.0: an upgraded gene feature visualization server.</article-title> <source><italic>Bioinformatics</italic></source> <volume>31</volume> <fpage>1296</fpage>&#x2013;<lpage>1297</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu817</pub-id> <pub-id pub-id-type="pmid">25504850</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>Y.</given-names></name> <name><surname>Niu</surname> <given-names>B.</given-names></name> <name><surname>Gao</surname> <given-names>Y.</given-names></name> <name><surname>Fu</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name></person-group> (<year>2010</year>). <article-title>CD-HIT Suite: a web server for clustering and comparing biological sequences.</article-title> <source><italic>Bioinformatics</italic></source> <volume>26</volume> <fpage>680</fpage>&#x2013;<lpage>682</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq003</pub-id> <pub-id pub-id-type="pmid">20053844</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ingley</surname> <given-names>E.</given-names></name> <name><surname>Hemmings</surname> <given-names>B. A.</given-names></name></person-group> (<year>1994</year>). <article-title>Pleckstrin homology (PH) domains in signal transducton.</article-title> <source><italic>J. Cell. Biochem.</italic></source> <volume>56</volume> <fpage>436</fpage>&#x2013;<lpage>443</lpage>. <pub-id pub-id-type="doi">10.1002/jcb.240560403</pub-id> <pub-id pub-id-type="pmid">7890802</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ito</surname> <given-names>M.</given-names></name> <name><surname>Sentoku</surname> <given-names>N.</given-names></name> <name><surname>Nishimura</surname> <given-names>A.</given-names></name> <name><surname>Hong</surname> <given-names>S. K.</given-names></name> <name><surname>Sato</surname> <given-names>Y.</given-names></name> <name><surname>Matsuoka</surname> <given-names>M.</given-names></name></person-group> (<year>2002</year>). <article-title>Position dependent expression of gl2-type homeobox gene, roc1: significance for protoderm differentiation and radial pattern formation in early rice embryogenesis.</article-title> <source><italic>Plant J.</italic></source> <volume>29</volume> <fpage>497</fpage>&#x2013;<lpage>507</lpage>. <pub-id pub-id-type="doi">10.1046/j.1365-313x.2002.01234.x</pub-id> <pub-id pub-id-type="pmid">11846882</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iyer</surname> <given-names>L. M.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>Adaptations of the helix-grip fold for ligand binding and catalysis in the START domain superfamily.</article-title> <source><italic>Proteins Struct. Funct. Genet.</italic></source> <volume>43</volume> <fpage>134</fpage>&#x2013;<lpage>144</lpage>. <pub-id pub-id-type="doi">10.1002/1097-0134(20010501)43:2&#x003C;134::AID-PROT1025&#x003C;3.0.CO;2-I</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jacquemin</surname> <given-names>J.</given-names></name> <name><surname>Ammiraju</surname> <given-names>J. S. S.</given-names></name> <name><surname>Haberer</surname> <given-names>G.</given-names></name> <name><surname>Billheimer</surname> <given-names>D. D.</given-names></name> <name><surname>Yu</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>L. C.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Fifteen million years of evolution in the <italic>Oryza</italic> genus shows extensive gene family expansion.</article-title> <source><italic>Mol. Plant</italic></source> <volume>7</volume> <fpage>642</fpage>&#x2013;<lpage>656</lpage>. <pub-id pub-id-type="doi">10.1093/mp/sst149</pub-id> <pub-id pub-id-type="pmid">24214894</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>P.</given-names></name> <name><surname>Binns</surname> <given-names>D.</given-names></name> <name><surname>Chang</surname> <given-names>H. Y.</given-names></name> <name><surname>Fraser</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>McAnulla</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>InterProScan 5: genome-scale protein function classification.</article-title> <source><italic>Bioinformatics</italic></source> <volume>30</volume> <fpage>1236</fpage>&#x2013;<lpage>1240</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu031</pub-id> <pub-id pub-id-type="pmid">24451626</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kassahn</surname> <given-names>K. S.</given-names></name> <name><surname>Dang</surname> <given-names>V. T.</given-names></name> <name><surname>Wilkins</surname> <given-names>S. J.</given-names></name> <name><surname>Perkins</surname> <given-names>A. C.</given-names></name> <name><surname>Ragan</surname> <given-names>M. A.</given-names></name></person-group> (<year>2009</year>). <article-title>Evolution of gene function and regulatory control after whole-genome duplication: comparative analyses in vertebrates.</article-title> <source><italic>Genome Res.</italic></source> <volume>19</volume> <fpage>1404</fpage>&#x2013;<lpage>1418</lpage>. <pub-id pub-id-type="doi">10.1101/gr.086827.108</pub-id> <pub-id pub-id-type="pmid">19439512</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kersey</surname> <given-names>P. J.</given-names></name> <name><surname>Allen</surname> <given-names>J. E.</given-names></name> <name><surname>Allot</surname> <given-names>A.</given-names></name> <name><surname>Barba</surname> <given-names>M.</given-names></name> <name><surname>Boddu</surname> <given-names>S.</given-names></name> <name><surname>Bolt</surname> <given-names>B. J.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Ensembl genomes 2018: an integrated omics infrastructure for non-vertebrate species.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>46</volume> <fpage>D802</fpage>&#x2013;<lpage>D808</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkx1011</pub-id> <pub-id pub-id-type="pmid">29092050</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kersting</surname> <given-names>A. R.</given-names></name> <name><surname>Bornberg-Bauer</surname> <given-names>E.</given-names></name> <name><surname>Moore</surname> <given-names>A. D.</given-names></name> <name><surname>Grath</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>). <article-title>Dynamics and adaptive benefits of protein domain emergence and arrangements during plant genome evolution.</article-title> <source><italic>Genome Biol. Evol.</italic></source> <volume>4</volume> <fpage>316</fpage>&#x2013;<lpage>329</lpage>. <pub-id pub-id-type="doi">10.1093/gbe/evs004</pub-id> <pub-id pub-id-type="pmid">22250127</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koonin</surname> <given-names>E. V.</given-names></name> <name><surname>Rogozin</surname> <given-names>I. B.</given-names></name></person-group> (<year>2003</year>). <article-title>Getting positive about selection.</article-title> <source><italic>Genome Biol.</italic></source> <volume>4</volume>:<issue>331</issue>. <pub-id pub-id-type="doi">10.1186/gb-2003-4-8-331</pub-id> <pub-id pub-id-type="pmid">12914654</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krogh</surname> <given-names>A.</given-names></name> <name><surname>Larsson</surname> <given-names>B.</given-names></name> <name><surname>Von Heijne</surname> <given-names>G.</given-names></name> <name><surname>Sonnhammer</surname> <given-names>E. L. L.</given-names></name></person-group> (<year>2001</year>). <article-title>Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.</article-title> <source><italic>J. Mol. Biol.</italic></source> <volume>305</volume> <fpage>567</fpage>&#x2013;<lpage>580</lpage>. <pub-id pub-id-type="doi">10.1006/jmbi.2000.4315</pub-id> <pub-id pub-id-type="pmid">11152613</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krzywinski</surname> <given-names>M.</given-names></name> <name><surname>Schein</surname> <given-names>J.</given-names></name> <name><surname>Birol</surname> <given-names>I.</given-names></name> <name><surname>Connors</surname> <given-names>J.</given-names></name> <name><surname>Gascoyne</surname> <given-names>R.</given-names></name> <name><surname>Horsman</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2009</year>). <article-title>Circos: an information aesthetic for comparative genomics.</article-title> <source><italic>Genome Res.</italic></source> <volume>19</volume> <fpage>1639</fpage>&#x2013;<lpage>1645</lpage>. <pub-id pub-id-type="doi">10.1101/gr.092759.109</pub-id> <pub-id pub-id-type="pmid">19541911</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Letunic</surname> <given-names>I.</given-names></name> <name><surname>Doerks</surname> <given-names>T.</given-names></name> <name><surname>Bork</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <article-title>SMART: recent updates, new developments and status in 2015.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>43</volume> <fpage>D257</fpage>&#x2013;<lpage>D260</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gku949</pub-id> <pub-id pub-id-type="pmid">25300481</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J. Y.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Zeigler</surname> <given-names>R. S.</given-names></name></person-group> (<year>2014</year>). <article-title>The 3,000 rice genomes project: new opportunities and challenges for future rice research.</article-title> <source><italic>Gigascience</italic></source> <volume>3</volume>:<issue>8</issue>. <pub-id pub-id-type="doi">10.1186/2047-217X-3-8</pub-id> <pub-id pub-id-type="pmid">24872878</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>W.</given-names></name> <name><surname>Xie</surname> <given-names>Y.</given-names></name> <name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name> <name><surname>Nie</surname> <given-names>P.</given-names></name> <name><surname>Zuo</surname> <given-names>Z.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>IBS: an illustrator for the presentation and visualization of biological sequences.</article-title> <source><italic>Bioinformatics</italic></source> <volume>31</volume> <fpage>3359</fpage>&#x2013;<lpage>3361</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv362</pub-id> <pub-id pub-id-type="pmid">26069263</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>P.</given-names></name> <name><surname>Porat</surname> <given-names>R.</given-names></name> <name><surname>Nadeau</surname> <given-names>J. A.</given-names></name> <name><surname>O&#x2019;Neill</surname> <given-names>S. D.</given-names></name></person-group> (<year>1996</year>). <article-title>Identification of a meristem L1 layer-specific gene in <italic>Arabidopsis</italic> that is expressed during embryonic pattern formation and defines a new class of homeobox genes.</article-title> <source><italic>Plant Cell</italic></source> <volume>8</volume> <fpage>2155</fpage>&#x2013;<lpage>2168</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.8.12.2155</pub-id> <pub-id pub-id-type="pmid">8989876</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Bennetzen</surname> <given-names>J. L.</given-names></name></person-group> (<year>2004</year>). <article-title>Rapid recent growth and divergence of rice nuclear genomes.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>101</volume> <fpage>12404</fpage>&#x2013;<lpage>12410</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0403715101</pub-id> <pub-id pub-id-type="pmid">15240870</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Madeira</surname> <given-names>F.</given-names></name> <name><surname>Park</surname> <given-names>Y. M.</given-names></name> <name><surname>Lee</surname> <given-names>J.</given-names></name> <name><surname>Buso</surname> <given-names>N.</given-names></name> <name><surname>Gur</surname> <given-names>T.</given-names></name> <name><surname>Madhusoodanan</surname> <given-names>N.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>The EMBL-EBI search and sequence analysis tools APIs in 2019.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>47</volume> <fpage>W636</fpage>&#x2013;<lpage>W641</lpage>. <pub-id pub-id-type="doi">10.1093/NAR/GKZ268</pub-id> <pub-id pub-id-type="pmid">30976793</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maglott</surname> <given-names>D.</given-names></name> <name><surname>Ostell</surname> <given-names>J.</given-names></name> <name><surname>Pruitt</surname> <given-names>K. D.</given-names></name> <name><surname>Tatusova</surname> <given-names>T.</given-names></name></person-group> (<year>2011</year>). <article-title>Entrez gene: gene-centered information at NCBI.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>33</volume> <fpage>D54</fpage>&#x2013;<lpage>D58</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkq1237</pub-id> <pub-id pub-id-type="pmid">21115458</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marchler-Bauer</surname> <given-names>A.</given-names></name> <name><surname>Derbyshire</surname> <given-names>M. K.</given-names></name> <name><surname>Gonzales</surname> <given-names>N. R.</given-names></name> <name><surname>Lu</surname> <given-names>S.</given-names></name> <name><surname>Chitsaz</surname> <given-names>F.</given-names></name> <name><surname>Geer</surname> <given-names>L. Y.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>CDD: NCBI&#x2019;s conserved domain database.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>43</volume> <fpage>D222</fpage>&#x2013;<lpage>D226</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gku1221</pub-id> <pub-id pub-id-type="pmid">25414356</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marchler-Bauer</surname> <given-names>A.</given-names></name> <name><surname>Lu</surname> <given-names>S.</given-names></name> <name><surname>Anderson</surname> <given-names>J. B.</given-names></name> <name><surname>Chitsaz</surname> <given-names>F.</given-names></name> <name><surname>Derbyshire</surname> <given-names>M. K.</given-names></name> <name><surname>DeWeese-Scott</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>CDD: a conserved domain database for the functional annotation of proteins.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>39</volume> <fpage>D225</fpage>&#x2013;<lpage>D229</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkq1189</pub-id> <pub-id pub-id-type="pmid">21109532</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mayer</surname> <given-names>B. J.</given-names></name> <name><surname>Ren</surname> <given-names>R.</given-names></name> <name><surname>Clark</surname> <given-names>K. L.</given-names></name> <name><surname>Baltimore</surname> <given-names>D.</given-names></name></person-group> (<year>1993</year>). <article-title>A putative modular domain present in diverse signaling proteins.</article-title> <source><italic>Cell</italic></source> <volume>73</volume> <fpage>629</fpage>&#x2013;<lpage>630</lpage>. <pub-id pub-id-type="doi">10.1016/0092-8674(93)90244-K</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mizuta</surname> <given-names>Y.</given-names></name> <name><surname>Harushima</surname> <given-names>Y.</given-names></name> <name><surname>Kurata</surname> <given-names>N.</given-names></name></person-group> (<year>2010</year>). <article-title>Rice pollen hybrid incompatibility caused by reciprocal gene loss of duplicated genes.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>107</volume> <fpage>20417</fpage>&#x2013;<lpage>20422</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1003124107</pub-id> <pub-id pub-id-type="pmid">21048083</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mukherjee</surname> <given-names>K.</given-names></name> <name><surname>B&#x00FC;rglin</surname> <given-names>T. R.</given-names></name></person-group> (<year>2006</year>). <article-title>MEKHLA, a novel domain with similarity to PAS domains, is fused to plant homeodomain-leucine zipper III proteins.</article-title> <source><italic>Plant Physiol.</italic></source> <volume>140</volume> <fpage>1142</fpage>&#x2013;<lpage>1150</lpage>. <pub-id pub-id-type="doi">10.1104/pp.105.073833</pub-id> <pub-id pub-id-type="pmid">16607028</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Panchy</surname> <given-names>N.</given-names></name> <name><surname>Lehti-Shiu</surname> <given-names>M.</given-names></name> <name><surname>Shiu</surname> <given-names>S. H.</given-names></name></person-group> (<year>2016</year>). <article-title>Evolution of gene duplication in plants.</article-title> <source><italic>Plant Physiol.</italic></source> <volume>171</volume> <fpage>2294</fpage>&#x2013;<lpage>2316</lpage>. <pub-id pub-id-type="doi">10.1104/pp.16.00523</pub-id> <pub-id pub-id-type="pmid">27288366</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pandey</surname> <given-names>A.</given-names></name> <name><surname>Misra</surname> <given-names>P.</given-names></name> <name><surname>Alok</surname> <given-names>A.</given-names></name> <name><surname>Kaur</surname> <given-names>N.</given-names></name> <name><surname>Sharma</surname> <given-names>S.</given-names></name> <name><surname>Lakhwani</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Genome-wide identification and expression analysis of homeodomain leucine zipper subfamily IV (HDZ IV) gene family from <italic>Musa accuminata</italic>.</article-title> <source><italic>Front. Plant Sci.</italic></source> <volume>7</volume>:<issue>20</issue>. <pub-id pub-id-type="doi">10.3389/fpls.2016.00020</pub-id> <pub-id pub-id-type="pmid">26870050</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ponting</surname> <given-names>C. P.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>1999</year>). <article-title>START: a lipid-binding domain in StAR, HD-ZIP and signalling proteins.</article-title> <source><italic>Trends Biochem. Sci.</italic></source> <volume>24</volume> <fpage>130</fpage>&#x2013;<lpage>132</lpage>. <pub-id pub-id-type="doi">10.1016/S0968-0004(99)01362-6</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prigge</surname> <given-names>M. J.</given-names></name> <name><surname>Otsuga</surname> <given-names>D.</given-names></name> <name><surname>Alonso</surname> <given-names>J. M.</given-names></name> <name><surname>Ecker</surname> <given-names>J. R.</given-names></name> <name><surname>Drews</surname> <given-names>G. N.</given-names></name> <name><surname>Clark</surname> <given-names>S. E.</given-names></name></person-group> (<year>2005</year>). <article-title>Class III homeodomain-leucine zipper gene family members have overlapping, antagonistic, and distinct roles in <italic>Arabidopsis</italic> development.</article-title> <source><italic>Plant Cell</italic></source> <volume>17</volume> <fpage>61</fpage>&#x2013;<lpage>76</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.104.026161.1</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qiao</surname> <given-names>X.</given-names></name> <name><surname>Yin</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Different Modes of Gene Duplication Show Divergent Evolutionary Patterns and Contribute Differently to the Expansion of Gene Families Involved in Important Fruit Traits in Pear (Pyrus bretschneideri)</article-title>. <source><italic>Front. Plant Sci.</italic></source> <volume>9</volume>, <fpage>161</fpage>. <pub-id pub-id-type="doi">10.3389/FPLS.2018.00161</pub-id> <pub-id pub-id-type="pmid">29487610</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qiao</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Yin</surname> <given-names>H.</given-names></name> <name><surname>Qi</surname> <given-names>K.</given-names></name> <name><surname>Li</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants</article-title>. <source><italic>Genome Biol.</italic></source> <volume>20</volume>, <fpage>38</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-019-1650-2</pub-id> <pub-id pub-id-type="pmid">30791939</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qiu</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>S. L.</given-names></name> <name><surname>Adams</surname> <given-names>K. L.</given-names></name></person-group> (<year>2017</year>). <article-title>Concerted divergence after gene duplication in polycomb repressive complexes.</article-title> <source><italic>Plant Physiol.</italic></source> <volume>174</volume> <fpage>1192</fpage>&#x2013;<lpage>1204</lpage>. <pub-id pub-id-type="doi">10.1104/pp.16.01983</pub-id> <pub-id pub-id-type="pmid">28455403</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rambaut</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <source><italic>FigTree v1.4.2</italic>, <italic>A Graphical Viewer of Phylogenetic Trees.</italic></source> Available online at: &#x003C;<ext-link ext-link-type="uri" xlink:href="http://tree.bio.ed.ac.uk/software/figtree/">http//tree.bio.ed.ac.uk/software/figtree/</ext-link>&#x003E; <comment>(accessed July 11, 2016)</comment>.</citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ren</surname> <given-names>L.-L.</given-names></name> <name><surname>Liu</surname> <given-names>Y.-J.</given-names></name> <name><surname>Liu</surname> <given-names>H.-J.</given-names></name> <name><surname>Qian</surname> <given-names>T.-T.</given-names></name> <name><surname>Qi</surname> <given-names>L.-W.</given-names></name> <name><surname>Wang</surname> <given-names>X.-R.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Subcellular Relocalization and Positive Selection Play Key Roles in the Retention of Duplicate Genes of Populus Class III Peroxidase Family</article-title>. <source><italic>Plant Cell</italic></source> <volume>26</volume>, <fpage>2404</fpage>&#x2013;<lpage>2419</lpage>. <pub-id pub-id-type="doi">10.1105/TPC.114.124750</pub-id> <pub-id pub-id-type="pmid">24934172</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rerie</surname> <given-names>W. G.</given-names></name> <name><surname>Feldmann</surname> <given-names>K. A.</given-names></name> <name><surname>Marks</surname> <given-names>M. D.</given-names></name></person-group> (<year>1994</year>). <article-title>The GLABRA2 gene encodes a homeo domain protein required for normal trichome development in <italic>Arabidopsis</italic>.</article-title> <source><italic>Genes Dev.</italic></source> <volume>8</volume> <fpage>1388</fpage>&#x2013;<lpage>1399</lpage>. <pub-id pub-id-type="doi">10.1101/gad.8.12.1388</pub-id> <pub-id pub-id-type="pmid">7926739</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Riechmann</surname> <given-names>J. L.</given-names></name></person-group> (<year>2002</year>). <article-title>Transcriptional regulation: a genomic overview.</article-title> <source><italic>Arabidopsis Book</italic></source> <volume>1</volume>:<issue>e0085</issue>. <pub-id pub-id-type="doi">10.1199/tab.0085</pub-id> <pub-id pub-id-type="pmid">22303220</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saeed</surname> <given-names>A. I.</given-names></name> <name><surname>Sharov</surname> <given-names>V.</given-names></name> <name><surname>White</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Liang</surname> <given-names>W.</given-names></name> <name><surname>Bhagabati</surname> <given-names>N.</given-names></name><etal/></person-group> (<year>2003</year>). <article-title>TM4: a free, open-source system for microarray data management and analysis.</article-title> <source><italic>Biotechniques</italic></source> <volume>34</volume> <fpage>374</fpage>&#x2013;<lpage>378</lpage>. <pub-id pub-id-type="doi">10.2144/03342mt01</pub-id> <pub-id pub-id-type="pmid">12613259</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Satheesh</surname> <given-names>V.</given-names></name> <name><surname>Chidambaranathan</surname> <given-names>P.</given-names></name> <name><surname>Jagannadham</surname> <given-names>P. T.</given-names></name> <name><surname>Kumar</surname> <given-names>V.</given-names></name> <name><surname>Jain</surname> <given-names>P. K.</given-names></name> <name><surname>Chinnusamy</surname> <given-names>V.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Transmembrane START domain proteins: in silico identification, characterization and expression analysis under stress conditions in chickpea (<italic>Cicer arietinum</italic> L.).</article-title> <source><italic>Plant Signal. Behav.</italic></source> <volume>11</volume>:<issue>e992698</issue>. <pub-id pub-id-type="doi">10.4161/15592324.2014.992698</pub-id> <pub-id pub-id-type="pmid">26445326</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schrick</surname> <given-names>K.</given-names></name> <name><surname>Bruno</surname> <given-names>M.</given-names></name> <name><surname>Khosla</surname> <given-names>A.</given-names></name> <name><surname>Cox</surname> <given-names>P. N.</given-names></name> <name><surname>Marlatt</surname> <given-names>S. A.</given-names></name> <name><surname>Roque</surname> <given-names>R. A.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Shared functions of plant and mammalian StAR-related lipid transfer (START) domains in modulating transcription factor activity.</article-title> <source><italic>BMC Biol.</italic></source> <volume>12</volume>:<issue>70</issue>. <pub-id pub-id-type="doi">10.1186/s12915-014-0070-8</pub-id> <pub-id pub-id-type="pmid">25159688</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schrick</surname> <given-names>K.</given-names></name> <name><surname>Nguyen</surname> <given-names>D.</given-names></name> <name><surname>Karlowski</surname> <given-names>W. M.</given-names></name> <name><surname>Mayer</surname> <given-names>K. F. X. X.</given-names></name></person-group> (<year>2004</year>). <article-title>START lipid/sterol-binding domains are amplified in plants and are predominantly associated with homeodomain transcription factors.</article-title> <source><italic>Genome Biol.</italic></source> <volume>5</volume>:<issue>R41</issue>. <pub-id pub-id-type="doi">10.1186/gb-2004-5-6-r41</pub-id> <pub-id pub-id-type="pmid">15186492</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soccio</surname> <given-names>R. E.</given-names></name> <name><surname>Breslow</surname> <given-names>J. L.</given-names></name></person-group> (<year>2003</year>). <article-title>StAR-related lipid transfer (START) proteins: mediators of intracellular lipid metabolism.</article-title> <source><italic>J. Biol. Chem.</italic></source> <volume>278</volume> <fpage>22183</fpage>&#x2013;<lpage>22186</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.R300003200</pub-id> <pub-id pub-id-type="pmid">12724317</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stamatakis</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.</article-title> <source><italic>Bioinformatics</italic></source> <volume>30</volume> <fpage>1312</fpage>&#x2013;<lpage>1313</lpage>. <pub-id pub-id-type="doi">10.1093/BIOINFORMATICS/BTU033</pub-id> <pub-id pub-id-type="pmid">24451623</pub-id></citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stein</surname> <given-names>J. C.</given-names></name> <name><surname>Yu</surname> <given-names>Y.</given-names></name> <name><surname>Copetti</surname> <given-names>D.</given-names></name> <name><surname>Zwickl</surname> <given-names>D. J.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus <italic>Oryza</italic>.</article-title> <source><italic>Nat. Genet.</italic></source> <volume>50</volume> <fpage>285</fpage>&#x2013;<lpage>296</lpage>. <pub-id pub-id-type="doi">10.1038/s41588-018-0040-0</pub-id> <pub-id pub-id-type="pmid">29358651</pub-id></citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stocco</surname> <given-names>D. M.</given-names></name></person-group> (<year>2001</year>). <article-title>StAR protein and the regulation of steroid hormone biosynthesis.</article-title> <source><italic>Annu. Rev. Physiol.</italic></source> <volume>63</volume> <fpage>193</fpage>&#x2013;<lpage>213</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.physiol.63.1.193</pub-id> <pub-id pub-id-type="pmid">11181954</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>D.</given-names></name> <name><surname>Ade</surname> <given-names>J.</given-names></name> <name><surname>Frye</surname> <given-names>C. A.</given-names></name> <name><surname>Innes</surname> <given-names>R. W.</given-names></name></person-group> (<year>2005</year>). <article-title>Regulation of plant defense responses in <italic>Arabidopsis</italic> by EDR2, a PH and START domain-containing protein.</article-title> <source><italic>Plant J.</italic></source> <volume>44</volume> <fpage>245</fpage>&#x2013;<lpage>257</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-313X.2005.02523.x</pub-id> <pub-id pub-id-type="pmid">16212604</pub-id></citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tatusov</surname> <given-names>R. L.</given-names></name> <name><surname>Fedorova</surname> <given-names>N. D.</given-names></name> <name><surname>Jackson</surname> <given-names>J. D.</given-names></name> <name><surname>Jacobs</surname> <given-names>A. R.</given-names></name> <name><surname>Kiryutin</surname> <given-names>B.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name><etal/></person-group> (<year>2003</year>). <article-title>The COG database: an updated vesion includes eukaryotes.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>4</volume>:<issue>41</issue>. <pub-id pub-id-type="doi">10.1186/1471-2105-4-41</pub-id> <pub-id pub-id-type="pmid">12969510</pub-id></citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsujishita</surname> <given-names>Y.</given-names></name> <name><surname>Hurley</surname> <given-names>J. H.</given-names></name></person-group> (<year>2000</year>). <article-title>Structure and lipid transport mechanism of a StAr-related domain.</article-title> <source><italic>Nat. Struct. Biol.</italic></source> <volume>7</volume> <fpage>408</fpage>&#x2013;<lpage>414</lpage>. <pub-id pub-id-type="doi">10.1038/75192</pub-id> <pub-id pub-id-type="pmid">10802740</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Venkata</surname> <given-names>B. P.</given-names></name> <name><surname>Schirck</surname> <given-names>K.</given-names></name></person-group> (<year>2006</year>). &#x201C;<article-title>START domains in lipid/sterol transfer and signaling in plants</article-title>,&#x201D; in <source><italic>Proceedings of the 17th International Symposium on Plant Lipids</italic></source>, (<publisher-loc>East Lansing, MI</publisher-loc>: <publisher-name>Michigan State University Press</publisher-name>).</citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Yu</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies.</article-title> <source><italic>Genomics Proteomics Bioinformatics</italic></source> <volume>8</volume> <fpage>77</fpage>&#x2013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1016/S1672-0229(10)60008-3</pub-id></citation></ref>
<ref id="B69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Shi</surname> <given-names>X.</given-names></name> <name><surname>Hao</surname> <given-names>B.</given-names></name> <name><surname>Ge</surname> <given-names>S.</given-names></name> <name><surname>Luo</surname> <given-names>J.</given-names></name></person-group> (<year>2005</year>). <article-title>Duplication and DNA segmental loss in the rice genome: implications for diploidization.</article-title> <source><italic>New Phytol.</italic></source> <volume>165</volume> <fpage>937</fpage>&#x2013;<lpage>946</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-8137.2004.01293.x</pub-id> <pub-id pub-id-type="pmid">15720704</pub-id></citation></ref>
<ref id="B70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Paterson</surname> <given-names>A. H.</given-names></name></person-group> (<year>2013</year>). <article-title>MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans.</article-title> <source><italic>Bioinformatics</italic></source> <volume>29</volume> <fpage>1458</fpage>&#x2013;<lpage>1460</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btt150</pub-id> <pub-id pub-id-type="pmid">23539305</pub-id></citation></ref>
<ref id="B71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Tang</surname> <given-names>H.</given-names></name> <name><surname>Debarry</surname> <given-names>J. D.</given-names></name> <name><surname>Tan</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>40</volume>:<issue>e49</issue>. <pub-id pub-id-type="doi">10.1093/nar/gkr1293</pub-id> <pub-id pub-id-type="pmid">22217600</pub-id></citation></ref>
<ref id="B72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Tang</surname> <given-names>H.</given-names></name> <name><surname>Tan</surname> <given-names>X.</given-names></name> <name><surname>Ficklin</surname> <given-names>S. P.</given-names></name> <name><surname>Feltus</surname> <given-names>F. A.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>Modes of gene duplication contribute differently to genetic novelty and redundancy, but show parallels across divergent angiosperms.</article-title> <source><italic>PLoS One</italic></source> <volume>6</volume>:<issue>e28150</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0028150</pub-id> <pub-id pub-id-type="pmid">22164235</pub-id></citation></ref>
<ref id="B73"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Western</surname> <given-names>T. L.</given-names></name> <name><surname>Burn</surname> <given-names>J.</given-names></name> <name><surname>Tan</surname> <given-names>W. L.</given-names></name> <name><surname>Skinner</surname> <given-names>D. J.</given-names></name> <name><surname>Martin-McCaffrey</surname> <given-names>L.</given-names></name> <name><surname>Moffatt</surname> <given-names>B. A.</given-names></name><etal/></person-group> (<year>2001</year>). <article-title>Isolation and characterization of mutants defective in seed coat mucilage secretory cell development in <italic>Arabidopsis</italic>.</article-title> <source><italic>Plant Physiol.</italic></source> <volume>127</volume> <fpage>998</fpage>&#x2013;<lpage>1011</lpage>. <pub-id pub-id-type="doi">10.1104/pp.010410</pub-id></citation></ref>
<ref id="B74"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>J. Y.</given-names></name> <name><surname>Chung</surname> <given-names>M. C.</given-names></name> <name><surname>Tu</surname> <given-names>C. Y.</given-names></name> <name><surname>Leu</surname> <given-names>W. M.</given-names></name></person-group> (<year>2002</year>). <article-title>OSTF1: a HD-GL2 family homeobox gene is developmentally regulated during early embryogenesis in rice.</article-title> <source><italic>Plant Cell Physiol.</italic></source> <volume>43</volume> <fpage>628</fpage>&#x2013;<lpage>638</lpage>. <pub-id pub-id-type="doi">10.1093/pcp/pcf076</pub-id> <pub-id pub-id-type="pmid">12091716</pub-id></citation></ref>
<ref id="B75"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>S.</given-names></name> <name><surname>Bourne</surname> <given-names>P. E.</given-names></name></person-group> (<year>2009</year>). <article-title>The evolutionary history of protein domains viewed by species phylogeny.</article-title> <source><italic>PLoS One</italic></source> <volume>4</volume>:<issue>e8378</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0008378</pub-id> <pub-id pub-id-type="pmid">20041107</pub-id></citation></ref>
<ref id="B76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>H.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Hong</surname> <given-names>Y. Y.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>P.</given-names></name> <name><surname>Ke</surname> <given-names>S. D.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Activated expression of an <italic>Arabidopsis</italic> HD-START protein confers drought tolerance with improved root system and reduced stomatal density.</article-title> <source><italic>Plant Cell</italic></source> <volume>20</volume> <fpage>1134</fpage>&#x2013;<lpage>1151</lpage>. <pub-id pub-id-type="doi">10.1105/tpc.108.058263</pub-id> <pub-id pub-id-type="pmid">18451323</pub-id></citation></ref>
<ref id="B77"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Lin</surname> <given-names>W.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Zhou</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2005</year>). <article-title>The genomes of <italic>Oryza sativa</italic>: a history of duplications.</article-title> <source><italic>PLoS Biol.</italic></source> <volume>3</volume>:<issue>e38</issue>. <pub-id pub-id-type="doi">10.1371/journal.pbio.0030038</pub-id> <pub-id pub-id-type="pmid">15685292</pub-id></citation></ref>
<ref id="B78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Q. J.</given-names></name> <name><surname>Zhu</surname> <given-names>T.</given-names></name> <name><surname>Xia</surname> <given-names>E. H.</given-names></name> <name><surname>Shi</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>Y. L.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Rapid diversification of five <italic>Oryza</italic> AA genomes associated with rice adaptation.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>111</volume> <fpage>E4954</fpage>&#x2013;<lpage>E4962</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1418307111</pub-id> <pub-id pub-id-type="pmid">25368197</pub-id></citation></ref>
<ref id="B79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhong</surname> <given-names>Z.</given-names></name> <name><surname>Lin</surname> <given-names>L.</given-names></name> <name><surname>Chen</surname> <given-names>M.</given-names></name> <name><surname>Lin</surname> <given-names>L.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Lin</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Expression divergence as an evolutionary alternative mechanism adopted by two rice subspecies against rice blast infection.</article-title> <source><italic>Rice</italic></source> <volume>12</volume>:<issue>12</issue>. <pub-id pub-id-type="doi">10.1186/s12284-019-0270-5</pub-id> <pub-id pub-id-type="pmid">30825020</pub-id></citation></ref>
<ref id="B80"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>A.</given-names></name> <name><surname>Guo</surname> <given-names>W.</given-names></name> <name><surname>Jain</surname> <given-names>K.</given-names></name> <name><surname>Mower</surname> <given-names>J. P.</given-names></name></person-group> (<year>2014</year>). <article-title>Unprecedented heterogeneity in the synonymous substitution rate within a plant genome.</article-title> <source><italic>Mol. Biol. Evol.</italic></source> <volume>31</volume> <fpage>1228</fpage>&#x2013;<lpage>1236</lpage>. <pub-id pub-id-type="doi">10.1093/molbev/msu079</pub-id> <pub-id pub-id-type="pmid">24557444</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="footnote1">
<label>1</label>
<p><ext-link ext-link-type="uri" xlink:href="http://hmmer.org/">http://hmmer.org/</ext-link></p></fn>
<fn id="footnote2">
<label>2</label>
<p><ext-link ext-link-type="uri" xlink:href="http://tree.bio.ed.ac.uk/software/figtree/">http://tree.bio.ed.ac.uk/software/figtree/</ext-link></p></fn>
</fn-group>
</back>
</article>
