<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Cell. Infect. Microbiol.</journal-id>
<journal-title>Frontiers in Cellular and Infection Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Cell. Infect. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">2235-2988</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fcimb.2017.00031</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Persistence of Functional Protein Domains in Mycoplasma Species and their Role in Host Specificity and Synthetic Minimal Life</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Kamminga</surname> <given-names>Tjerko</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/390751/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Koehorst</surname> <given-names>Jasper J.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Vermeij</surname> <given-names>Paul</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Slagman</surname> <given-names>Simen-Jan</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Martins dos Santos</surname> <given-names>Vitor A. P.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Bijlsma</surname> <given-names>Jetta J. E.</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Schaap</surname> <given-names>Peter J.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/198555/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Laboratory of Systems and Synthetic Biology, Department of Agrotechnology and Food Sciences, Wageningen University and Research</institution> <country>Wageningen, Netherlands</country></aff>
<aff id="aff2"><sup>2</sup><institution>Bioprocess Technology and Support, MSD Animal Health</institution> <country>Boxmeer, Netherlands</country></aff>
<aff id="aff3"><sup>3</sup><institution>Discovery and Technology, MSD Animal Health</institution> <country>Boxmeer, Netherlands</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Lorenza Putignani, Bambino Ges&#x000F9; Children&#x00027;s Hospital (IRCCS), Italy</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Subramanian Dhandayuthapani, Texas Tech University Health Sciences Center, USA; Yang Zhang, University of Pennsylvania, USA</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Peter J. Schaap <email>peter.schaap&#x00040;wur.nl</email></p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>07</day>
<month>02</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>7</volume>
<elocation-id>31</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>11</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>01</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 Kamminga, Koehorst, Vermeij, Slagman, Martins dos Santos, Bijlsma and Schaap.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>Kamminga, Koehorst, Vermeij, Slagman, Martins dos Santos, Bijlsma and Schaap</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Mycoplasmas are the smallest self-replicating organisms and obligate parasites of a specific vertebrate host. An in-depth analysis of the functional capabilities of mycoplasma species is fundamental to understand how some of simplest forms of life on Earth succeeded in subverting complex hosts with highly sophisticated immune systems. In this study we present a genome-scale comparison, focused on identification of functional protein domains, of 80 publically available mycoplasma genomes which were consistently re-annotated using a standardized annotation pipeline embedded in a semantic framework to keep track of the data provenance. We examined the pan- and core-domainome and studied predicted functional capability in relation to host specificity and phylogenetic distance. We show that the pan- and core-domainome of mycoplasma species is closed. A comparison with the proteome of the &#x0201C;minimal&#x0201D; synthetic bacterium JCVI-Syn3.0 allowed us to classify domains and proteins essential for minimal life. Many of those essential protein domains, essential Domains of Unknown Function (DUFs) and essential hypothetical proteins are not persistent across mycoplasma genomes suggesting that mycoplasma species support alternative domain configurations that bypass their essentiality. Based on the protein domain composition, we could separate mycoplasma species infecting blood and tissue. For selected genomes of tissue infecting mycoplasmas, we could also predict whether the host is ruminant, pig or human. Functionally closely related mycoplasma species, which have a highly similar protein domain repertoire, but different hosts could not be separated. This study provides a concise overview of the functional capabilities of mycoplasma species, which can be used as a basis to further understand host-pathogen interaction or to design synthetic minimal life.</p></abstract>
<kwd-group>
<kwd>mycoplasma</kwd>
<kwd>mollicutes</kwd>
<kwd>protein domains</kwd>
<kwd>genome comparison</kwd>
<kwd>host specificity</kwd>
<kwd>niche specificity</kwd>
<kwd>minimal genome</kwd>
<kwd>protein metabolism</kwd>
</kwd-group>
<contract-sponsor id="cn001">Horizon 2020 Framework Programme<named-content content-type="fundref-id">10.13039/100010661</named-content></contract-sponsor>
<counts>
<fig-count count="5"/>
<table-count count="3"/>
<equation-count count="0"/>
<ref-count count="59"/>
<page-count count="13"/>
<word-count count="9150"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Mycoplasmas have evolved from a common gram-positive ancestor (Razin and Yogev, <xref ref-type="bibr" rid="B42">1998</xref>) and the evolutionary path of genome reduction has led to an obligatory parasitic lifestyle which presumably has selected for those bacteria that best manipulate their hosts and make optimal use of their specific niche with a minimal set of genes. The mechanisms needed by these bacteria to survive in a vertebrate host, however, are not completely understood (Rosengarten et al., <xref ref-type="bibr" rid="B43">2000</xref>). Research into infectious mechanisms used by mycoplasma species has been focused on identification of adhesive molecules (Rottem, <xref ref-type="bibr" rid="B44">2003</xref>), lipoproteins (Browning et al., <xref ref-type="bibr" rid="B4">2011</xref>), molecular mechanisms used to vary the composition of the surface of the bacterial membrane (Razin and Yogev, <xref ref-type="bibr" rid="B42">1998</xref>) and production of oxidizing components (e.g., hydrogen peroxide and hydrogen sulfide) which cause damage to the host (Vilei and Frey, <xref ref-type="bibr" rid="B55">2001</xref>; Gro&#x000DF;hennig et al., <xref ref-type="bibr" rid="B17">2015</xref>). While these studies provide insight into the mechanisms used by mycoplasmas to infect the host they do not explain why a mycoplasma species is specific for its host. Besides being important pathogens, mycoplasma species have also been extensively studied because their gene set is expected to be close to the minimal amount of genes needed to sustain life (Gil et al., <xref ref-type="bibr" rid="B16">2004</xref>). Recently, a major hallmark was achieved by publication of an engineered mycoplasma with a synthetic minimal genome of 473 genes based on the genome of <italic>Mycoplasma mycoides</italic> subsp. <italic>capri</italic> (Gibson et al., <xref ref-type="bibr" rid="B15">2010</xref>; Hutchison et al., <xref ref-type="bibr" rid="B23">2016</xref>) providing a benchmark for genome comparison studies aimed at determining gene essentiality.</p>
<p>Advancements in genome sequencing techniques led to the availability of a multitude of genomes from mycoplasma species. With this wealth of sequencing data, it is possible to study the complete repertoire of genes for a bacterial species, the pan-genome. Rouli et al. (<xref ref-type="bibr" rid="B45">2015</xref>) observed that bacterial species that have adopted an allopatric lifestyle in specific hosts, tend to have a closed pan-genome. In recent comparative genomics studies for mycoplasma and haemoplasma species, a sub-group within the mycoplasma genus, the pan-genome was reported to be open (Liu et al., <xref ref-type="bibr" rid="B31">2012</xref>; Guimaraes et al., <xref ref-type="bibr" rid="B19">2014</xref>). Here we present a genome-scale comparison of mycoplasma species at the functional level of protein domains. Proteins are the main working machinery of the cell and consist of functional domains, which are stable structurally independent and genetically mobile units. A protein function can thus be precisely described by taking into account the specific domain composition architecture (Koehorst et al., <xref ref-type="bibr" rid="B26">2016b</xref>). Studying protein domain presence, instead of gene sequence similarity, allows for comparison of domain promiscuity, and expansion and domain architecture variability. In a recent study, this approach was used for comparison of 121 <italic>Streptococcus</italic> strains based on the protein domain composition of these strains (Saccenti et al., <xref ref-type="bibr" rid="B46">2015</xref>) and the authors were able to capture metabolic flexibility within <italic>Streptococcus</italic> through the identification of differences in core metabolic pathways between pathogenic and non-pathogenic strains. By analyzing functional capability based on protein domains, we gain insight in functional flexibility of mycoplasma species and we hypothesized that this will allow us to capture functional differences between mycoplasma species explaining adaptation to a host or niche. This strategy is supported by the recent finding that for <italic>Mycoplasma pneumonia</italic> gene essentiality should be studied on the level of domains and not on the level of genes (Lluch-senar et al., <xref ref-type="bibr" rid="B33">2015</xref>). All protein domains found in a species make up the pan-domainome of a species (Kuznetsov et al., <xref ref-type="bibr" rid="B29">2006</xref>), containing core, accessory, and unique domains.</p>
<p>We performed a de novo annotation of 80 publically available mycoplasma genomes and included in this analysis the synthetic minimal genome variant of <italic>M. mycoides</italic> subsp. <italic>capri</italic> using a standardized pipeline for prokaryotic genomes focused on identification of protein domains. We determined the composition and size of the core- and pan-domainome of distinct mycoplasma species and of the complete mycoplasma genus. Incorporation of the synthetic minimal variant in the comparison allowed us to analyze the overlap between protein domains in the core domainome of mycoplasma species vs. the synthetic minimal bacterium to pinpoint functions essential for minimal life.</p>
</sec>
<sec sec-type="methods" id="s2">
<title>Methods</title>
<sec>
<title>Genome retrieval and data handling</title>
<p>In total 65 complete and 15 draft mycoplasma genomes (Table <xref ref-type="supplementary-material" rid="SM1">S1</xref>) were obtained from the NCBI database on the 25th of August 2015 using the &#x0201C;rsync&#x0201D; interface. The dataset contained information from 34 mycoplasma species. For 20 species a single genome sequence was available while for the other 14 species multiple genomes were available (2&#x02013;12 genomes per species). For 6 species only a draft genome sequence was available. Genome sizes range from 0.58 Mbp for <italic>M. genitalium</italic> to 1.36 Mbp for <italic>M. penetrans</italic>. Genome sequences were retrieved in FASTA format and were used as input for an in-house prokaryotic annotation platform (SAPP; Koehorst et al., <xref ref-type="bibr" rid="B27">2016a</xref>). <italic>Bacillus subtilis</italic> strain 168 (NC_000964) (Weisburg et al., <xref ref-type="bibr" rid="B56">1989</xref>) was used as outlier/common ancestor. Briefly, the SAPP platform consists of sets of modules required for genome annotation of prokaryotes. Different modules can be selected for analysis and results and metadata are directly stored in a graph-database using the RDF (Resource Description Framework) data model. Originally deposited genome annotations were obtained directly from the NCBI in GenBank format and converted into RDF. For three draft genomes no reference annotation was available (accession numbers: <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="NZ_ANIV00000000">NZ_ANIV00000000</ext-link>, <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="NZ_ANAB00000000">NZ_ANAB00000000</ext-link>, and <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="NX_ANAA00000000">NX_ANAA00000000</ext-link>).</p>
</sec>
<sec>
<title>Genome re-annotation using SAPP</title>
<p>Gene prediction was performed using Prodigal version 2.6.2 (Hyatt et al., <xref ref-type="bibr" rid="B24">2010</xref>) with codon table 4 (The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code). Proteins were analyzed using InterProScan version 5.4-47.0 (Jones et al., <xref ref-type="bibr" rid="B25">2014</xref>) with the complete set of applications enabled (TIGRFAM, PIRSF, ProDom, SMART, PROSITE Profiles&#x00026;Pattern, HAMAP, PfamA, PRINTS, SUPERFAMILY, Coils and Gene3D). Protein domain information and other relevant information (GO terms, EC&#x00023;&#x00027;s) obtained from InterProScan were directly stored in the graph-database. For querying results a SPARQL endpoint was set-up on a local server using Blazegraph Workbench v2.1.0. The annotated genomes were uploaded in RDF format using the Blazegraph Webinterface and query results were obtained in R using RCurl (Temple Lang, <xref ref-type="bibr" rid="B50">2015</xref>) and SPARQL (Van Hage, <xref ref-type="bibr" rid="B53">2013</xref>).</p>
</sec>
<sec>
<title>Phylogenetic analysis of mycoplasma genomes</title>
<p>16S rRNA sequences were obtained from the ARB-SILVA database (Quast et al., <xref ref-type="bibr" rid="B41">2013</xref>) (Table <xref ref-type="supplementary-material" rid="SM1">S2</xref>). When available, sequences from the &#x0201C;all species living tree&#x0201D; project were used. For the synthetic JCVI-Syn3.0, the 16S rRNA sequence of the parental <italic>M. mycoides</italic> subsp. <italic>capri</italic> PG3 was used. 16S rRNA sequences were aligned with Clustal Omega (version 1.2.1). MEGA (version 7.0.14) was used to create a phylogenetic tree (maximum likelihood method with 500x bootstrapping). Archaeopteryx (version 0.9901) was used to visualize the tree and root the tree using <italic>B. subtilis</italic> as outlier. The phylogenetic tree was read into R and analyzed using the R package &#x0201C;ape&#x0201D; (Paradis et al., <xref ref-type="bibr" rid="B37">2004</xref>). Comparison of the phylogenetic tree to the protein domain tree was done using the R package &#x0201C;dendextend&#x0201D; (Galili, <xref ref-type="bibr" rid="B13">2015</xref>).</p>
</sec>
<sec>
<title>Analysis of core- and pan-domainome</title>
<p>The total domain composition of each genome was obtained using SPARQL queries. Only domains which were assigned with an e-value of &#x0003C;1E<sup>&#x02212;07</sup> were taken into account. In R, a matrix was created with all genomes and their domain composition in binary format, meaning that in this analysis only domain presence was considered. Clustering of species based on the presence/absence matrix was done using the function &#x0201C;hclust&#x0201D; in R; distances were calculated using the &#x0201C;Manhattan&#x0201D; distance. The R-package &#x0201C;micropan&#x0201D; (Snipen et al., <xref ref-type="bibr" rid="B49">2009</xref>) was used to analyze the pan- and core-domainome of species from which five or more genomes were available and the same approach was used to analyze the complete mycoplasma database. To analyze how the amount of genomes sequenced affects the size of the pan- and core-domainome, a 10 times random sampling was done from the presence/absence domain matrix using sample sizes ranging from 1 to 80 genome sequences. The range of model complexities considered (k-range) was 3&#x02013;5. Estimated core- and pan-domainome sizes were obtained using micropan; true core- and pan-domainome sizes were directly calculated from the sample set. Further analysis of differences between species was done using principal component analysis (PCA). Loading scores obtained with PCA were used to identify domains that contribute highly to group separation. To identify domains present in haemoplasma species that contribute highly to separation of this cluster from the other mycoplasma clusters a loadings score &#x0003E;0.02 was used. To identify domains that contribute highly to separation of the pneumoniae cluster and the spiroplasma/hominis cluster, cut-off values for the loading score of &#x0003E;0.05 and &#x0003C;&#x02013;0.05 were used, respectively. Proteins with a metabolic function were extracted from the genome-scale metabolic model of <italic>M. pneumonia</italic> (Wodke et al., <xref ref-type="bibr" rid="B57">2013</xref>) and extended with InterProScan domain annotations.</p>
</sec>
<sec>
<title>Analysis of orthologous proteins</title>
<p>A SPARQL query was used to generate a protein FASTA file using all mycoplasma genomes (JCVI-Syn3.0 was not taken into account). An all-against-all BLASTP (Wolf and Koonin, <xref ref-type="bibr" rid="B58">2012</xref>) was performed of the mycoplasma proteins using an e-value cut-off of 1E<sup>&#x02212;05</sup> and a maximum target sequence of 10<sup>5</sup>. The BLAST file created was used to find orthologous proteins with orthAgogue (Ekseth et al., <xref ref-type="bibr" rid="B11">2014</xref>) excluding protein pairs with an overlap below 50%. Clustering of orthologous proteins was done using MCL (Enright et al., <xref ref-type="bibr" rid="B12">2002</xref>) setting 1.5 as main inflation. With a SPARQL query the domain composition of all orthologous proteins was obtained based on InterPro identifiers.</p>
</sec>
<sec>
<title>Clustering of hypothetical mycoplasma proteins</title>
<p>Hypothetical proteins (domain-less proteins) were obtained from the mycoplasma genomes using a SPARQL query. Orthologous protein clusters containing these hypothetical proteins were obtained from the list of orthologous protein clusters. Persistence of these orthologous clusters was determined in the complete set of genomes used, JCVI-Syn3.0 and <italic>M. mycoides</italic> subsp. <italic>capri LC</italic>. Haemoplasma species were excluded from this analysis.</p>
</sec>
<sec>
<title>Prediction of host/niche specific domains</title>
<p>K-nearest neighbor and random forest classification (Breiman, <xref ref-type="bibr" rid="B3">2001</xref>) were used to classify mycoplasma species based on host or niche specificity and to identify domains important for classification. A binary domain presence/absence matrix was used as input. The R-package &#x0201C;class&#x0201D; was used to perform k-nearest neighbor (k-nn) classification (Venables and Ripley, <xref ref-type="bibr" rid="B54">2002</xref>) and the R package &#x0201C;randomForest&#x0201D; (Liaw and Wiener, <xref ref-type="bibr" rid="B30">2002</xref>) was used for random forest classification. 500 trees were built for each classification with random forest. Domains important for classification were found based on the mean decrease in node impurity (Gini index). Information from 26 mycoplasma genomes was used for the final niche classification and from 22 genomes for the final host classification (Tables <xref ref-type="supplementary-material" rid="SM1">S10</xref>, <xref ref-type="supplementary-material" rid="SM1">S11</xref>). K-nn classification for the niche dataset was done with a k-value of 5, 19 mycoplasma species in the training set and 7 mycoplasma species in the test set (4 infecting multiple tissue types and 3 infecting strictly respiratory tissue). For the host classification a k-value of 4 was used, 16 mycoplasma species in the training set and 6 mycoplasma species were used in the test set (3 species infecting ruminants, 1 species infecting pig and 2 species infecting humans).</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec>
<title>Re-annotation of mycoplasma genomes</title>
<p>The quality of the structural and functional annotation of publically available genomes can vary. In order to get for all studied genomes an up-to-date set of annotated genes and to minimize the risk of false discoveries resulting from methodological inconsistencies, the 80 publicly available mycoplasma genomes (Table <xref ref-type="supplementary-material" rid="SM1">S1</xref>) were re-annotated using a standardized set of algorithms (Koehorst et al., <xref ref-type="bibr" rid="B27">2016a</xref>). Mycoplasma genomes have a low GC content and thus accuracies of the various original gene prediction methods applied were expected to be high (Tripp et al., <xref ref-type="bibr" rid="B51">2015</xref>). Nevertheless, on average 3.7% more genes were found after re-annotation with the most recent version of prodigal (Hyatt et al., <xref ref-type="bibr" rid="B24">2010</xref>). Consequently, the total number of proteins found in the re-annotated genomes was also higher (Table <xref ref-type="supplementary-material" rid="SM1">S3</xref>).</p>
</sec>
<sec>
<title>Mycoplasma proteome and predicted pan- and core-domainome</title>
<p>Haemoplasma species, which specifically infect blood, have a higher number of predicted proteins relative to their genome size (Figure <xref ref-type="fig" rid="F1">1A</xref>), corresponding with a lower average CDS length (Guimaraes et al., <xref ref-type="bibr" rid="B20">2011</xref>). This difference is caused by the presence of a large repertoire of proteins with a relatively short CDS length, which are part of paralogous gene families (do Nascimento et al., <xref ref-type="bibr" rid="B9">2012</xref>). To survive in their specific niche, haemoplasma species can express these proteins and generate variability of proteins at the cell surface to prevent detection by the immune system of the host (Citti and Blanchard, <xref ref-type="bibr" rid="B6">2013</xref>). Besides the haemoplasma species, a high amount of predicted proteins relative to the genome size was also found in <italic>M. genitalium</italic> G37 and JCVI-Syn3.0. Approximately 80% of the mycoplasma species proteins contained functional domains (Figure <xref ref-type="fig" rid="F1">1B</xref>). This percentage is similar to the average match percentage found if the whole UniProtKB is analyzed using InterProScan (Hunter et al., <xref ref-type="bibr" rid="B22">2012</xref>). The <italic>M. mycoides</italic> based JCVI-syn3.0 synthetic genome contained the highest percentage of proteins with a recognizable domain (86%), &#x0007E;9% more than the parental template genome. Haemoplasma species were the notable exceptions, which despite their normal genome size, contained a significantly lower percentage of proteins with recognizable domains (22&#x02013;54%). This difference occurs because the aforementioned variable surface proteins do not contain recognizable domains. As a direct result of the re-annotation strategy the total amount of unique functional domains per species increased with 0.8% on average.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Mycoplasma proteome specification. (A)</bold> Correlation between genome size and total amount of proteins. <bold>(B)</bold> Ratio of proteins covered by protein domains. Mag, <italic>M. agalactiae</italic>; Mal, <italic>M. alligatoris</italic>; Man, <italic>M. anatis</italic>; Mca, <italic>M. canis</italic>; Mcol, <italic>M. columbinum</italic>; Mge, <italic>M. genitalium</italic>; Mio, <italic>M. iowae</italic>; Mmc, <italic>M. mycoides capri</italic>; Movi, <italic>M. ovipneumoniae</italic>; Mpn, <italic>M. pneumoniae</italic>; Mar, <italic>M. arthritidis</italic>; Mbo, <italic>M. bovis</italic>; Mca, <italic>M. capricolum</italic> subsp. <italic>capricolum</italic>; Mcon, <italic>M. conjunctivae</italic>; Mcr, <italic>M. crocodyli</italic>; Mcy, <italic>M. cynos</italic>; Mfe, <italic>M. fermentans</italic>; Mga, <italic>M. gallisepticum</italic>; Mhc, <italic>M. haemocanis</italic>; Mhf, <italic>M. haemofelis</italic>; Mho, <italic>M. hominis</italic>; Mhy, <italic>M. hyopneumoniae</italic>; Mhr, <italic>M. hyorhinis</italic>; Mle, <italic>M. leachii</italic>; Mmo, <italic>M. mobile</italic>; Mmm, <italic>M. mycoides</italic> subsp. <italic>mycoides</italic>; Mov, <italic>M. ovis</italic>; Mpa, <italic>M. parvum</italic>; Mpe, <italic>M. penetrans</italic>; Mpul, <italic>M. pulmonis</italic>; Mput, <italic>M. putrefaciens</italic>; Msu, <italic>M. suis</italic>; Msy, <italic>M. synoviae</italic>; Mwe, <italic>M. wenyonii</italic>; Syn, JCVI-Syn3.0. Numbers relate to strains (Table <xref ref-type="supplementary-material" rid="SM1">S4</xref>).</p></caption>
<graphic xlink:href="fcimb-07-00031-g0001.tif"/>
</fig>
<p>The total pan-domainome consisted of 1737 domains, the core domainome consisted of 335 domains and the core-to-pan ratio was 19.3%. Analysis of the pan-domainome for species from which 5 or more genomes (<italic>M. pneumoniae, M. gallisepticum, M. hyopneumoniae</italic>, and <italic>M. genitalium</italic>) were available using Micropan (2- or 3-component system, Snipen et al., <xref ref-type="bibr" rid="B49">2009</xref>) showed that the pan-domainome was closed (alpha &#x0003E; 1). A closed pan-domainome was also observed for the genus (9 component system, alpha &#x0003E; 1) taking into account all 80 mycoplasma genomes (Figure <xref ref-type="fig" rid="F2">2</xref>).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>The mycoplasma pan-domainome is closed</bold>. True and estimated core- and pan-domainome sizes were calculated using an iterative process (<italic>n</italic> &#x0003D; 10) in which a fixed number of genomes was randomly selected from the binary domain matrix. Estimated values were calculated using the R package MicroPan and true values were directly calculated from the input. Estimated pan-domainome (blue); true pan-domainome (red); estimated core-domainome (green) and true core-domainome (purple).</p></caption>
<graphic xlink:href="fcimb-07-00031-g0002.tif"/>
</fig>
</sec>
<sec>
<title>Functional classification of mycoplasma species</title>
<p>To gain insight into a possible functional differentiation of mycoplasma species as a result of specific host co-evolution, we clustered mycoplasma species based on a presence/absence domain matrix and compared domain repertoire clustering with clustering based on 16S rRNA sequences (Figure <xref ref-type="fig" rid="F3">3</xref>). In the domain based functional tree, the monophyletic pneumonia cluster separated into three separate functional clusters. One of these separate clusters contains the haemoplasma species, which have a relatively low number of protein domains (Figure <xref ref-type="supplementary-material" rid="SM2">S1</xref> and Table <xref ref-type="supplementary-material" rid="SM1">S4</xref>). <italic>M. penetrans</italic> and <italic>M. iowae</italic> form a second functional cluster; these species have a relatively high number of functional domains when compared to other species in the pneumonia 16S-phylogenetic group. The remaining species in this 16S-phylogenetic group are closely related to the spiroplasma cluster in the functional tree. The hominis 16S-phylogenetic cluster was completely maintained in the protein domain tree but compared to the 16S tree there were some re-arrangements, which can partly be explained by low significance in the assignment of branches in the 16S phylogenetic tree. Notable changes are: <italic>M</italic>. <italic>hominis</italic> and <italic>M. arthritidis</italic> clustered with <italic>M. columbinum</italic> and <italic>M. pulmonis</italic> clustered with <italic>M. hyorhinis</italic>. We did not observe a functional clustering based on host.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Niche-driven functional evolution</bold>. Accelerated functional evolution causes separation of haemoplasma species and several other mycoplasma species when phylogenetic clusters are compared to functional clusters. Dashed lines indicate distinct branches. <bold>Left:</bold> Standard phylogenetic tree using 16S rRNA (maximum likelihood, 500x bootstrapped, see Table <xref ref-type="supplementary-material" rid="SM1">S2</xref> for strains and sequences which were used). Only bootstrapping values &#x0003C;0.7 are shown, the phylogenetic tree with all bootstrapping values is shown in Figure <xref ref-type="supplementary-material" rid="SM2">S2</xref>. <bold>Right:</bold> functional clustering based on Manhattan distance calculated from the presence/absence matrix of domains. Groups indicated are: S, Spiroplasma; H, Hominis; P, Pneumoniae; Ha, Haemoplasma; and O, Other.</p></caption>
<graphic xlink:href="fcimb-07-00031-g0003.tif"/>
</fig>
</sec>
<sec>
<title>Functional differentiation of haemoplasma species</title>
<p>To determine which domains were important for separation of haemoplasma from mycoplasma species infecting tissue, we used principal component analysis (Figure <xref ref-type="fig" rid="F4">4</xref>). Based on the loading scores for the first and second principal component we could assess which domains contributed to group separation. Haemoplasma species were separated from the other mycoplasma species along the first principal component. We identified 30 domains in haemoplasma species that mainly contributed to separation of this cluster (Table <xref ref-type="table" rid="T1">1</xref> and Table <xref ref-type="supplementary-material" rid="SM1">S5</xref>) and 400 domains present in the tissue infecting mycoplasma species that mainly contributed to separation of this cluster from the haemoplasma species cluster. Domains present in the haemoplasma species that contributed to group separation were ABC transporter domains for iron or vitamin B12. Multiple domains were found related to functional enzymes in purine metabolism (GMP synthase, IMP dehydrogenase, adenylosuccinate synthase) or L-aspartate metabolism (fumarate lyase family domains, part of adenylosuccinate lyase) which provides a precursor for purine metabolism (Santos et al., <xref ref-type="bibr" rid="B47">2011</xref>). The presence of GMP synthase domains may provide the haemoplasma with the option to produce all purine bases from hypoxanthine which is present in blood (Guimaraes et al., <xref ref-type="bibr" rid="B20">2011</xref>). An alternative function for these GMP synthase domains could be the production of glutamate which is present in a low concentration in blood (McMenamy et al., <xref ref-type="bibr" rid="B35">1957</xref>). Three domains related to superoxide dismutase activity were also found, a function, which could provide protection when radicals are present in blood.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Functional differentiation of mycoplasma species</bold>. Score plot is shown of principal component analysis done on the presence/absence matrix of the 80 mycoplasma strains and the synthetic bacterium JCVI-Syn3.0. Main phylogenetic groups are color coded. Note the separation of mycoplasma species infecting blood and tissue. Mag, <italic>M. agalactiae</italic>; Mal, <italic>M. alligatoris</italic>; Man, <italic>M. anatis</italic>; Mca, <italic>M. canis</italic>; Mcol, <italic>M. columbinum</italic>; M_g5, <italic>M. g5847</italic>, Mge, <italic>M. genitalium</italic>; Mio, <italic>M. iowae</italic>; Mmc, <italic>M. mycoides capri</italic>; Movi, <italic>M. ovipneumoniae</italic>; Mpn, <italic>M. pneumoniae</italic>; Mar, <italic>M. arthritidis</italic>; Mbo, <italic>M. bovis</italic>; Mca, <italic>M. capricolum</italic> subsp. <italic>capricolum</italic>; Mcon, <italic>M. conjunctivae</italic>; Mcr, <italic>M. crocodyli</italic>; Mcy, <italic>M. cynos</italic>; Mfe, <italic>M. fermentans</italic>; Mga, <italic>M. gallisepticum</italic>; Mhc, <italic>M. haemocanis</italic>; Mhf, <italic>M. haemofelis</italic>; Mho, <italic>M. hominis</italic>; Mhy, <italic>M. hyopneumoniae</italic>; Mhr, <italic>M. hyorhinis</italic>; Mle, <italic>M. leachii</italic>; Mmo, <italic>M. mobile</italic>; Mmm, <italic>M. mycoides</italic> subsp. <italic>mycoides</italic>; Mov, <italic>M. ovis</italic>; Mpa, <italic>M. parvum</italic>; Mpe, <italic>M. penetrans</italic>; Mpul, <italic>M. pulmonis</italic>; Mput, <italic>M. putrefaciens</italic>; Msu, <italic>M. suis</italic>; Msy, <italic>M. synoviae</italic>; Mwe, <italic>M. wenyonii</italic>; Syn, JCVI-Syn3.0. Numbers relate to strains (Table <xref ref-type="supplementary-material" rid="SM1">S4</xref>).</p></caption>
<graphic xlink:href="fcimb-07-00031-g0004.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Top 10 domains responsible for separation of mycoplasma functional clusters</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="center" colspan="2"><bold>Enriched in haemoplasma<xref ref-type="table-fn" rid="TN1a"><sup>a</sup></xref></bold></th>
<th valign="top" align="center" colspan="2"><bold>Enriched in hominis/spiroplasma<xref ref-type="table-fn" rid="TN1b"><sup>b</sup></xref></bold></th>
<th valign="top" align="center" colspan="2"><bold>Enriched in pneumoniae<xref ref-type="table-fn" rid="TN1c"><sup>c</sup></xref></bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>ID<xref ref-type="table-fn" rid="TN1d"><sup>d</sup></xref></bold></th>
<th valign="top" align="left"><bold>InterPro description<xref ref-type="table-fn" rid="TN1e"><sup>e</sup></xref></bold></th>
<th valign="top" align="left"><bold>ID<xref ref-type="table-fn" rid="TN1d"><sup>d</sup></xref></bold></th>
<th valign="top" align="left"><bold>InterPro description<xref ref-type="table-fn" rid="TN1e"><sup>e</sup></xref></bold></th>
<th valign="top" align="left"><bold>ID<xref ref-type="table-fn" rid="TN1d"><sup>d</sup></xref></bold></th>
<th valign="top" align="left"><bold>InterPro description<xref ref-type="table-fn" rid="TN1e"><sup>e</sup></xref></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">IPR026023</td>
<td valign="top" align="left">Ribonucleotide reductase small subunit, prokaryotic</td>
<td valign="top" align="left">IPR029048</td>
<td valign="top" align="left">Heat shock protein 70kD, C-terminal domain</td>
<td valign="top" align="left">IPR002606</td>
<td valign="top" align="left">Riboflavin kinase, bacterial</td>
</tr>
<tr>
<td valign="top" align="left">IPR029022</td>
<td valign="top" align="left">ABC transporter, BtuC-like</td>
<td valign="top" align="left">IPR013826</td>
<td valign="top" align="left">DNA topoisomerase, type IA, central region, subdomain 3</td>
<td valign="top" align="left">IPR003526</td>
<td valign="top" align="left">2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase</td>
</tr>
<tr>
<td valign="top" align="left">IPR000522</td>
<td valign="top" align="left">ABC transporter, permease protein</td>
<td valign="top" align="left">IPR004398</td>
<td valign="top" align="left">RNA methyltransferase, RsmD</td>
<td valign="top" align="left">IPR006660</td>
<td valign="top" align="left">Arsenate reductase-like</td>
</tr>
<tr>
<td valign="top" align="left">IPR001674</td>
<td valign="top" align="left">GMP synthase, C-terminal</td>
<td valign="top" align="left">IPR003442</td>
<td valign="top" align="left">tRNA threonylcarbamoyl adenosine modification protein TsaE</td>
<td valign="top" align="left">IPR011631</td>
<td valign="top" align="left">Protein of unknown function DUF1600</td>
</tr>
<tr>
<td valign="top" align="left">IPR004837</td>
<td valign="top" align="left">Sodium/calcium exchanger membrane region</td>
<td valign="top" align="left">IPR006667</td>
<td valign="top" align="left">SLC41 divalent cation transporters, integral membrane domain</td>
<td valign="top" align="left">IPR023344</td>
<td valign="top" align="left">Uncharacterized domain MG237, C-terminal</td>
</tr>
<tr>
<td valign="top" align="left">IPR001670</td>
<td valign="top" align="left">Alcohol dehydrogenase, iron-type</td>
<td valign="top" align="left">IPR006668</td>
<td valign="top" align="left">Magnesium transporter, MgtE intracellular domain</td>
<td valign="top" align="left">IPR015271</td>
<td valign="top" align="left">Protein of unknown function DUF1951</td>
</tr>
<tr>
<td valign="top" align="left">IPR001093</td>
<td valign="top" align="left">IMP dehydrogenase/GMP reductase</td>
<td valign="top" align="left">IPR016947</td>
<td valign="top" align="left">Bacteriophage gamma, gammalsu0035</td>
<td valign="top" align="left">IPR013825</td>
<td valign="top" align="left">DNA topoisomerase, type IA, central region, subdomain 2</td>
</tr>
<tr>
<td valign="top" align="left">IPR019065</td>
<td valign="top" align="left">Restriction endonuclease, type II, NgoFVII</td>
<td valign="top" align="left">IPR000748</td>
<td valign="top" align="left">Pseudouridine synthase, RsuA/RluB/E/F</td>
<td valign="top" align="left">IPR012760</td>
<td valign="top" align="left">RNA polymerase sigma factor RpoD, C-terminal</td>
</tr>
<tr>
<td valign="top" align="left">IPR020471</td>
<td valign="top" align="left">Aldo/keto reductase subgroup</td>
<td valign="top" align="left">IPR001525</td>
<td valign="top" align="left">C-5 cytosine methyltransferase</td>
<td valign="top" align="left">IPR001844</td>
<td valign="top" align="left">Chaperonin Cpn60</td>
</tr>
<tr>
<td valign="top" align="left">IPR023210</td>
<td valign="top" align="left">NADP-dependent oxidoreductase domain</td>
<td valign="top" align="left">IPR003370</td>
<td valign="top" align="left">Chromate transporter</td>
<td valign="top" align="left">IPR002423</td>
<td valign="top" align="left">Chaperonin Cpn60/TCP-1</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1a">
<label>a</label>
<p><italic>Domains enriched in haemoplasma functional cluster</italic>.</p></fn>
<fn id="TN1b">
<label>b</label>
<p><italic>Domains enriched in hominis/spiroplasma functional cluster</italic>.</p></fn>
<fn id="TN1c">
<label>c</label>
<p><italic>Domains enriched in pneumoniae functional cluster</italic>.</p></fn>
<fn id="TN1d">
<label>d</label>
<p><italic>InterPro Identifier</italic>.</p></fn>
<fn id="TN1e">
<label>e</label>
<p><italic>Domain description obtained from InterProScan</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Functional differentiation between the hominis/spiroplasma and pneumoniae groups</title>
<p>Along the second principal component the hominis and spiroplasma clusters were separated from the pneumoniae cluster. We found 43 domains present in the hominis/spiroplasma clusters that mainly contributed to separation from the pneumoniae cluster vs. 71 in the pneumoniae cluster that mainly contributed to separation from the hominis/spiroplasma cluster (Table <xref ref-type="table" rid="T1">1</xref> and Table <xref ref-type="supplementary-material" rid="SM1">S5</xref>). In the hominis/spiroplasma cluster there was an increased presence of domains related to transport of magnesium and other divalent cations and also an increased capacity for chromate transport. Metals are important co-factors and increased chromate transport capability possibly results in increased chromate resistance as observed in <italic>B. subtilis</italic> (D&#x000ED;az-Maga&#x000F1;a et al., <xref ref-type="bibr" rid="B8">2009</xref>). Functionalities of other domains important to separate the hominis/spiroplasma cluster from the pneumoniae cluster were related to DNA/RNA modification, protein/peptide degradation and phosphopentomutase activity. The latter enzyme links nucleotide synthesis to the pentose phosphate pathway (PPP) (Pollack et al., <xref ref-type="bibr" rid="B40">1997</xref>) and provides mycoplasma with the option to produce nucleotides from the purine/pyrimidine bases or alternatively to degrade nucleotides via the PPP and glycolysis. In the set of domains that mainly contributed to separation of the pneumonia cluster from the hominis/spiroplasma cluster, a functional domain related to NAD kinase activity, needed for the production of NADP<sup>&#x0002B;</sup>, was found. Another domain was found linked to activity in the non-mevalonate pathway of isoprenoid synthesis: 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase. Activity of this pathway was shown for <italic>M. penetrans</italic> and <italic>M. gallisepticum</italic> (Eberl et al., <xref ref-type="bibr" rid="B10">2004</xref>) and might reduce the need to obtain isoprenoid precursors from the host. There was an increased presence of a domain related to thioredoxin-disulfide reductase activity which produces reduced thioredoxin needed for the production of deoxyribonucleotides and is important for protection against oxidative stress (Ben-Menachem et al., <xref ref-type="bibr" rid="B2">1997</xref>). The separation of mycoplasma species based on protein domain composition provided a concise overview of the functional differences between mycoplasma species.</p>
</sec>
<sec>
<title>Persistence of protein domains and of orthologous proteins</title>
<p>In order to compare the persistence of protein domains with the persistence of orthologous proteins, the complete set of orthologous proteins in the 80 mycoplasma genomes was determined using a standard bidirectional best hit approach (Wolf and Koonin, <xref ref-type="bibr" rid="B58">2012</xref>) followed by orthology assessment with orthAgogue (Ekseth et al., <xref ref-type="bibr" rid="B11">2014</xref>) and MCL clustering (Enright et al., <xref ref-type="bibr" rid="B12">2002</xref>). We found &#x0003E;5000 clusters of orthologous proteins and examined in how many genomes these orthologous proteins are present (Figure <xref ref-type="fig" rid="F5">5A</xref>). Only 135 orthologous proteins are conserved amongst all mycoplasma species and we find an average persistence of orthologous proteins of 12.6%. The persistence of protein domains in the pan-domainome was much higher (average of 48.4%, Figure <xref ref-type="fig" rid="F5">5A</xref>).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>Persistence of (essential) domains and (essential) hypothetical proteins. (A)</bold> Persistence of orthologous proteins based on sequence similarity (red) and persistence of single domains (blue). <bold>(B)</bold> Persistence of domains present in JCVI but absent from the core of the mycoplasma species infecting tissue (green). Persistence of hypothetical proteins for 72 mycoplasma species including JCVI-Syn3.0 (red), <italic>M. mycoides capri</italic> LC (purple), and JCVI-Syn3.0 (blue).</p></caption>
<graphic xlink:href="fcimb-07-00031-g0005.tif"/>
</fig>
</sec>
<sec>
<title>Mycoplasma pan- and core-domainome analysis in relation to JCVI-syn3.0</title>
<p>Clustering of mycoplasma species based on the pan-domainome did not show a correlation with their specific host. To further classify protein domains, we compared the pan- and core-domainome of the mycoplasma genus with the domainome of the minimal synthetic organism JCVI-Syn3.0 (Hutchison et al., <xref ref-type="bibr" rid="B23">2016</xref>) which consisted of 869 domains (Table <xref ref-type="supplementary-material" rid="SM1">S7</xref>). For the synthetic organism we assumed that all protein domains in this organism were essential. The core domainome consisted of 335 domains, a relatively small amount. This can be explained because we took into account the haemoplasma species that grow in blood and cannot be cultured <italic>ex vivo</italic>. When the core-domainome was calculated for mycoplasma species infecting tissue, a larger size core was obtained of 479 protein domains. From this core 26 domains were not present in JCVI-Syn3.0 (Table <xref ref-type="supplementary-material" rid="SM1">S8</xref>). Apparently these domains are not essential for growth in a laboratory environment. The remaining 453 core domains show overlap with JCVI-Syn3.0 (Table <xref ref-type="supplementary-material" rid="SM1">S8</xref>), indicating that these persistent domains are essential for axenic growth in (complex) growth media. Interestingly, the remaining 416 domains in JCVI-Syn3.0 essential for minimal life are not persistent (Figure <xref ref-type="fig" rid="F5">5B</xref> and Figure <xref ref-type="supplementary-material" rid="SM2">S3</xref>) suggesting that within the mycoplasma domain landscape many alternative configurations exist that bypass their essentiality.</p>
</sec>
<sec>
<title>Metabolic capability in relation to host specificity</title>
<p>To assess if domains with a non-essential metabolic function determine host specificity, we obtained all domains with a metabolic function not present in the synthetic minimal organism. Domains with a metabolic function were derived based on the genome-scale metabolic model of <italic>M. pneumoniae</italic> (Wodke et al., <xref ref-type="bibr" rid="B57">2013</xref>) supplemented with InterPro annotations. This model contains 145 genes, coding for 145 proteins, and from this set of proteins 359 unique protein domains were obtained. Almost all proteins with a metabolic function were covered with domains (97%). Overall we found 162 domains (33.8% of the total core) with a metabolic function to be present in the core of the tissue infecting mycoplasma species and 197 accessory domains with a metabolic function present in the accessory domainome (pan minus core). In JCVI-Syn3 156 domains from the metabolic core domainome and 140 accessory domains with a metabolic function were present. Thus, 63 domains with a metabolic function were absent in JCVI-Syn3.0 and to assess whether these domains could be involved in host specificity we clustered mycoplasma species infecting tissue based on the presence/absence of these domains but we could not establish a correlation with host specificity (Figure <xref ref-type="supplementary-material" rid="SM2">S4</xref>).</p>
</sec>
<sec>
<title>Role of hypothetical proteins in host adaptation</title>
<p>Clustering based on the pan-domainome composition or on the metabolic domain complement absent in JCVI-Syn3.0 failed to show a direct link between specific domains and host specificity. We further analyzed if presence or absence of hypothetical proteins could explain host specificity. In our dataset, a protein was annotated as hypothetical when a protein did not contain a protein domain or when a protein contained a domain of unknown function (DUF). In total 58 DUFs were found in the mycoplasma genus, from which only 8 DUFs were present in JCVI-Syn3 (Table <xref ref-type="supplementary-material" rid="SM1">S9</xref>). There were no DUFs in the core domainome of the complete genus and only 2 DUFs in the core domainome of the tissue infecting mycoplasma species (DUF161 and DUF933). DUF161 is part of a membrane protein with unknown function; DUF933 is suggested to be part of a nucleoprotein complex and could function as a GTP-dependent translation factor. The total amount of DUFs found was too low to analyze a relation with the host and for further classification of hypothetical proteins we compared the complete set of hypothetical proteins in JCVI-Syn3.0 to the complete set of hypothetical proteins in the pan-genome of the mycoplasma species infecting tissue. In total 11,598 hypothetical proteins were found in the tissue infecting mycoplasma species which based on sequence similarity, could be clustered into 1766 orthologous protein clusters. The relative persistence of the hypothetical protein clusters showed a sharp decline with an average persistence of approximately 9% (Figure <xref ref-type="fig" rid="F5">5B</xref>). The total amount of genes with completely unknown functions in the genome of JCVI-Syn3.0 was only 65 (Hutchison et al., <xref ref-type="bibr" rid="B23">2016</xref>) and we identified just 40 proteins to which no functional domains could be assigned. The persistence of orthologous protein clusters containing these hypothetical proteins was 14% (Figure <xref ref-type="fig" rid="F5">5B</xref>) which was higher than average. There was, however, conservation of clusters with hypothetical proteins from the spiroplasma phylogenetic group. In line with the finding that not all essential JCVI-Syn3.0 protein domains were persistent, essential hypothetical proteins were also not persistent suggesting that within the mycoplasma genus alternative solutions exist substituting these essential but currently unknown functions. We did not observe a relation with the host on the basis of the clustering of orthologous hypothetical proteins not present in JCVI-Syn3.0 (Figure <xref ref-type="supplementary-material" rid="SM2">S5</xref>).</p>
</sec>
<sec>
<title>Protein domain composition in relation to host or niche</title>
<p>Clustering based on the complete pan-domainome of mycoplasma, the metabolic domains outside JCVI-syn3.0 as well as the hypothetical orthologous proteins outside JCVI-Syn3.0 did not show a relation with a mycoplasma species specific host. As a final effort, we applied two machine learning approaches: k-nearest neighbor (k-nn) and Random Forest (Chen and Ishwaran, <xref ref-type="bibr" rid="B5">2012</xref>), to classify a mycoplasma species niche or host based on the pan-domainome composition. Both methods could predict with high accuracy whether the niche of a mycoplasma species was blood or tissue confirming the results already found using PCA (supplementary materials and Table <xref ref-type="supplementary-material" rid="SM1">S5</xref>). When the niche was specified in more detail (Table <xref ref-type="supplementary-material" rid="SM1">S6</xref>, Niche), the prediction accuracy decreased and species with a unique niche (e.g., <italic>M. mobile</italic> and <italic>M. conjunctivae</italic>) could not be assigned. Classification of mycoplasma growing in blood, strictly in the respiratory tract and in multiple tissue types including lung (Table <xref ref-type="supplementary-material" rid="SM1">S6</xref>, Niche 2) was possible using Random Forest with 95% prediction accuracy (5% out-of-bag error rate). The domain most important for classification was cell division protein <italic>FtsZ</italic> (IPR000158). This domain was present in many mycoplasma species but absent from <italic>M. canis, M. gallisepticum</italic>, and <italic>M. hyopneumoniae</italic>, which formed for a large part the species infecting the respiratory tract in our dataset. Absence of this specific domain does not mean that a species has no functional <italic>FtsZ</italic>, since there are alternative domain configurations possible (containing e.g., domain IPR003008 and IPR020805). To prevent prediction bias due to differences in the number of genomes available of a certain species, we decided to focus on the mycoplasma species infecting tissue for which we had at least two genomes and limited our search to two genomes per species. Using this smaller selection of genomes, prediction accuracy was higher (96% using the random forest classifier and 71% using k-nn classification) and we again identified the specific <italic>FtsZ</italic> domain (Table <xref ref-type="table" rid="T2">2</xref> and Table <xref ref-type="supplementary-material" rid="SM1">S10</xref>) as an important domain for niche classification. We also identified a putative DNA-binding domain (IPR009061), present in phenylalanine-tRNA synthetases. In our database this domain was not present in the selected strains of <italic>M. canis, M. hyopneumoniae</italic>, and <italic>M. pneumoniae</italic> which are all present strictly in the respiratory tract. The domain was, however, present in other mycoplasma species identified as strictly present in the respiratory tract: <italic>M</italic>. <italic>cynos, M. gallisepticum</italic>, and <italic>M. mycoides</italic> subsp. <italic>mycoides</italic> SC. Also important for classification was restriction endonuclease, type I domain IPR000055, which was not present in <italic>M. gallisepticum</italic> strains used in our selection and was also absent from the <italic>M. mycoides</italic> subsp. <italic>mycoides</italic> SC strains used in our comparison. There was not a single domain uniquely present in all mycoplasma infecting the respiratory tract and absent from the mycoplasma infecting multiple tissue types.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>Top 10 domains relevant for niche classification: Strictly respiratory or multiple tissue types</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Domain information</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Abundance (%)<xref ref-type="table-fn" rid="TN2a"><sup>a</sup></xref></bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>ID<xref ref-type="table-fn" rid="TN2d"><sup>d</sup></xref></bold></th>
<th valign="top" align="left"><bold>InterPro description<xref ref-type="table-fn" rid="TN2e"><sup>e</sup></xref></bold></th>
<th valign="top" align="left"><bold>Respiratory system<xref ref-type="table-fn" rid="TN2b"><sup>b</sup></xref></bold></th>
<th valign="top" align="center"><bold>Multiple<xref ref-type="table-fn" rid="TN2c"><sup>c</sup></xref></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">IPR009061</td>
<td valign="top" align="left">DNA binding domain, putative</td>
<td valign="top" align="center">40</td>
<td valign="top" align="center">100</td>
</tr>
<tr>
<td valign="top" align="left">IPR000055</td>
<td valign="top" align="left">Restriction endonuclease, type I, HsdS</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">94</td>
</tr>
<tr>
<td valign="top" align="left">IPR022749</td>
<td valign="top" align="left">N6 adenine-specific DNA methyltransferase, N12 class, N-terminal</td>
<td valign="top" align="center">20</td>
<td valign="top" align="center">81</td>
</tr>
<tr>
<td valign="top" align="left">IPR000158</td>
<td valign="top" align="left">Cell division protein FtsZ</td>
<td valign="top" align="center">40</td>
<td valign="top" align="center">100</td>
</tr>
<tr>
<td valign="top" align="left">IPR011701</td>
<td valign="top" align="left">Major facilitator superfamily</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">88</td>
</tr>
<tr>
<td valign="top" align="left">IPR003798</td>
<td valign="top" align="left">DNA recombination RmuC</td>
<td valign="top" align="center">20</td>
<td valign="top" align="center">75</td>
</tr>
<tr>
<td valign="top" align="left">IPR008280</td>
<td valign="top" align="left">Tubulin/FtsZ, C-terminal</td>
<td valign="top" align="center">40</td>
<td valign="top" align="center">88</td>
</tr>
<tr>
<td valign="top" align="left">IPR002198</td>
<td valign="top" align="left">Short-chain dehydrogenase/reductase SDR</td>
<td valign="top" align="center">20</td>
<td valign="top" align="center">75</td>
</tr>
<tr>
<td valign="top" align="left">IPR011089</td>
<td valign="top" align="left">Domain of unknown function DUF1524</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left">IPR005864</td>
<td valign="top" align="left">ATPase, F0 complex, subunit B, bacterial</td>
<td valign="top" align="center">60</td>
<td valign="top" align="center">100</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN2a">
<label>a</label>
<p><italic>Abundance of a protein domain in the specific niche</italic>.</p></fn>
<fn id="TN2b">
<label>b</label>
<p><italic>Abundance in mycoplasma species with a strictly respiratory niche</italic>.</p></fn>
<fn id="TN2c">
<label>c</label>
<p><italic>Abundance in mycoplasma species with multiple niches including respiratory</italic>.</p></fn>
<fn id="TN2d">
<label>d</label>
<p><italic>InterPro Identifier</italic>.</p></fn>
<fn id="TN2e">
<label>e</label>
<p><italic>Domain description obtained from InterProScan</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>For identification of domains important to classify mycoplasma hosts, we first used the complete diversity in hosts mentioned in Table <xref ref-type="supplementary-material" rid="SM1">S6</xref> and obtained a prediction accuracy of &#x0003C;80% using random forest. We decided to use a more focused approach and selected only mycoplasma species growing in tissue for which we had two species per host and two genomes per species. Genomes for cows and goats were pooled into a ruminants group. With this grouping, we could accurately predict (83% accuracy with k-nn classification and 100% with random forest) if a mycoplasma species from the selected genomes infects a pig, ruminant, or human. The most discriminatory domains identified from the random forest analysis (Table <xref ref-type="table" rid="T3">3</xref> and Table <xref ref-type="supplementary-material" rid="SM1">S11</xref>) related to peptidase functions (IPR000668, IPR005151, and IPR029045). A phosphodiesterase domain (IPR024654 and related family IPR000979) was found to be important for host differentiation, this domain only occurs in the human pathogens taken into account. A <italic>RmlC</italic>-like jelly roll fold domain (IPR014710), which is related to mannose/myo-inositol metabolism, was identified in the pig and ruminant species but was absent from species that infect humans. Two domains of unknown function were found: DUF2714 and DUF285 (IPR021222 and IPR005046). The DUF285 domain has probably been exchanged between ruminant species via horizontal gene transfer (Nouvel et al., <xref ref-type="bibr" rid="B36">2010</xref>). Several domains related to proteins expressed at the bacterial surface were found (IPR011889 and IPR027593). A glycine cleavage domain was found (IPR002930) which was absent from the selected mycoplasma species infecting humans. Using the Random Forest prediction, on specific species groups, we have identified a number of protein domains which could relate to host specificity.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p><bold>Top 10 domains relevant for host classification: Ruminants, pigs or humans</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Domain information</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Abundance (%)<xref ref-type="table-fn" rid="TN3a"><sup>a</sup></xref></bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>ID<xref ref-type="table-fn" rid="TN3e"><sup>e</sup></xref></bold></th>
<th valign="top" align="left"><bold>InterPro description<xref ref-type="table-fn" rid="TN3f"><sup>f</sup></xref></bold></th>
<th valign="top" align="center"><bold>Ruminants<xref ref-type="table-fn" rid="TN3b"><sup>b</sup></xref></bold></th>
<th valign="top" align="center"><bold>Pigs<xref ref-type="table-fn" rid="TN3c"><sup>c</sup></xref></bold></th>
<th valign="top" align="center"><bold>Humans<xref ref-type="table-fn" rid="TN3d"><sup>d</sup></xref></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">IPR000668</td>
<td valign="top" align="left">Peptidase C1A, papain C-terminal</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">IPR005151</td>
<td valign="top" align="left">Tail specific protease</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">IPR000979</td>
<td valign="top" align="left">Phosphodiesterase MJ0936/Vps29</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">100</td>
</tr>
<tr>
<td valign="top" align="left">IPR014710</td>
<td valign="top" align="left">RmlC-like jelly roll fold</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">IPR021222</td>
<td valign="top" align="left">Protein of unknown function DUF2714</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">IPR005046</td>
<td valign="top" align="left">Protein of unknown function DUF285</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">IPR002931</td>
<td valign="top" align="left">Transglutaminase-like</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">IPR002930</td>
<td valign="top" align="left">Glycine cleavage H-protein</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">IPR011889</td>
<td valign="top" align="left">Bacterial surface protein 26-residue repeat</td>
<td valign="top" align="center">92</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">IPR029045</td>
<td valign="top" align="left">ClpP/crotonase-like domain</td>
<td valign="top" align="center">100</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN3a">
<label>a</label>
<p><italic>Abundance of a protein domain in the specific host</italic>.</p></fn>
<fn id="TN3b">
<label>b</label>
<p><italic>Abundance in ruminant species</italic>.</p></fn>
<fn id="TN3c">
<label>c</label>
<p><italic>Abundance in pig species</italic>.</p></fn>
<fn id="TN3d">
<label>d</label>
<p><italic>Abundance in humans</italic>.</p></fn>
<fn id="TN3e">
<label>e</label>
<p><italic>InterPro Identifier</italic>.</p></fn>
<fn id="TN3f">
<label>f</label>
<p><italic>Domain description obtained from InterProScan</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>All mycoplasma species have reduced genomes and could be considered minimal organisms. Arguably, the most studied minimal organism to date is <italic>M. pneumoniae</italic>, a human pathogen causing inflammation of lung tissue in humans. From this bacterium we have knowledge of the genome (Dandekar et al., <xref ref-type="bibr" rid="B7">2000</xref>), transcriptome (G&#x000FC;ell et al., <xref ref-type="bibr" rid="B18">2009</xref>), proteome (Batisse et al., <xref ref-type="bibr" rid="B1">2009</xref>), metabolome (Yus et al., <xref ref-type="bibr" rid="B59">2009</xref>; Maier et al., <xref ref-type="bibr" rid="B34">2013</xref>) and several regulatory mechanisms including the role of non-coding RNA&#x00027;s (Llor&#x000E9;ns-Rico et al., <xref ref-type="bibr" rid="B32">2016</xref>). Interactions with the host have also been extensively studied (Rottem, <xref ref-type="bibr" rid="B44">2003</xref>) but despite of the wealth of information on this minimal organism we still cannot explain why there is a preference for colonization of human lung tissue. Knowing that it is not a simple case of adhesion properties (Krause, <xref ref-type="bibr" rid="B28">1996</xref>), we hypothesized that there is a complex combination of functions that determines a bacterial host or niche. To find these functions, we clustered species based on domain presence to find direct leads and ultimately used a random forest classification algorithm on the complete mycoplasma pan-domainome to find sets of domains that predict the specific host or niche of a mycoplasma species. By considering presence or absence of proteins domains we deviate from the classical approach in which bacterial genomes are compared based on orthologous proteins. We found that the persistence of single domains is higher which indicates that conservation of the structural information in the protein domains is more important than maintaining the gene sequences in which the domains are present. A similar result was recently found in a comparative genomics study of 432 <italic>Pseudomonas</italic> species (Koehorst et al., <xref ref-type="bibr" rid="B27">2016a</xref>), indicating that this could be a trend amongst bacterial species. By using Random Forest classification, we could predict with high accuracy whether a mycoplasma species infects tissue or blood and found metabolic properties in the haemoplasma cluster that could explain why this organism successfully infects blood. Zooming into functional species clusters, the prediction accuracy decreases and it is not possible to predict a host or niche of closely related species such as <italic>M. haemofelis</italic> and <italic>M. haemocanis</italic>, within the haemoplasma group, or <italic>M. agalactiae</italic> and <italic>M. bovis</italic>, within the hominis group. Despite the lower prediction accuracy we were still able to identify differences between mycoplasma species in relation with its specific host or niche if we used larger clusters as was shown for the differentiation of mycoplasma colonizing ruminants, pigs, or humans. To determine the specific role of a signifying protein function (e.g., one of the peptidase functions) in host-pathogen interaction would require additional laboratory studies.</p>
<p>To understand in greater detail which factors determine host or niche specificity, more mycoplasma genomes of species of specific interest could be sequenced. This will provide more detailed information on the variation in the domain composition of this species, increasing the accuracy of host prediction. Further information needed to understand host or niche specificity could also follow from functional annotation of proteins without a protein domain, which make up &#x0007E;20% of the total proteome of a mycoplasma species. The machine learning approaches applied did not take domain abundance into account as we used the binary domain matrix as input to avoid overfitting. (Dual-)Trancriptomics studies might provide the additional insight needed to explain the interplay between host and pathogen. For example, a recent study on the chicken pathogen <italic>M. gallisepticum</italic> (Pflaum et al., <xref ref-type="bibr" rid="B38">2015</xref>) showed temporal phase variation in the expression of <italic>vlhA</italic> genes during infection. Finally, the strict host specificity for mycoplasma species can be challenged since several mycoplasma species infect a broad range of hosts (e.g., <italic>M. bovis</italic> and <italic>M. mycoides</italic> subsp. <italic>mycoides</italic>) and mycoplasmas normally isolated from animals are sometimes found in humans and vice versa (Huang et al., <xref ref-type="bibr" rid="B21">2001</xref>; Pitcher and Nicholas, <xref ref-type="bibr" rid="B39">2005</xref>). The assumption of strict host specificity for mycoplasma species could be incorrect and mycoplasma may be able to infect a wider range of hosts and ecosystems than previously anticipated (Citti and Blanchard, <xref ref-type="bibr" rid="B6">2013</xref>).</p>
<p>Our finding that the pan-domainome of the mycoplasma genus is closed supports the general expectation that species with an allopatric lifestyle have a lower chance of gaining genes by horizontal gene transfer (HGT). This finding, however, seems to contradict the recent comparative genomics reports on an open pan-genome for mycoplasma species (Liu et al., <xref ref-type="bibr" rid="B31">2012</xref>; Guimaraes et al., <xref ref-type="bibr" rid="B19">2014</xref>). Possible mechanisms that could contribute to the increase of the pan-genome have been described to be: (1) variation in expression and structure of surface antigens, (2) horizontal gene transfer (HGT), (3) genetic drift, and (4) phage attack (Citti and Blanchard, <xref ref-type="bibr" rid="B6">2013</xref>). HGT events between species outside the mycoplasma genus are rare (Sirand-Pugnet et al., <xref ref-type="bibr" rid="B48">2007</xref>) and phage attacks are not common in mycoplasma species (Tu et al., <xref ref-type="bibr" rid="B52">2001</xref>). Thus, we expect that genetic drift and sequence variations in the regions coding for variable surface proteins contribute to an increase in the pan-genome size but that this increase is mainly related to genes encoding proteins without characterized domains.</p>
<p>Because the pan-domainome of mycoplasma species is closed, sequencing additional strains will not add to the overall systems level understanding of mycoplasma physiology and focus should be on further understanding of the mycoplasma strains for which the genome sequence is known. In this study we incorporated the minimal JCVI-Syn3.0, which is based on a <italic>M. mycoides</italic> template. We considered a protein domain essential when it was present in the minimal synthetic bacterium meaning that the protein domain is needed for growth in a complex cultivation medium under laboratory conditions. We also consider it likely that none of the domains in the minimal synthetic bacterium are needed to maintain growth in the specific host since the genome has been minimized for growth outside the host. By comparing the core domainome of the mycoplasma genus with JCVI-Syn3.0 we found that almost all domains present in the mycoplasma core are also present in the minimal synthetic organism and are likely needed to support growth in axenic media under laboratory conditions. The synthetic bacterial genome still contains 17% of essential protein coding genes with an unknown function. We found that conserved hypothetical proteins in the spiroplasma functional group are conserved in JCVI-Syn3.0. This finding is in line with the general notion that conserved hypothetical proteins are more likely to be essential (Galperin and Koonin, <xref ref-type="bibr" rid="B14">2004</xref>) but in the case of mycoplasma this conservation is limited to mainly the functional cluster, and not to the complete genus. Both findings can provide a guideline for the design of minimal bacterial synthetic genomes. We expect that when mycoplasma species from other functional groups are taken as a template, alternative configurations will emerge showing flexibility in the composition of the pan-domainome of minimal synthetic bacteria designed from mycoplasma ancestors.</p>
</sec>
<sec id="s5">
<title>Author contributions</title>
<p>All authors contributed to study design and interpretation. TK, JB, and PS drafted the manuscript. JK provided scripts and methods used in this research. All authors revised the manuscript and approved the final version. All authors take responsibility for accuracy and integrity of the work.</p>
</sec>
<sec id="s6">
<title>Funding</title>
<p>This work was financially supported by MSD Animal Health, Bioprocess Technology &#x00026; Support, Boxmeer, Netherlands. This project has received funding from the <italic>European Union&#x00027;s Horizon 2020 research and innovation programme</italic> under grant agreement No. 634942.</p>
<sec>
<title>Conflict of interest statement</title>
<p>TK, PV, SJS, and JB are employed by MSD-AH, a pharmaceutical company producing veterinary vaccines. The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ack><p>We thank Pascal Sirand-Pugnet for critically reviewing the manuscript.</p>
</ack>
<sec sec-type="supplementary-material" id="s7">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://journal.frontiersin.org/article/10.3389/fcimb.2017.00031/full#supplementary-material">http://journal.frontiersin.org/article/10.3389/fcimb.2017.00031/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="DataSheet1.XLSX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet2.DOCX" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Batisse</surname> <given-names>C.</given-names></name> <name><surname>Rode</surname> <given-names>M.</given-names></name> <name><surname>Yamada</surname> <given-names>T.</given-names></name> <name><surname>Maier</surname> <given-names>T.</given-names></name> <name><surname>Bader</surname> <given-names>S.</given-names></name> <name><surname>Beltran-Alvarez</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Proteome organization in a genome-reduced bacterium</article-title>. <source>Science</source> <volume>326</volume>, <fpage>1235</fpage>&#x02013;<lpage>1240</lpage>. <pub-id pub-id-type="doi">10.1126/science.1176343</pub-id><pub-id pub-id-type="pmid">19965468</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ben-Menachem</surname> <given-names>G.</given-names></name> <name><surname>Himmelreich</surname> <given-names>R.</given-names></name> <name><surname>Herrmann</surname> <given-names>R.</given-names></name> <name><surname>Aharonowitz</surname> <given-names>Y.</given-names></name> <name><surname>Rottem</surname> <given-names>S.</given-names></name></person-group> (<year>1997</year>). <article-title>The thioredoxin reductase system of mycoplasmas</article-title>. <source>Microbiology</source> <volume>143</volume>, <fpage>1933</fpage>&#x02013;<lpage>1940</lpage>. <pub-id pub-id-type="doi">10.1099/00221287-143-6-1933</pub-id><pub-id pub-id-type="pmid">9202470</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Mach. Learn.</source> <volume>45</volume>, <fpage>5</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Browning</surname> <given-names>G. F.</given-names></name> <name><surname>Marenda</surname> <given-names>M. S.</given-names></name> <name><surname>Noormohammadi</surname> <given-names>A. H.</given-names></name> <name><surname>Markham</surname> <given-names>P. F.</given-names></name></person-group> (<year>2011</year>). <article-title>The central role of lipoproteins in the pathogenesis of mycoplasmoses</article-title>. <source>Vet. Microbiol.</source> <volume>153</volume>, <fpage>44</fpage>&#x02013;<lpage>50</lpage>. <pub-id pub-id-type="doi">10.1016/j.vetmic.2011.05.031</pub-id><pub-id pub-id-type="pmid">21684094</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Ishwaran</surname> <given-names>H.</given-names></name></person-group> (<year>2012</year>). <article-title>Random forests for genomic data analysis</article-title>. <source>Genomics</source> <volume>99</volume>, <fpage>323</fpage>&#x02013;<lpage>329</lpage>. <pub-id pub-id-type="doi">10.1016/j.ygeno.2012.04.003</pub-id><pub-id pub-id-type="pmid">22546560</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Citti</surname> <given-names>C.</given-names></name> <name><surname>Blanchard</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Mycoplasmas and their host: emerging and re-emerging minimal pathogens</article-title>. <source>Trends Microbiol.</source> <volume>21</volume>, <fpage>196</fpage>&#x02013;<lpage>203</lpage>. <pub-id pub-id-type="doi">10.1016/j.tim.2013.01.003</pub-id><pub-id pub-id-type="pmid">23419218</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dandekar</surname> <given-names>T.</given-names></name> <name><surname>Huynen</surname> <given-names>M.</given-names></name> <name><surname>Regula</surname> <given-names>J. T.</given-names></name> <name><surname>Ueberle</surname> <given-names>B.</given-names></name> <name><surname>Zimmermann</surname> <given-names>C. U.</given-names></name> <name><surname>Andrade</surname> <given-names>M. A.</given-names></name> <etal/></person-group>. (<year>2000</year>). <article-title>Re-annotating the <italic>Mycoplasma pneumoniae</italic> genome sequence: adding value, function and reading frames</article-title>. <source>Nucleic Acids Res</source>. <volume>28</volume>, <fpage>3278</fpage>&#x02013;<lpage>3288</lpage>. <pub-id pub-id-type="doi">10.1093/nar/28.17.3278</pub-id><pub-id pub-id-type="pmid">10954595</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>D&#x000ED;az-Maga&#x000F1;a</surname> <given-names>A.</given-names></name> <name><surname>Aguilar-Barajas</surname> <given-names>E.</given-names></name> <name><surname>Moreno-S&#x000E1;nchez</surname> <given-names>R.</given-names></name> <name><surname>Ram&#x000ED;rez-D&#x000ED;az</surname> <given-names>M. I.</given-names></name> <name><surname>Riveros-Rosas</surname> <given-names>H.</given-names></name> <name><surname>Vargas</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Short-chain chromate ion transporter proteins from <italic>Bacillus subtilis</italic> confer chromate resistance in <italic>Escherichia coli</italic></article-title>. <source>J. Bacteriol.</source> <volume>191</volume>, <fpage>5441</fpage>&#x02013;<lpage>5445</lpage>. <pub-id pub-id-type="doi">10.1128/JB.00625-09</pub-id><pub-id pub-id-type="pmid">19581367</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>do Nascimento</surname> <given-names>N. C.</given-names></name> <name><surname>Santos</surname> <given-names>A. P.</given-names></name> <name><surname>Guimaraes</surname> <given-names>A. M. S.</given-names></name> <name><surname>Sanmiguel</surname> <given-names>P. J.</given-names></name> <name><surname>Messick</surname> <given-names>J. B.</given-names></name></person-group> (<year>2012</year>). <article-title>Mycoplasma haemocanis - the canine hemoplasma and its feline counterpart in the genomic era</article-title>. <source>Vet. Res.</source> <volume>43</volume>:<fpage>66</fpage>. <pub-id pub-id-type="doi">10.1186/1297-9716-43-66</pub-id><pub-id pub-id-type="pmid">23020168</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eberl</surname> <given-names>M.</given-names></name> <name><surname>Hintz</surname> <given-names>M.</given-names></name> <name><surname>Jamba</surname> <given-names>Z.</given-names></name> <name><surname>Beck</surname> <given-names>E.</given-names></name> <name><surname>Jomaa</surname> <given-names>H.</given-names></name></person-group> (<year>2004</year>). <article-title><italic>Mycoplasma penetrans</italic> is capable of activating V&#x003B3;9/ V&#x003B4;2 T cells while other human pathogenic mycoplasmas fail to do so</article-title>. <source>Infect. Immun.</source> <volume>72</volume>, <fpage>4881</fpage>&#x02013;<lpage>4883</lpage>. <pub-id pub-id-type="doi">10.1128/iai.72.8.4881-4883.2004</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ekseth</surname> <given-names>O. K.</given-names></name> <name><surname>Kuiper</surname> <given-names>M.</given-names></name> <name><surname>Mironov</surname> <given-names>V.</given-names></name></person-group> (<year>2014</year>). <article-title>OrthAgogue: an agile tool for the rapid prediction of orthology relations</article-title>. <source>Bioinformatics</source> <volume>30</volume>, <fpage>734</fpage>&#x02013;<lpage>736</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btt582</pub-id><pub-id pub-id-type="pmid">24115168</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Enright</surname> <given-names>A. J.</given-names></name> <name><surname>Van Dongen</surname> <given-names>S.</given-names></name> <name><surname>Ouzounis</surname> <given-names>C. A.</given-names></name></person-group> (<year>2002</year>). <article-title>An efficient algorithm for large-scale detection of protein families</article-title>. <source>Nucleic Acids Res.</source> <volume>30</volume>, <fpage>1575</fpage>&#x02013;<lpage>1584</lpage>. <pub-id pub-id-type="doi">10.1093/nar/30.7.1575</pub-id><pub-id pub-id-type="pmid">11917018</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Galili</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>Dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>3718</fpage>&#x02013;<lpage>3720</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv428</pub-id><pub-id pub-id-type="pmid">26209431</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Galperin</surname> <given-names>M. Y.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2004</year>). <article-title>&#x0201C;Conserved hypothetical&#x0201D; proteins: prioritization of targets for experimental study</article-title>. <source>Nucleic Acids Res.</source> <volume>32</volume>, <fpage>5452</fpage>&#x02013;<lpage>5463</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkh885</pub-id><pub-id pub-id-type="pmid">15479782</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gibson</surname> <given-names>D. G.</given-names></name> <name><surname>Glass</surname> <given-names>J. I.</given-names></name> <name><surname>Lartigue</surname> <given-names>C.</given-names></name> <name><surname>Noskov</surname> <given-names>V. N.</given-names></name> <name><surname>Chuang</surname> <given-names>R.-Y.</given-names></name> <name><surname>Algire</surname> <given-names>M. A.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Creation of a bacterial cell controlled by a chemically synthesized genome</article-title>. <source>Science</source> <volume>329</volume>, <fpage>52</fpage>&#x02013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1126/science.1190719</pub-id><pub-id pub-id-type="pmid">20488990</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gil</surname> <given-names>R.</given-names></name> <name><surname>Silva</surname> <given-names>F. J.</given-names></name> <name><surname>Peret&#x000F3;</surname> <given-names>J.</given-names></name> <name><surname>Pereto</surname> <given-names>J.</given-names></name></person-group> (<year>2004</year>). <article-title>Determination of the core of a minimal bacterial gene set determination of the core of a minimal bacterial gene set<sup>&#x02020;</sup></article-title>. <source>Microbiol. Mol. Biol. Rev.</source> <volume>68</volume>, <fpage>518</fpage>&#x02013;<lpage>537</lpage>. <pub-id pub-id-type="doi">10.1128/MMBR.68.3.518-537.2004</pub-id><pub-id pub-id-type="pmid">15353568</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gro&#x000DF;hennig</surname> <given-names>S.</given-names></name> <name><surname>Ischebeck</surname> <given-names>T.</given-names></name> <name><surname>Gibhardt</surname> <given-names>J.</given-names></name> <name><surname>Busse</surname> <given-names>J.</given-names></name> <name><surname>Feussner</surname> <given-names>I.</given-names></name> <name><surname>St&#x000FC;lke</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Hydrogen sulfide is a novel potential virulence factor of <italic>Mycoplasma pneumoniae</italic>: characterization of the unusual cysteine desulfurase/desulfhydrase hape</article-title>. <source>Mol. Microbiol.</source> <volume>100</volume>, <fpage>42</fpage>&#x02013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.1111/mmi.13300</pub-id><pub-id pub-id-type="pmid">26711628</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>G&#x000FC;ell</surname> <given-names>M.</given-names></name> <name><surname>van Noort</surname> <given-names>V.</given-names></name> <name><surname>Yus</surname> <given-names>E.</given-names></name> <name><surname>Chen</surname> <given-names>W.-H.</given-names></name> <name><surname>Leigh-Bell</surname> <given-names>J.</given-names></name> <name><surname>Michalodimitrakis</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Transcriptome complexity in a genome-reduced bacterium</article-title>. <source>Science</source> <volume>326</volume>, <fpage>1268</fpage>&#x02013;<lpage>1271</lpage>. <pub-id pub-id-type="doi">10.1126/science.1176951</pub-id><pub-id pub-id-type="pmid">19965477</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guimaraes</surname> <given-names>A. M. S.</given-names></name> <name><surname>Santos</surname> <given-names>A. P.</given-names></name> <name><surname>Do Nascimento</surname> <given-names>N. C.</given-names></name> <name><surname>Timenetsky</surname> <given-names>J.</given-names></name> <name><surname>Messick</surname> <given-names>J. B.</given-names></name></person-group> (<year>2014</year>). <article-title>Comparative genomics and phylogenomics of hemotrophic mycoplasmas</article-title>. <source>PLoS ONE</source> <volume>9</volume>:<fpage>e91445</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0091445</pub-id><pub-id pub-id-type="pmid">24642917</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guimaraes</surname> <given-names>A. M. S.</given-names></name> <name><surname>Santos</surname> <given-names>A. P.</given-names></name> <name><surname>SanMiguel</surname> <given-names>P.</given-names></name> <name><surname>Walter</surname> <given-names>T.</given-names></name> <name><surname>Timenetsky</surname> <given-names>J.</given-names></name> <name><surname>Messick</surname> <given-names>J. B.</given-names></name></person-group> (<year>2011</year>). <article-title>Complete genome sequence of Mycoplasma suis and insights into its biology and adaption to an erythrocyte niche</article-title>. <source>PLoS ONE</source> <volume>6</volume>:<fpage>e19574</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0019574</pub-id><pub-id pub-id-type="pmid">21573007</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>J. Y.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Meng</surname> <given-names>L.</given-names></name> <name><surname>Shou</surname> <given-names>C. C.</given-names></name></person-group> (<year>2001</year>). <article-title>Mycoplasma infections and different human carcinomas</article-title>. <source>World J. Gastroenterol.</source> <volume>7</volume>, <fpage>266</fpage>&#x02013;<lpage>269</lpage>. <pub-id pub-id-type="doi">10.3748/wjg.v7.i2.266</pub-id><pub-id pub-id-type="pmid">11819772</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hunter</surname> <given-names>S.</given-names></name> <name><surname>Jones</surname> <given-names>P.</given-names></name> <name><surname>Mitchell</surname> <given-names>A.</given-names></name> <name><surname>Apweiler</surname> <given-names>R.</given-names></name> <name><surname>Attwood</surname> <given-names>T. K.</given-names></name> <name><surname>Bateman</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>InterPro in 2011: new developments in the family and domain prediction database</article-title>. <source>Nucleic Acids Res.</source> <volume>40</volume>, <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gks456</pub-id><pub-id pub-id-type="pmid">22096229</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hutchison</surname> <given-names>C. A.</given-names></name> <name><surname>Chuang</surname> <given-names>R.-Y.</given-names></name> <name><surname>Noskov</surname> <given-names>V. N.</given-names></name> <name><surname>Assad-Garcia</surname> <given-names>N.</given-names></name> <name><surname>Deerinck</surname> <given-names>T. J.</given-names></name> <name><surname>Ellisman</surname> <given-names>M. H.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Design and synthesis of a minimal bacterial genome</article-title>. <source>Science</source> <volume>351</volume>:<fpage>aad6253</fpage>. <pub-id pub-id-type="doi">10.1126/science.aad6253</pub-id><pub-id pub-id-type="pmid">27013737</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hyatt</surname> <given-names>D.</given-names></name> <name><surname>Chen</surname> <given-names>G.-L.</given-names></name> <name><surname>Locascio</surname> <given-names>P. F.</given-names></name> <name><surname>Land</surname> <given-names>M. L.</given-names></name> <name><surname>Larimer</surname> <given-names>F. W.</given-names></name> <name><surname>Hauser</surname> <given-names>L. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Prodigal: prokaryotic gene recognition and translation initiation site identification</article-title>. <source>BMC Bioinformatics</source> <volume>11</volume>:<fpage>119</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-119</pub-id><pub-id pub-id-type="pmid">20211023</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>P.</given-names></name> <name><surname>Binns</surname> <given-names>D.</given-names></name> <name><surname>Chang</surname> <given-names>H.-Y.</given-names></name> <name><surname>Fraser</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>McAnulla</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>InterProScan 5: genome-scale protein function classification</article-title>. <source>Bioinformatics</source> <volume>30</volume>, <fpage>1236</fpage>&#x02013;<lpage>1240</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu031</pub-id><pub-id pub-id-type="pmid">24451626</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koehorst</surname> <given-names>J. J.</given-names></name> <name><surname>Saccenti</surname> <given-names>E.</given-names></name> <name><surname>Schaap</surname> <given-names>P. J.</given-names></name> <name><surname>Martins dos Santos</surname> <given-names>V. A. P.</given-names></name> <name><surname>Suarez-Diez</surname> <given-names>M.</given-names></name></person-group> (<year>2016b</year>). <article-title>Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics</article-title>. <source>F1000Research</source> <volume>5</volume>, <fpage>1987</fpage>. <pub-id pub-id-type="doi">10.12688/f1000research.9416.2</pub-id><pub-id pub-id-type="pmid">27703668</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koehorst</surname> <given-names>J. J.</given-names></name> <name><surname>van Dam</surname> <given-names>J. C. J.</given-names></name> <name><surname>van Heck</surname> <given-names>R. G. A.</given-names></name> <name><surname>Saccenti</surname> <given-names>E.</given-names></name> <name><surname>dos Santos</surname> <given-names>V. A. P. M.</given-names></name> <name><surname>Suarez-Diez</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2016a</year>). <article-title>Comparison of 432 Pseudomonas strains through integration of genomic, functional, metabolic and expression data</article-title>. <source>Sci. Rep.</source> <volume>6</volume>:<fpage>38699</fpage>. <pub-id pub-id-type="doi">10.1038/srep38699</pub-id><pub-id pub-id-type="pmid">27922098</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krause</surname> <given-names>D. C.</given-names></name></person-group> (<year>1996</year>). <article-title><italic>Mycoplasma pneumoniae</italic> cytadherence: unravelling the tie that binds</article-title>. <source>Mol. Microbiol.</source> <volume>20</volume>, <fpage>247</fpage>&#x02013;<lpage>253</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-2958.1996.tb02613.x</pub-id><pub-id pub-id-type="pmid">8733224</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuznetsov</surname> <given-names>V.</given-names></name> <name><surname>Pickalov</surname> <given-names>V.</given-names></name> <name><surname>Kanapin</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <article-title>Proteome complexity measures based on counting of domain-to-protein links for replicative and non-replicative domains</article-title>. <source>Bioinform. Genome Regul. Struct. II</source> <fpage>329</fpage>&#x02013;<lpage>341</lpage>. <pub-id pub-id-type="doi">10.1007/0-387-29455-4_32</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liaw</surname> <given-names>A.</given-names></name> <name><surname>Wiener</surname> <given-names>M.</given-names></name></person-group> (<year>2002</year>). <article-title>Classification and Regression by randomForest</article-title>. <source>R News</source> <volume>2</volume>, <fpage>18</fpage>&#x02013;<lpage>22</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/doc/Rnews/">http://cran.r-project.org/doc/Rnews/</ext-link></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>W.</given-names></name> <name><surname>Fang</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Guo</surname> <given-names>S.</given-names></name> <name><surname>Luo</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Comparative genomics of Mycoplasma: analysis of conserved essential genes and diversity of the pan-genome</article-title>. <source>PLoS ONE</source> <volume>7</volume>:<fpage>e35698</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0035698</pub-id><pub-id pub-id-type="pmid">22536428</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Llor&#x000E9;ns-Rico</surname> <given-names>V.</given-names></name> <name><surname>Cano</surname> <given-names>J.</given-names></name> <name><surname>Kamminga</surname> <given-names>T.</given-names></name> <name><surname>Gil</surname> <given-names>R.</given-names></name> <name><surname>Latorre</surname> <given-names>A.</given-names></name> <name><surname>Chen</surname> <given-names>W.-H.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Bacterial antisense RNAs are mainly the product of transcriptional noise</article-title>. <source>Sci. Adv</source>. <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1126/sciadv.1501363</pub-id><pub-id pub-id-type="pmid">26973873</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lluch-Senar</surname> <given-names>M.</given-names></name> <name><surname>Delgado</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Llor&#x000E9;ns-Rico</surname> <given-names>V.</given-names></name> <name><surname>O&#x00027;Reilly</surname> <given-names>F. J.</given-names></name> <name><surname>Wodke</surname> <given-names>J. A.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Defining a minimal cell: essentiality of small ORFs and ncRNAs in a genome-reduced bacterium</article-title>. <source>Mol. Syst. Biol</source>. <volume>11</volume>:<fpage>780</fpage>. <pub-id pub-id-type="doi">10.15252/msb.20145558</pub-id><pub-id pub-id-type="pmid">25609650</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maier</surname> <given-names>T.</given-names></name> <name><surname>Marcos</surname> <given-names>J.</given-names></name> <name><surname>Wodke</surname> <given-names>J. A. H.</given-names></name> <name><surname>Paetzold</surname> <given-names>B.</given-names></name> <name><surname>Liebeke</surname> <given-names>M.</given-names></name> <name><surname>Guti&#x000E9;rrez-Gallego</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Large-scale metabolome analysis and quantitative integration with genomics and proteomics data in <italic>Mycoplasma pneumoniae</italic></article-title>. <source>Mol. Biosyst.</source> <volume>9</volume>, <fpage>1743</fpage>&#x02013;<lpage>1755</lpage>. <pub-id pub-id-type="doi">10.1039/c3mb70113a</pub-id><pub-id pub-id-type="pmid">23598864</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McMenamy</surname> <given-names>R. H.</given-names></name> <name><surname>Lund</surname> <given-names>C. C.</given-names></name> <name><surname>Oncley</surname> <given-names>J. L.</given-names></name></person-group> (<year>1957</year>). <article-title>Unbound amino acid concentrations in human blood plasmas</article-title>. <source>J. Clin. Invest.</source> <volume>1</volume>, <fpage>1672</fpage>&#x02013;<lpage>1679</lpage>. <pub-id pub-id-type="doi">10.1172/jci103568</pub-id><pub-id pub-id-type="pmid">13491698</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nouvel</surname> <given-names>L. X.</given-names></name> <name><surname>Sirand-Pugnet</surname> <given-names>P.</given-names></name> <name><surname>Marenda</surname> <given-names>M. S.</given-names></name> <name><surname>Sagn&#x000E9;</surname> <given-names>E.</given-names></name> <name><surname>Barbe</surname> <given-names>V.</given-names></name> <name><surname>Mangenot</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Comparative genomic and proteomic analyses of two <italic>Mycoplasma agalactiae</italic> strains: clues to the macro- and micro-events that are shaping mycoplasma diversity</article-title>. <source>BMC Genomics</source> <volume>11</volume>:<fpage>86</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2164-11-86</pub-id><pub-id pub-id-type="pmid">20122262</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paradis</surname> <given-names>E.</given-names></name> <name><surname>Claude</surname> <given-names>J.</given-names></name> <name><surname>Strimmer</surname> <given-names>K.</given-names></name></person-group> (<year>2004</year>). <article-title>APE: analyses of phylogenetics and evolution in R language</article-title>. <source>Bioinformatics</source> <volume>20</volume>, <fpage>289</fpage>&#x02013;<lpage>290</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btg412</pub-id><pub-id pub-id-type="pmid">14734327</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pflaum</surname> <given-names>K.</given-names></name> <name><surname>Tulman</surname> <given-names>E. R.</given-names></name> <name><surname>Beaudet</surname> <given-names>J.</given-names></name> <name><surname>Liao</surname> <given-names>X.</given-names></name> <name><surname>Geary</surname> <given-names>S. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Global changes in <italic>Mycoplasma gallisepticum</italic> phase-variable lipoprotein gene vlhA expression during <italic>in vivo</italic> infection of the natural chicken host</article-title>. <source>Infect. Immun.</source> <volume>84</volume>, <fpage>351</fpage>&#x02013;<lpage>355</lpage>. <pub-id pub-id-type="doi">10.1128/IAI.01092-15</pub-id><pub-id pub-id-type="pmid">26553465</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pitcher</surname> <given-names>D. G.</given-names></name> <name><surname>Nicholas</surname> <given-names>R. A. J.</given-names></name></person-group> (<year>2005</year>). <article-title>Mycoplasma host specificity: fact or fiction?</article-title> <source>Vet. J.</source> <volume>170</volume>, <fpage>300</fpage>&#x02013;<lpage>306</lpage>. <pub-id pub-id-type="doi">10.1016/j.tvjl.2004.08.011</pub-id><pub-id pub-id-type="pmid">16266844</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pollack</surname> <given-names>J. D.</given-names></name> <name><surname>Williams</surname> <given-names>M. V.</given-names></name> <name><surname>McElhaney</surname> <given-names>R. N.</given-names></name></person-group> (<year>1997</year>). <article-title>The comparative metabolism of the mollicutes (Mycoplasmas): the utility for taxonomic classification and the relationship of putative gene annotation and phylogeny to enzymatic function in the smallest free-living cells</article-title>. <source>Crit. Rev. Microbiol.</source> <volume>23</volume>, <fpage>269</fpage>&#x02013;<lpage>354</lpage>. <pub-id pub-id-type="doi">10.3109/10408419709115140</pub-id><pub-id pub-id-type="pmid">9439886</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quast</surname> <given-names>C.</given-names></name> <name><surname>Pruesse</surname> <given-names>E.</given-names></name> <name><surname>Yilmaz</surname> <given-names>P.</given-names></name> <name><surname>Gerken</surname> <given-names>J.</given-names></name> <name><surname>Schweer</surname> <given-names>T.</given-names></name> <name><surname>Yarza</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>The SILVA ribosomal RNA gene database project: improved data processing and web-based tools</article-title>. <source>Nucleic Acids Res.</source> <volume>41</volume>, <fpage>590</fpage>&#x02013;<lpage>596</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gks1219</pub-id><pub-id pub-id-type="pmid">23193283</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Razin</surname> <given-names>S.</given-names></name> <name><surname>Yogev</surname> <given-names>D.</given-names></name></person-group> (<year>1998</year>). <article-title>Molecular biology and pathogenicity of mycoplasmas</article-title>. <source>Microbiol. Mol. Biol. Rev.</source> <volume>62</volume>, <fpage>1094</fpage>&#x02013;<lpage>1156</lpage>. <pub-id pub-id-type="pmid">9841667</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosengarten</surname> <given-names>R.</given-names></name> <name><surname>Citti</surname> <given-names>C.</given-names></name> <name><surname>Glew</surname> <given-names>M.</given-names></name> <name><surname>Lischewski</surname> <given-names>A.</given-names></name> <name><surname>Droesse</surname> <given-names>M.</given-names></name> <name><surname>Much</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2000</year>). <article-title>Host-pathogen interactions in mycoplasma pathogenesis: virulence and survival strategies of minimalist prokaryotes</article-title>. <source>Int. J. Med. Microbiol.</source> <volume>290</volume>, <fpage>15</fpage>&#x02013;<lpage>25</lpage>. <pub-id pub-id-type="doi">10.1016/S1438-4221(00)80099-5</pub-id><pub-id pub-id-type="pmid">11043978</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rottem</surname> <given-names>S.</given-names></name></person-group> (<year>2003</year>). <article-title>Interaction of mycoplasmas with host cells</article-title>. <source>Physiol. Rev.</source> <volume>83</volume>, <fpage>417</fpage>&#x02013;<lpage>432</lpage>. <pub-id pub-id-type="doi">10.1152/physrev.00030.2002</pub-id><pub-id pub-id-type="pmid">12663864</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rouli</surname> <given-names>L.</given-names></name> <name><surname>Merhej</surname> <given-names>V.</given-names></name> <name><surname>Fournier</surname> <given-names>P.-E.</given-names></name> <name><surname>Raoult</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>The bacterial pangenome as a new tool for analyzing pathogenic bacteria</article-title>. <source>New Microbes New Infect.</source> <volume>7</volume>, <fpage>72</fpage>&#x02013;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.1016/j.nmni.2015.06.005</pub-id><pub-id pub-id-type="pmid">26442149</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saccenti</surname> <given-names>E.</given-names></name> <name><surname>Nieuwenhuijse</surname> <given-names>D.</given-names></name> <name><surname>Koehorst</surname> <given-names>J. J.</given-names></name> <name><surname>Dos Santos</surname> <given-names>V. A. P. M.</given-names></name> <name><surname>Schaap</surname> <given-names>P. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Assessing the metabolic diversity of streptococcus from a protein domain point of view</article-title>. <source>PLoS ONE</source> <volume>10</volume>:<fpage>e0137908</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0137908</pub-id><pub-id pub-id-type="pmid">26366735</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Santos</surname> <given-names>A. P.</given-names></name> <name><surname>Guimaraes</surname> <given-names>A. M. S.</given-names></name> <name><surname>do Nascimento</surname> <given-names>N. C.</given-names></name> <name><surname>Sanmiguel</surname> <given-names>P. J.</given-names></name> <name><surname>Martin</surname> <given-names>S. W.</given-names></name> <name><surname>Messick</surname> <given-names>J. B.</given-names></name></person-group> (<year>2011</year>). <article-title>Genome of <italic>Mycoplasma haemofelis</italic>, unraveling its strategies for survival and persistence</article-title>. <source>Vet. Res.</source> <volume>42</volume>, <fpage>102</fpage>. <pub-id pub-id-type="doi">10.1186/1297-9716-42-102</pub-id><pub-id pub-id-type="pmid">21936946</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sirand-Pugnet</surname> <given-names>P.</given-names></name> <name><surname>Lartigue</surname> <given-names>C.</given-names></name> <name><surname>Marenda</surname> <given-names>M.</given-names></name> <name><surname>Jacob</surname> <given-names>D.</given-names></name> <name><surname>Barr&#x000E9;</surname> <given-names>A.</given-names></name> <name><surname>Barbe</surname> <given-names>V.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>Being pathogenic, plastic, and sexual while living with a nearly minimal bacterial genome</article-title>. <source>PLoS Genet.</source> <volume>3</volume>:<fpage>e75</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pgen.0030075</pub-id><pub-id pub-id-type="pmid">17511520</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Snipen</surname> <given-names>L.</given-names></name> <name><surname>Alm&#x000F8;y</surname> <given-names>T.</given-names></name> <name><surname>Ussery</surname> <given-names>D. W.</given-names></name></person-group> (<year>2009</year>). <article-title>Microbial comparative pan-genomics using binomial mixture models</article-title>. <source>BMC Genomics</source> <volume>10</volume>:<fpage>385</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2164-10-385</pub-id><pub-id pub-id-type="pmid">19691844</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Temple Lang</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <source>RCurl: General Network (HTTP/FTP/&#x02026;) Client Interface for R</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/package=RCurl">https://cran.r-project.org/package=RCurl</ext-link></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tripp</surname> <given-names>H. J.</given-names></name> <name><surname>Sutton</surname> <given-names>G.</given-names></name> <name><surname>White</surname> <given-names>O.</given-names></name> <name><surname>Wortman</surname> <given-names>J.</given-names></name> <name><surname>Pati</surname> <given-names>A.</given-names></name> <name><surname>Mikhailova</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Toward a standard in structural genome annotation for prokaryotes</article-title>. <source>Stand. Genomic Sci.</source> <volume>10</volume>, <fpage>45</fpage>. <pub-id pub-id-type="doi">10.1186/s40793-015-0034-9</pub-id><pub-id pub-id-type="pmid">26380633</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tu</surname> <given-names>A. H.</given-names></name> <name><surname>Voelker</surname> <given-names>L. L.</given-names></name> <name><surname>Shen</surname> <given-names>X.</given-names></name> <name><surname>Dybvig</surname> <given-names>K.</given-names></name></person-group> (<year>2001</year>). <article-title>Complete nucleotide sequence of the mycoplasma virus P1 genome</article-title>. <source>Plasmid</source> <volume>45</volume>, <fpage>122</fpage>&#x02013;<lpage>126</lpage>. <pub-id pub-id-type="doi">10.1006/plas.2000.1501</pub-id><pub-id pub-id-type="pmid">11322826</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Van Hage</surname> <given-names>W. R</given-names></name></person-group> (<year>2013</year>). <source>SPARQL: SPARQL Client</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/package=SPARQL">https://cran.r-project.org/package=SPARQL</ext-link></citation>
</ref>
<ref id="B54">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Venables</surname> <given-names>W. N.</given-names></name> <name><surname>Ripley</surname> <given-names>B. D.</given-names></name></person-group> (<year>2002</year>). <article-title>Random and mixed effects</article-title>, in <source>Modern Applied Statistics with S</source>, eds <person-group person-group-type="editor"><name><surname>Chambers</surname> <given-names>J.</given-names></name> <name><surname>Eddy</surname> <given-names>W.</given-names></name> <name><surname>H&#x000E4;rdle</surname> <given-names>W.</given-names></name> <name><surname>Sheather</surname> <given-names>S.</given-names></name> <name><surname>Tierney</surname> <given-names>L.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer New York</publisher-name>), <fpage>271</fpage>&#x02013;<lpage>300</lpage>. <pub-id pub-id-type="doi">10.1007/978-0-387-21706-2_10</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vilei</surname> <given-names>E. M.</given-names></name> <name><surname>Frey</surname> <given-names>J.</given-names></name></person-group> (<year>2001</year>). <article-title>Genetic and biochemical characterization of glycerol uptake in <italic>mycoplasma mycoides</italic> subsp. mycoides SC: its impact on H(2)O(2) production and virulence</article-title>. <source>Clin. Diagn. Lab. Immunol</source>. <volume>8</volume>, <fpage>85</fpage>&#x02013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1128/cdli.8.1.85-92.2001</pub-id><pub-id pub-id-type="pmid">11139200</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weisburg</surname> <given-names>W. G.</given-names></name> <name><surname>Tully</surname> <given-names>J. G.</given-names></name> <name><surname>Rose</surname> <given-names>D. L.</given-names></name> <name><surname>Petzel</surname> <given-names>J. P.</given-names></name> <name><surname>Oyaizu</surname> <given-names>H.</given-names></name> <name><surname>Yang</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>1989</year>). <article-title>A phylogenetic analysis of the mycoplasmas: basis for their classification</article-title>. <source>J. Bacteriol</source>. <volume>171</volume>, <fpage>6455</fpage>&#x02013;<lpage>6467</lpage>. <pub-id pub-id-type="doi">10.1128/jb.171.12.6455-6467.1989</pub-id><pub-id pub-id-type="pmid">2592342</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wodke</surname> <given-names>J. A. H.</given-names></name> <name><surname>Pucha&#x00142;ka</surname> <given-names>J.</given-names></name> <name><surname>Lluch-Senar</surname> <given-names>M.</given-names></name> <name><surname>Marcos</surname> <given-names>J.</given-names></name> <name><surname>Yus</surname> <given-names>E.</given-names></name> <name><surname>Godinho</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Dissecting the energy metabolism in <italic>Mycoplasma pneumoniae</italic> through genome-scale metabolic modeling</article-title>. <source>Mol. Syst. Biol.</source> <volume>9</volume>, <fpage>653</fpage>. <pub-id pub-id-type="doi">10.1038/msb.2013.6</pub-id><pub-id pub-id-type="pmid">23549481</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolf</surname> <given-names>Y. I.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2012</year>). <article-title>A tight link between orthologs and bidirectional best hits in bacterial and archaeal genomes</article-title>. <source>Genome Biol. Evol.</source> <volume>4</volume>, <fpage>1286</fpage>&#x02013;<lpage>1294</lpage>. <pub-id pub-id-type="doi">10.1093/gbe/evs100</pub-id><pub-id pub-id-type="pmid">23160176</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yus</surname> <given-names>E.</given-names></name> <name><surname>Maier</surname> <given-names>T.</given-names></name> <name><surname>Michalodimitrakis</surname> <given-names>K.</given-names></name> <name><surname>van Noort</surname> <given-names>V.</given-names></name> <name><surname>Yamada</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>W.-H.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Impact of genome reduction on bacterial metabolism and its regulation</article-title>. <source>Science</source> <volume>326</volume>, <fpage>1263</fpage>&#x02013;<lpage>1268</lpage>. <pub-id pub-id-type="doi">10.1126/science.1177263</pub-id><pub-id pub-id-type="pmid">19965476</pub-id></citation>
</ref>
</ref-list>
</back>
</article>
