<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2021.677558</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Pan-Genome of the Genus <italic>Streptomyces</italic> and Prioritization of Biosynthetic Gene Clusters With Potential to Produce Antibiotic Compounds</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Caicedo-Montoya</surname>
<given-names>Carlos</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="fn3" ref-type="author-notes"><sup>&#x2020;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1258984/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Manzo-Ruiz</surname>
<given-names>Monserrat</given-names>
</name>
<xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<xref rid="fn3" ref-type="author-notes"><sup>&#x2020;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1274712/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>R&#x00ED;os-Estepa</surname>
<given-names>Rigoberto</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/622511/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup>
<institution>Grupo de Bioprocesos, Departamento de Ingenier&#x00ED;a Qu&#x00ED;mica, Universidad de Antioquia (UdeA)</institution>, <addr-line>Medell&#x00ED;n</addr-line>, <country>Colombia</country>
</aff>
<aff id="aff2"><sup>2</sup>
<institution>Departamento de Biolog&#x00ED;a Molecular y Biotecnolog&#x00ED;a, Instituto de Investigaciones Biom&#x00E9;dicas, Universidad Nacional Aut&#x00F3;noma de M&#x00E9;xico</institution>, <addr-line>Ciudad de M&#x00E9;xico</addr-line>, <country>Mexico</country>
</aff>
<author-notes>
<fn id="fn1" fn-type="edited-by">
<p>Edited by: Narjol Gonz&#x00E1;lez-Escalona, United States Food and Drug Administration, United States</p>
</fn>
<fn id="fn2" fn-type="edited-by">
<p>Reviewed by: Omkar S. Mohite, Novo Nordisk Foundation Center for Biosustainability (DTU Biosustain), Denmark; Diogo Antonio Tschoeke, Federal University of Rio de Janeiro, Brazil</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Rigoberto R&#x00ED;os-Estepa, <email>rigoberto.rios@udea.edu.co</email>
</corresp>
<fn id="fn3" fn-type="equal">
<p><sup>&#x2020;</sup>These authors have contributed equally to this work</p>
</fn>
<fn id="fn4" fn-type="other">
<p>This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>28</day>
<month>09</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>677558</elocation-id>
<history>
<date date-type="received">
<day>07</day>
<month>03</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>08</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 Caicedo-Montoya, Manzo-Ruiz and R&#x00ED;os-Estepa.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Caicedo-Montoya, Manzo-Ruiz and R&#x00ED;os-Estepa</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Species of the genus <italic>Streptomyces</italic> are known for their ability to produce multiple secondary metabolites; their genomes have been extensively explored to discover new bioactive compounds. The richness of genomic data currently available allows filtering for high quality genomes, which in turn permits reliable comparative genomics studies and an improved prediction of biosynthetic gene clusters (BGCs) through genome mining approaches. In this work, we used 121 genome sequences of the genus <italic>Streptomyces</italic> in a comparative genomics study with the aim of estimating the genomic diversity by protein domains content, sequence similarity of proteins and conservation of Intergenic Regions (IGRs). We also searched for BGCs but prioritizing those with potential antibiotic activity. Our analysis revealed that the pan-genome of the genus <italic>Streptomyces</italic> is clearly open, with a high quantity of unique gene families across the different species and that the IGRs are rarely conserved. We also described the phylogenetic relationships of the analyzed genomes using multiple markers, obtaining a trustworthy tree whose relationships were further validated by Average Nucleotide Identity (ANI) calculations. Finally, 33 biosynthetic gene clusters were detected to have potential antibiotic activity and a predicted mode of action, which might serve up as a guide to formulation of related experimental studies.</p>
</abstract>
<kwd-group>
<kwd>pan-genome</kwd>
<kwd><italic>Streptomyces</italic></kwd>
<kwd>genome mining</kwd>
<kwd>comparative genomics</kwd>
<kwd>biosynthetic gene cluster</kwd>
</kwd-group>
<contract-num rid="cn1">80740-959-2019</contract-num>
<contract-sponsor id="cn1">MINCIENCIAS &#x2013; Colombia &#x2013; Convocatoria</contract-sponsor>
<counts>
<fig-count count="6"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="86"/>
<page-count count="17"/>
<word-count count="12252"/>
</counts>
</article-meta>
</front>
<body>
<sec id="sec1" sec-type="intro">
<title>Introduction</title>
<p><italic>Streptomyces</italic> is the most prolific genus of the phylum <italic>Actinobacteria</italic> in terms of secondary metabolites production with high societal impact. It is estimated that members of the genus <italic>Streptomyces</italic> produce more than 50% of bioactive compounds produced by bacteria (<xref ref-type="bibr" rid="ref13">Doroghazi and Metcalf, 2013</xref>). The variety of bioactive compounds produced by this genus include, among others, antifungals (e.g., amphotericin B by <italic>Streptomyces nodosus</italic>), anti-parasitic (e.g., avermectins by <italic>Streptomyces avermitilis</italic>), antivirals (e.g., virantmycin by <italic>Streptomyces nitrosporeus</italic>), immunosuppressant (e.g., rapamycin by <italic>Streptomyces hygroscopicus</italic> and other strains), chemotherapeutics (e.g., daunorubicin by <italic>Streptomyces peucetius</italic>), and a wide variety of antibiotics as tetracycline produced by <italic>Streptomyces rimosus</italic> and streptomycin produced by <italic>Streptomyces griseus</italic> (<xref ref-type="bibr" rid="ref47">Nakagawa et al., 1981</xref>; <xref ref-type="bibr" rid="ref53">Pham et al., 2019</xref>).</p>
<p>The advent of next-generation sequencing technologies unveiled the metabolic potential of bacteria as producers of secondary metabolites. <italic>Streptomyces coelicolor</italic>, for instance, only produces actinorhodin, undecylprodigiosin, calcium-dependent antibiotic and methylenomycin at laboratory conditions, though its genome contains over 20 biosynthetic gene clusters (<xref ref-type="bibr" rid="ref6">Challis and Hopwood, 2003</xref>). The rapid progress on genomic sequencing and the decrease in sequencing prices have enabled obtaining a vast quantity of genomes that has led to a deeper knowledge of microorganisms capable of synthesizing bioactive compounds, and the discovery of biosynthetic gene clusters that might produce novel compounds with clinical and commercial value (<xref ref-type="bibr" rid="ref26">Kalkreuter et al., 2020</xref>).</p>
<p>The analysis of such amount of genomic data is a challenge; nevertheless, it may pave the way for performing comparative genomics studies, which help revealing the microbial diversity of a genus, genes involved in environmental adaptations, antibiotic resistance, and genes that confer the ability to colonize novel niches (<xref ref-type="bibr" rid="ref70">Tettelin et al., 2008</xref>; <xref ref-type="bibr" rid="ref49">Niu, 2018</xref>). Previous comparative genomic studies in the genus <italic>Streptomyces</italic> showed the genetic variability and the biosynthetic potential of the genus (<xref ref-type="bibr" rid="ref23">Jackson et al., 2018</xref>; <xref ref-type="bibr" rid="ref83">Xu et al., 2019</xref>; <xref ref-type="bibr" rid="ref3">Belknap et al., 2020</xref>; <xref ref-type="bibr" rid="ref35">Lee et al., 2020</xref>); genes involved in osmotic stress defense, symbiotic interactions, among other environmental niche adaptation genes were identified in marine <italic>Streptomyces</italic> (<xref ref-type="bibr" rid="ref73">Tian et al., 2016</xref>; <xref ref-type="bibr" rid="ref2">Almeida et al., 2019</xref>).</p>
<p>These reports, however, include only a few <italic>Streptomyces</italic> strains or a mix of complete and incomplete genomes that could render unreliable results. In this study, we present the first comparative genomic study for the genus <italic>Streptomyces</italic> with a large amount of complete and high-quality genomes available, unveiling the pan-genome in terms of protein sequence similarity and protein domains content, describing phylogenetic relationships of the analyzed genomes, as well as highlighting the variability of their intergenic regions (IGRs) and their capability of producing bioactive compounds; the study prioritizes in the biosynthetic gene clusters (BGCs) with potential antibiotic activity and a predicted mode of action according to the co-localization of duplicated self-resistance genes.</p>
</sec>
<sec id="sec2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="sec3">
<title>Selection of Genomes</title>
<p>For the present study, the genomes were selected based on their quality and completeness. All genomes of the genus <italic>Streptomyces</italic> with status &#x201C;Complete Genomes&#x201D; were downloaded from Reference Sequence (Refseq; December 2019). After manual curation, 121 high quality genomes were included for the subsequent analysis. We further evaluated the genome assembly quality through the determination of genome completeness with BUSCO against the lineage dataset <italic>streptomycetales_odb10</italic>, which contains 145 species and 1,579 BUSCOs (<xref ref-type="bibr" rid="ref64">Sim&#x00E3;o et al., 2015</xref>).</p>
</sec>
<sec id="sec4">
<title>Pan-Genome Estimation</title>
<p>Genomes were downloaded from RefSeq in <italic>genbank</italic> and <italic>faa</italic> formats. We executed Roary to calculate the pan-genome for the genus <italic>Streptomyces</italic> (<xref ref-type="bibr" rid="ref50">Page et al., 2015</xref>). Previously, we evaluated other software for pan-genomics studies such as BPGA (<xref ref-type="bibr" rid="ref7">Chaudhari et al., 2016</xref>), GET_HOMOLOGUES (<xref ref-type="bibr" rid="ref9">Contreras-Moreira and Vinuesa, 2013</xref>), Micropan in mode <italic>blast all vs all</italic> (<xref ref-type="bibr" rid="ref65">Snipen and Liland, 2015</xref>) and Roary. The latter was selected because it had one of the lowest running times and generates the output for conservation of intergenic regions analysis, while producing similar results to the other tools. Roary requires genomes in gff3 format along with the sequences at the end of the file. To produce such files, we converted the <italic>genbank</italic> files using the BioPerl script <italic>bp_genbank2gff3.pl</italic> (<xref ref-type="bibr" rid="ref43">McKay, 2004</xref>). The minimum percentage identity for BLASTp searches was set to 70%; the splitting of paralogs was blocked because it was required for the determination of the conserved IGRs. The maximum number of clusters was adjusted to 170,000. An alignment of core genes detected by Roary was created using MAFFT (<xref ref-type="bibr" rid="ref28">Katoh and Standley, 2013</xref>) and we utilized this alignment to build a maximum likelihood phylogenetic tree using the software FastTree2 (<xref ref-type="bibr" rid="ref55">Price et al., 2010</xref>) implemented in the Galaxy Europe server (<xref ref-type="bibr" rid="ref19">Goecks et al., 2010</xref>).</p>
<p>In addition, we characterized the pan-genome of the genus <italic>Streptomyces</italic> based on the domain diversity of proteins encoded in the analyzed genomes. We used the R package Micropan version 1.2 (<xref ref-type="bibr" rid="ref65">Snipen and Liland, 2015</xref>). Briefly, all the amino acid sequences of encoded proteins of the 121 genomes were annotated for their domain content with HMMER 3.3.1 (<xref ref-type="bibr" rid="ref14">Eddy, 2011</xref>) against the Pfam-A database (<xref ref-type="bibr" rid="ref17">Finn et al., 2014</xref>). Clustering made by Micropan is based on the presence of domains; thus, proteins sharing the same domains were grouped in the same gene family or cluster. The function BionomixEstimate of Micropan was implemented to extrapolate the size of the pan-genome using the presence/absence matrix resulting from both previous analyses. For both methodologies, we also determined the fluidity and the Jaccard distance for the genomes of the streptomycetes using the corresponding functions in Micropan.</p>
<p>All genes were classified as core, soft-core, shell, and cloud genes according to their presence among the genomes analyzed. Thus, genes present in the 121 strains were designated as core genes; genes present in more than 95% of strains (115 strains) were classified as soft-core genes; shell genes were those with a presence between 15 and 95% (19 and 114 strains), and genes present in less than 15% of the strains analyzed (less than 19 genomes) were assigned as cloud genes. For both methodologies, Roary and Micropan, we extracted representative sequences of the core, soft-core, shell, and cloud genes with in-house built Biopython scripts for subsequent functional annotation.</p>
<p>Genes resulting from Roary were translated into amino acid sequences; then, functional description of pan-genome categories defined for both methodologies was carried out determining gene ontology (GO) terms for the selected proteins (<xref ref-type="bibr" rid="ref71">The Gene Ontology Consortium, 2019</xref>). This was performed with the Interproscan functional predictions of ORFs tool available in the Galaxy Europe Server (<xref ref-type="bibr" rid="ref56">Quevillon et al., 2005</xref>). The results were summarized and plotted in WEGO 2.0 (<xref ref-type="bibr" rid="ref85">Ye et al., 2018</xref>). Additional annotations were obtained through the WebMGA server for Clusters of Orthologous Groups (COG) assignments (<xref ref-type="bibr" rid="ref82">Wu et al., 2011</xref>). Finally, the phylogenetic tree built with core genes along with information of the habitat and number of genes for each pan-genome category were visualized with Itol (<xref ref-type="bibr" rid="ref36">Letunic and Bork, 2019</xref>).</p>
</sec>
<sec id="sec5">
<title>Conservation of Intergenic Regions</title>
<p>In agreement with the phylogenetic tree, we defined three groups to analyze the conservation of IGRs in more closely related organisms; <italic>Streptomyces xiamenensis</italic> 318 and <italic>Streptomyces cattleya</italic> NRRL 8057 were left out of this analysis since no obvious relation with other <italic>Streptomyces</italic> was found. We estimated the conservation of intergenic regions across the streptomycetes using the software Piggy (<xref ref-type="bibr" rid="ref72">Thorpe et al., 2018</xref>). The results of the previous analysis in Roary were used as input for Piggy. The software parameters <italic>nuc_id</italic> and <italic>len_id</italic> were set to 70, which is in accordance with the values used in Roary. Default values were used for the other parameters. Following this procedure, we analyzed the IGRs in the predefined groups of <italic>Streptomyces</italic>. The parameters used to analyze the groups of genomes were the same for the analysis of the complete set of genomes. Moreover, we aligned the IGRs conserved in more than 90% of the genomes included in each group against Rfam (version 14.5) database (<xref ref-type="bibr" rid="ref27">Kalvari et al., 2021</xref>). Then, we explored for possible non-coding RNAs presence in these conserved IGRs with the software RNAz (<xref ref-type="bibr" rid="ref20">Gruber et al., 2010</xref>). We previously filtered the IGRs alignments with the command <italic>rnazSelectSeqs.pl</italic> to preserve the sequences with a mean pairwise identity of 70%. Only the outputs with an overall RNA-class probability above 0.7 were considered as putative non-coding RNAs; their secondary structures were visualized with RNAfold (<xref ref-type="bibr" rid="ref38">Lorenz et al., 2011</xref>) and their possible targets were defined using IntaRNA 2.0 (<xref ref-type="bibr" rid="ref40">Mann et al., 2017</xref>).</p>
</sec>
<sec id="sec6">
<title>Phylogenomic Analysis</title>
<p>The Galaxy wrapper of fastANI, with default parameters and using an all-<italic>versus</italic>-all genome comparisons, was implemented to calculate the average nucleotide identity (ANI) for the 121 selected strains (<xref ref-type="bibr" rid="ref24">Jain et al., 2018</xref>). The heat map and dendrogram for the results of fastANI were generated using the libraries Seaborn and Matplotlib of Python (<xref ref-type="bibr" rid="ref21">Hunter, 2007</xref>). The linkage method was the UPGMA algorithm and the pairwise distances between observations was the Euclidean metric. For those genomes with ANI values higher than 95% and with ambiguous taxonomic affiliations, we performed global genome alignments with progressiveMauve (<xref ref-type="bibr" rid="ref11">Darling et al., 2010</xref>).</p>
</sec>
<sec id="sec7">
<title>BGCs Prediction, Prioritization, and Similarity Comparison</title>
<p>All 121 genomes were analyzed using ARTS 2.0 (available at <ext-link xlink:href="https://arts.ziemertlab.com" ext-link-type="uri">https://arts.ziemertlab.com</ext-link>) with default settings. ARTS 2.0 used antiSMASH 5.1.1 for BGCs prediction (<xref ref-type="bibr" rid="ref4">Blin et al., 2019</xref>). Since the lack of a proper methodology to define gene cluster boundaries, antiSMASH outputs a series of biosynthetic gene cluster regions; each region can be comprised by one or more co-localized &#x201C;candidate&#x201D; clusters; each &#x201C;candidate&#x201D; cluster defined by antiSMASH contains the biosynthetic machinery to produce a type of metabolite. In this work, we call BGC to each &#x201C;candidate&#x201D; cluster (for more information of antisSMASH definitions see: <ext-link xlink:href="https://docs.antismash.secondarymetabolites.org/understanding_output/" ext-link-type="uri">https://docs.antismash.secondarymetabolites.org/understanding_output/</ext-link>). After running the antiSMASH analysis, ARTS identifies BGCs co-localized with self-resistance enzymes (based on Resfam database), and with core genes (defined by ARTS using a database of actinomycetes genomes) with predicted horizontal gene transfer (HGT; <xref ref-type="bibr" rid="ref1">Alanjary et al., 2017</xref>; <xref ref-type="bibr" rid="ref46">Mungan et al., 2020</xref>). All predicted clusters of our interest were searched in the repository of the Minimum Information about a Biosynthetic Gene Cluster (MIBiG) database (<xref ref-type="bibr" rid="ref29">Kautsar et al., 2020</xref>; available at <ext-link xlink:href="https://mibig.secondarymetabolites.org" ext-link-type="uri">https://mibig.secondarymetabolites.org</ext-link>).</p>
<p>The 3,750 <italic>genbank</italic> files, derived from the antiSMASH analysis, and the 33 <italic>genbank</italic> files corresponding to the BGCs prioritized by ARTS 2.0, were used as input for BGC similarity comparison using BiG-SCAPE 1.1.2 (<xref ref-type="bibr" rid="ref48">Navarro-Mu&#x00F1;oz et al., 2020</xref>). Analyses were made setting cutoff values at 0.3, 0.5, and 0.7 with and without the MIBiG database. Results of the networks, including the MIBiG database, were then filtered to remove comparisons between BGCs from the MIBiG database that did not display similarity with clusters from our analysis. Similarity comparison matrices were visualized using Cytoscape 3.8.2 (<xref ref-type="bibr" rid="ref63">Shannon et al., 2003</xref>).</p>
</sec>
</sec>
<sec id="sec8" sec-type="results">
<title>Results</title>
<sec id="sec9">
<title>General Features of <italic>Streptomyces</italic> Genomes</title>
<p>High quality genomes were included in the present investigation; apart from the status as &#x201C;Complete genomes,&#x201D; all genomes showed a high completeness and a reduced number of fragmented and missing genes (<xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S1</xref>). Genome size ranges from 5.96 Mb for <italic>Streptomyces xiamenensis</italic> 318 to 12.01 Mb for <italic>S. hygroscopicus</italic> XM201; both strains also contain the minimum and the maximum protein coding genes with 5,100 and 9,385, respectively. The %G+C mean content is 71.77 +/&#x2212; 0.81, which is an expected characteristic of members of the phylum <italic>Actinobacteria</italic> (<xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S2</xref>; <xref ref-type="bibr" rid="ref12">Dhakal et al., 2017</xref>). Most strains have a unique chromosome, although notably, the strain <italic>S. hygroscopicus limoneus</italic> KCTC 1717 has two chromosomes (<xref ref-type="bibr" rid="ref34">Lee et al., 2016</xref>); all strains contain between one and four plasmids. <xref ref-type="supplementary-material" rid="SM1">Supplementary File S1</xref> comprises all the metadata collected i.e., it contains the information of genome accession numbers, sequencing platform, coverage, and other genomic features such as the number of tRNAs and rRNAs in each genome.</p>
</sec>
<sec id="sec10">
<title>Comparative Genomics of the Genus <italic>Streptomyces</italic> Through Clustering of Protein Sequences by Similarity and Domains Content</title>
<p>We determined the pan-genome of the genus <italic>Streptomyces</italic> to establish their microbial diversity in terms of protein coding genes, domains content, and regulatory elements located in intergenic regions. For this purpose, we used different methodologies to accurately represent its entire gene repertoire. The analysis with Roary, to determine the diversity of protein coding genes, showed that the pan-genome of <italic>Streptomyces</italic> is clearly open (<italic>&#x03B1;</italic>&#x003C;1, 0&#x003C;<italic>&#x03B3;</italic>&#x003C;1) with a size of 145,462 clusters (<xref rid="fig1" ref-type="fig">Figures 1A</xref>,<xref rid="fig1" ref-type="fig">B</xref>). By using the BionomixEstimate function of Micropan, the current data allowed extrapolation to a total size of 273,372 clusters. These clusters were then classified according to their conservation level among the genomes analyzed. In concordance with this classification, we obtained 633 core genes, 1,080 soft-core genes, 6,040 shell genes, and 137,709 cloud genes; interestingly in the last group 81,568 were unique clusters, which means they were only present in one genome among all the considered strains.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>Pan-genome estimation for the genus <italic>Streptomyces</italic>. <bold>(A)</bold> Pan-genome categories size for calculations using Roary and Micropan; &#x201C;soft-core&#x201D; label includes both core and soft-core genome. Power law fit for the number of total genes and new genes as a function of the number of genomes added to the analysis for <bold>(B)</bold> Roary and <bold>(C)</bold> Micropan.</p>
</caption>
<graphic xlink:href="fmicb-12-677558-g001.tif"/>
</fig>
<p>Micropan estimated the pan-genome size of the genus <italic>Streptomyces</italic> as 8,973 protein families or clusters. Although, this number is clearly low compared to that determined by Roary, the power law fit, performed for the total size of the pan-genome and the number of new genes, showed that the pan-genome was still open (<italic>&#x03B1;</italic>&#x003C;1, 0&#x003C;<italic>&#x03B3;</italic>&#x003C;1), even though, it was set in the boundaries of a close pan-genome, as the value of the gamma parameter was close to zero (<xref rid="fig1" ref-type="fig">Figure 1C</xref>). The BionomixEstimate function, applied to both methods, displayed a similar core genome size of 600 and 589 for Roary and Micropan, respectively. <xref rid="fig1" ref-type="fig">Figure 1</xref> summarizes the pan-genome calculations.</p>
<p>The fluidity of the pan-genome, which determines how dissimilar genomes are at a gene level, was estimated for both procedures employed to assess the genomic diversity of the genus <italic>Streptomyces</italic> (<xref ref-type="bibr" rid="ref31">Kislyuk et al., 2011</xref>). The fluidity value was 0.53 +/&#x2212; 0.099 for Roary and 0.22 +/&#x2212; 0.031 for Micropan. This indicates that <italic>Streptomyces</italic> genomes differ 53%, on average, if the similarity of protein sequences are used to build the pan-genome, and 22% if their domain distributions are considered. A related assessment of genome diversity can be performed by the Jaccard distance distribution (<xref ref-type="bibr" rid="ref22">Jaccard, 1912</xref>), which is roughly defined as one minus the number of genes shared by two genomes, divided by the total number of genes in these two genomes; the higher the value of Jaccard distance the more dissimilar the two genomes are. Overall, the Jaccard distance for both methodologies, Roary and Micropan, displayed similar distributions (<xref ref-type="supplementary-material" rid="SM6">Supplementary Figures S3A</xref>,<xref ref-type="supplementary-material" rid="SM1">B</xref>, respectively), centered at different mean values. Thus, highly similar genomes, as those genomes of the same species, possess the same genes/domains frequency giving values close to zero.</p>
</sec>
<sec id="sec11">
<title>Phylogenomic Analysis</title>
<p>One of the main outcomes of a pan-genomic study is the determination of the genes shared by all members of a determined group, which corresponds to the core genome, previously defined. These core genes can be concatenated and aligned to define phylogenetic relationships among the members of a group, as this approach possesses higher resolution than using a single phylogenetic marker, e.g., 16S rRNA gene; thus, it has been suggested as the basis for bacterial phylogeny (<xref ref-type="bibr" rid="ref52">Parks et al., 2018</xref>). Furthermore, a combination of results of multiple markers, such as the core genome phylogeny, above mentioned, and overall genome relatedness indices (from which ANI is the most broadly used), has been proposed to obtain precise taxonomic affiliations (<xref ref-type="bibr" rid="ref16">Figueras et al., 2014</xref>). In this regard, we used both approaches, to explore the phylogeny in the genus <italic>Streptomyces</italic>.</p>
<p>Alignment of a set of 633 core genes, calculated by Roary, allowed the construction of a high confidence phylogenetic tree. The bootstrap values for all branches were above 0.9 being the majority equal to 1 (<xref rid="fig2" ref-type="fig">Figure 2</xref>). The number of genes in each genome, that are part of the different pan-genome categories, are also depicted in this figure, depending on the method used to determine the pan-genome. Based on the core genome phylogenetic tree, there was no clear relationship between the isolation source of the strains and its evolutionary relationship with other strains. Three clades were clearly distinguishable in the phylogenetic tree; they are highlighted in <xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S4</xref>, for the sake of clarity.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>Core genome phylogenetic tree constructed using conserved genes across all genomes considered. Color codes are included for the bootstrap support and the isolation source of the microorganisms. The inner bars represent the number of genes in each strain belonging to the core and soft-core genome, (labeled as core), shell genome, and cloud genome for the pan-genome determination using Roary. The outer bars represent the number of genes in each strain belonging to the core and soft-core genome, (labeled as core), shell genome, and cloud genome according to the analysis in Micropan.</p>
</caption>
<graphic xlink:href="fmicb-12-677558-g002.tif"/>
</fig>
<p>Interestingly, <italic>S. hygroscopicus</italic> XM201 was set in group 1, while the other <italic>S. hygroscopicus</italic> strains were in group 3; in addition, the XM201 strain was closer to <italic>Streptomyces</italic> sp. 11-1-2 and <italic>S. violaceusniger</italic> Tu 4113, whose ANI values were 98.5 and 95.5, respectively. Other genomic features such as the genome size and the number of proteins encoded in the genome were also similar among these strains. <italic>Streptomyces lydicus</italic> strains formed a paraphyletic taxon; <italic>S. lydicus</italic> A02 was closer to <italic>S. gilvosporeus</italic> F607 than to the other strains classified as <italic>S. lydicus</italic>; this outcome was supported by ANI values in a range of 86&#x2013;89%. A similar result was found for <italic>S. lydicus</italic> WYEC 108 whose ANI values were between 86 and 88% with other <italic>S. lydicus</italic>, and 96.7% with <italic>Streptomyces</italic> sp. NEAU-S7GS2. Lastly, <italic>Streptomyces</italic> sp. MOE7 contained an ANI value of 97.8% with <italic>S. lydicus</italic> GS93/23. ANI values of <italic>Streptomyces autolyticus</italic> CGMCC0516, <italic>Streptomyces malaysiensis</italic> DSM 4137 and <italic>Streptomyces</italic> sp. M56 were above 98% among them, which could indicate that they are the same species. Other strains that showed high ANI values between species and close relationship in the phylogenetic tree of core genes were: <italic>Streptomyces pratensis</italic> ATCC 33331 and <italic>Streptomyces</italic> sp. PAMC26508 (99.1 ANI); <italic>Streptomyces bacillaris</italic> ATCC 15855, <italic>Streptomyces</italic> sp. DUT11, <italic>Streptomyces</italic> sp. CFMR7 and <italic>Streptomyces</italic> sp. S8 (ANI values between 95.7 and 98.9%); both strains of <italic>Streptomyces globisporus</italic> with <italic>Streptomyces</italic> sp. 6063 and <italic>Streptomyces</italic> sp. Tue6075 (ANI values greater than 95.1); <italic>Streptomyces fradiae</italic> NKZ-259 and <italic>Streptomyces alfalfae</italic> ACCC40021 with an ANI value above of 99.9%, which might suggest they are the same strain, though further experimental studies are vital to prove it. The strains VK-A60T, KJ40, Fr-008, J1074, SM254, and SM17 all have ANI values greater than 95.8% among them.</p>
<p>An interesting clade is the one formed by <italic>Streptomyces</italic> sp. CB09001, the model organism <italic>Streptomyces coelicolor</italic> A3(2) and the biotechnologically important actinobacteria <italic>Streptomyces lividans</italic> TK24. A comparison of these strains showed that the genome of <italic>S. coelicolor</italic> A3(2) is almost 0.8 Mb larger than the <italic>S. lividans</italic> TK24 genome and 1.2 Mb larger than the one of <italic>Streptomyces</italic> sp. CB09001. Thus, we performed a global alignment of the genomes of these species to corroborate the observed relationship (<xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S5</xref>). This assessment showed that the <italic>S. lividans</italic> TK24 genome is a reduced version of the <italic>S. coelicolor</italic> A3(2) genome, which has an additional region of about 0.6Mb in one of the telomeres. These results are in consensus with a recent study indicating that the ANI value between <italic>S. lividans</italic> and <italic>S. coelicolor</italic> is 99.0% (<xref ref-type="bibr" rid="ref77">Vicente et al., 2018</xref>). Surprisingly, all the results observed in the core genome phylogenetic tree are validated by the corresponding ANI values among the species clustered together. In addition, a deep analysis for a possible reclassification of some species is suggested by the outcomes of the present study. These results can be observed in the <xref rid="fig2" ref-type="fig">Figure 2</xref> and <xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S4</xref> for the phylogenetic tree and in the <xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S6</xref> for ANI values.</p>
</sec>
<sec id="sec12">
<title>Intergenic Regions Conservation</title>
<p>The number of IGRs was astonishingly high and variable, and no core group of IGRs was determined for the genus <italic>Streptomyces</italic>. The total number of IGR were 378,972; of these, 275,225 correspond to unique clusters of IGRs, which was more than twice the number of unique gene clusters obtained with Roary. As observed in <xref rid="fig3" ref-type="fig">Figure 3A</xref>, IGRs were only conserved across few strains. We further explored these results by analyzing the IGRs through the groups defined in the core genome phylogenetic tree, previously described (<xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S4</xref>). In group 1, comprising 20 species (<xref rid="fig3" ref-type="fig">Figure 3B</xref>), the number of IGRs were still high (46,301 compared to 19,333 gene families), although, in this group there were 16 core IGRs. For group 2 (<xref rid="fig3" ref-type="fig">Figure 3C</xref>), which contained more species compared to group 1, only two IGRs were conserved in all 37 species belonging to this group; meanwhile, in group 3 (<xref rid="fig3" ref-type="fig">Figure 3D</xref>) integrated by 59 species, 131,213 clusters were unique IGRs and two were defined as core IGRs. Overall, IGRs clusters showed a pronounced drop as the number of genomes increased which differed from the behavior displayed for gene clusters, which were mainly unique or core genes (<xref rid="fig3" ref-type="fig">Figure 3A</xref>). Surprisingly, no annotations were retrieved from Rfam when representative sequences of these few conserved IGRs were searched. From these, 10, 1, 3 IGRs belonging to the groups previously defined, contain putative novel small or non-coding RNAs due to the conserved RNA secondary structures detected by RNAz (<xref ref-type="supplementary-material" rid="SM2">Supplementary File S2</xref>). The minimum free energy (MFE) structure of these predicted small RNAs (sRNAs) can be visualized in <xref ref-type="supplementary-material" rid="SM6">Supplementary Figures S7</xref>&#x2013;<xref ref-type="supplementary-material" rid="SM6">S20</xref>. Additionally, we estimated the possible targets of these putative novel non-coding RNAs, in the genomes from which we extracted the representative IGRs sequences. Overall, we found six sequences that share full complementarity with the mRNA located down-stream, which suggests they can act as regulatory elements in the untranslated region of these genes; by other hand, multiple targets were detected that can interact with these sRNAs. The details of these analyses can be observed in the <xref ref-type="supplementary-material" rid="SM2">Supplementary File S2</xref>.</p>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>Pan-genome determination based on the conservation of intergenic regions (IGRs) in the <italic>Streptomyces</italic> genus for <bold>(A)</bold> all species included in the current study. <bold>(B)</bold> Species belonging to the clade or group 1 in the core genome phylogenetic tree. <bold>(C)</bold> Species belonging to the clade or group 2 in the core genome phylogenetic tree. <bold>(D)</bold> Species of the clade 3 in the core genome phylogenetic tree. <bold>(E)</bold> Strains of the paraphyletic group of <italic>Streptomyces lydicus</italic>. <bold>(F)</bold>. <italic>Streptomyces clavuligerus</italic> ATCC 27064, F1D-5 and F613-1. <bold>(G)</bold> <italic>Streptomyces albus</italic> DSM 41398, BK3-25 and ZD11. <bold>(H)</bold> <italic>Streptomyces hygroscopicus</italic> 5008, TL01, and KCTC 1717.</p>
</caption>
<graphic xlink:href="fmicb-12-677558-g003.tif"/>
</fig>
<p>To investigate the IGRs conservation between more related <italic>Streptomyces</italic> species, we further analyzed the pan-genome of IGRs of <italic>S. lydicus</italic> (<xref rid="fig3" ref-type="fig">Figure 3E</xref>), <italic>Streptomyces clavuligerus</italic> (<xref rid="fig3" ref-type="fig">Figure 3F</xref>), <italic>Streptomyces albus</italic> (<xref rid="fig3" ref-type="fig">Figure 3G</xref>), and <italic>S. hygroscopicus</italic> (<xref rid="fig3" ref-type="fig">Figure 3H</xref>), as representatives of the three groups previously defined in the <italic>Streptomyces</italic> phylogeny. In the case of the paraphyletic group of <italic>S. lydicus</italic>, 248 IGR core clusters and 7,705 unique IGRs clusters were found. In <italic>S. clavuligerus</italic>, <italic>S. hygroscopicus</italic>, and <italic>S. albus</italic>, the number of core IGRs were 4,284, 4,597, and 4,706, respectively, which were considerably higher than the number of unique IGRs. This behavior agrees with the number of genes shared by the genomes, but it contrasts with the results obtained from the different groups of the phylogeny, and when all genomes of the genus were considered. Hence, we observed that IGRs are only conserved between phylogenetically related species.</p>
</sec>
<sec id="sec13">
<title>Functional Description of the Pan-Genome</title>
<p>Genes of the acquired pan-genome were then functionally classified. The COG functional enrichment demonstrated that the most conserved genes and family of proteins are those involved in primary metabolism and DNA processing functions (<xref rid="fig4" ref-type="fig">Figure 4</xref>). Interestingly, the abundance of secondary metabolism genes increases in less conserved genes, i.e., cloud genes. This tendency is more evident in the Micropan analysis, where protein domains of secondary metabolite genes represent more than 25% of total protein domains in cloud genes; lipid metabolism, frequently used for secondary metabolites production (<xref ref-type="bibr" rid="ref37">Liu et al., 2013</xref>), also predominates.</p>
<fig position="float" id="fig4">
<label>Figure 4</label>
<caption>
<p>Functional description of the categories of the pan-genome for Roary and Micropan by means of Clusters of Orthologous Groups (COG) annotations. Inner circle represents the core and soft-core genomes, the middle circle represents the shell genome, and outer circle represents the cloud genome.</p>
</caption>
<graphic xlink:href="fmicb-12-677558-g004.tif"/>
</fig>
<p>The analysis of GO categories displayed similar results. Primary metabolism and catalytic processes as organic cyclic and heterocyclic compound binding are over-represented in core genes analyzed by Roary (<xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S21</xref>). The GO enrichment in genes analyzed using Micropan evidences the abundance of genes involved in cellular and metabolic processes, as well as the abundance of the catalytic activity genes in all levels of conservancy, denoting the catalytic power of the genus.</p>
</sec>
<sec id="sec14">
<title>BGCs Prediction and Prioritization</title>
<p>Genomes were analyzed using ARTS 2.0 to prioritize BGCs more likely to produce an active metabolite, based on the presence of self-resistance enzymes co-localized within BGCs, as well as the presence of duplicated core genes with evidence of HGT (<xref ref-type="bibr" rid="ref1">Alanjary et al., 2017</xref>; <xref ref-type="bibr" rid="ref46">Mungan et al., 2020</xref>).</p>
<p>The analysis with antiSMASH displayed 3,750 regions of BGCs (<xref ref-type="supplementary-material" rid="SM3">Supplementary File S3</xref>). Since some BGCs can be co-localized in the same region (up to nine BGCs in one region), we separated the BGCs afterward to do the final count. However, it is worthy to point out that some co-localized BGCs could act as hybrid clusters, such as the modular system NRPS/T1PKS, which is widely found in the three domains of life (<xref ref-type="bibr" rid="ref79">Wang et al., 2014</xref>).</p>
<p>Overall, 5,289 BGCs were identified in the 121 genomes analyzed, distributed in the 3,750 regions. Per order of frequency, non-ribosomal peptide synthetase (NRPS), terpene, type 1 polyketide synthase (T1PKS), and siderophore were the predominant BGC types, accounting for almost 50% of total predicted BGCs. Each genome accounts for 23&#x2013;83 BGCs (average=44, median=42), <italic>Streptomyces griseochromogenes</italic> ATCC 14511 carried 83 BGCs and <italic>Streptomyces</italic> sp. CLI2509 carried 23. The biosynthetic potential of <italic>S. griseochromogenes</italic> ATCC 14511 was already unveiled by <xref ref-type="bibr" rid="ref80">Wu et al. (2017a)</xref>.</p>
<p>The set of <italic>Streptomyces</italic> strains analyzed carry 41 different types of BGCs out of 52 types defined by antiSMASH. The diversity in each genome goes between 10 and 26 types of BGCs (average and median=18). <italic>Streptomyces lydicus</italic> WYEC 108 was the strain that displayed the higher diversity, and <italic>Streptomyces koyangensis</italic> VK-A60T the lowest one. NRPS, terpene, and siderophore clusters were present in the 121 genomes (<xref rid="fig5" ref-type="fig">Figure 5</xref>; <xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S4</xref>); although T1PKS and bacteriocin clusters were present in most of the strains, they were not found in <italic>Streptomyces exfoliatus</italic> A1013Y and <italic>Streptomyces xiamenensis</italic> 318, respectively. Furthermore, the ribosomally synthesized and post-translationally modified peptide (RiPP) clusters bottromycin and cyanobactin were only found in <italic>Streptomyces scabiei</italic> 87.22, and <italic>S. lydicus</italic> A02, respectively.</p>
<fig position="float" id="fig5">
<label>Figure 5</label>
<caption>
<p>Description of biosynthetic gene clusters (BGCs) according to their presence among genomes, their proximity to core and self-resistance genes, and core genes with evidence of horizontal gene transfer (HGT) and duplication. Bar graph of presence shows the counting of BGCs present in the analyzed genomes. Bar graphs of core and resistance display the counting of core and self-resistance genes (resistance) located nearby the BGC. Core/Resistance graph shows the counting of BGCs co-localized with both core and self-resistance genes. Core/Resistance/HGT graph displays the counting of BGCs co-localized with self-resistance genes and core genes with evidence of HGT. Core/Resistance/HGT/Duplication chart shows the counting of BGCs co-localized with self-resistance genes and core genes with evidence of both HGT and duplication. NRPS, non-ribosomal peptide synthetase cluster; T1PKS, type I PKS (Polyketide synthase); NRPS-like, NRPS-like fragment; T3PKS, type III PKS; T2PKS, type II PKS; PKS-like, other types of PKS cluster; LAP, linear azol(in)e-containing peptides; HglE-KS, heterocyst glycolipid synthase-like PKS; CDPS, cyanobactins like patellamide; Amglyccycl, aminoglycoside/aminocyclitol cluster; Blactam, &#x03B2;-lactam cluster; TfuA-related, TfuA-related RiPPs; Hserlactone, homoserine lactone cluster; Fused, pheganomycin-style protein ligase-containing cluster; Other: cluster containing a secondary metabolite-related protein that does not fit into any other category.</p>
</caption>
<graphic xlink:href="fmicb-12-677558-g005.tif"/>
</fig>
<p>Although, no obvious relationship between the source of strains and their BGCs were found, there is a slight association between the frequency of BGC types and the genetic proximity (<xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S4</xref>). For example, the clade of <italic>S. hygroscopicus</italic> displays similar frequency of NRPS, terpene, T1PKS, and siderophore; only the variety <italic>limoneus</italic> KCTC 1717 exhibited more bacteriocins in comparison to the varieties <italic>jinggangensis</italic> 5008 and the engineered <italic>jinggangensis</italic> TL01. In the case of <italic>S. lividans</italic> TK24 and <italic>S. coelicolor</italic> A3(2), both display a similar frequency of BGCs; yet, only <italic>S. lividans</italic> contains more terpenes in its genome. An interesting comparison is between <italic>Streptomyces</italic> sp. CNQ-509 and <italic>Streptomyces</italic> sp. WAC 06738; both strains are in the same clade but come from different isolation sources, marine and soil respectively, and mainly differ in the number of NRPS and T1PKS in their genomes.</p>
<p>The high BGCs variability in the genus was demonstrated with the cluster region comparison using BiG-SCAPE. This bioinformatic tool estimates distances between BGCs through the combination of the Jaccard index to determine the similarity of protein domains in the BGCs, the adjacency index that indicates the adjacent domains shared between BGCs, and the domain sequence similarity index, which calculate the sequence identity along with the domain copy number differences (<xref ref-type="bibr" rid="ref48">Navarro-Mu&#x00F1;oz et al., 2020</xref>). The network created by BiG-SCAPE using a cutoff of 0.3 &#x2013; to identify interactions between BGCs producing similar compounds &#x2013; displayed 2,359 nodes, and 12,969 edges (<xref rid="fig6" ref-type="fig">Figure 6A</xref>). A further comparison showed that 838 regions out of the 3,750 identified by antiSMASH are similar or have been already reported in the MIBiG database (<xref rid="fig6" ref-type="fig">Figure 6B</xref>; <xref ref-type="supplementary-material" rid="SM6">Supplementary File S4</xref>). Terpene, NRPS, siderophore, and ectoine are the clusters with the largest network similarity, whereas 1,204 cluster regions are unique within the analyzed genomes <xref ref-type="supplementary-material" rid="SM5">(Supplementary File S5)</xref>.</p>
<fig position="float" id="fig6">
<label>Figure 6</label>
<caption>
<p>Sequence similarity network of BGCs <bold>(A)</bold> without and <bold>(B)</bold> with the information of the Minimum Information about a Biosynthetic Gene Cluster (MIBiG) database. <bold>(C)</bold> Sequence similarity network of prioritized BGCs; borders of figures represent the color code of the phylogenetic tree in <xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S4</xref>. Analysis was made using BiG-SCAPE at cutoff=0.3. NRPS includes NRPS-like; PKS-other includes T2PKS, T3PKS, PKS-like and hglE-KS; RiPPs include bacteriocin, lanthipeptide, linear azol(in)e-containing peptides (LAP), lassopeptide, thiopeptide, and TfuA-related. Others include hybrid clusters different from PKS-NRPS.</p>
</caption>
<graphic xlink:href="fmicb-12-677558-g006.tif"/>
</fig>
<p>To prioritize the search for antibiotics, ARTS uses BGC prediction from antiSMASH and displays the presence of self-resistance enzymes co-localized with BGC. In all 121 genomes analyzed, only 593 self-resistance genes were identified, distributed in 480 cluster regions out of the 3,750 regions identified by antiSMASH. On average, we identified five self-resistance genes in a genome; the maximum amount of self-resistance genes found in a genome was 12, in <italic>Streptomyces alfalfae</italic> ACCC40021. <italic>Streptomyces globisporus</italic> TFH56 was the unique strain without a self-resistance enzyme identified in its genome. Nevertheless, this strain can inhibit the growth of <italic>Botrytis cinerea</italic>, a gray mold pathogen that grows in tomato flowers (<xref ref-type="bibr" rid="ref8">Cho and Kwak, 2019</xref>). Furthermore, we observed that NRPS and T1PKS are more frequently co-localized with self-resistance genes in comparison to other BGCs (<xref rid="fig5" ref-type="fig">Figure 5</xref>).</p>
<p>Another feature, considered in the prioritization of BGCs as possible producers of antibiotics, is the identification of core genes (defined by ARTS) within a biosynthetic cluster. Thus, 3,040 core genes were found in all genomes distributed in 1,490 regions. NRPS, terpene, and T1PKS offer the highest number of identified core genes (<xref rid="fig5" ref-type="fig">Figure 5</xref>). Additionally, core genes along with self-resistance genes were found in only 242 regions, being NRPS, T1PKS, and T2PKS the BGCs more frequently co-localized with both types of genes (<xref rid="fig5" ref-type="fig">Figure 5</xref>). Since some antibiotics target core genes, the producing bacteria tend to duplicate the gene and produce a homolog to avoid suicide. In this way, the presence of duplicated core genes in the BGC could lead to the prediction of the mode of action of the encoded antibiotic (<xref ref-type="bibr" rid="ref46">Mungan et al., 2020</xref>). Applying a stricter filter to predict the antibiotic with its correspondent target, we only found 33 regions (distributed in 31 genomes) co-localized with self-resistance genes and core genes with evidence of duplication and HGT (<xref rid="tab1" ref-type="table">Table 1</xref>); most of these clusters were NRPS (<xref rid="fig5" ref-type="fig">Figure 5</xref>).</p>
<table-wrap position="float" id="tab1">
<label>Table 1</label>
<caption>
<p>Prioritized BGCs for their putative antibiotic biosynthesis production.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Organism name</th>
<th align="left" valign="top"># Cluster</th>
<th align="left" valign="top">Core gene description</th>
<th align="left" valign="top">BGC type</th>
<th align="left" valign="top">Resistance model</th>
<th align="left" valign="top">MIBiG report<xref rid="tfn1" ref-type="table-fn"><sup>1</sup></xref>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="2"><italic>Streptomyces avermitilis</italic> MA-4680</td>
<td align="center" valign="top">18</td>
<td align="left" valign="top">Proteasome, beta subunit</td>
<td align="left" valign="top">T2PKS, T1PKS</td>
<td align="left" valign="top">Proteasome subunit</td>
<td align="left" valign="top">Spore pigment. Similarity with curamycin from <italic>S. cyaneus</italic>
</td>
</tr>
<tr>
<td align="center" valign="top">19</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">Terpene</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Pentalenolactone</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="2"><italic>Streptomyces bingchenggensis</italic> BCW-1</td>
<td align="center" valign="top">10</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS, furan, T1PKS, hglE-KS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="center" valign="top">20</td>
<td align="left" valign="top">Proteasome, beta subunit</td>
<td align="left" valign="top">T1PKS, NRPS</td>
<td align="left" valign="top">Proteasome subunit</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces violaceusniger</italic> Tu 4113</td>
<td align="center" valign="top">21</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS-like</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. Sv. ACTE SirexAA-E</td>
<td align="center" valign="top">1</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces cattleya</italic> NRRL 8057</td>
<td align="center" valign="top">2_13</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS-like</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces hygroscopicus jinggangensis</italic> 5008</td>
<td align="center" valign="top">15</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces hygroscopicus jinggangensis</italic> TL01</td>
<td align="center" valign="top">15</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces fulvissimus</italic> DSM 40593</td>
<td align="center" valign="top">2</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces collinus</italic> Tu 365</td>
<td align="center" valign="top">26</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">Terpene</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces cyaneogriseus noncyanogenus</italic> NMWT 1</td>
<td align="center" valign="top">24</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">Terpene</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. CdTB01</td>
<td align="center" valign="top">28</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">T1PKS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. SAT1</td>
<td align="center" valign="top">24</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">Lanthipeptide</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces lydicus</italic> 103</td>
<td align="center" valign="top">22</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS, other, T3PKS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces puniciscabiei</italic> TW1S1</td>
<td align="center" valign="top">1</td>
<td align="left" valign="top">DNA polymerase III, beta subunit</td>
<td align="left" valign="top">Terpene, T1PKS</td>
<td align="left" valign="top">DNA polymerase III, beta subunit</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces autolyticus</italic> CGMCC0516</td>
<td align="center" valign="top">5</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS-like</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces lydicus</italic> GS93/23</td>
<td align="center" valign="top">16</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">T3PKS, NRPS, other</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces niveus</italic> SCSIO 3406</td>
<td align="center" valign="top">24</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces hygroscopicus</italic> XM201</td>
<td align="center" valign="top">5</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS-like</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. MOE7</td>
<td align="center" valign="top">15</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">Other, T3PKS, NRPS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces lavendulae lavendulae</italic> CCM 3239</td>
<td align="center" valign="top">21</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">Thiopeptide, LAP</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. M56</td>
<td align="center" valign="top">43</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS-like</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. P3</td>
<td align="center" valign="top">31</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Similarity with scabichelin from <italic>S. scabiei</italic>
</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces lunaelactis</italic> MM109</td>
<td align="center" valign="top">27</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS-like</td>
<td align="left" valign="top">GAPDH_C/Metallo-&#x03B2;-lactamase</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces nigra</italic> 452</td>
<td align="center" valign="top">20</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">T3PKS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. ZFG47</td>
<td align="center" valign="top">23</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">T2PKS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Similarity with curamycin from <italic>S. cyaneus</italic>
</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. 11-1-2</td>
<td align="center" valign="top">4</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS-like</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. WAC 01438</td>
<td align="center" valign="top">18</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">T3PKS, NRPS, T2PKS</td>
<td align="left" valign="top">GAPDH_C/GAPDH_C</td>
<td align="left" valign="top">Similarity with spore pigment from <italic>S. collinus</italic>
</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. WAC 01529</td>
<td align="center" valign="top">1</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">Lassopeptide, NRPS, terpene</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. GGCR-6</td>
<td align="center" valign="top">4</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">T1PKS</td>
<td align="left" valign="top">Carboxyl transferase domain/GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. MK-45</td>
<td align="center" valign="top">4</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Similarity with isocomplestatin from <italic>S. lavendulae</italic>
</td>
</tr>
<tr>
<td align="left" valign="top">
<italic>Streptomyces</italic> sp. endophyte N2</td>
<td align="center" valign="top">3</td>
<td align="left" valign="top">GAPDH type I</td>
<td align="left" valign="top">NRPS-like, T1PKS</td>
<td align="left" valign="top">GAPDH_C</td>
<td align="left" valign="top">Not reported</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>GAPDH, glyceraldehyde 3-phosphate dehydrogenase; and GAPDH_C, glyceraldehyde 3-phosphate dehydrogenase, C-terminal domain.</p>
<fn id="tfn1">
<label>1</label>
<p>Similarity found using BiG-SCAPE including the MIBiG database at cutoff 0.3.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>In these 33 regions (<xref rid="tab1" ref-type="table">Table 1</xref>), core genes were also classified as self-resistance genes; the diversity of these genes was low, presenting only three functions: glyceraldehyde-3-phosphate dehydrogenase (GAPDH) type I, proteasome, and DNA polymerase III &#x03B2;-subunit. Two self-resistance genes within the same region were found in only three genomes. In a NRPS-like cluster of <italic>Streptomyces lunaelactis</italic> MM109 the resistance targets were found in the C-terminal domain of GAPDH (GAPDH_C) and a metallo-&#x03B2;-lactamase, whereas in <italic>Streptomyces</italic> sp. WAC 01438 the T3PKS/NRPS/T2PKS cluster displayed two GAPDH_C as self-resistance genes, and <italic>Streptomyces</italic> sp. GGCR-6 presented a carboxyl transferase domain and GAPDH_C as resistance targets in a T1PKS cluster.</p>
<p>Using the approach of BGC prioritization, we identified BGCs with all elements needed to biosynthesize antibiotic molecules with a predicted mode of action. Some of the prioritized BGC display similarity with another prioritized cluster from a genetically related <italic>Streptomyces</italic> (<xref rid="fig6" ref-type="fig">Figure 6C</xref>), i.e., region 22 of <italic>S. lydicus</italic> 103 is similar to the region 16 of <italic>S. lydicus</italic> GS93 and the region 15 of <italic>Streptomyces</italic> sp. MOE7. Also, the region 5 of <italic>S. hygroscopicus</italic> XM201, the region 21 of <italic>Streptomyces violaceusniger</italic> Tu 4113 and the region 4 of <italic>Streptomyces</italic> sp. 11-1-2 are similar from each other. Likewise, the region 5 of <italic>S. autolyticus</italic> CGMCC0516 and the region 43 <italic>Streptomyces</italic> sp. M56 share sequence similarity. Intriguingly, the region 26 of <italic>Streptomyces collinus</italic> Tu 365 and the region 24 of <italic>S. cyaneogriseus noncyanogenus</italic> NMWT 1 are similar but both strains are not closely genetically related.</p>
<p>Of our prioritized BGCs only the region 19 of <italic>Streptomyces avermitilis</italic> MA-4680 is already described as the biosynthetic pathway of the antibiotic pentalenolactone, the region 18 also of <italic>S. avermitilis</italic> MA-4680 is reported in the MIBiG database (<xref ref-type="bibr" rid="ref29">Kautsar et al., 2020</xref>) as a spore pigment cluster (although a bioactivity assay is not reported). Other four regions, along with the region 18 of <italic>S. avermitilis</italic> MA-4680, have similarity with a reported cluster in the MIBiG database (<xref rid="tab1" ref-type="table">Table 1</xref>). Thus, we were able to perform a high throughput antibiotic screening using the bioinformatic tool ARTS 2.0, identifying interesting clusters that could be experimentally tested.</p>
</sec>
</sec>
<sec id="sec15" sec-type="discussions">
<title>Discussion</title>
<p>A pan-genome, defined as the entire set of non-orthologous genes in a specified group of strains (<xref ref-type="bibr" rid="ref70">Tettelin et al., 2008</xref>), may reveal gene clusters of special interest as those related with specific niches or involved in the production of bioactive compounds (<xref ref-type="bibr" rid="ref44">Medini et al., 2005</xref>). This study aims to determine the pan-genome or supra-genome of the genus <italic>Streptomyces</italic>. From all genomes available in NCBI, only 121 complete genomes with high quality assemblies were selected for the analysis. In addition, two approaches to compute the gene families or clusters were explored: the sequence similarity, using the software Roary (<xref ref-type="bibr" rid="ref50">Page et al., 2015</xref>), and, based on the presence of common protein domains, using the R package Micropan (<xref ref-type="bibr" rid="ref65">Snipen and Liland, 2015</xref>).</p>
<p>The analysis with Roary exposes a pan-genome size of 145,462 gene families; 94.7% of them corresponds to cloud genes. This finding is consistent with the study of <xref ref-type="bibr" rid="ref83">Xu et al. (2019)</xref> who uncovered 123,302 clusters in 87 genomes of <italic>Streptomyces</italic> derived from marine ecosystems; the authors used genomes with completeness above 95% and employed an identity of 50% for clustering, which can impact the pan-genome size. In another study, 39,893 gene families across the genus were determined in a study using a similar number of <italic>Streptomyces</italic> strains (122; <xref ref-type="bibr" rid="ref42">McDonald and Currie, 2017</xref>). To generate gene families, the authors used Proteinortho v2 (<xref ref-type="bibr" rid="ref32">Lechner et al., 2011</xref>) with default parameters; this tool uses a low value of percent identity (25%) as a threshold, which might be the cause of any difference with our results. Besides, many of the genomes the authors used are fragmented, which can introduce errors in pan-genome calculations (<xref ref-type="bibr" rid="ref74">Tonkin-Hill et al., 2020</xref>). A recent paper reported a pan-genome size of 106,000 genes and 1,018 core genes by using 125 complete <italic>Streptomyces</italic> genomes and a percent similarity threshold of 40% in BLASTp (<xref ref-type="bibr" rid="ref39">Lorenzi et al., 2021</xref>); this might explain the differences with the present study, although, the core genes number is quite similar to the soft-core genes that we calculated. What is remarkable in these two approaches is the similar value of gamma (&#x03B3;) in the mathematical fit of the genome size (0.62 compared to the 0.6 obtained in this work). As the identity of the strains used in both studies also differs, this similar gamma (&#x03B3;) value states that the quality of the genomes is the most important feature to obtain reliable results and predictions. Using the pan-genome size estimation performed by BinomixEstimate (273,372 clusters) and the value of gamma (&#x03B3;), we estimate that around 284 genomes are necessary to determine the complete reservoir of genes in the genus <italic>Streptomyces</italic>. The number of strains used may also cause a bias in the analysis. Because of this, related pan-genomic studies in <italic>Streptomyces</italic> determined significantly less clusters than those found in this investigation (<xref ref-type="bibr" rid="ref30">Kim et al., 2015</xref>; <xref ref-type="bibr" rid="ref73">Tian et al., 2016</xref>; <xref ref-type="bibr" rid="ref81">Wu et al., 2017b</xref>; <xref ref-type="bibr" rid="ref23">Jackson et al., 2018</xref>; <xref ref-type="bibr" rid="ref2">Almeida et al., 2019</xref>). We also observed that the core genome size is higher in those studies that include few genomes, obtaining values greater than 2,000 core gene families (<xref ref-type="bibr" rid="ref86">Zhou et al., 2012</xref>; <xref ref-type="bibr" rid="ref30">Kim et al., 2015</xref>; <xref ref-type="bibr" rid="ref73">Tian et al., 2016</xref>); this value tends to decrease as more genomes are added.</p>
<p>To our knowledge, no previous characterization of the pan-genome of the genus <italic>Streptomyces</italic> has been performed based on protein domains. This is an alternative approach that is robust against errors in predicting of protein coding genes, which reduces the variation in annotation between genomes (<xref ref-type="bibr" rid="ref66">Snipen and Ussery, 2013</xref>). Surprisingly, the number of clusters reduces dramatically compared to the calculations carried out by Roary, although, the number of core genes remains similar. It is possible to argue that many proteins without domain annotations are discarded in the Micropan analysis and that is the case of cloud proteins, which are poorly characterized because they are less frequently found and therefore less studied. This inference is supported by the fact that COG annotations of core genomes of both methods are quite consistent, while the proportion of COG categories in the shell and cloud genomes differs markedly. Conversely, the threshold used to consider a protein as belonging to the same cluster could be high if we consider that we are characterizing a genus with enormous genetic variety. Nevertheless, some proteins can have similar function and therefore similar domains in their structures; as a result, their protein sequence identity can still be low to be clustered in the same group even if we reduce the threshold. This idea is strengthened by an additional analysis with the pipeline BPGA (<xref ref-type="bibr" rid="ref7">Chaudhari et al., 2016</xref>) using a 50% of identity. Here, 662 core gene families were obtained; this outcome is very similar to the sizes reported by the methods used in the present study. Moreover, we also found a higher number of unique genes (48,315, data not shown), which were less than those found with Roary, where the threshold was 70%. The gap between the number of clusters, from Roary and Micropan, could be attributed to false predicted &#x201C;genes,&#x201D; which do not align correctly to other clusters producing an increase in the number of unique genes or singletons (<xref ref-type="bibr" rid="ref66">Snipen and Ussery, 2013</xref>). Further, overestimation of cloud genes has been previously reported when using Roary and other methods, not based on protein domains, to estimate pan-genome sizes (<xref ref-type="bibr" rid="ref74">Tonkin-Hill et al., 2020</xref>).</p>
<p>Regarding the diversity of the <italic>Streptomyces</italic> spp., the genomic fluidity and the Jaccard distance were determined for the pan-genomes produced by Roary and Micropan. These results seem to be consistent with an open pan-genome with a high and diverse gene content. Overall, fluidity values tend to be low for species and increase as genetic distance arises, e.g., for <italic>Emiliania huxleyi</italic> (<xref ref-type="bibr" rid="ref57">Read et al., 2013</xref>) and for <italic>Burkholderia pseudomallei</italic> (<xref ref-type="bibr" rid="ref67">Spring-Pearson et al., 2015</xref>) this value has been estimated in 0.1 and 0.17, respectively; a notable exception is <italic>Cronobacter sakazakii</italic> which has a fluidity of 0.875, which indicates a large accessory genome pool of this specie (<xref ref-type="bibr" rid="ref33">Lee and Andam, 2019</xref>). At the level of genus <xref ref-type="bibr" rid="ref31">Kislyuk et al. (2011)</xref> calculated a fluidity value around 0.9 for the genus <italic>Frankia</italic>, which belongs to the phylum <italic>Actinobacteria</italic>. In a recent study, a value of 0.12 was obtained for <italic>Streptomyces rimosus</italic> (<xref ref-type="bibr" rid="ref51">Park and Andam, 2019</xref>). We considered the fluidity value for <italic>Streptomyces</italic> spp. as a reasonable assessment strategy of the genus&#x2019; diversity; this value reveals the enormous diversity of strains exposed to different lifestyles and habitats, and therefore, prone to acquire genetic material through lateral transfer so as to obtain better adaptations to their environments; undoubtedly, this results in a wide range of the genome sizes and protein coding genes in streptomycetes (see <xref ref-type="supplementary-material" rid="SM6">Supplementary Figure S2</xref>). Consequently, some strains have almost the double of protein coding genes.</p>
<p>Micropan results are more difficult to compare because this methodology is less employed in pan-genomic studies, yet the fluidity obtained with this software is quite low compared with the one obtained with Roary. It may indicate that, in terms of functionality of the genomes, the dissimilarity diminishes around 20%; therefore, many clusters, which are separated when sequence similarity is used to form them, can have the same or similar function due to the presence of the same domains in its sequence.</p>
<p>Small RNAs play an important role in post-transcriptional control of messenger RNA expression and regulate diverse processes, e.g., carbon metabolism, iron homeostasis, RNA polymerase function, virulence, biofilm formation, oxidation, outer membrane perturbation, cellular accumulation of sugar-phosphates and plasmid replication (<xref ref-type="bibr" rid="ref58">Richards and Vanderpool, 2011</xref>). Trans-encoded regulatory sRNAs are located at sites distinct from those of their target genes and they are typically encoded and enriched in the conserved IGRs of bacterial genomes (<xref ref-type="bibr" rid="ref76">Tsai et al., 2015</xref>). Therefore, a precise determination of conservation of IGR is a crucial stage in small-RNAs studies as this is typically the first step in the computational identification of these important regulators in bacteria (<xref ref-type="bibr" rid="ref61">Rossi et al., 2016</xref>; <xref ref-type="bibr" rid="ref18">Fuli et al., 2017</xref>). Some software use this information to predict novel sRNAs in bacterial genomes such as RNAz (<xref ref-type="bibr" rid="ref20">Gruber et al., 2010</xref>) and QRNA (<xref ref-type="bibr" rid="ref68">Sridhar and Gunasekaran, 2013</xref>). Since little is known about the abundance and function of sRNA in Gram-positive bacteria like <italic>Streptomyces</italic> (<xref ref-type="bibr" rid="ref15">Engel et al., 2020</xref>), an accurate determination of the conservation of IGRs and its dependency with phylogenetic distance is necessary for a proper estimation of regulatory RNAs encoding potential (<xref ref-type="bibr" rid="ref76">Tsai et al., 2015</xref>). The current analysis shows that IGRs conservation is reduced at the level of genus and the conservation is still low in smaller groups, when strains are grouped according to the three clades obtained in the phylogenetic tree. However, these rarely conserved IGRs can harbor regulatory function since novel putative non-coding RNAs (nc-RNAs) were detected in these regions; the role of these putative ncRNAs is an interesting question because a high selection pressure must act to conserve these sequences in species with an enormous diversity such as streptomycetes indicating their participation in controlling multiple metabolic processes. As a first approach, we investigated the interaction of these molecules with other functional RNAs showing that numerous mRNAs with diverse annotations (<xref ref-type="supplementary-material" rid="SM2">Supplementary File S2</xref>) can interact with these predicted regulators. By other hand, we hypothesize that reducing the genetic distance among species will produce trustworthy alignments, which plays a key role in the RNAs structure prediction and will improve the bioinformatics predictions. This is reinforced by the fact that, when the analysis is made in more related strains, i.e., at the level of species, IGRs are well preserved. Therefore, the current analysis lays the foundations for further studies involving computational predictions of sRNAs and their regulatory mechanism in species with biotechnological application such as <italic>S. clavuligerus</italic>, <italic>S. hygroscopicus</italic>, <italic>S. lydicus</italic>, and <italic>S. albus</italic>.</p>
<p>A high confidence phylogenetic tree, using 633 markers, was obtained as a result of core genome determination in the pan-genome analysis. Overall, there is a strong resemblance with earlier phylogenomic analysis performed in <italic>Streptomyces</italic> by <xref ref-type="bibr" rid="ref41">Mart&#x00ED;n-S&#x00E1;nchez et al. (2019)</xref> who used 93 complete <italic>Streptomyces</italic> genomes and 575 markers. <xref ref-type="bibr" rid="ref42">McDonald and Currie (2017)</xref> also obtained similar results, though their analysis included fragmented genomes and the bootstrap values of some branches were less than 0.7, which is considered a low bootstrap support. In that study, the number of markers was inferior (94), and many genomes were fragmented. Thus, as it was already highlighted, our first and foremost priority would be to decide on high quality genomes for confident evolutionary analysis.</p>
<p>What is striking in our analysis is the correlation found during ANI determination for strains with values above 95%, and their position in the core genome tree. Together with the core genome tree, ANI calculations consider only the part of the genome, where alignments can be built (<xref ref-type="bibr" rid="ref60">Richter et al., 2016</xref>). Global alignments of strains with ANI values above 95% support differences among genomes despite the high conservation in their core genes (<xref ref-type="supplementary-material" rid="SM6">Supplementary Figures S22</xref>&#x2013;<xref ref-type="supplementary-material" rid="SM1">S31</xref>); hence, it should be noted that genomic analyses along with biochemical and physiological characterizations are still necessary for the correct taxonomic classification of microorganisms. By way of illustration, <italic>S. coelicolor</italic> A3(2) and <italic>S. lividans</italic> TK24 possess an ANI value that suggests they are the same species, or even the same strain, but their phenotypic behavior differs markedly. <italic>Streptomyces lividans</italic> TK24 produces small amounts of the antibiotics actinorhodin and undecylprodigiosin compared to <italic>S. coelicolor</italic> A3(2) (<xref ref-type="bibr" rid="ref62">R&#x00FC;ckert et al., 2015</xref>). With the advent of new and complete genomes, a deep analysis should be performed for a possible taxonomic re-classification of the species mentioned in <xref ref-type="supplementary-material" rid="SM6">Supplementary Figures S22</xref>&#x2013;<xref ref-type="supplementary-material" rid="SM6">S31</xref> (special attention must be paid to <italic>S. hygroscopicus</italic> XM201).</p>
<p>The genus <italic>Streptomyces</italic> is characterized for its metabolic capacity of producing a wide range of metabolites with high societal impact (<xref ref-type="bibr" rid="ref53">Pham et al., 2019</xref>) and is still one of the most studied genera. <italic>Streptomyces</italic> is the genus with most entries in the MIBiG database by far (636 entries, search made on January 31, 2021), followed by <italic>Aspergillus</italic> and <italic>Pseudomonas</italic>.</p>
<p>Previous genome mining studies have been developed in the genus <italic>Streptomyces</italic>. Our findings correlate well with results previously reported by <xref ref-type="bibr" rid="ref3">Belknap et al. (2020)</xref>. Using antiSMASH 4.1 they predicted that NRPS, PKS1, terpenes, and lantipeptides were the most common BGCs, and <italic>S. rhizosphaericus</italic> NRRL B-24304 (not included in our study) carried the highest number of BGCs (<italic>n</italic>=83). The slight differences between our results and results reported in 2020 might be caused by improvements in BGC detection found in newer versions of antiSMASH (<xref ref-type="bibr" rid="ref4">Blin et al., 2019</xref>), as well as the number of genomes analyzed and their quality.</p>
<p>In our study, the ribosomally synthesized and post-translationally modified peptide (RiPP) clusters bottromycin and cyanobactin were only found in <italic>S. scabiei</italic> 87.22, and <italic>S. lydicus</italic> A02, respectively. Surprisingly, as far as we know, there are no reports of cyanobactin expression in <italic>Streptomyces</italic> strains; cyanobactin clusters were previously identified in <italic>S. lydicus</italic> A02 and <italic>S. venezuelae</italic> genomes using the genome mining tool BAGEL3 (<xref ref-type="bibr" rid="ref54">Poorinmohammad et al., 2019</xref>). Bottromycin, however, is already described in <italic>S. scabies</italic> DSM 41658 (<xref ref-type="bibr" rid="ref78">Vior et al., 2020</xref>). In a recent study, where 1,110 genomes of <italic>Streptomyces</italic> strains were analyzed (including incomplete genomes), cyanobactin and bottromycin clusters were identified in seven and 17 genomes, respectively (<xref ref-type="bibr" rid="ref3">Belknap et al., 2020</xref>), demonstrating that, despite the fact that these BGCs were rarely found in the set of genomes we analyzed, it does not mean that other BGCs could not be present in other <italic>Streptomyces</italic> strains out of the scope of the present study.</p>
<p>Cluster similarity analysis demonstrated that terpenes are also highly similar in the genus as previously reported (<xref ref-type="bibr" rid="ref41">Mart&#x00ED;n-S&#x00E1;nchez et al., 2019</xref>). Siderophore and ectoine are also highly similar probably due to their primary role in iron acquisition and stress protectant, respectively (<xref ref-type="bibr" rid="ref25">Jones et al., 2019</xref>; <xref ref-type="bibr" rid="ref59">Richter et al., 2019</xref>). Intriguingly, one third of the predicted cluster regions did not display similarity with other predicted or reported region, and only one fifth of the prioritized antibiotics are similar to a reported cluster, demonstrating the capacity of the genus to produce diverse compounds.</p>
<p>It is well established that BGCs of known antibiotics produced by <italic>Streptomyces</italic> are co-localized with self-resistance enzymes, e.g., streptomycin and cephamycin C produced by <italic>S. griseus</italic> and <italic>S. clavuligerus</italic>, respectively (<xref ref-type="supplementary-material" rid="SM3">Supplementary File S3</xref>). Regions containing both clusters were successfully found by ARTS along with other 478 regions with co-localized self-resistance enzymes. The challenge now is the creation of strategies to prioritize the identification of BGCs with novel antibiotic activity within the increasing genomic data. As an approach to rationalize the seek for antibiotics, <xref ref-type="bibr" rid="ref10">Culp et al. (2020)</xref> proposed that identifying BGCs with low similarity and lacking known resistance determinants could lead to the detection of antibiotics with novel mechanisms of action. Following this strategy, they identified two glycopeptide bacteriostatics with an unknown mechanism of action (<xref ref-type="bibr" rid="ref10">Culp et al., 2020</xref>). Thus, the identification of BGCs nearby self-resistance enzymes along with duplicated core genes with predicted HGT, seems to be a promissory approach to identify BGCs that potentially produce new antibiotics with a predicted mode of action; this approach is currently used in the quest for new antibiotic clusters (<xref ref-type="bibr" rid="ref84">Yan et al., 2020</xref>) and led to the discovery of thiolactomycin in <italic>Salinispora pacifica</italic> (<xref ref-type="bibr" rid="ref69">Tang et al., 2015</xref>). ARTS is the first tool to incorporate these parameters that could derive more confident predictions (<xref ref-type="bibr" rid="ref75">Tran et al., 2019</xref>); it is a powerful tool and user friendly for a high throughput identification of BGCs for potential antibiotic biosynthesis. Despite its ease of use and how informative is, only few studies have incorporated ARTS in their methodologies. In this regard, we call the attention to the analysis of marine myxobacterial strains, which revealed these strains contain a high number of self-resistance genes, e.g., <italic>E. salina</italic> DSM 15201 contains 13 self-resistance genes (<xref ref-type="bibr" rid="ref45">Moghaddam et al., 2018</xref>). We strongly recommend that bioinformatics tools such as ARTS should be incorporated in further studies aimed at seeking for new antibiotics.</p>
<p>Using ARTS, we prioritized the search of cluster regions with a predicted mode of action. As part of our predictions, we successfully identified the pentalenolactone cluster, which targets indeed the glyceraldehyde-3-phosphate dehydrogenase (<xref ref-type="bibr" rid="ref5">Cane and Sohng, 1989</xref>). Some of the prioritized regions are co-localized with more than one self-resistance gene that could increase the probability of an antibiotic activity. The most promising of the prioritized regions could be the region 23 of <italic>Streptomyces</italic> sp. ZFG47 and the region 18 of <italic>S. avermitilis</italic> MA-4680 since both displayed a similarity with the antibiotic curamycin from <italic>Streptomyces cyaneus</italic> (<xref rid="fig6" ref-type="fig">Figure 6C</xref>; <xref rid="tab1" ref-type="table">Table 1</xref>).</p>
<p>Parameters like duplication and HGT of core genes should be used carefully if the purpose is the identification of any type of antibiotics, since filters with these parameters exclude the high number of clusters settled nearby core and self-resistance genes, which, along with biosynthetic clusters of antibiotics already known, might be potentially used for metabolic reengineering strategies to produce new antibiotic scaffolds. It is worth stressing that the metabolic potential of the genus <italic>Streptomyces</italic> goes beyond antibiotics, and with every new discovered species, we may possibly be amazed by their metabolic complexity and richness. Without a doubt, this genus is and apparently will continue to be one of the most fascinating to be studied.</p>
</sec>
<sec id="sec16" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>The datasets analyzed for this study can be found in the Reference Sequence (RefSeq) database (<ext-link xlink:href="https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/" ext-link-type="uri">https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/</ext-link>). The accession number for the genomes included can be found in the <xref ref-type="supplementary-material" rid="SM1">Supplementary File S1</xref>.</p>
</sec>
<sec id="sec17">
<title>Author Contributions</title>
<p>CC-M and RR-E designed the study. CC-M and MM-R collected the data, performed all bioinformatics analyses, and drafted the manuscript. RR-E supervised the research work, interpreted the results, corrected and wrote the manuscript, and serve as corresponding author. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="sec18" sec-type="funding-information">
<title>Funding</title>
<p>This work was supported by a grant obtained from MINCIENCIAS &#x2013; Colombia &#x2013; Convocatoria 785 &#x2013; 2017. Grant # 80740-595-2019 to CC-M.</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec19" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack>
<p>CC-M thanks to MINCIENCIAS-Colombia for scholarship. MM-R thanks to Conacyt-M&#x00E9;xico for scholarship. All authors thank the maintainers and funders of the Galaxy Europe server because it was used for some calculations in the current study.</p>
</ack>
<sec id="sec20" sec-type="supplementary-material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fmicb.2021.677558/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fmicb.2021.677558/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.XLSX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_2.XLSX" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_3.XLSX" id="SM3" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_4.XLSX" id="SM4" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_5.xlsx" id="SM5" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Presentation_1.pdf" id="SM6" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alanjary</surname> <given-names>M.</given-names></name> <name><surname>Kronmiller</surname> <given-names>B.</given-names></name> <name><surname>Adamek</surname> <given-names>M.</given-names></name> <name><surname>Blin</surname> <given-names>K.</given-names></name> <name><surname>Weber</surname> <given-names>T.</given-names></name> <name><surname>Huson</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>The antibiotic resistant target seeker (ARTS), an exploration engine for antibiotic cluster prioritization and novel drug target discovery</article-title>. <source>Nucleic Acids Res.</source> <volume>45</volume>, <fpage>W42</fpage>&#x2013;<lpage>W48</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkx360</pub-id>, PMID: <pub-id pub-id-type="pmid">28472505</pub-id></citation></ref>
<ref id="ref2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Almeida</surname> <given-names>E. L.</given-names></name> <name><surname>Rinc&#x00F3;n</surname> <given-names>A. F. C.</given-names></name> <name><surname>Jackson</surname> <given-names>S. A.</given-names></name> <name><surname>Dobson</surname> <given-names>A. D. W.</given-names></name></person-group> (<year>2019</year>). <article-title>Comparative genomics of marine sponge-derived <italic>Streptomyces</italic> spp. isolates SM17 and SM18 with their closest terrestrial relatives provides novel insights into environmental niche adaptations and secondary metabolite biosynthesis potential</article-title>. <source>Front. Microbiol.</source> <volume>10</volume>:<fpage>1713</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2019.01713</pub-id>, PMID: <pub-id pub-id-type="pmid">31404169</pub-id></citation></ref>
<ref id="ref3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Belknap</surname> <given-names>K. C.</given-names></name> <name><surname>Park</surname> <given-names>C. J.</given-names></name> <name><surname>Barth</surname> <given-names>B. M.</given-names></name> <name><surname>Andam</surname> <given-names>C. P.</given-names></name></person-group> (<year>2020</year>). <article-title>Genome mining of biosynthetic and chemotherapeutic gene clusters in <italic>Streptomyces</italic> bacteria</article-title>. <source>Sci. Rep.</source> <volume>10</volume>:<fpage>2003</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-020-58904-9</pub-id>, PMID: <pub-id pub-id-type="pmid">32029878</pub-id></citation></ref>
<ref id="ref4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blin</surname> <given-names>K.</given-names></name> <name><surname>Shaw</surname> <given-names>S.</given-names></name> <name><surname>Steinke</surname> <given-names>K.</given-names></name> <name><surname>Villebro</surname> <given-names>R.</given-names></name> <name><surname>Ziemert</surname> <given-names>N.</given-names></name> <name><surname>Lee</surname> <given-names>S. Y.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>AntiSMASH 5.0: updates to the secondary metabolite genome mining pipeline</article-title>. <source>Nucleic Acids Res.</source> <volume>47</volume>, <fpage>W81</fpage>&#x2013;<lpage>W87</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkz310</pub-id>, PMID: <pub-id pub-id-type="pmid">31032519</pub-id></citation></ref>
<ref id="ref5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cane</surname> <given-names>D. E.</given-names></name> <name><surname>Sohng</surname> <given-names>J. K.</given-names></name></person-group> (<year>1989</year>). <article-title>Inhibition of glyceraldehyde-3-phosphate dehydrogenase by pentalenolactone: kinetic and mechanistic studies</article-title>. <source>Arch. Biochem. Biophys.</source> <volume>270</volume>, <fpage>50</fpage>&#x2013;<lpage>61</lpage>. doi: <pub-id pub-id-type="doi">10.1016/0003-9861(89)90006-4</pub-id>, PMID: <pub-id pub-id-type="pmid">2930199</pub-id></citation></ref>
<ref id="ref6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Challis</surname> <given-names>G. L.</given-names></name> <name><surname>Hopwood</surname> <given-names>D. A.</given-names></name></person-group> (<year>2003</year>). <article-title>Synergy and contingency as driving forces for the evolution of multiple secondary metabolite production by <italic>Streptomyces</italic> species</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>100</volume>, <fpage>14555</fpage>&#x2013;<lpage>14561</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1934677100</pub-id>, PMID: <pub-id pub-id-type="pmid">12970466</pub-id></citation></ref>
<ref id="ref7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chaudhari</surname> <given-names>N. M.</given-names></name> <name><surname>Gupta</surname> <given-names>V. K.</given-names></name> <name><surname>Dutta</surname> <given-names>C.</given-names></name></person-group> (<year>2016</year>). <article-title>BPGA-an ultra-fast pan-genome analysis pipeline</article-title>. <source>Sci. Rep.</source> <volume>6</volume>:<fpage>24373</fpage>. doi: <pub-id pub-id-type="doi">10.1038/srep24373</pub-id>, PMID: <pub-id pub-id-type="pmid">27071527</pub-id></citation></ref>
<ref id="ref8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cho</surname> <given-names>G.</given-names></name> <name><surname>Kwak</surname> <given-names>Y. S.</given-names></name></person-group> (<year>2019</year>). <article-title>Evolution of antibiotic synthesis gene clusters in the <italic>Streptomyces globisporus</italic> TFH56, isolated from tomato flower</article-title>. <source>G3</source> <volume>9</volume>, <fpage>1807</fpage>&#x2013;<lpage>1813</lpage>. doi: <pub-id pub-id-type="doi">10.1534/g3.119.400037</pub-id>, PMID: <pub-id pub-id-type="pmid">31018944</pub-id></citation></ref>
<ref id="ref9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Contreras-Moreira</surname> <given-names>B.</given-names></name> <name><surname>Vinuesa</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis</article-title>. <source>Appl. Environ. Microbiol.</source> <volume>79</volume>, <fpage>7696</fpage>&#x2013;<lpage>7701</lpage>. doi: <pub-id pub-id-type="doi">10.1128/AEM.02411-13</pub-id>, PMID: <pub-id pub-id-type="pmid">24096415</pub-id></citation></ref>
<ref id="ref10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Culp</surname> <given-names>E. J.</given-names></name> <name><surname>Waglechner</surname> <given-names>N.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Fiebig-Comyn</surname> <given-names>A. A.</given-names></name> <name><surname>Hsu</surname> <given-names>Y. P.</given-names></name> <name><surname>Koteva</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Evolution-guided discovery of antibiotics that inhibit peptidoglycan remodelling</article-title>. <source>Nature</source> <volume>578</volume>, <fpage>582</fpage>&#x2013;<lpage>587</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41586-020-1990-9</pub-id>, PMID: <pub-id pub-id-type="pmid">32051588</pub-id></citation></ref>
<ref id="ref11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Darling</surname> <given-names>A. E.</given-names></name> <name><surname>Mau</surname> <given-names>B.</given-names></name> <name><surname>Perna</surname> <given-names>N. T.</given-names></name></person-group> (<year>2010</year>). <article-title>Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement</article-title>. <source>PLoS One</source> <volume>5</volume>:<fpage>e11147</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0011147</pub-id>, PMID: <pub-id pub-id-type="pmid">20593022</pub-id></citation></ref>
<ref id="ref12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dhakal</surname> <given-names>D.</given-names></name> <name><surname>Pokhrel</surname> <given-names>A. R.</given-names></name> <name><surname>Shrestha</surname> <given-names>B.</given-names></name> <name><surname>Sohng</surname> <given-names>J. K.</given-names></name></person-group> (<year>2017</year>). <article-title>Marine rare actinobacteria: isolation, characterization, and strategies for harnessing bioactive compounds</article-title>. <source>Front. Microbiol.</source> <volume>8</volume>:<fpage>1106</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2017.01106</pub-id>, PMID: <pub-id pub-id-type="pmid">28663748</pub-id></citation></ref>
<ref id="ref13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doroghazi</surname> <given-names>J. R.</given-names></name> <name><surname>Metcalf</surname> <given-names>W. W.</given-names></name></person-group> (<year>2013</year>). <article-title>Comparative genomics of actinomycetes with a focus on natural product biosynthetic genes</article-title>. <source>BMC Genomics</source> <volume>14</volume>:<fpage>611</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2164-14-611</pub-id>, PMID: <pub-id pub-id-type="pmid">24020438</pub-id></citation></ref>
<ref id="ref14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eddy</surname> <given-names>S. R.</given-names></name></person-group> (<year>2011</year>). <article-title>Accelerated profile HMM searches</article-title>. <source>PLoS Comput. Biol.</source> <volume>7</volume>:<fpage>e1002195</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1002195</pub-id>, PMID: <pub-id pub-id-type="pmid">22039361</pub-id></citation></ref>
<ref id="ref15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Engel</surname> <given-names>F.</given-names></name> <name><surname>Ossipova</surname> <given-names>E.</given-names></name> <name><surname>Jakobsson</surname> <given-names>P. J.</given-names></name> <name><surname>Vockenhuber</surname> <given-names>M. P.</given-names></name> <name><surname>Suess</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <article-title>sRNA scr5239 involved in feedback loop regulation of <italic>Streptomyces coelicolor</italic> central metabolism</article-title>. <source>Front. Microbiol.</source> <volume>10</volume>:<fpage>3121</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2019.03121</pub-id>, PMID: <pub-id pub-id-type="pmid">32117084</pub-id></citation></ref>
<ref id="ref16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Figueras</surname> <given-names>M. J.</given-names></name> <name><surname>Beaz-Hidalgo</surname> <given-names>R.</given-names></name> <name><surname>Hossain</surname> <given-names>M. J.</given-names></name> <name><surname>Liles</surname> <given-names>M. R.</given-names></name></person-group> (<year>2014</year>). <article-title>Taxonomic affiliation of new genomes should be verified using average nucleotide identity and multilocus phylogenetic analysis</article-title>. <source>Genome Announc.</source> <volume>2</volume>, <fpage>e00927</fpage>&#x2013;<lpage>e01114</lpage>. doi: <pub-id pub-id-type="doi">10.1128/genomeA.00927-14</pub-id>, PMID: <pub-id pub-id-type="pmid">25477398</pub-id></citation></ref>
<ref id="ref17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Finn</surname> <given-names>R. D.</given-names></name> <name><surname>Bateman</surname> <given-names>A.</given-names></name> <name><surname>Clements</surname> <given-names>J.</given-names></name> <name><surname>Coggill</surname> <given-names>P.</given-names></name> <name><surname>Eberhardt</surname> <given-names>R. Y.</given-names></name> <name><surname>Eddy</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Pfam: the protein families database</article-title>. <source>Nucleic Acids Res.</source> <volume>42</volume>, <fpage>D222</fpage>&#x2013;<lpage>D230</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkt1223</pub-id>, PMID: <pub-id pub-id-type="pmid">24288371</pub-id></citation></ref>
<ref id="ref18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fuli</surname> <given-names>X.</given-names></name> <name><surname>Wenlong</surname> <given-names>Z.</given-names></name> <name><surname>Xiao</surname> <given-names>W.</given-names></name> <name><surname>Jing</surname> <given-names>Z.</given-names></name> <name><surname>Baohai</surname> <given-names>H.</given-names></name> <name><surname>Zhengzheng</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>A genome-wide prediction and identification of intergenic small RNAs by comparative analysis in <italic>Mesorhizobium huakuii</italic> 7653R</article-title>. <source>Front. Microbiol.</source> <volume>8</volume>:<fpage>1730</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2017.01730</pub-id>, PMID: <pub-id pub-id-type="pmid">28943874</pub-id></citation></ref>
<ref id="ref19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goecks</surname> <given-names>J.</given-names></name> <name><surname>Nekrutenko</surname> <given-names>A.</given-names></name> <name><surname>Taylor</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences</article-title>. <source>Genome Biol.</source> <volume>11</volume>:<fpage>R86</fpage>. doi: <pub-id pub-id-type="doi">10.1186/gb-2010-11-8-r86</pub-id>, PMID: <pub-id pub-id-type="pmid">20738864</pub-id></citation></ref>
<ref id="ref20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gruber</surname> <given-names>A. R.</given-names></name> <name><surname>Findei&#x00DF;</surname> <given-names>S.</given-names></name> <name><surname>Washietl</surname> <given-names>S.</given-names></name> <name><surname>Hofacker</surname> <given-names>I. L.</given-names></name> <name><surname>Stadler</surname> <given-names>P. F.</given-names></name></person-group> (<year>2010</year>). <article-title>RNAz 2.0: improved noncoding RNA detection</article-title>. <source>Pac. Symp. Biocomput.</source> <volume>15</volume>, <fpage>69</fpage>&#x2013;<lpage>79</lpage>. PMID: <pub-id pub-id-type="pmid">19908359</pub-id></citation></ref>
<ref id="ref21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hunter</surname> <given-names>J. D.</given-names></name></person-group> (<year>2007</year>). <article-title>Matplotlib: a 2D graphic environment</article-title>. <source>Comput. Sci. Eng.</source> <volume>9</volume>, <fpage>90</fpage>&#x2013;<lpage>95</lpage>. doi: <pub-id pub-id-type="doi">10.1109/MCSE.2007.55</pub-id></citation></ref>
<ref id="ref22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jaccard</surname> <given-names>P.</given-names></name></person-group> (<year>1912</year>). <article-title>The distribution of the flora in the alpine zone</article-title>. <source>New Phytol.</source> <volume>11</volume>, <fpage>37</fpage>&#x2013;<lpage>50</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1469-8137.1912.tb05611.x</pub-id></citation></ref>
<ref id="ref23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jackson</surname> <given-names>S. A.</given-names></name> <name><surname>Crossman</surname> <given-names>L.</given-names></name> <name><surname>Almeida</surname> <given-names>E. L.</given-names></name> <name><surname>Margassery</surname> <given-names>L. M.</given-names></name> <name><surname>Kennedy</surname> <given-names>J.</given-names></name> <name><surname>Dobson</surname> <given-names>A. D. W.</given-names></name></person-group> (<year>2018</year>). <article-title>Diverse and abundant secondary metabolism biosynthetic gene clusters in the genomes of marine sponge derived <italic>Streptomyces</italic> spp. isolates</article-title>. <source>Mar. Drugs</source> <volume>16</volume>:<fpage>67</fpage>. doi: <pub-id pub-id-type="doi">10.3390/md16020067</pub-id>, PMID: <pub-id pub-id-type="pmid">29461500</pub-id></citation></ref>
<ref id="ref24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jain</surname> <given-names>C.</given-names></name> <name><surname>Rodriguez-R</surname> <given-names>L. M.</given-names></name> <name><surname>Phillippy</surname> <given-names>A. M.</given-names></name> <name><surname>Konstantinidis</surname> <given-names>K. T.</given-names></name> <name><surname>Aluru</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries</article-title>. <source>Nat. Commun.</source> <volume>9</volume>:<fpage>5114</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41467-018-07641-9</pub-id>, PMID: <pub-id pub-id-type="pmid">30504855</pub-id></citation></ref>
<ref id="ref25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>S. E.</given-names></name> <name><surname>Pham</surname> <given-names>C. A.</given-names></name> <name><surname>Zambri</surname> <given-names>M. P.</given-names></name> <name><surname>McKillip</surname> <given-names>J.</given-names></name> <name><surname>Carlson</surname> <given-names>E. E.</given-names></name> <name><surname>Elliot</surname> <given-names>M. A.</given-names></name></person-group> (<year>2019</year>). <article-title><italic>Streptomyces</italic> volatile compounds influence exploration and microbial community dynamics by altering iron availability</article-title>. <source>mBio</source> <volume>10</volume>, <fpage>e00171</fpage>&#x2013;<lpage>e00219</lpage>. doi: <pub-id pub-id-type="doi">10.1128/mBio.00171-19</pub-id>, PMID: <pub-id pub-id-type="pmid">30837334</pub-id></citation></ref>
<ref id="ref26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kalkreuter</surname> <given-names>E.</given-names></name> <name><surname>Pan</surname> <given-names>G.</given-names></name> <name><surname>Cepeda</surname> <given-names>A. J.</given-names></name> <name><surname>Shen</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <article-title>Targeting bacterial genomes for natural product discovery</article-title>. <source>Trends Pharmacol. Sci.</source> <volume>41</volume>, <fpage>13</fpage>&#x2013;<lpage>26</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tips.2019.11.002</pub-id>, PMID: <pub-id pub-id-type="pmid">31822352</pub-id></citation></ref>
<ref id="ref27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kalvari</surname> <given-names>I.</given-names></name> <name><surname>Nawrocki</surname> <given-names>E. P.</given-names></name> <name><surname>Ontiveros-Palacios</surname> <given-names>N.</given-names></name> <name><surname>Argasinska</surname> <given-names>J.</given-names></name> <name><surname>Lamkiewicz</surname> <given-names>K.</given-names></name> <name><surname>Marz</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Rfam 14: expanded coverage of metagenomic, viral and microRNA families</article-title>. <source>Nucleic Acids Res.</source> <volume>49</volume>, <fpage>D192</fpage>&#x2013;<lpage>D200</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkaa1047</pub-id>, PMID: <pub-id pub-id-type="pmid">33211869</pub-id></citation></ref>
<ref id="ref28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Katoh</surname> <given-names>K.</given-names></name> <name><surname>Standley</surname> <given-names>D. M.</given-names></name></person-group> (<year>2013</year>). <article-title>MAFFT multiple sequence alignment software version 7: improvements in performance and usability</article-title>. <source>Mol. Biol. Evol.</source> <volume>30</volume>, <fpage>772</fpage>&#x2013;<lpage>780</lpage>. doi: <pub-id pub-id-type="doi">10.1093/molbev/mst010</pub-id>, PMID: <pub-id pub-id-type="pmid">23329690</pub-id></citation></ref>
<ref id="ref29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kautsar</surname> <given-names>S. A.</given-names></name> <name><surname>Blin</surname> <given-names>K.</given-names></name> <name><surname>Shaw</surname> <given-names>S.</given-names></name> <name><surname>Navarro-Mu&#x00F1;oz</surname> <given-names>J. C.</given-names></name> <name><surname>Terlouw</surname> <given-names>B. R.</given-names></name> <name><surname>Van Der Hooft</surname> <given-names>J. J. J.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>MIBiG 2.0: a repository for biosynthetic gene clusters of known function</article-title>. <source>Nucleic Acids Res.</source> <volume>48</volume>, <fpage>D454</fpage>&#x2013;<lpage>D458</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkz882</pub-id>, PMID: <pub-id pub-id-type="pmid">31612915</pub-id></citation></ref>
<ref id="ref30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>J. N.</given-names></name> <name><surname>Kim</surname> <given-names>Y.</given-names></name> <name><surname>Jeong</surname> <given-names>Y.</given-names></name> <name><surname>Roe</surname> <given-names>J. H.</given-names></name> <name><surname>Kim</surname> <given-names>B. G.</given-names></name> <name><surname>Cho</surname> <given-names>B. K.</given-names></name></person-group> (<year>2015</year>). <article-title>Comparative genomics reveals the core and accessory genomes of <italic>Streptomyces</italic> species</article-title>. <source>J. Microbiol. Biotechnol.</source> <volume>25</volume>, <fpage>1599</fpage>&#x2013;<lpage>1605</lpage>. doi: <pub-id pub-id-type="doi">10.4014/jmb.1504.04008</pub-id>, PMID: <pub-id pub-id-type="pmid">26032364</pub-id></citation></ref>
<ref id="ref31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kislyuk</surname> <given-names>A. O.</given-names></name> <name><surname>Haegeman</surname> <given-names>B.</given-names></name> <name><surname>Bergman</surname> <given-names>N. H.</given-names></name> <name><surname>Weitz</surname> <given-names>J. S.</given-names></name></person-group> (<year>2011</year>). <article-title>Genomic fluidity: an integrative view of gene diversity within microbial populations</article-title>. <source>BMC Genomics</source> <volume>12</volume>:<fpage>32</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2164-12-32</pub-id>, PMID: <pub-id pub-id-type="pmid">21232151</pub-id></citation></ref>
<ref id="ref32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lechner</surname> <given-names>M.</given-names></name> <name><surname>Findei&#x00DF;</surname> <given-names>S.</given-names></name> <name><surname>Steiner</surname> <given-names>L.</given-names></name> <name><surname>Marz</surname> <given-names>M.</given-names></name> <name><surname>Stadler</surname> <given-names>P. F.</given-names></name> <name><surname>Prohaska</surname> <given-names>S. J.</given-names></name></person-group> (<year>2011</year>). <article-title>Proteinortho: detection of (co-) orthologs in large-scale analysis</article-title>. <source>BMC Bioinformatics</source> <volume>12</volume>:<fpage>124</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-12-124</pub-id>, PMID: <pub-id pub-id-type="pmid">21526987</pub-id></citation></ref>
<ref id="ref33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>I. P. A.</given-names></name> <name><surname>Andam</surname> <given-names>C. P.</given-names></name></person-group> (<year>2019</year>). <article-title>Pan-genome diversification and recombination in <italic>Cronobacter sakazakii</italic>, an opportunistic pathogen in neonates, and insights to its xerotolerant lifestyle</article-title>. <source>BMC Microbiol.</source> <volume>19</volume>:<fpage>306</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12866-019-1664-7</pub-id>, PMID: <pub-id pub-id-type="pmid">31881843</pub-id></citation></ref>
<ref id="ref34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>S. H.</given-names></name> <name><surname>Choe</surname> <given-names>H.</given-names></name> <name><surname>Bae</surname> <given-names>K. S.</given-names></name> <name><surname>Park</surname> <given-names>D. S.</given-names></name> <name><surname>Nasir</surname> <given-names>A.</given-names></name> <name><surname>Kim</surname> <given-names>K. M.</given-names></name></person-group> (<year>2016</year>). <article-title>Complete genome of <italic>Streptomyces hygroscopicus</italic> subsp. limoneus KCTC 1717 (=KCCM 11405), a soil bacterium producing validamycin and diverse secondary metabolites</article-title>. <source>J. Biotechnol.</source> <volume>219</volume>, <fpage>1</fpage>&#x2013;<lpage>2</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jbiotec.2015.12.010</pub-id>, PMID: <pub-id pub-id-type="pmid">26704727</pub-id></citation></ref>
<ref id="ref35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>N.</given-names></name> <name><surname>Kim</surname> <given-names>W.</given-names></name> <name><surname>Hwang</surname> <given-names>S.</given-names></name> <name><surname>Lee</surname> <given-names>Y.</given-names></name> <name><surname>Cho</surname> <given-names>S.</given-names></name> <name><surname>Palsson</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Thirty complete <italic>Streptomyces</italic> genome sequences for mining novel secondary metabolite biosynthetic gene clusters</article-title>. <source>Sci. Data</source> <volume>7</volume>:<fpage>55</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41597-020-0395-9</pub-id>, PMID: <pub-id pub-id-type="pmid">32054853</pub-id></citation></ref>
<ref id="ref36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Letunic</surname> <given-names>I.</given-names></name> <name><surname>Bork</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Interactive tree of life (iTOL) v4: recent updates and new developments</article-title>. <source>Nucleic Acids Res.</source> <volume>47</volume>, <fpage>W256</fpage>&#x2013;<lpage>W259</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkz239</pub-id>, PMID: <pub-id pub-id-type="pmid">30931475</pub-id></citation></ref>
<ref id="ref37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>G.</given-names></name> <name><surname>Chater</surname> <given-names>K. F.</given-names></name> <name><surname>Chandra</surname> <given-names>G.</given-names></name> <name><surname>Niu</surname> <given-names>G.</given-names></name> <name><surname>Tan</surname> <given-names>H.</given-names></name></person-group> (<year>2013</year>). <article-title>Molecular regulation of antibiotic biosynthesis in <italic>Streptomyces</italic></article-title>. <source>Microbiol. Mol. Biol. Rev.</source> <volume>77</volume>, <fpage>112</fpage>&#x2013;<lpage>143</lpage>. doi: <pub-id pub-id-type="doi">10.1128/mmbr.00054-12</pub-id>, PMID: <pub-id pub-id-type="pmid">23471619</pub-id></citation></ref>
<ref id="ref38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lorenz</surname> <given-names>R.</given-names></name> <name><surname>Bernhart</surname> <given-names>S. H.</given-names></name> <name><surname>Siederdissen</surname> <given-names>C. H. Z.</given-names></name> <name><surname>Tafer</surname> <given-names>H.</given-names></name> <name><surname>Flamm</surname> <given-names>C.</given-names></name> <name><surname>Stadler</surname> <given-names>P. F.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>ViennaRNA package 2.0</article-title>. <source>Algorithms Mol. Biol.</source> <volume>6</volume>:<fpage>26</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1748-7188-6-26</pub-id>, PMID: <pub-id pub-id-type="pmid">22115189</pub-id></citation></ref>
<ref id="ref39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lorenzi</surname> <given-names>J. N.</given-names></name> <name><surname>Lespinet</surname> <given-names>O.</given-names></name> <name><surname>Leblond</surname> <given-names>P.</given-names></name> <name><surname>Thibessard</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Subtelomeres are fast-evolving regions of the <italic>Streptomyces</italic> linear chromosome</article-title>. <source>Microb. Genom.</source> <volume>7</volume>. doi: <pub-id pub-id-type="doi">10.1099/mgen.0.000525</pub-id>, PMID: <pub-id pub-id-type="pmid">33749576</pub-id></citation></ref>
<ref id="ref40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mann</surname> <given-names>M.</given-names></name> <name><surname>Wright</surname> <given-names>P. R.</given-names></name> <name><surname>Backofen</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>IntaRNA 2.0: enhanced and customizable prediction of RNA-RNA interactions</article-title>. <source>Nucleic Acids Res.</source> <volume>45</volume>, <fpage>W435</fpage>&#x2013;<lpage>W439</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkx279</pub-id>, PMID: <pub-id pub-id-type="pmid">28472523</pub-id></citation></ref>
<ref id="ref41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mart&#x00ED;n-S&#x00E1;nchez</surname> <given-names>L.</given-names></name> <name><surname>Singh</surname> <given-names>K. S.</given-names></name> <name><surname>Avalos</surname> <given-names>M.</given-names></name> <name><surname>Van Wezel</surname> <given-names>G. P.</given-names></name> <name><surname>Dickschat</surname> <given-names>J. S.</given-names></name> <name><surname>Garbeva</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Phylogenomic analyses and distribution of terpene synthases among <italic>Streptomyces</italic></article-title>. <source>Beilstein J. Org. Chem.</source> <volume>15</volume>, <fpage>1181</fpage>&#x2013;<lpage>1193</lpage>. doi: <pub-id pub-id-type="doi">10.3762/bjoc.15.115</pub-id>, PMID: <pub-id pub-id-type="pmid">31293665</pub-id></citation></ref>
<ref id="ref42"><citation citation-type="other"><person-group person-group-type="author"><name><surname>McDonald</surname> <given-names>B. R.</given-names></name> <name><surname>Currie</surname> <given-names>C. R.</given-names></name></person-group> (<year>2017</year>). <article-title>Lateral gene transfer dynamics in the ancient bacterial genus <italic>Streptomyces</italic></article-title>. <source>mBio</source> <volume>8</volume>, <fpage>e00644</fpage>&#x2013;<lpage>e00717</lpage>. doi: <pub-id pub-id-type="doi">10.1128/mBio.00644-17</pub-id>, PMID: <pub-id pub-id-type="pmid">28588130</pub-id></citation></ref>
<ref id="ref43"><citation citation-type="other"><person-group person-group-type="author"><name><surname>McKay</surname> <given-names>S.</given-names></name></person-group> (<year>2004</year>). bp_genbank2gff3.pl. Available at: <ext-link xlink:href="https://metacpan.org/pod/distribution/BioPerl/bin/bp_genbank2gff3" ext-link-type="uri">https://metacpan.org/pod/distribution/BioPerl/bin/bp_genbank2gff3</ext-link> (Accessed December 12, 2019).</citation></ref>
<ref id="ref44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Medini</surname> <given-names>D.</given-names></name> <name><surname>Donati</surname> <given-names>C.</given-names></name> <name><surname>Tettelin</surname> <given-names>H.</given-names></name> <name><surname>Masignani</surname> <given-names>V.</given-names></name> <name><surname>Rappuoli</surname> <given-names>R.</given-names></name></person-group> (<year>2005</year>). <article-title>The microbial pan-genome</article-title>. <source>Curr. Opin. Genet. Dev.</source> <volume>15</volume>, <fpage>589</fpage>&#x2013;<lpage>594</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.gde.2005.09.006</pub-id>, PMID: <pub-id pub-id-type="pmid">16185861</pub-id></citation></ref>
<ref id="ref45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moghaddam</surname> <given-names>J. A.</given-names></name> <name><surname>Cr&#x00FC;semann</surname> <given-names>M.</given-names></name> <name><surname>Alanjary</surname> <given-names>M.</given-names></name> <name><surname>Harms</surname> <given-names>H.</given-names></name> <name><surname>D&#x00E1;vila-C&#x00E9;spedes</surname> <given-names>A.</given-names></name> <name><surname>Blom</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Analysis of the genome and metabolome of marine myxobacteria reveals high potential for biosynthesis of novel specialized metabolites</article-title>. <source>Sci. Rep.</source> <volume>8</volume>:<fpage>16600</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-018-34954-y</pub-id>, PMID: <pub-id pub-id-type="pmid">30413766</pub-id></citation></ref>
<ref id="ref46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mungan</surname> <given-names>M. D.</given-names></name> <name><surname>Alanjary</surname> <given-names>M.</given-names></name> <name><surname>Blin</surname> <given-names>K.</given-names></name> <name><surname>Weber</surname> <given-names>T.</given-names></name> <name><surname>Medema</surname> <given-names>M. H.</given-names></name> <name><surname>Ziemert</surname> <given-names>N.</given-names></name></person-group> (<year>2020</year>). <article-title>ARTS 2.0: feature updates and expansion of the antibiotic resistant target seeker for comparative genome mining</article-title>. <source>Nucleic Acids Res.</source> <volume>48</volume>, <fpage>W546</fpage>&#x2013;<lpage>W552</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkaa374</pub-id>, PMID: <pub-id pub-id-type="pmid">32427317</pub-id></citation></ref>
<ref id="ref47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nakagawa</surname> <given-names>A.</given-names></name> <name><surname>Iwai</surname> <given-names>Y.</given-names></name> <name><surname>Hashimoto</surname> <given-names>H.</given-names></name> <name><surname>Miyazaki</surname> <given-names>N.</given-names></name> <name><surname>Oiwa</surname> <given-names>R.</given-names></name> <name><surname>Takahashi</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>1981</year>). <article-title>Virantmycin, a new antiviral antibiotic produced by a strain of <italic>Streptomyces</italic></article-title>. <source>J. Antibiot.</source> <volume>34</volume>, <fpage>1408</fpage>&#x2013;<lpage>1415</lpage>. doi: <pub-id pub-id-type="doi">10.7164/antibiotics.34.1408</pub-id>, PMID: <pub-id pub-id-type="pmid">7319904</pub-id></citation></ref>
<ref id="ref48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Navarro-Mu&#x00F1;oz</surname> <given-names>J. C.</given-names></name> <name><surname>Selem-Mojica</surname> <given-names>N.</given-names></name> <name><surname>Mullowney</surname> <given-names>M. W.</given-names></name> <name><surname>Kautsar</surname> <given-names>S. A.</given-names></name> <name><surname>Tryon</surname> <given-names>J. H.</given-names></name> <name><surname>Parkinson</surname> <given-names>E. I.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>A computational framework to explore large-scale biosynthetic diversity</article-title>. <source>Nat. Chem. Biol.</source> <volume>16</volume>, <fpage>60</fpage>&#x2013;<lpage>68</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41589-019-0400-9</pub-id>, PMID: <pub-id pub-id-type="pmid">31768033</pub-id></citation></ref>
<ref id="ref49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Niu</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>Genomics-driven natural product discovery in actinomycetes</article-title>. <source>Trends Biotechnol.</source> <volume>36</volume>, <fpage>238</fpage>&#x2013;<lpage>241</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tibtech.2017.10.009</pub-id>, PMID: <pub-id pub-id-type="pmid">29126570</pub-id></citation></ref>
<ref id="ref50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Page</surname> <given-names>A. J.</given-names></name> <name><surname>Cummins</surname> <given-names>C. A.</given-names></name> <name><surname>Hunt</surname> <given-names>M.</given-names></name> <name><surname>Wong</surname> <given-names>V. K.</given-names></name> <name><surname>Reuter</surname> <given-names>S.</given-names></name> <name><surname>Holden</surname> <given-names>M. T. G.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Roary: rapid large-scale prokaryote pan genome analysis</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>3691</fpage>&#x2013;<lpage>3693</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btv421</pub-id>, PMID: <pub-id pub-id-type="pmid">26198102</pub-id></citation></ref>
<ref id="ref51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>C. J.</given-names></name> <name><surname>Andam</surname> <given-names>C. P.</given-names></name></person-group> (<year>2019</year>). <article-title>Within-species genomic variation and variable patterns of recombination in the tetracycline producer <italic>Streptomyces rimosus</italic></article-title>. <source>Front. Microbiol.</source> <volume>10</volume>:<fpage>552</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2019.00552</pub-id>, PMID: <pub-id pub-id-type="pmid">30949149</pub-id></citation></ref>
<ref id="ref52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parks</surname> <given-names>D. H.</given-names></name> <name><surname>Chuvochina</surname> <given-names>M.</given-names></name> <name><surname>Waite</surname> <given-names>D. W.</given-names></name> <name><surname>Rinke</surname> <given-names>C.</given-names></name> <name><surname>Skarshewski</surname> <given-names>A.</given-names></name> <name><surname>Chaumeil</surname> <given-names>P. A.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life</article-title>. <source>Nat. Biotechnol.</source> <volume>36</volume>, <fpage>996</fpage>&#x2013;<lpage>1004</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nbt.4229</pub-id>, PMID: <pub-id pub-id-type="pmid">30148503</pub-id></citation></ref>
<ref id="ref53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pham</surname> <given-names>J. V.</given-names></name> <name><surname>Yilma</surname> <given-names>M. A.</given-names></name> <name><surname>Feliz</surname> <given-names>A.</given-names></name> <name><surname>Majid</surname> <given-names>M. T.</given-names></name> <name><surname>Maffetone</surname> <given-names>N.</given-names></name> <name><surname>Walker</surname> <given-names>J. R.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>A review of the microbial production of bioactive natural products and biologics</article-title>. <source>Front. Microbiol.</source> <volume>10</volume>:<fpage>1404</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2019.01404</pub-id>, PMID: <pub-id pub-id-type="pmid">31281299</pub-id></citation></ref>
<ref id="ref54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Poorinmohammad</surname> <given-names>N.</given-names></name> <name><surname>Bagheban-Shemirani</surname> <given-names>R.</given-names></name> <name><surname>Hamedi</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>Genome mining for ribosomally synthesised and post-translationally modified peptides (RiPPs) reveals undiscovered bioactive potentials of actinobacteria</article-title>. <source>Antonie Van Leeuwenhoek</source> <volume>112</volume>, <fpage>1477</fpage>&#x2013;<lpage>1499</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s10482-019-01276-6</pub-id>, PMID: <pub-id pub-id-type="pmid">31123844</pub-id></citation></ref>
<ref id="ref55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Price</surname> <given-names>M. N.</given-names></name> <name><surname>Dehal</surname> <given-names>P. S.</given-names></name> <name><surname>Arkin</surname> <given-names>A. P.</given-names></name></person-group> (<year>2010</year>). <article-title>FastTree 2&#x2014;approximately maximum-likelihood trees for large alignments</article-title>. <source>PLoS One</source> <volume>5</volume>:<fpage>e9490</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0009490</pub-id>, PMID: <pub-id pub-id-type="pmid">20224823</pub-id></citation></ref>
<ref id="ref56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quevillon</surname> <given-names>E.</given-names></name> <name><surname>Silventoinen</surname> <given-names>V.</given-names></name> <name><surname>Pillai</surname> <given-names>S.</given-names></name> <name><surname>Harte</surname> <given-names>N.</given-names></name> <name><surname>Mulder</surname> <given-names>N.</given-names></name> <name><surname>Apweiler</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2005</year>). <article-title>InterProScan: protein domains identifier</article-title>. <source>Nucleic Acids Res.</source> <volume>33</volume>, <fpage>W116</fpage>&#x2013;<lpage>W120</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gki442</pub-id>, PMID: <pub-id pub-id-type="pmid">15980438</pub-id></citation></ref>
<ref id="ref57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Read</surname> <given-names>B. A.</given-names></name> <name><surname>Kegel</surname> <given-names>J.</given-names></name> <name><surname>Klute</surname> <given-names>M. J.</given-names></name> <name><surname>Kuo</surname> <given-names>A.</given-names></name> <name><surname>Lefebvre</surname> <given-names>S. C.</given-names></name> <name><surname>Maumus</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Pan genome of the phytoplankton <italic>Emiliania</italic> underpins its global distribution</article-title>. <source>Nature</source> <volume>499</volume>, <fpage>209</fpage>&#x2013;<lpage>213</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nature12221</pub-id>, PMID: <pub-id pub-id-type="pmid">23760476</pub-id></citation></ref>
<ref id="ref58"><citation citation-type="journal">&#xFEFF;<person-group person-group-type="author"><name><surname>Richards</surname> <given-names>G. R.</given-names></name> <name><surname>Vanderpool</surname> <given-names>C. K.</given-names></name></person-group> (<year>2011</year>). <article-title>Molecular call and response: the physiology of bacterial small RNAs</article-title>. <source>Biochim. Biophys. Acta</source> <volume>1809</volume>, <fpage>525</fpage>&#x2013;<lpage>531</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.bbagrm.2011.07.013</pub-id>, PMID: <pub-id pub-id-type="pmid">21843668</pub-id></citation></ref>
<ref id="ref59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Richter</surname> <given-names>A. A.</given-names></name> <name><surname>Mais</surname> <given-names>C.-N.</given-names></name> <name><surname>Czech</surname> <given-names>L.</given-names></name> <name><surname>Geyer</surname> <given-names>K.</given-names></name> <name><surname>Hoeppner</surname> <given-names>A.</given-names></name> <name><surname>Smits</surname> <given-names>S. H. J.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Biosynthesis of the stress-protectant and chemical chaperon ectoine: biochemistry of the transaminase EctB</article-title>. <source>Front. Microbiol.</source> <volume>10</volume>:<fpage>2811</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2019.02811</pub-id>, PMID: <pub-id pub-id-type="pmid">31921013</pub-id></citation></ref>
<ref id="ref60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Richter</surname> <given-names>M.</given-names></name> <name><surname>Rossell&#x00F3;-M&#x00F3;ra</surname> <given-names>R.</given-names></name> <name><surname>Gl&#x00F6;ckner</surname> <given-names>F. O.</given-names></name> <name><surname>Peplies</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison</article-title>. <source>Bioinformatics</source> <volume>32</volume>, <fpage>929</fpage>&#x2013;<lpage>931</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btv681</pub-id>, PMID: <pub-id pub-id-type="pmid">26576653</pub-id></citation></ref>
<ref id="ref61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rossi</surname> <given-names>C. C.</given-names></name> <name><surname>Bosse</surname> <given-names>J. T.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Witney</surname> <given-names>A. A.</given-names></name> <name><surname>Gould</surname> <given-names>K. A.</given-names></name> <name><surname>Langford</surname> <given-names>P. R.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>A computational strategy for the search of regulatory small RNAs in <italic>Actinobacillus pleuropneumoniae</italic></article-title>. <source>RNA</source> <volume>22</volume>, <fpage>1373</fpage>&#x2013;<lpage>1385</lpage>. doi: <pub-id pub-id-type="doi">10.1261/rna.055129.115</pub-id>, PMID: <pub-id pub-id-type="pmid">27402897</pub-id></citation></ref>
<ref id="ref62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>R&#x00FC;ckert</surname> <given-names>C.</given-names></name> <name><surname>Albersmeier</surname> <given-names>A.</given-names></name> <name><surname>Busche</surname> <given-names>T.</given-names></name> <name><surname>Jaenicke</surname> <given-names>S.</given-names></name> <name><surname>Winkler</surname> <given-names>A.</given-names></name> <name><surname>Fridj&#x00F3;nsson</surname> <given-names>&#x00D3;. H.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Complete genome sequence of <italic>Streptomyces lividans</italic> TK24</article-title>. <source>J. Biotechnol.</source> <volume>199</volume>, <fpage>21</fpage>&#x2013;<lpage>22</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jbiotec.2015.02.004</pub-id>, PMID: <pub-id pub-id-type="pmid">25680930</pub-id></citation></ref>
<ref id="ref63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shannon</surname> <given-names>P.</given-names></name> <name><surname>Markiel</surname> <given-names>A.</given-names></name> <name><surname>Ozier</surname> <given-names>O.</given-names></name> <name><surname>Baliga</surname> <given-names>N. S.</given-names></name> <name><surname>Wang</surname> <given-names>J. T.</given-names></name> <name><surname>Ramage</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Cytoscape: a software environment for integrated models of biomolecular interaction networks</article-title>. <source>Genome Res.</source> <volume>13</volume>, <fpage>2498</fpage>&#x2013;<lpage>2504</lpage>. doi: <pub-id pub-id-type="doi">10.1101/gr.1239303</pub-id>, PMID: <pub-id pub-id-type="pmid">14597658</pub-id></citation></ref>
<ref id="ref64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sim&#x00E3;o</surname> <given-names>F. A.</given-names></name> <name><surname>Waterhouse</surname> <given-names>R. M.</given-names></name> <name><surname>Ioannidis</surname> <given-names>P.</given-names></name> <name><surname>Kriventseva</surname> <given-names>E. V.</given-names></name> <name><surname>Zdobnov</surname> <given-names>E. M.</given-names></name></person-group> (<year>2015</year>). <article-title>BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>3210</fpage>&#x2013;<lpage>3212</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btv351</pub-id>, PMID: <pub-id pub-id-type="pmid">26059717</pub-id></citation></ref>
<ref id="ref65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Snipen</surname> <given-names>L.</given-names></name> <name><surname>Liland</surname> <given-names>K. H.</given-names></name></person-group> (<year>2015</year>). <article-title>Micropan: an R-package for microbial pan-genomics</article-title>. <source>BMC Bioinformatics</source> <volume>16</volume>:<fpage>79</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12859-015-0517-0</pub-id>, PMID: <pub-id pub-id-type="pmid">25888166</pub-id></citation></ref>
<ref id="ref66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Snipen</surname> <given-names>L.</given-names></name> <name><surname>Ussery</surname> <given-names>D. W.</given-names></name></person-group> (<year>2013</year>). <article-title>A domain sequence approach to pangenomics: applications to <italic>Escherichia coli</italic></article-title>. <source>F1000Res</source> <volume>1</volume>:<fpage>19</fpage>. doi: <pub-id pub-id-type="doi">10.12688/f1000research.1-19.v2</pub-id>, PMID: <pub-id pub-id-type="pmid">24555018</pub-id></citation></ref>
<ref id="ref67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spring-Pearson</surname> <given-names>S. M.</given-names></name> <name><surname>Stone</surname> <given-names>J. K.</given-names></name> <name><surname>Doyle</surname> <given-names>A.</given-names></name> <name><surname>Allender</surname> <given-names>C. J.</given-names></name> <name><surname>Okinaka</surname> <given-names>R. T.</given-names></name> <name><surname>Mayo</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Pangenome analysis of burkholderia pseudomallei: genome evolution preserves gene order despite high recombination rates</article-title>. <source>PLoS One</source> <volume>10</volume>:<fpage>e0140274</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0140274</pub-id>, PMID: <pub-id pub-id-type="pmid">26484663</pub-id></citation></ref>
<ref id="ref68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sridhar</surname> <given-names>J.</given-names></name> <name><surname>Gunasekaran</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>Computational small RNA prediction in bacteria</article-title>. <source>Bioinform. Biol. Insights</source> <volume>7</volume>, <fpage>83</fpage>&#x2013;<lpage>95</lpage>. doi: <pub-id pub-id-type="doi">10.4137/BBI.S11213</pub-id>, PMID: <pub-id pub-id-type="pmid">23516022</pub-id></citation></ref>
<ref id="ref69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Mill&#x00E1;n-Agui&#x00F1;aga</surname> <given-names>N.</given-names></name> <name><surname>Zhang</surname> <given-names>J. J.</given-names></name> <name><surname>O&#x2019;Neill</surname> <given-names>E. C.</given-names></name> <name><surname>Ugalde</surname> <given-names>J. A.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Identification of thiotetronic acid antibiotic biosynthetic pathways by target-directed genome mining</article-title>. <source>ACS Chem. Biol.</source> <volume>10</volume>, <fpage>2841</fpage>&#x2013;<lpage>2849</lpage>. doi: <pub-id pub-id-type="doi">10.1021/acschembio.5b00658</pub-id>, PMID: <pub-id pub-id-type="pmid">26458099</pub-id></citation></ref>
<ref id="ref70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tettelin</surname> <given-names>H.</given-names></name> <name><surname>Riley</surname> <given-names>D.</given-names></name> <name><surname>Cattuto</surname> <given-names>C.</given-names></name> <name><surname>Medini</surname> <given-names>D.</given-names></name></person-group> (<year>2008</year>). <article-title>Comparative genomics: the bacterial pan-genome</article-title>. <source>Curr. Opin. Microbiol.</source> <volume>11</volume>, <fpage>472</fpage>&#x2013;<lpage>477</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.mib.2008.09.006</pub-id>, PMID: <pub-id pub-id-type="pmid">19086349</pub-id></citation></ref>
<ref id="ref71"><citation citation-type="journal"><person-group person-group-type="author"><collab id="coll1">The Gene Ontology Consortium</collab></person-group> (<year>2019</year>). <article-title>The gene ontology resource: 20 years and still going strong</article-title>. <source>Nucleic Acids Res.</source> <volume>47</volume>, <fpage>D330</fpage>&#x2013;<lpage>D338</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gky1055</pub-id>, PMID: <pub-id pub-id-type="pmid">30395331</pub-id></citation></ref>
<ref id="ref72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thorpe</surname> <given-names>H. A.</given-names></name> <name><surname>Bayliss</surname> <given-names>S. C.</given-names></name> <name><surname>Sheppard</surname> <given-names>S. K.</given-names></name> <name><surname>Feil</surname> <given-names>E. J.</given-names></name></person-group> (<year>2018</year>). <article-title>Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria</article-title>. <source>Gigascience</source> <volume>7</volume>, <fpage>1</fpage>&#x2013;<lpage>11</lpage>. doi: <pub-id pub-id-type="doi">10.1093/gigascience/giy015</pub-id>, PMID: <pub-id pub-id-type="pmid">29635296</pub-id></citation></ref>
<ref id="ref73"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Yang</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Comparative genomics analysis of <italic>Streptomyces</italic> species reveals their adaptation to the marine environment and their diversity at the genomic level</article-title>. <source>Front. Microbiol.</source> <volume>7</volume>:<fpage>998</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2016.00998</pub-id>, PMID: <pub-id pub-id-type="pmid">27446038</pub-id></citation></ref>
<ref id="ref74"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tonkin-Hill</surname> <given-names>G.</given-names></name> <name><surname>MacAlasdair</surname> <given-names>N.</given-names></name> <name><surname>Ruis</surname> <given-names>C.</given-names></name> <name><surname>Weimann</surname> <given-names>A.</given-names></name> <name><surname>Horesh</surname> <given-names>G.</given-names></name> <name><surname>Lees</surname> <given-names>J. A.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Producing polished prokaryotic pangenomes with the Panaroo pipeline</article-title>. <source>Genome Biol.</source> <volume>21</volume>:<fpage>180</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s13059-020-02090-4</pub-id>, PMID: <pub-id pub-id-type="pmid">32698896</pub-id></citation></ref>
<ref id="ref75"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tran</surname> <given-names>P. N.</given-names></name> <name><surname>Yen</surname> <given-names>M. R.</given-names></name> <name><surname>Chiang</surname> <given-names>C. Y.</given-names></name> <name><surname>Lin</surname> <given-names>H. C.</given-names></name> <name><surname>Chen</surname> <given-names>P. Y.</given-names></name></person-group> (<year>2019</year>). <article-title>Detecting and prioritizing biosynthetic gene clusters for bioactive compounds in bacteria and fungi</article-title>. <source>Appl. Microbiol. Biotechnol.</source> <volume>103</volume>, <fpage>3277</fpage>&#x2013;<lpage>3287</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s00253-019-09708-z</pub-id>, PMID: <pub-id pub-id-type="pmid">30859257</pub-id></citation></ref>
<ref id="ref76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsai</surname> <given-names>C. H.</given-names></name> <name><surname>Liao</surname> <given-names>R.</given-names></name> <name><surname>Chou</surname> <given-names>B.</given-names></name> <name><surname>Palumbo</surname> <given-names>M.</given-names></name> <name><surname>Contreras</surname> <given-names>L. M.</given-names></name></person-group> (<year>2015</year>). <article-title>Genome-wide analyses in bacteria show small-RNA enrichment for long and conserved intergenic regions</article-title>. <source>J. Bacteriol.</source> <volume>197</volume>, <fpage>40</fpage>&#x2013;<lpage>50</lpage>. doi: <pub-id pub-id-type="doi">10.1128/JB.02359-14</pub-id>, PMID: <pub-id pub-id-type="pmid">25313390</pub-id></citation></ref>
<ref id="ref77"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vicente</surname> <given-names>C.</given-names></name> <name><surname>Thibessard</surname> <given-names>A.</given-names></name> <name><surname>Lorenzi</surname> <given-names>J.-N.</given-names></name> <name><surname>Benhadj</surname> <given-names>M.</given-names></name> <name><surname>H&#x00F4;tel</surname> <given-names>L.</given-names></name> <name><surname>Gacemi-Kirane</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Comparative genomics among closely related <italic>Streptomyces</italic> strains revealed specialized metabolite biosynthetic gene cluster diversity</article-title>. <source>Antibiotics</source> <volume>7</volume>:<fpage>86</fpage>. doi: <pub-id pub-id-type="doi">10.3390/antibiotics7040086</pub-id>, PMID: <pub-id pub-id-type="pmid">30279346</pub-id></citation></ref>
<ref id="ref78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vior</surname> <given-names>N. M.</given-names></name> <name><surname>Cea-Torrescassana</surname> <given-names>E.</given-names></name> <name><surname>Eyles</surname> <given-names>T. H.</given-names></name> <name><surname>Chandra</surname> <given-names>G.</given-names></name> <name><surname>Truman</surname> <given-names>A. W.</given-names></name></person-group> (<year>2020</year>). <article-title>Regulation of bottromycin biosynthesis involves an internal transcriptional start site and a cluster-situated modulator</article-title>. <source>Front. Microbiol.</source> <volume>11</volume>:<fpage>495</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2020.00495</pub-id>, PMID: <pub-id pub-id-type="pmid">32273872</pub-id></citation></ref>
<ref id="ref79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Fewer</surname> <given-names>D. P.</given-names></name> <name><surname>Holm</surname> <given-names>L.</given-names></name> <name><surname>Rouhiainen</surname> <given-names>L.</given-names></name> <name><surname>Sivonen</surname> <given-names>K.</given-names></name></person-group> (<year>2014</year>). <article-title>Atlas of nonribosomal peptide and polyketide biosynthetic pathways reveals common occurrence of nonmodular enzymes</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>111</volume>, <fpage>9259</fpage>&#x2013;<lpage>9264</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1401734111</pub-id>, PMID: <pub-id pub-id-type="pmid">24927540</pub-id></citation></ref>
<ref id="ref80"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>L.</given-names></name> <name><surname>Chen</surname> <given-names>G.</given-names></name> <name><surname>Feng</surname> <given-names>G.</given-names></name></person-group> (<year>2017a</year>). <article-title>Complete genome sequence of <italic>Streptomyces griseochromogenes</italic> ATCC 14511T, a producer of nucleoside compounds and diverse secondary metabolites</article-title>. <source>J. Biotechnol.</source> <volume>249</volume>, <fpage>16</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jbiotec.2017.03.017</pub-id>, PMID: <pub-id pub-id-type="pmid">28342817</pub-id></citation></ref>
<ref id="ref81"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>H.</given-names></name> <name><surname>Liu</surname> <given-names>W.</given-names></name> <name><surname>Shi</surname> <given-names>L.</given-names></name> <name><surname>Si</surname> <given-names>K.</given-names></name> <name><surname>Liu</surname> <given-names>T.</given-names></name> <name><surname>Dong</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2017b</year>). <article-title>Comparative genomic and regulatory analyses of natamycin production of <italic>Streptomyces lydicus</italic> A02</article-title>. <source>Sci. Rep.</source> <volume>7</volume>:<fpage>9114</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-017-09532-3</pub-id>, PMID: <pub-id pub-id-type="pmid">28831190</pub-id></citation></ref>
<ref id="ref82"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>S.</given-names></name> <name><surname>Zhu</surname> <given-names>Z.</given-names></name> <name><surname>Fu</surname> <given-names>L.</given-names></name> <name><surname>Niu</surname> <given-names>B.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name></person-group> (<year>2011</year>). <article-title>WebMGA: a customizable web server for fast metagenomic sequence analysis</article-title>. <source>BMC Genomics</source> <volume>12</volume>:<fpage>444</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2164-12-444</pub-id>, PMID: <pub-id pub-id-type="pmid">21899761</pub-id></citation></ref>
<ref id="ref83"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>L.</given-names></name> <name><surname>Ye</surname> <given-names>K. X.</given-names></name> <name><surname>Dai</surname> <given-names>W. H.</given-names></name> <name><surname>Sun</surname> <given-names>C.</given-names></name> <name><surname>Xu</surname> <given-names>L. H.</given-names></name> <name><surname>Han</surname> <given-names>B. N.</given-names></name></person-group> (<year>2019</year>). <article-title>Comparative genomic insights into secondary metabolism biosynthetic gene cluster distributions of marine <italic>Streptomyces</italic></article-title>. <source>Mar. Drugs</source> <volume>17</volume>:<fpage>498</fpage>. doi: <pub-id pub-id-type="doi">10.3390/md17090498</pub-id>, PMID: <pub-id pub-id-type="pmid">31454987</pub-id></citation></ref>
<ref id="ref84"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yan</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>N.</given-names></name> <name><surname>Tang</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Recent developments in self-resistance gene directed natural product discovery</article-title>. <source>Nat. Prod. Rep.</source> <volume>37</volume>, <fpage>879</fpage>&#x2013;<lpage>892</lpage>. doi: <pub-id pub-id-type="doi">10.1039/c9np00050j</pub-id>, PMID: <pub-id pub-id-type="pmid">31912842</pub-id></citation></ref>
<ref id="ref85"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ye</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Cui</surname> <given-names>H.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Cheng</surname> <given-names>Y.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>WEGO 2.0: a web tool for analyzing and plotting GO annotations, 2018 update</article-title>. <source>Nucleic Acids Res.</source> <volume>46</volume>, <fpage>W71</fpage>&#x2013;<lpage>W75</lpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gky400</pub-id>, PMID: <pub-id pub-id-type="pmid">29788377</pub-id></citation></ref>
<ref id="ref86"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Z.</given-names></name> <name><surname>Gu</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>Y. Q.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name></person-group> (<year>2012</year>). <article-title>Genome plasticity and systems evolution in <italic>Streptomyces</italic></article-title>. <source>BMC Bioinformatics</source> <volume>13</volume>:<fpage>S8</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-13-S10-S8</pub-id>, PMID: <pub-id pub-id-type="pmid">22759432</pub-id></citation></ref></ref-list></back></article>