<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2021.632567</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>FiberGrowth Pipeline: A Framework Toward Predicting Fiber-Specific Growth From Human Gut Bacteroidetes Genomes</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Colnet</surname> <given-names>B&#x00E9;n&#x00E9;dicte</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1447190/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Sieber</surname> <given-names>Christian M. K.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/340813/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Perraudeau</surname> <given-names>Fanny</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1447157/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Leclerc</surname> <given-names>Marion</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/135065/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Pendulum Therapeutics</institution>, <addr-line>San Francisco, CA</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Mines Paristech</institution>, <addr-line>Paris</addr-line>, <country>France</country></aff>
<aff id="aff3"><sup>3</sup><institution>Universit&#x00E9; Paris Saclay, INRAe, AgroParisTech, Micalis Institute</institution>, <addr-line>Jouy en Josas</addr-line>, <country>France</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Thierry Giardina, Aix-Marseille Universit&#x00E9;, France</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Phil B. Pope, Norwegian University of Life Sciences, Norway; Arnaud Basle, Newcastle University, United Kingdom</p></fn>
<corresp id="c001">&#x002A;Correspondence: Marion Leclerc, <email>marion.leclerc@inrae.fr</email></corresp>
<fn fn-type="equal" id="fn002"><p><sup>&#x2020;</sup>These authors have contributed equally to this work</p></fn>
<fn fn-type="other" id="fn004"><p>This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>06</day>
<month>10</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>632567</elocation-id>
<history>
<date date-type="received">
<day>23</day>
<month>11</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>06</day>
<month>09</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 Colnet, Sieber, Perraudeau and Leclerc.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Colnet, Sieber, Perraudeau and Leclerc</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Dietary fibers impact gut colonic health, through the production of short-chain fatty acids. A low-fiber diet has been linked to lower bacterial diversity, obesity, type 2 diabetes, and promotion of mucosal pathogens. Glycoside hydrolases (GHs) are important enzymes involved in the bacterial catabolism of fiber into short-chain fatty acids. However, the GH involved in glycan breakdown (adhesion, hydrolysis, and fermentation) are organized in polysaccharide utilization loci (PUL) with complex modularity. Our goal was to explore how the capacity of strains, from the Bacteroidetes phylum, to grow on fiber could be predicted from their genome sequences. We designed an <italic>in silico</italic> pipeline called FiberGrowth and independently validated it for seven different fibers, on 28 genomes from Bacteroidetes-type strains. To do so, we compared the existing GH annotation tools and built PUL models by using published growth and gene expression data. FiberGrowth&#x2019;s prediction performance in terms of true positive rate (TPR) and false positive rate (FPR) strongly depended on available data and fiber: arabinoxylan (TPR: 0.89 and FPR: 0), inulin (0.95 and 0.33), heparin (0.8 and 0.22) laminarin (0.38 and 0.17), levan (0.3 and 0.06), mucus (0.13 and 0.38), and starch (0.73 and 0.41). Being able to better predict fiber breakdown by bacterial strains would help to understand their impact on human nutrition and health. Assuming further gene expression experiment along with discoveries on structural analysis, we hope computational tools like FiberGrowth will help researchers prioritize and design <italic>in vitro</italic> experiments.</p>
</abstract>
<kwd-group>
<kwd>PUL</kwd>
<kwd>human gut bacteria</kwd>
<kwd>fiber</kwd>
<kwd>prebiotics</kwd>
<kwd>annotation</kwd>
<kwd>glycosyl hydrolase</kwd>
<kwd>growth prediction</kwd>
<kwd>Bacteroidetes</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="1"/>
<equation-count count="2"/>
<ref-count count="69"/>
<page-count count="13"/>
<word-count count="11199"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="S1">
<title>Introduction</title>
<p>The human large intestine supports an extremely dense and diverse microbial community&#x2014;up to 100 trillion individuals&#x2014;known as the gut microbiota (<xref ref-type="bibr" rid="B5">B&#x00E4;ckhed et al., 2005</xref>). Over the last decade, the microbiome has been shown to play an important role in human health, and numerous studies have documented the link between microbiota composition and metabolic diseases, such as type 2 diabetes (T2D) (<xref ref-type="bibr" rid="B39">Qin et al., 2012</xref>), obesity, colorectal cancer (<xref ref-type="bibr" rid="B56">Thomas et al., 2019</xref>), immune response to treatment (<xref ref-type="bibr" rid="B50">Tanoue et al., 2019</xref>), and inflammatory bowel diseases such as Crohn&#x2019;s disease (<xref ref-type="bibr" rid="B22">Henke et al., 2019</xref>), and also athletic performance (<xref ref-type="bibr" rid="B43">Scheiman et al., 2019</xref>). Microbial dysbiosis has been correlated to modern lifestyle, environmental parameters, medication, and western diet (<xref ref-type="bibr" rid="B35">Mosca et al., 2016</xref>). Indeed, one of the main drivers of microbiota composition has been shown to be diet, with long-term differences and fast responses to drastic diet changes, both in the metagenome and in the transcriptome (<xref ref-type="bibr" rid="B18">Filippo et al., 2010</xref>; <xref ref-type="bibr" rid="B10">David et al., 2013</xref>; <xref ref-type="bibr" rid="B51">Tap et al., 2015</xref>). One of the parameters playing a part in the booming number of individuals affected by metabolic disorders is the reduction of polysaccharide diversity in the day-to-day diet. As an example, migration from a non-Western country to the United States has been associated with immediate loss of gut microbiome diversity and function (<xref ref-type="bibr" rid="B58">Vangay et al., 2018</xref>). The mechanism of action being the depletion of dietary fibers&#x2014;a nutrient category that includes a broad array of polysaccharides that are not digestible by human enzymes&#x2014;in industrialized countries&#x2019; diet (<xref ref-type="bibr" rid="B7">Burkitt et al., 1972</xref>; <xref ref-type="bibr" rid="B17">Faith et al., 2011</xref>; <xref ref-type="bibr" rid="B46">Sonnenburg and Sonnenburg, 2014</xref>; <xref ref-type="bibr" rid="B68">Zmora et al., 2018</xref>).</p>
<p>However, the responses to a given diet are characterized by a large and not-yet-understood individual variability (<xref ref-type="bibr" rid="B30">Leshem et al., 2020</xref>) that complicates the design of specific diets or targeted foods and the understanding of glycan breakdown. The human microbiota produces complementary enzymes enabling the depolymerization and hydrolysis of dietary polysaccharides into sugars that can further be fermented into short-chain fatty acids (SCFAs). The enzymes completing this task, named carbohydrate-active enzymes (CAZymes), are involved in complex metabolic networks for the synthesis [glycosyltransferases (GTs)], degradation [glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), and enzymes for the auxiliary activities (AAs)], and recognition [carbohydrate-binding module (CBM)] of all the carbohydrates on Earth. CAZymes are found in all living organisms (typically 1&#x2013;3% of the gene content) and are particularly abundant (more than 3% of the gene content) in plants and microbes, e.g., <italic>Bacteroides thetaiotaomicron</italic> encodes for 391 CAZymes, which represent 8.2% of its genomic/gene content. In humans eating more fiber for 5 days, the expression of GH related to dietary fiber increased while mucus degrading GH were downregulated (<xref ref-type="bibr" rid="B51">Tap et al., 2015</xref>). Another argument supporting the link between CAZyme gene presence and strains&#x2019; metabolic capacities is the trend between CAZyme gene count and diversity, being almost a classifier of the bacterial genomes&#x2019; phylogeny (<xref ref-type="bibr" rid="B16">El Kaoutari et al., 2013</xref>). CAZyme gene expression was demonstrated <italic>in vitro</italic> using RNA-seq when strains were grown on specific polysaccharides (<xref ref-type="bibr" rid="B41">Rey et al., 2010</xref>; <xref ref-type="bibr" rid="B34">Martens et al., 2011</xref>; <xref ref-type="bibr" rid="B44">Scott et al., 2011</xref>). However, except for the Bacteroidetes phylum, there are still very few enzymatic systems being described and characterized. Annotating the glycosyl hydrolase genes and their loci organization/synteny is mandatory to characterize the human gut microbial capacity to breakdown glycans. Today, the CAZyme database of sequences and subfamilies is a reference for such genes. The database is built by an academic laboratory using manual expert editing of the annotation (<xref ref-type="bibr" rid="B9">Cantarel et al., 2008</xref>), while an open-source annotation tool called dbCAN is also available (<xref ref-type="bibr" rid="B65">Yin et al., 2012</xref>; <xref ref-type="bibr" rid="B66">Zhang et al., 2018</xref>).</p>
<p>Annotating CAZyme genes is not an easy task due to the modularity of the gene structure (<xref ref-type="bibr" rid="B9">Cantarel et al., 2008</xref>). Beyond difficulties to annotate CAZyme genes, the practical question to further investigate the link between the metabolic disorders (phenotype-level) and the genomic content of the microbiome is still an open question. Several publications reported that the CAZyme genes&#x2019; abundance in genomes could account for a richer metabolic network able to degrade different fibers or carbohydrates. Despite the link between specific GH and specific fibers documented by several authors (<xref ref-type="bibr" rid="B67">Zhao et al., 2018</xref>; <xref ref-type="bibr" rid="B28">Kovatcheva-Datchary et al., 2019</xref>), a consistent and complete list of CAZyme genes associated to a specific fiber is currently missing to any newcomer willing to understand the metabolic capacities of a strain from its genome sequence (as a starting point).</p>
<p>In addition to a complicated underlying link between carbohydrate breakdown and specific CAZy genes, taking into account the polysaccharide utilization loci (PUL) appears critical to understand the capacity of strains to hydrolase carbohydrates, in particular for the Bacteroidetes (<xref ref-type="bibr" rid="B29">Lap&#x00E9;bie et al., 2019</xref>). Specifically, within the Bacteroidetes phylum, PULs are reported to be the genomic area that encodes the capacity to attach, degrade the fiber, and import oligomers. The term PUL was first coined by <xref ref-type="bibr" rid="B6">Bjursell et al. (2006)</xref> to describe clusters of colocalized, coregulated genes that contain functions such as detection, sequestration, enzymatic digestion, and transport of complex carbohydrates (<xref ref-type="bibr" rid="B33">Martens et al., 2009</xref>; <xref ref-type="bibr" rid="B21">Grondin et al., 2017</xref>). Indeed, the PULs encode a complement of cell surface glycan-binding proteins (SGBPs), TonB-dependent transporters (TBDTs), CAZymes (most frequently GHs and also PLs and CEs), and carbohydrate sensors/transcriptional regulators. Less reported in the literature, a similar structure has also been mentioned for Gram-positives (gpPULs) for butyrate-producing species belonging to the Firmicutes (<xref ref-type="bibr" rid="B45">Sheridan et al., 2015</xref>). Therefore, glycosyl hydrolases are encoded within operons that are not taken into account when simply annotating CAZymes. Currently, obtaining PUL annotation from a genome is not straightforward, and we found two available published resources. The PULDB database (<xref ref-type="bibr" rid="B54">Terrapon et al., 2017</xref>) of experimentally and non-experimentally proven PULs in Bacteroidetes is built as an extension to the CAZy database. The other one is a prediction tool called PULpy, identifying CAZymes that are co-localized with susCD gene pairs. The authors present their tool as a public version of the PULDB algorithm (<xref ref-type="bibr" rid="B48">Stewart et al., 2018</xref>). These current resources have drawbacks for non-experts: the first database has to be queried with either fibers or known species. Therefore, it is hardly usable for any new isolated strain. In addition, some interesting polysaccharides with prebiotic properties, such as inulin, are missing. The other tool could help since it provides an algorithm searching for hits similar to PULDB. However, the output is a prediction of a potential PUL with a number referring to the PULDB number and not specifically to a carbohydrate.</p>
<p>Taking this context into account, the goal of this manuscript is to attempt to bridge the gap between the microbiome metabolic capacities and strains&#x2019; genomes using a predictive model. Our approach is based on a simple microbiological standpoint where we consider the strains&#x2019; ability to grow on a specific carbohydrate as a measure of their metabolic capacities. This simple proxy allows us to have a simple experimental measure for which we suppose the specific genomic content could be predictive of such capacities. Hence, from a genome sequence, microbiologists could obtain a prediction of the metabolic capacities of a strain without having it grown on the fiber. The proof of principle is based on a benchmark data set that documented the growth differences of strains on different carbohydrates. We show that in taking PUL structures into account, our FiberGrowth tool improves growth prediction in comparison to only relying on CAZyme annotations of single genes.</p>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="S2.SS1">
<title>Carbohydrate-Active Enzyme Annotation</title>
<p>Annotation of CAZymes was done using the open-source dbCAN2 pipeline (<xref ref-type="bibr" rid="B65">Yin et al., 2012</xref>; <xref ref-type="bibr" rid="B66">Zhang et al., 2018</xref>), which relies on three different algorithms: (1) hidden Markov models (HMMER) (<xref ref-type="bibr" rid="B15">Eddy, 2011</xref>), (2) alignment (DIAMOND) (<xref ref-type="bibr" rid="B24">Buchfink et al., 2015</xref>), and (3) peptide recognition (Hotpep) (<xref ref-type="bibr" rid="B8">Busk et al., 2017</xref>). The outputs are the genes associated with a CAZy gene based on each of the three algorithms. We chose to use the majority consensus rule: when a hit is found by two algorithms out of the three, then the gene is considered as a CAZy gene. As expected, the alignment algorithm (DIAMOND) provides more hits because of the modular structure of GH genes. In this manuscript, the pipeline dbCAN2 was used with the default settings.</p>
<p>In addition, we obtained CAZyme annotations from the CAZy database from B. Henrissat. The annotation was done in two steps: first, a BLASTP analysis of the predicted ORFs against the full-length sequences included in the CAZy database is performed (<xref ref-type="bibr" rid="B9">Cantarel et al., 2008</xref>). Second, the remaining sequences are manually analyzed by both (i) a BLAST search against individual GH, PL, CE, CBM, and GT modules and (ii) a HMMER3 search using hidden Markov models built for each CAZy module family. Raw CAZy annotations are presented in <xref ref-type="supplementary-material" rid="DS2">Supplementary Tables S1</xref>, <xref ref-type="supplementary-material" rid="DS2">S2</xref>. The strain selection was performed using the available growth dataset used in this manuscript.</p>
</sec>
<sec id="S2.SS2">
<title>Glycoside Hydrolase Annotation Comparisons</title>
<p>The comparison between CAZy and dbCAN2 annotations was performed on 54 genomes on which 87 different GH families were screened (<xref ref-type="supplementary-material" rid="DS2">Supplementary Table S3</xref>) on the family level (e.g., the subfamily GH43_1 was considered as GH43 family).</p>
</sec>
<sec id="S2.SS3">
<title>Growth Prediction Using Only Glycoside Hydrolase</title>
<p>To test how growth prediction performs with only one GH, we gathered such associations from four different publications (<xref ref-type="bibr" rid="B16">El Kaoutari et al., 2013</xref>; <xref ref-type="bibr" rid="B37">Park et al., 2018</xref>; <xref ref-type="bibr" rid="B67">Zhao et al., 2018</xref>; <xref ref-type="bibr" rid="B28">Kovatcheva-Datchary et al., 2019</xref>). The results are shown in <xref ref-type="supplementary-material" rid="DS1">Supplementary Figure S1</xref>.</p>
</sec>
<sec id="S2.SS4">
<title>Building Fiber-Specific Polysaccharide Utilization Loci Models</title>
<p>PUL models were constructed using CAZyme genes, publicly available gene expression data, and previously published data from growth experiments with RNA expression measurements (<xref ref-type="fig" rid="F1">Figure 1</xref>). In the first step, we created candidate PULs based on previously published literature showing an association between CAZymes and fiber metabolism (<xref ref-type="supplementary-material" rid="DS1">Supplementary Figure S2</xref>; <xref ref-type="bibr" rid="B16">El Kaoutari et al., 2013</xref>; <xref ref-type="bibr" rid="B37">Park et al., 2018</xref>; <xref ref-type="bibr" rid="B67">Zhao et al., 2018</xref>; <xref ref-type="bibr" rid="B28">Kovatcheva-Datchary et al., 2019</xref>). We then refined the candidate PULs in analyzing available gene expression data from growth experiments with fiber-enriched media to refine the gene composition of PULs (<xref ref-type="supplementary-material" rid="DS2">Supplementary Table S4</xref>). Co-expressed neighboring genes of the candidate PULs were added, and genes without significant change in gene expression were removed. In addition, the genome sequence of 12 strains in combination with available growth data (<xref ref-type="bibr" rid="B12">Desai et al., 2016</xref>) was used to identify variations in gene composition and order for each PUL model. We then retrieved the gene family hidden Markov models (HMMs) for each gene of the PUL models from Pfam. If no Pfam annotation was available, we built a custom HMM by searching NCBI nr (<xref ref-type="bibr" rid="B42">Sayers et al., 2020</xref>) for orthologous genes. The retrieved amino acid sequences were aligned using ClustalW (default parameters) (<xref ref-type="bibr" rid="B57">Thompson et al., 1994</xref>) and HMMs built using HMMER3 (<xref ref-type="bibr" rid="B15">Eddy, 2011</xref>). The PUL model was tested against the genome from which it was inferred (majority of the PULs were inferred using the data on <italic>B. thetaiotaomicron</italic> DSM 2079), to obtain a positive control that the complete gene cluster was found. Because of the limited available RNA data, only seven PULs were built, for arabinoxylan, inulin, heparin, laminarin, levan, mucus, and starch (<xref ref-type="table" rid="T1">Table 1</xref> and <xref ref-type="supplementary-material" rid="DS2">Supplementary Table S5</xref> for genes included in each PUL).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>FiberGrowth pipeline strategy and proof of concept. The different steps of the pipeline are shown, highlighting the iterations between computing and integration of microbiology data. The PUL models, once designed, can then be used to process genomes within minutes. See Prediction of Polysaccharide Utilization Loci Using FiberGrowth in &#x201C;Materials and Methods&#x201D; section for additional pipeline specifications.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmicb-12-632567-g001.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Performance of the FiberGrowth tool: the growth predictions for 28 genomes growing on seven fibers were compared to experimental data.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Fiber</td>
<td valign="top" align="center">True positive rate (TPR)</td>
<td valign="top" align="center">False positive rate (FPR)</td>
<td valign="top" align="center">Precision</td>
<td valign="top" align="center">Recall</td>
<td valign="top" align="center">Total_data</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Arabinoxylan</td>
<td valign="top" align="center">0.89</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">1.00</td>
<td valign="top" align="center">0.89</td>
<td valign="top" align="center">28</td>
</tr>
<tr>
<td valign="top" align="left">Heparin</td>
<td valign="top" align="center">0.80</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.67</td>
<td valign="top" align="center">0.8</td>
<td valign="top" align="center">28</td>
</tr>
<tr>
<td valign="top" align="left">Inulin</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">0.33</td>
<td valign="top" align="center">0.91</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">28</td>
</tr>
<tr>
<td valign="top" align="left">Laminarin</td>
<td valign="top" align="center">0.38</td>
<td valign="top" align="center">0.17</td>
<td valign="top" align="center">0.75</td>
<td valign="top" align="center">0.38</td>
<td valign="top" align="center">28</td>
</tr>
<tr>
<td valign="top" align="left">Levan</td>
<td valign="top" align="center">0.30</td>
<td valign="top" align="center">0.06</td>
<td valign="top" align="center">0.75</td>
<td valign="top" align="center">0.3</td>
<td valign="top" align="center">28</td>
</tr>
<tr>
<td valign="top" align="left">Mucin</td>
<td valign="top" align="center">0.13</td>
<td valign="top" align="center">0.38</td>
<td valign="top" align="center">0.29</td>
<td valign="top" align="center">0.13</td>
<td valign="top" align="center">28</td>
</tr>
<tr>
<td valign="top" align="left">Starch</td>
<td valign="top" align="center">0.73</td>
<td valign="top" align="center">0.41</td>
<td valign="top" align="center">0.53</td>
<td valign="top" align="center">0.73</td>
<td valign="top" align="center">28</td>
</tr>
</tbody>
</table></table-wrap>
</sec>
<sec id="S2.SS5">
<title>Prediction of Polysaccharide Utilization Loci Using FiberGrowth</title>
<p>The FiberGrowth tool automatically predicts growth using fiber-specific PULs on a given bacterial genome. As input, it either takes a genome in fasta format or a gff file with gene locations together with their amino acid sequences in fasta format. If only a genome is provided, gene prediction will be performed using prodigal (<xref ref-type="bibr" rid="B25">Hyatt et al., 2010</xref>). In the next step, members of each fiber-specific reference PUL are identified using hmmscan of the HMMER package (<xref ref-type="bibr" rid="B15">Eddy, 2011</xref>). Based on the location and function, spatially clustered genes of carbohydrate active enzymes are determined by performing single linkage hierarchical clustering on the gene position using Euclidean distance. PUL candidates are retrieved by using a 5-kb threshold on the gene distance tree, allowing unannotated genes to be part of a candidate PUL. In the last step, candidates having all required core genes are reported (<xref ref-type="supplementary-material" rid="DS1">Supplementary Figure S0</xref>). Running the FiberGrowth tool on one bacterial genome takes about 1 min on a 2.3-GHz Intel Core i9 using one core. FiberGrowth is implemented in R (<xref ref-type="bibr" rid="B40">R Core Team, 2018</xref>) and makes use of the packages data.table (<xref ref-type="bibr" rid="B14">Dowle and Srinivasan, 2019</xref>), docopt (<xref ref-type="bibr" rid="B11">de Jonge, 2020</xref>), DT (<xref ref-type="bibr" rid="B64">Xie et al., 2020</xref>), gggenes (<xref ref-type="bibr" rid="B62">Wilkins, 2019</xref>), ggplot2 (<xref ref-type="bibr" rid="B61">Wickham, 2016</xref>, p. 2), knitr (<xref ref-type="bibr" rid="B63">Xie, 2015</xref>), magrittr (<xref ref-type="bibr" rid="B4">Bache and Wickham, 2014</xref>), rhmmer (<xref ref-type="bibr" rid="B3">Arendsee, 2017</xref>), rmarkdown (<xref ref-type="bibr" rid="B1">Allaire et al., 2020</xref>), and vroom (<xref ref-type="bibr" rid="B23">Hester and Wickham, 2020</xref>). The FiberGrowth code and PUL models are available on Github at <ext-link ext-link-type="uri" xlink:href="https://github.com/wholebiome/FiberGrowth">https://github.com/wholebiome/FiberGrowth</ext-link>.</p>
</sec>
<sec id="S2.SS6">
<title>Validation of FiberGrowth Pipeline With External Bacterial Growth Datasets</title>
<p>We compared the predictions of FiberGrowth to new external experimental measures of 28 bacterial strains&#x2019; abilities to degrade a wide variety of dietary and host-derived polysaccharides performed by Eric Martens (unpublished data, but kindly shared to benchmark FiberGrowth performance) and from a previous work (<xref ref-type="bibr" rid="B12">Desai et al., 2016</xref>; <xref ref-type="supplementary-material" rid="DS2">Supplementary Table S6</xref>). Briefly, these authors documented the growth of 534 strains including 28 types of strains, for which the genome sequence is available, on several polysaccharides and glycans as sole carbon sources (<italic>n</italic> = 2 replicate cultures per glycan substrate). Strains were grown on (i) a glucose-rich growth medium (PYG), (ii) a carbon-free minimum medium (PY), and (iii) a minimum medium with a polysaccharide as the only carbon source (PY + polysaccharide). The growth (OD<sub>600 nm</sub>) was recorded every 10 min when cultures were grown on PY + polysaccharide medium vs. PY medium only. The custom carbohydrate array was formulated according to <xref ref-type="bibr" rid="B34">Martens et al. (2011)</xref>. We transformed the normalized growth results into a binary table filled with 0&#x2019;s (no growth) and 1&#x2019;s (growth) for each combination of strain and substrate (<xref ref-type="supplementary-material" rid="DS2">Supplementary Table S6</xref>). If the normalized growth was above 0.01, the strain was considered able to metabolize the substrate. Otherwise, the strain was considered not able to degrade the substrate. Then, the performance of the pipeline is used considering the predicted growth and the experimental ones using indicators such as the true positive rate (TPR&#x2014;also called sensitivity) and the false positive rate (FPR), such as</p>
<disp-formula id="S2.Ex1"><mml:math id="M1" display="block">
<mml:mrow>
<mml:mrow>
<mml:mtext>TPR</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>r</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>f</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mtable rowspacing="0pt">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>r</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>f</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>r</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>f</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S2.Ex2"><mml:math id="M2" display="block">
<mml:mrow>
<mml:mrow>
<mml:mtext>FPR</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>r</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>f</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mtable rowspacing="0pt">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>r</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>f</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>r</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>f</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo mathvariant="italic" separator="true">&#x2003;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>The TPR and the FPR illustrate different properties of a predictor. For example if the goal is to find at least one strain growing on a substrate, a low TPR is a good characteristic. In other words, a predictor with a low TPR labeling a strain as growing ensures a good confidence on this prediction. But, if the goal is rather to not forget any strain that has a potential to grow on a strain, then a predictor with a high TPR is preferred.</p>
</sec>
</sec>
<sec sec-type="results" id="S3">
<title>Results</title>
<sec id="S3.SS1">
<title>CAZy and dbCAN Annotation Comparison</title>
<p>Since two possible strategies exist to annotate glycosyl hydrolases, the automatic open-source pipeline dbCAN and a manually curated CAZyme database, we assessed the annotation differences between them. Fifty-four different Bacteroidetes genomes were analyzed, on which the annotations were performed while screening for 87 different GH families. The majority of the predictions regarding GH gene counts were identical and the diversity of the GH was evidenced by the large number of GH detected per genome, ranging from 100 to 400 (<xref ref-type="fig" rid="F2">Figure 2A</xref>). However, the two annotation tools showed discrepancies. On a total of 2,889 genes detected on the 54 different genomes, 107 are only detected by CAZy and 49 only by dbCAN (<xref ref-type="fig" rid="F2">Figure 2B</xref>). Beyond the gene count differences, it is noteworthy that, under our settings, some GH were not detected by one of the tools, and it depends on the GH family. For example, the CAZy detected GH24 and GH142 while dbCAN did not. Conversely, GH99 was annotated by dbCAN whereas CAZy was not, for this dataset. A detailed visualization is available on <xref ref-type="supplementary-material" rid="DS1">Supplementary Figure S3</xref>. The details of the GH families found by only one of the methods can be found in <xref ref-type="supplementary-material" rid="DS2">Supplementary Table S3</xref>. Using only the glycosyl hydrolase count to predict growth did not lead to meaningful results, with too many false positives (see <xref ref-type="supplementary-material" rid="DS1">Supplementary Figures S3</xref>, <xref ref-type="supplementary-material" rid="DS1">S4</xref>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Number of common and different GH counts obtained by CAZy and dbCAN annotation methods. Annotations were performed on 54 different Bacteroidetes genomes on which the annotation was performed while screening for 87 different GH families. <bold>(A)</bold> A similar plot to <xref ref-type="bibr" rid="B16">El Kaoutari et al. (2013)</xref> highlighting a trend between the number of different GH family diversity and the GH total gene counts per genome. <bold>(B)</bold> A Venn diagram displaying, among all the GH predicted by dbCAN or CAZy, the number of common ones and different ones. Here, 2,733 genes were identically annotated by the two tools, while the manually curated CAZy brought 107 different annotations and dbCAN 49 others. Note that CAZy returned more GH genes than dbCAN.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmicb-12-632567-g002.tif"/>
</fig>
</sec>
<sec id="S3.SS2">
<title>FiberGrowth Prediction: Proof of Concept and Performance</title>
<p>We designed seven fiber-specific PUL models that are not specific to a certain genome and thus enable growth prediction on newly sequenced strains. The number of genes taken into account for a PUL model varies between 7 and 13, depending on the fiber (<xref ref-type="fig" rid="F3">Figure 3</xref> and <xref ref-type="supplementary-material" rid="DS2">Supplementary Table S5</xref>).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Polysaccharide utilization loci (PUL) gene models and fiber structures. Gene annotations and organization of each PUL model are shown on the left. Chemical structures of the associated fibers are drawn on the right.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmicb-12-632567-g003.tif"/>
</fig>
<p>To determine the predictive performance of our method, we compared the predictions with the experimental data for 28 strains, grown on PYG medium with or without fiber (<xref ref-type="fig" rid="F4">Figure 4</xref>). One striking characteristic of the experimental growth data was that, besides inulin, very few strains grew on certain fibers, leading to table results with many 0&#x2019;s. Furthermore, based on our preliminary 0.1 OD<sub>600 nm</sub> threshold, for <italic>Bacteroides intestinalis</italic>, no growth at all was observed in the experimental data, and <italic>Bacteroides vulgatus</italic>, <italic>Bacteroides caccae</italic>, <italic>Bacteroides massiliensis</italic>, and <italic>Bacteroides fragilis</italic> only grew on one substrate. To better fit the experimental results, a 0.01 OD<sub>600</sub><sub>&#x2013;</sub><sub><italic>nm</italic></sub> threshold was used. Aware of these limits originating from the experimental data, we calculated FiberGrowth performance for each PUL model (<xref ref-type="fig" rid="F4">Figure 4</xref> and <xref ref-type="table" rid="T1">Table 1</xref>).</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Comparison of FiberGrowth PUL-based predictions with experimental growth data for 28 strains from the <italic>Bacteroidales</italic> family. For each strain, the prediction and experimental result of growth (green) or absence of growth (black) on different growth media are shown. Dendrogram is calculated on experimental growth results.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmicb-12-632567-g004.tif"/>
</fig>
<p>The best performance was measured for arabinoxylan with a TPR of 0.89 and FPR of 0. Only nine strains grew on arabinoxylan and only one false negative was detected (for <italic>Bacteroides cellulosilyticus</italic>). Our model, designed based on <xref ref-type="bibr" rid="B32">Martens et al. (2008)</xref>, takes into account a total 10 genes from the operon, including the regulator of a two-component system (<xref ref-type="fig" rid="F3">Figure 3</xref>). Here, the information is taken from <xref ref-type="fig" rid="F2">Figure 2</xref> (expression data for genes bacova_03417, bacova_03421-36, bacova_03437-40, and bacova_03448-50) and <xref ref-type="supplementary-material" rid="DS2">Supplementary Table S2</xref> (gives more precisely the necessary three enzymes) with <italic>Bacteroides ovatus</italic> gene information. Note that in this reference, experimental data for <italic>B. thetaiotaomicron</italic> is reported with no growth and <italic>B. ovatus</italic> to grow due to the PUL expression, which is confirmed by the prediction (see <xref ref-type="fig" rid="F4">Figure 4</xref>).</p>
<p>For heparin (TPR: 0.8; FPR: 0.22), the core genes GH88, PL15, and PL13 and a sulfatase are part of the model. Adjacent to it, the antisense PL13 is also part of the model and mostly adapted to high sulfate regions. Both PL13 and PL12, not included in the model, produce small oligosaccharides that only the exo-processive lyase PL15 is able to degrade. GH88 belongs to a family of enzymes that cleave the glycosidic linkage between the &#x0394;4,5-unsaturated UA and GlcN/GlcNAc disaccharides.</p>
<p>The model for inulin showed the highest sensitivity (TPR: 0.95) with a FPR of 0.33. This was despite very opposite growth results compared to arabinoxylan, since 22/28 strains grew on inulin. The inulin PUL model (<xref ref-type="fig" rid="F3">Figure 3</xref>) was built from <xref ref-type="bibr" rid="B34">Martens et al. (2011)</xref>, using four protein-encoding genes: GH32, a fructokinase domain, a transporter, susC/susD homolog, and a susHT domain. The growth experiment RNA used was from <italic>B. thetaiotaomicron</italic>.</p>
<p>Levan, with a structure similar to inulin and a PUL composition that shared GH32 but included a specific levanase, led to not only a low sensitivity (TPR: 0.3) but also a low number of false positives (FPR: 0.06). The model accurately predicted growth for <italic>Parabacteroides</italic>, <italic>Odoribacteriaceae</italic>, and <italic>Dysgonomonadaceae</italic> but over-predicted growth for most <italic>Bacteroides</italic> genomes.</p>
<p>Performance of the mucin model was very low (TPR: 0.13; FPR: 0.38). The PUL model, inferred from gene expression data of strains growing on mucin, comprises 11 genes with GH18, GH16, and GH92. However, typical mucus-associated GH are missing (<xref ref-type="fig" rid="F3">Figure 3</xref>). From a phylogenetic standpoint, it is worth noting that the model predictions were worse for the non-<italic>Bacteroides</italic> genomes, being wrong for all <italic>Parabacteroides</italic> genomes and <italic>Dysgonomonas mossi</italic>.</p>
<p>Laminarin (&#x03B2;1-3 and &#x03B2;1-6&#x2212;glucan) found in brown algae is a glycan storage. The prediction performance was very low (TPR: 0.38; FPR: 0.17) based on the genes selected for the PUL. Similarly to mucin, it is noteworthy that the predictions were all wrong for the eight non-<italic>Bacteroides</italic> genomes, which all grew on this substrate.</p>
<p>Surprisingly, the starch model led to a high sensitivity (TPR: 0.75) with a FPR of 0.41 as trade-off. This PUL model was designed using a widely recognized set of genes, consistent among the literature: the amylase GH13 and GH17 and the susD-RagB, susF-SusE, susR, and TonB. The model was trained using Bt transcriptomics data. However, the visualization of the new PUL shown in <xref ref-type="fig" rid="F5">Figure 5</xref> highlighted the lack of synteny of the starch PUL between closely related genomes. The model wrongly predicted growth of <italic>Bacteroides dorei</italic>, <italic>B. intestinalis</italic>, <italic>B. massiliensis</italic>, <italic>Bacteroides plebeius</italic>, <italic>B. vulgatus</italic>, <italic>Parabacteroides goldsteinii</italic>, and <italic>Parabacteroides gordonii</italic>. On the opposite, it did not predict growth of five strains <italic>B. fragilis</italic>, <italic>Bacteroides fluxus</italic>, <italic>Bacteroides nordii</italic>, <italic>Bacteroides salyersae</italic>, and <italic>Dysgomonas mosslii.</italic></p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Amylase PUL visualization across five <italic>Bacteroides</italic> genomes. For each strain, only the most complete PUL is represented. The reference to build the starch model came from <italic>Bacteroides thetaiotaomicron</italic>. This representation highlights the lack of synteny of the starch PUL within the genomes of strains belonging to the same genus.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmicb-12-632567-g005.tif"/>
</fig>
<p>Some insights can be obtained by taking into consideration the taxonomic differences of the genomes analyzed. From the 28 genomes belonging to the <italic>Bacteroidales</italic> order, eight were from different families than the <italic>Bacteroidaceae</italic>. For the five <italic>Parabacteroides</italic> genomes analyzed, the model accurately predicted growth on inulin for all of them and also on levan, heparin, and starch for each of the two strains growing on it. The five strains grew on laminarin and mucin, but the model missed those five positive growth results. <italic>Parabacteroides johnsonii</italic> appeared separated on the clustering dendrogram because of its results on starch and growth on heparin but no growth on inulin. Interestingly, for the less studied <italic>Dysgonomonas</italic> genus (<italic>Dysgonomonadaceae</italic>), the strains <italic>Dysgonomonas gadei</italic> and <italic>Dysgonomonas mossii</italic> grew on three and four fibers and were accurately predicted, except again for mucin for <italic>G. gadei</italic> and for starch for <italic>D. mossii</italic>. Finally, <italic>Odoribacter splanchnicus</italic> (<italic>Odoribacteraceae</italic>) did not grow on any substrates except inulin and laminarin, and both were wrongly predicted by the model.</p>
</sec>
</sec>
<sec sec-type="discussion" id="S4">
<title>Discussion</title>
<p>In this manuscript, we designed a pipeline to predict the growth of Bacteroidetes species from the human gut on seven different polysaccharides using a combination of <italic>in silico</italic> modeling and validation with microbiology data. To our knowledge, our work provides the first integrated pipeline to use PUL to investigate the growth of human gut strains on specific polysaccharides.</p>
<p>PUL models have already been described by <xref ref-type="bibr" rid="B55">Terrapon et al. (2015)</xref>, but the approach is different, centered on specific genomes to provide a unique model based on previous biochemical characterizations of the enzymes and proteins involved. For taxonomic assignment and phylogenetic placement of existing GH or new GH, the SACCHARIS pipeline automatically annotates GH and provides accurate phylogenetic functional trees (<xref ref-type="bibr" rid="B27">Jones et al., 2018</xref>).</p>
<p>Our hypothesis was that, by using growth and transcriptomics data from the literature, new fiber-specific PUL models could be built and assessed on a distinct growth data set. Hence, the integrated pipeline, including a new automated annotation of PULs, could provide microbiologists, in a minute, with growth predictions from genome sequences. The pipeline showed very different accuracies, depending on the fiber, from excellent, up to 96%, to 38% for mucin, close to a random association.</p>
<p>Despite building a PUL model that followed a good agreement within the scientific community for the GH and associated genes involved in starch hydrolysis, and using available transcriptomics data to build the model, it performed poorly compared to the other models. <xref ref-type="bibr" rid="B2">Anderson and Salyers (1989)</xref> first reported in the late 80s that the breakdown of starch by <italic>B. thetaiotaomicron</italic> involved outer membrane bound-attached starch-binding sites and periplasmic starch-degrading enzymes, rather than only extracellular enzymes. Since then, starch catabolism has been largely characterized, and <xref ref-type="bibr" rid="B19">Foley et al. (2016)</xref> described the Sus operon as the model system for starch uptake in Bacteroidetes. The complexity and substrates&#x2019; diversity of starch-related polysaccharides hardly fit the CAZyme database. GH13 displays a wide phylogenetic diversity (as described by <xref ref-type="bibr" rid="B47">Stam et al., 2007</xref>) that is now classified under GH13 subfamilies in CAZyme. The visualization of the genes&#x2019; organization of our PUL models for few <italic>Bacteroides</italic> genomes is consistent with the large diversity and lack of synteny between close genomes. In the <italic>in vitro</italic> experiments we considered, starch was from potato starch (Eric Martens&#x2019; data), and the results would be different from a different origin or amylose/amylopectin ratio. Furthermore, starch can be found under distinct biochemical structures, in <italic>in vitro</italic> experiments (RS2 or RS3) and in food: not only starch as found in fruits but also the different resistant starch structures, formed after cooking and cooling down starchy food, or chemically processed resistant starch (RS4). In humans and in pigs, the microbial community composition was found to be linked to the starch structure, emphasizing that variability can be explored and understood only through the use of starches with highly characterized structures (<xref ref-type="bibr" rid="B60">Warren et al., 2018</xref>). It is then possible to speculate that other uncharacterized operons for starch breakdown exist in Bacteroidetes genomes, and, as a matter of fact, we detected incomplete PUL, where annotations are missing. Furthermore, starch is a carbohydrate storage in plants, hence, relative to the global evolution, has not been consumed as cooked by humans until recently. Whether or not this large diversity of amylase systems reflects recent evolution remains to be determined. It is worth noting that in other bacterial phyla, genomic systems described for the breakdown of starch differ from the common set of genes typically reported and used in our model. It has been recently demonstrated that in <italic>Ruminococcus bromii</italic> L2-63 a cell surface amylosome and sporulation capacity exist for starch breakdown in strains from the human and rumen microbiota (<xref ref-type="bibr" rid="B36">Mukhopadhya et al., 2017</xref>).</p>
<p>Laminarin is also a storage glucan and the growth prediction was low. The performance was similar to the one for starch, despite a very different situation: our PUL model was inadequate since, based on the methodology and annotation, using GH3 and GH16, we might not have found an appropriate marker GH. However, a seven-gene PUL has been described for <italic>Bacteroides uniformis</italic>, which includes CBM6/GH3, GH158, and GH16.</p>
<p><xref ref-type="bibr" rid="B13">Devill&#x00E9; et al. (2007)</xref> reported that laminarin modulated the microbiome and increased propionate and butyrate in fermenters, pointing toward an effect on not only Bacteroidetes but also Firmicutes. Furthermore, it modulated mucus composition in rats&#x2019; gastrointestinal tract (<xref ref-type="bibr" rid="B13">Devill&#x00E9; et al., 2007</xref>). It was recently shown by another research group (<xref ref-type="bibr" rid="B49">Strain et al., 2019</xref>) that an addition of laminarin into digesters increased <italic>Lachnospiraceae</italic>, compared to fructooligosaccharide (FOS) and cellulose. The mucin model led to the lowest prediction performance. Interestingly seven out of the eight non-<italic>Bacteroides</italic> genomes led to a false prediction. It was built using PULDB and included 11 genes, but several key GH involved in mucin breakdown were not accounted for. The PUL model was based on transcriptomics data, and because of the complexity of mucin, the GH genes over-expressed at this time point were limited. Another issue is cDNA annotation. For instance, one of the main causes of misannotation of the <italic>fuc</italic> genes is their similarity to the genes for rhamnose utilization. Both the FucK and RhaB proteins belong to the FGGY family of carbohydrate kinases (Pfam: PF02782).</p>
<p>Mucin used in experiments is very frequently pig gastric mucin type III. However, mucin composition depends upon the individual and the GI tract segment, and its complexity has only recently been acknowledged. Mucin is at a crossroad between dietary substrates and human secretion. Indeed, using droplet microfluidics, we recently demonstrated and characterized a new GH enzyme, active on human gangliosides, with similar structure to human mucin and milk oligosides and overrepresented in IBD patients&#x2019; metagenomes (<xref ref-type="bibr" rid="B53">Tauzin et al., 2020</xref>). The mucin composition and degradation pathways still remain to be elucidated.</p>
<p>Interestingly, the link between mucin and heparin has been shown <italic>in vivo</italic>: the expression of heparin PUL in mice colonized with <italic>B. thetaiotaomicron</italic> (<italic>B. theta</italic>) could only be observed in bacteria occupying the mucosal layer of the gastrointestinal tract, suggesting that <italic>in vivo</italic> mucus could be a source of heparin <italic>in vivo</italic> (<xref ref-type="bibr" rid="B31">Li et al., 2015</xref>). Furthermore, GH88 included in our heparin PUL model plays a key role in bacteria&#x2013;mucus interaction: when mice were co-colonized with six other Bacteroidetes strains in addition to the <italic>B. theta</italic> mutant, the <italic>B. theta</italic> GH88&#x2013;mutant was much lower in abundance than in mono-colonized mice, indicating that the ability to degrade heparin is under increased selection pressure for <italic>B. theta</italic> in the presence of other Bacteroidetes. Heparin is an interesting glycan: it is an anticoagulant drug, with a structure of the glycosaminoglycan family of carbohydrates, bearing similarities with mucin, since the most common disaccharide unit is composed of a 2-O-sulfated iduronic acid and 6-O-sulfated, N-sulfated glucosamine. As a matter of fact, heparins from most commercial preparations originate from beef lung or porcine intestinal mucosa. Our model led to a good growth prediction, and interestingly, <italic>a posteriori</italic>, we had obtained a PUL structure very similar to the one built by <xref ref-type="bibr" rid="B26">Joglekar et al. (2018)</xref>. The authors documented the complex synteny for heparin PULs within the <italic>Bacteroides</italic> genus: for example, in <italic>Bacteroides eggerthii</italic> and <italic>Bacteroides galinarium</italic> genomes, another heparin PUL exists, with PL15 but not PL13. The prediction from our model could be expected to be low for strains that do not encode for a PL13 or PL15, such as <italic>B. eggerthii</italic> DSM20697, <italic>B. galinarium</italic> DSM 18171, <italic>B. stercoris</italic> ATCC 43183, and B. YIT 12058. Because the human epithelium contains high amounts of heparin sulfate, the biologically relevant glycan PUL is heparin sulfate, hence with PL13 as a priority gene to target. We did not include GH95, involved in mucin degradation, because its &#x03B1;-L-fucosidase activity is only found in very few heparin PULs, in some <italic>B. thetaiotaomicron</italic>, <italic>B. ovatus</italic>, and <italic>B. finegoldii</italic> genomes.</p>
<p>Growth prediction for inulin shows good performance. FOS are short-chain oligosaccharides that are generated by hydrolysis of the polysaccharide inulin, which is composed of 2&#x2013;60 fructose monomers. We presented the results of inulin and levan next to each other because they represent two distinct glycosidic linkages (2&#x2013;6 in levan and 2&#x2013;1 in inulin) that are present in the fructan homopolymers and that are available to the gut microbiota. Inulin is found in different nutrients such as wheat, onion, garlic, and banana and is the most common used fiber in prebiotics that, when used in combination with other probiotics, is able to promote the growth of specific beneficial gut bacteria such as bifidobacteria (<xref ref-type="bibr" rid="B20">Gibson et al., 1995</xref>). GH91, an inulin lyase, has been described as involved in the hydrolysis of inulin. Interestingly, while GH32 appears to be always necessary, our results showed that GH91 is not. A close examination of GH91 indicates that the enzyme activity releases difructofuranosyl 1,2 23 diamyhide that seems kept within the cytoplasmic compartment, consistent with the absence of a signal peptide on the gene sequence (Henrissat, personal communication). The absence of release of fructose and the location of the enzymatic activity seem to indicate that GH91 is not mandatory and that its role might not be on catabolism but potentially for intra-cytoplasm metabolism or storage.</p>
<p>Recent work by the group of <xref ref-type="bibr" rid="B26">Joglekar et al. (2018)</xref> on Bt strains VPI-5482 (same strain as used in our study) and Bt-8736 contrasted levan and inulin or fructan operon with GH phylogenetic trees. They demonstrated that related genetic loci can encode diversified biochemical pathways in strains from the same <italic>B. thetaiotaomicron</italic> species. The presence of GH32, SGBP, and SusD and SusC-like domain, corresponding to outer membrane binding proteins, explained the capacity to grow. Accordingly, in our growth prediction model, <italic>B. finegoldii</italic>, in the same phylogenetic cluster as Bt VPI-5482 for the SusC and SusD, does not grow on inulin. Furthermore, it has been demonstrated that the presence of the divergent susC/susD gene alone enabled the hybrid Bt(8736-2) strain to outcompete the wild-type strain <italic>in vivo</italic> in mice fed an inulin diet. This pathway does match our model, which includes enzymes and carbohydrate-binding and import proteins with distinct substrate specificities, which could not have been predicted previously based on sequence data alone.</p>
<p>The discrepancy between databases regarding GH annotation can be highlighted in the case of levan PUL, looking at the domain level. Our results showed that taking into account three GH32 distinct gene copies and a levanase within a PUL model was sufficient to predict growth on levan, but the model over-predicted growth in <italic>Bacteroides</italic> genomes. However, the PUL was built using transcriptomics data, which raises questions on cross-annotation of cDNA and genomes. Indeed, Pfam annotation from the transcriptome provided an N-terminal domain for the GH32 used in this model, instead of a catalytic domain. One can predict that the catalytic domain from Pfam would provide a better performance. Recent work brought some insight onto the PUL related to inulin or levan metabolism. A closer look at the inulin/levan- or fructan-associated operon recently described by <xref ref-type="bibr" rid="B26">Joglekar et al. (2018)</xref> confirmed that the specificity for the 2-6 linkage found in levan is from a GH32 cell surface endo-levanase and an ortholog of BT1761, a surface glycan-binding protein. The presence of the cell surface levanase of <italic>B. thetaiotaomicron</italic> VPI-5482 was critical for the ability of this strain to use the levan. The authors very elegantly demonstrated how structural differences present in dietary polysaccharides such as fructans can result in distinct molecular mechanisms for utilization of these polymers.</p>
<p>Despite obtaining low performance for some fibers, the overall goal of this analysis is a metabolic functional assessment of the different strains. We also tested whether an approach involving the use of GH genes only would be sufficient to obtain a growth prediction. Several teams applied the following method: gathering enzymes into functional groups, for example GH23, GH25, and GH73 being dedicated to peptidoglycan breakdown and GH13 is dedicated for starch breakdown. A first drawback in implementing this method into a pipeline is that the attribution of a function or the link between GH and a substrate or fiber varies substantially across the literature. For instance, we gathered such associations from four different publications (<xref ref-type="bibr" rid="B16">El Kaoutari et al., 2013</xref>; <xref ref-type="bibr" rid="B37">Park et al., 2018</xref>; <xref ref-type="bibr" rid="B67">Zhao et al., 2018</xref>; <xref ref-type="bibr" rid="B28">Kovatcheva-Datchary et al., 2019</xref>) and found that discrepancies can be observed. For example, GH95 being either associated with mucin degradation or with cellulose degradation depending on the publication. Supporting our PUL approach, using GH only led to growth predictions that were not meaningful.</p>
<p>Several parameters play a part in the model prediction performance. Both annotation quality and discrepancy between datasets and gene and enzyme terminology differences have been a hurdle in designing new PUL models. As shown by the comparison of GH annotation tools, an accurate annotation of CAZymes is key to improve prediction performance. Most automated annotation pipelines for transcriptomic data do not accurately annotate for GH. Then, the Pfam domain used for functional domain characterization may easily provide the right annotation of only one domain of the GH, as we observed for several PUL where we did not capture the GH catalytic domain. Another aspect is that our PUL models do not have a size limit, as long as the distance between neighboring genes annotated as the same PUL family is less than 5 kb. This is consistent with large syntenies observed in PULDB. For example, for our PUL amylase model, <italic>B. uniformis</italic> PUL is slightly over 6 kb while <italic>B. salyersae</italic> PUL reached 10 kb.</p>
<p>False prediction may originate from several reasons including <italic>in vitro</italic> conditions or how the model processes growth data. The PUL models were obviously sensitive to the growth/OD threshold. For arabinoxylan, the three strains predicted &#x201C;no growth&#x201D; by the model and grew were <italic>B. massiliensis</italic>, <italic>B. oleiciplinus</italic>, and <italic>B. merdae</italic>. Interestingly, in the four cases where the model predicted &#x201C;no growth&#x201D; and growth was counted positive based on the model threshold, the OD<sub>600</sub><sub>&#x2013;</sub><sub><italic>nm</italic></sub> recorded were below 0.1 (<italic>B. intestinalis</italic>, 0.05; <italic>B. johnsonii</italic> and <italic>B. clarius</italic>, 0.06; and <italic>B. finegoldii</italic>, 0.14). A similar phenomenon happened for heparin, and a reset growth threshold improved performance. The growth threshold could be adjusted accordingly for the different fibers, when more experimental data are available. However, the experimental growth medium itself influences growth yield in a strain-to-strain manner that is difficult to predict. The results would need to be reassessed or taken with caution if strains were grown in a culture medium that is different from the reference. The experimental data we used to measure the pipeline performance were obtained in PY medium. However, the transcriptomics data used to build the PULs models were obtained on CM medium, which could account for differential genes being over-expressed, at the time of sampling, compared to growth validation data on PY medium. Indeed, the growth medium may change which genes are prioritized in the model. It remains to be determined whether this bias would be more impactful on complex PUL models with large numbers of genes or complex operon structure where genes, not co-transcribed, might not be captured in a single transcriptome time point. The extreme modularity of some PULs indicates that the conclusions about growth results from one species to another have to be taken with caution since, like others, we detected strain-to-strain variations within the same species for inulin and levan.</p>
<p>The complexity of fiber structures makes the links between CAZyme genes and functional interpretation very uncertain. In order to determine the strain capacity to break down a specific fiber, we linked CAZy genes to a given fiber or prebiotic. However, this representation has limitations because (i) the complexity and substrate diversity hardly fit the CAZyme databases and several families are displaying a wide phylogenetic diversity, such as amylase GH13 (as described by <xref ref-type="bibr" rid="B47">Stam et al., 2007</xref>) and (ii) certain families can be dedicated to/involved in different substrates (GH32 targeting inulin and levan as showed by <xref ref-type="bibr" rid="B45">Sheridan et al., 2015</xref>). Different authors reported different gene and substrate associations. As the gene content is then associated with functional capacities, it can have far-reaching consequences on the conclusions and on further applications or recommendations.</p>
<p>Which experimentations could help refine the pipeline? A major challenge toward quantifying the degree of redundancy of CAZymes will be to obtain more <italic>in vitro</italic> information on growth and CAZyme expression patterns. Up to now, there are still very few measurements of purified enzymes to pinpoint the specific substrate specificity of the CAZymes involved in the hydrolysis and fermentation of fiber or of the human gut mucus. We also modeled Firmicutes PULs (<xref ref-type="supplementary-material" rid="DS1">Supplementary Figure S5</xref>), but despite encouraging results, the scarcity of transcriptomics data available led us to focus on Bacteroidetes.</p>
<p>One way to circumvent these drawbacks or limits is to obtain massive large datasets performed in the same conditions, with strains for which the genomes are available. To improve FiberGrowth pipeline performance, a larger number of strains should be grown in the exact same conditions. This could be done using robots under anaerobic chamber, similar to the work by <xref ref-type="bibr" rid="B69">Zou et al. (2019)</xref>, leading to &#x003E; 6,000 isolated strains and 1,520 new genomes sequenced. Note that if more experimental data become available, it would also be possible to adopt a machine-learning approach to infer potential PUL rather than having a deterministic approach such as FiberGrowth.</p>
<p>Another way to increase our capacity to obtain large growth dataset arises from the recent advances in droplet microfluidic culture, which can increase the capacity and lower the cost of purified substrates, to screen for growth on fiber. <xref ref-type="bibr" rid="B59">Villa et al. (2020)</xref> recently demonstrated the potential of microfluidic droplet assays for comparing the growth rates and functions of individual bacterial strains isolated from gut microbial communities. This would be very useful in order to improve FiberGrowth pipeline on a large number of strains grown in the exact same conditions.</p>
<p>Interestingly, the authors also investigated how screening for GH with microfluidics could lead toward the differentiation of subjects, based on the &#x201C;fiber profile&#x201D; being metabolized by their microbiota. A limit of the characterization of a fiber-metabolizing potential per microbiota remains the complexity and modularity of the GH and PUL operons. Predicting the overall microbiome response to a specific fiber requires to account for the variability of GH protein structures. However, an accurate annotation of PUL seems difficult in metagenomics datasets. We already showed, using 40-kb <italic>E. coli</italic> fosmid libraries built from fecal samples or distal ileum mucosa, that glycosyl hydrolases were modular and subject to recent horizontal gene transfer, not just between phylogenetically close genomes but above the genus level (<xref ref-type="bibr" rid="B52">Tasse et al., 2010</xref>; <xref ref-type="bibr" rid="B53">Tauzin et al., 2020</xref>). A challenge remains to accurately annotate glycosyl hydrolase loci in metagenomic data, only feasible with long assembly with enough coverage.</p>
<p>Predicting individual response to prebiotic or fiber intake, we previously had demonstrated that the diversity of fiber-rich food items correlated with microbiome 16S rDNA diversity in young adults (<xref ref-type="bibr" rid="B51">Tap et al., 2015</xref>). Others have demonstrated that a 5-day regimen with a plant-based diet compared to animal product-based diet (<xref ref-type="bibr" rid="B10">David et al., 2013</xref>) increased microbial diversity. Mining existing large datasets such as iHMP2 (<xref ref-type="bibr" rid="B38">Proctor et al., 2019</xref>) can be of importance to find correlations between the breakdown of polymers from human origin, which have been linked to inflammatory bowel diseases. Those include not only human mucus and related compounds, such as heparin, but also human gangliosides. As a growing number of metagenomics datasets are being generated worldwide, correlations can be inferred between dietary intake and microbiome composition. However, most food questionnaires are not designed to provide indication on specific fiber. Interestingly, new AI-assisted tools for computing food images on smartphones may provide a more accurate picture of the ingested fiber content. There is still a lack of knowledge between a food element/items and the set of glycosyl hydrolases necessary for its breakdown and the subsequent production of beneficial metabolites through fermentation.</p>
</sec>
<sec sec-type="conclusion" id="S5">
<title>Conclusion</title>
<p>The diversity of the CAZy gene families involved in the breakdown of glycans and the extreme modularity of the operons that encode them, sometimes at the strain level, offer a challenge to building PUL models that are generic for a given fiber. However, we demonstrated, for the first time, that a pipeline, combining automated genome screening and annotation of PULs, allowed us to build growth prediction models and measure their accuracies on Bacteroidetes available growth data. Some PUL models require optimization, while for levan, heparin, inulin, and arabinoxylan, FiberGrowth pipeline can compute growth estimations with TPR &#x003E; 0.8 and FPR &#x003C; 0.33 in 1 min on an unannotated genome.</p>
<p>This work also demonstrated that, despite advances in PUL bioinformatics screening and computing modeling, the lack of biochemical characterization of glycosyl hydrolases and PUL systems remains a main issue.</p>
</sec>
<sec sec-type="data-availability" id="S6">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="DS1">Supplementary Material</xref>, further inquiries can be directed to the corresponding author/s.</p>
</sec>
<sec id="S7">
<title>Author Contributions</title>
<p>BC and CS designed and built the pipeline, performed all computing, and participated in the manuscript writing. FP supervised computing and assisted in the manuscript editing. ML was involved in the pipeline design and supervised the manuscript writing. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>FP and CS work at Pendulum Therapeutics Inc. ML was member of the scientific advisory board for Pendulum Therapeutics. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="S8">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack>
<p>We wish to warmly thank Eric Martens for sharing his experimental dataset before publication. We are grateful to Bernard Henrissat sharing his expertise and annotation results from the manually curated CAZy database. We also thank Hakim Lakmini for his help drawing carbohydrate chemical structures.</p>
</ack>
<sec sec-type="supplementary-material" id="S9">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fmicb.2021.632567/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fmicb.2021.632567/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.zip" id="DS1" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_2.ZIP" id="DS2" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Allaire</surname> <given-names>J. J.</given-names></name> <name><surname>Xie</surname> <given-names>Y.</given-names></name> <name><surname>McPherson</surname> <given-names>J.</given-names></name> <name><surname>Luraschi</surname> <given-names>J.</given-names></name> <name><surname>Ushey</surname> <given-names>K.</given-names></name> <name><surname>Atkins</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2020</year>). <source><italic>Rmarkdown</italic><italic>: Dynamic Documents for R.</italic></source> <comment>Available online at:</comment> <ext-link ext-link-type="uri" xlink:href="https://github.com/rstudio/rmarkdown">https://github.com/rstudio/rmarkdown</ext-link> <comment>(accessed September 21, 2021)</comment>.</citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anderson</surname> <given-names>K.</given-names></name> <name><surname>Salyers</surname> <given-names>A.</given-names></name></person-group> (<year>1989</year>). <article-title>Biochemical evidence that starch breakdown by <italic>Bacteroides thetaiotamicron</italic> involves outer membrane starch-binding sites and periplasmic starch-degrading enzymes.</article-title> <source><italic>J. Bacteriol.</italic></source> <volume>171</volume> <fpage>3192</fpage>&#x2013;<lpage>3198</lpage>. <pub-id pub-id-type="doi">10.1128/jb.171.6.3192-3198.1989</pub-id> <pub-id pub-id-type="pmid">2722747</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arendsee</surname> <given-names>Z.</given-names></name></person-group> (<year>2017</year>). <source><italic>Rhmmer: Utilities Parsing &#x201C;HMMER&#x201D; Results.</italic></source> <comment>Available online at:</comment> <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=rhmmer">https://CRAN.R-project.org/package=rhmmer</ext-link> <comment>(accessed September 21, 2021)</comment>.</citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bache</surname> <given-names>S. M.</given-names></name> <name><surname>Wickham</surname> <given-names>H.</given-names></name></person-group> (<year>2014</year>). <source><italic>Magrittr: A Forward-Pipe Operator for R.</italic></source> <comment>Available online at:</comment> <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=magrittr">https://CRAN.R-project.org/package=magrittr</ext-link> <comment>(accessed September 21, 2021)</comment>.</citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>B&#x00E4;ckhed</surname> <given-names>F.</given-names></name> <name><surname>Ley</surname> <given-names>R.</given-names></name> <name><surname>Sonnenburg</surname> <given-names>J.</given-names></name> <name><surname>Peterson</surname> <given-names>D.</given-names></name> <name><surname>Gordon</surname> <given-names>J.</given-names></name></person-group> (<year>2005</year>). <article-title>Host-bacterial mutualism in the human intestine.</article-title> <source><italic>Science (New York, N.Y.)</italic></source> <volume>307</volume> <fpage>1915</fpage>&#x2013;<lpage>1920</lpage>. <pub-id pub-id-type="doi">10.1126/science.1104816</pub-id> <pub-id pub-id-type="pmid">15790844</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bjursell</surname> <given-names>M. K.</given-names></name> <name><surname>Martens</surname> <given-names>E. C.</given-names></name> <name><surname>Gordon</surname> <given-names>J.</given-names> <suffix>I</suffix></name></person-group> (<year>2006</year>). <article-title>Functional genomic and metabolic studies of the adaptations of a prominent adult human gut symbiont, <italic>Bacteroides thetaiotaomicron</italic>, to the suckling period.</article-title> <source><italic>J. Biol. Chem.</italic></source> <volume>281</volume> <fpage>36269</fpage>&#x2013;<lpage>36279</lpage>.</citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Buchfink</surname> <given-names>B.</given-names></name> <name><surname>Xie</surname> <given-names>C.</given-names></name> <name><surname>Huson</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>Fast and sensitive protein alignment using DIAMOND.</article-title> <source><italic>Nature Methods</italic></source> <volume>12</volume> <fpage>59</fpage>&#x2013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.15496/publikation-1176</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burkitt</surname> <given-names>D. P.</given-names></name> <name><surname>Walker</surname> <given-names>A. R. P.</given-names></name> <name><surname>Painter</surname> <given-names>N. S.</given-names></name></person-group> (<year>1972</year>). <article-title>Effect of dietary fibre on stools and transit-times, and its role in the causation of disease.</article-title> <source><italic>Lancet</italic></source> <volume>300</volume> <fpage>1408</fpage>&#x2013;<lpage>1411</lpage>. <pub-id pub-id-type="doi">10.1016/S0140-6736(72)92974-1</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Busk</surname> <given-names>P.</given-names></name> <name><surname>Pilgaard</surname> <given-names>B.</given-names></name> <name><surname>Lezyk</surname> <given-names>M.</given-names></name> <name><surname>Meyer</surname> <given-names>A.</given-names></name> <name><surname>Lange</surname> <given-names>L.</given-names></name></person-group> (<year>2017</year>). <article-title>Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>18</volume>:<fpage>214</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-017-1625-9</pub-id> <pub-id pub-id-type="pmid">28403817</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cantarel</surname> <given-names>B.</given-names></name> <name><surname>Coutinho</surname> <given-names>P.</given-names></name> <name><surname>Rancurel</surname> <given-names>C.</given-names></name> <name><surname>Bernard</surname> <given-names>T.</given-names></name> <name><surname>Lombard</surname> <given-names>V.</given-names></name> <name><surname>Henrissat</surname> <given-names>B.</given-names></name></person-group> (<year>2008</year>). <article-title>The carbohydrate-active ENZYMES Database (CAZy): an expert resource for glycogenomics.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>37</volume> <fpage>D233</fpage>&#x2013;<lpage>D238</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkn663</pub-id> <pub-id pub-id-type="pmid">18838391</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>David</surname> <given-names>L.</given-names></name> <name><surname>Maurice</surname> <given-names>C.</given-names></name> <name><surname>Carmody</surname> <given-names>R.</given-names></name> <name><surname>Gootenberg</surname> <given-names>D.</given-names></name> <name><surname>Button</surname> <given-names>J.</given-names></name> <name><surname>Wolfe</surname> <given-names>B.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>Diet rapidly and reproducibly alters the gut microbiome.</article-title> <source><italic>Nature</italic></source> <volume>505</volume> <fpage>559</fpage>&#x2013;<lpage>563</lpage>. <pub-id pub-id-type="doi">10.1038/nature12820</pub-id> <pub-id pub-id-type="pmid">24336217</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Jonge</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <source><italic>Docopt: Command-Line Interface Specification Language.</italic></source> <comment>Available online at:</comment> <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=docopt">https://CRAN.R-project.org/package=docopt</ext-link> <comment>(accessed September 21, 2021)</comment>.</citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Desai</surname> <given-names>M.</given-names></name> <name><surname>Seekatz</surname> <given-names>A.</given-names></name> <name><surname>Koropatkin</surname> <given-names>N.</given-names></name> <name><surname>Kamada</surname> <given-names>N.</given-names></name> <name><surname>Hickey</surname> <given-names>C.</given-names></name> <name><surname>Wolter</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>A dietary fiber-deprived gut microbiota degrades the colonic mucus barrier and enhances pathogen susceptibility.</article-title> <source><italic>Cell</italic></source> <volume>167</volume> <fpage>1339</fpage>&#x2013;<lpage>1353.e21</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2016.10.043</pub-id> <pub-id pub-id-type="pmid">27863247</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Devill&#x00E9;</surname> <given-names>C.</given-names></name> <name><surname>Gharbi</surname> <given-names>M.</given-names></name> <name><surname>Dandrifosse</surname> <given-names>G.</given-names></name> <name><surname>Peulen</surname> <given-names>O.</given-names></name></person-group> (<year>2007</year>). <article-title>Study on the effects of laminarin, a polysaccharide from seaweed, on gut characteristics.</article-title> <source><italic>J. Sci. Food Agric.</italic></source> <volume>87</volume> <fpage>1717</fpage>&#x2013;<lpage>1725</lpage>. <pub-id pub-id-type="doi">10.1002/jsfa.2901</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dowle</surname> <given-names>M.</given-names></name> <name><surname>Srinivasan</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <source><italic>Data.Table: Extension of &#x2018;data.Frame&#x2018;.</italic></source> <comment>Available online at:</comment> <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=data.tabl">https://CRAN.R-project.org/package=data.tabl</ext-link> <comment>(accessed September 21, 2021)</comment>.</citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eddy</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Accelerated profile HMM searches.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>7</volume>:<fpage>e1002195</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1002195</pub-id> <pub-id pub-id-type="pmid">22039361</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>El Kaoutari</surname> <given-names>A.</given-names></name> <name><surname>Armougom</surname> <given-names>F.</given-names></name> <name><surname>Gordon</surname> <given-names>J.</given-names> <suffix>I</suffix></name> <name><surname>Raoult</surname> <given-names>D.</given-names></name> <name><surname>Henrissat</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>The abundance and variety of carbohydrate-active enzymes in the human gut microbiota.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>11</volume> <fpage>497</fpage>&#x2013;<lpage>504</lpage>. <pub-id pub-id-type="doi">10.1038/nrmicro3050</pub-id> <pub-id pub-id-type="pmid">23748339</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Faith</surname> <given-names>J. J.</given-names></name> <name><surname>Mcnulty</surname> <given-names>N.</given-names></name> <name><surname>Rey</surname> <given-names>F.</given-names></name> <name><surname>Gordon</surname> <given-names>J.</given-names> <suffix>I</suffix></name></person-group> (<year>2011</year>). <article-title>Predicting a human gut microbiota&#x2019;s response to diet in gnotobiotic mice.</article-title> <source><italic>Science (New York, N.Y.)</italic></source> <volume>333</volume> <fpage>101</fpage>&#x2013;<lpage>104</lpage>. <pub-id pub-id-type="doi">10.1126/science.1206025</pub-id> <pub-id pub-id-type="pmid">21596954</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Filippo</surname> <given-names>C.</given-names></name> <name><surname>Cavalieri</surname> <given-names>D.</given-names></name> <name><surname>Di Paola</surname> <given-names>M.</given-names></name> <name><surname>Ramazzotti</surname> <given-names>M.</given-names></name> <name><surname>Poullet</surname> <given-names>J.</given-names></name> <name><surname>Sebastien</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>Impact of diet in shaping gut microbiota revealed by a comparative study in children from europe and rual Africa.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>107</volume> <fpage>14691</fpage>&#x2013;<lpage>14696</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1005963107</pub-id> <pub-id pub-id-type="pmid">20679230</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Foley</surname> <given-names>M.</given-names></name> <name><surname>Cockburn</surname> <given-names>D.</given-names></name> <name><surname>Koropatkin</surname> <given-names>N.</given-names></name></person-group> (<year>2016</year>). <article-title>The sus operon: a model system for starch uptake by the human gut Bacteroidetes.</article-title> <source><italic>Cell. Mol. Life Sci.</italic></source> <volume>73</volume> <fpage>2603</fpage>&#x2013;<lpage>2617</lpage>. <pub-id pub-id-type="doi">10.1007/s00018-016-2242-x</pub-id> <pub-id pub-id-type="pmid">27137179</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gibson</surname> <given-names>G. R.</given-names></name> <name><surname>Beatty</surname> <given-names>E. R.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Cummings</surname> <given-names>J. H.</given-names></name></person-group> (<year>1995</year>). <article-title>Selective stimulation of bifidobacteria in the human colon by oligofructose and inulin.</article-title> <source><italic>Gastroenterology</italic></source> <volume>108</volume> <fpage>975</fpage>&#x2013;<lpage>982</lpage>. <pub-id pub-id-type="doi">10.1016/0016-5085(95)90192-2</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grondin</surname> <given-names>J. M.</given-names></name> <name><surname>Tamura</surname> <given-names>K.</given-names></name> <name><surname>D&#x00E9;jean</surname> <given-names>G.</given-names></name> <name><surname>Abbott</surname> <given-names>D. W.</given-names></name> <name><surname>Brumer</surname> <given-names>H.</given-names></name></person-group> (<year>2017</year>). <article-title>Polysaccharide utilization loci: fueling microbial communities. Edited by George O&#x2019;Toole.</article-title> <source><italic>J. Bacteriol.</italic></source> <volume>199</volume>:<fpage>e00860-16</fpage>. <pub-id pub-id-type="doi">10.1128/JB.00860-16</pub-id> <pub-id pub-id-type="pmid">28138099</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Henke</surname> <given-names>M. T.</given-names></name> <name><surname>Kenny</surname> <given-names>D. J.</given-names></name> <name><surname>Cassilly</surname> <given-names>C. D.</given-names></name> <name><surname>Vlamakis</surname> <given-names>H.</given-names></name> <name><surname>Xavier</surname> <given-names>R. J.</given-names></name> <name><surname>Clardy</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title><italic>Ruminococcus gnavus</italic>, a member of the human gut microbiome associated with Crohn&#x2019;s disease, produces an inflammatory polysaccharide.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>116</volume> <fpage>12672</fpage>&#x2013;<lpage>12677</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1904099116</pub-id> <pub-id pub-id-type="pmid">31182571</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hester</surname> <given-names>J.</given-names></name> <name><surname>Wickham</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <source><italic>Vroom: Read and Write Rectangular Text Data Quickly.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=vroom">https://CRAN.R-project.org/package=vroom</ext-link> <comment>(accessed September 21, 2021)</comment>.</citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hyatt</surname> <given-names>D.</given-names></name> <name><surname>Chen</surname> <given-names>G.-L.</given-names></name> <name><surname>LoCascio</surname> <given-names>P. F.</given-names></name> <name><surname>Land</surname> <given-names>M. L.</given-names></name> <name><surname>Larimer</surname> <given-names>F. W.</given-names></name> <name><surname>Hauser</surname> <given-names>L. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Prodigal: prokaryotic gene recognition and translation initiation site identification.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>11</volume>:<fpage>119</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-119</pub-id> <pub-id pub-id-type="pmid">20211023</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Joglekar</surname> <given-names>P.</given-names></name> <name><surname>Sonnenburg</surname> <given-names>E.</given-names></name> <name><surname>Higginbottom</surname> <given-names>S.</given-names></name> <name><surname>Earle</surname> <given-names>K.</given-names></name> <name><surname>Morland</surname> <given-names>C.</given-names></name> <name><surname>Shapiro-Ward</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Genetic variation of the SusC/SusD homologs from a polysaccharide utilization locus underlies divergent fructan specificities and functional adaptation in <italic>Bacteroides thetaiotaomicron</italic> Strains.</article-title> <source><italic>mSphere</italic></source> <volume>3</volume>:<fpage>e00185-18</fpage>. <pub-id pub-id-type="doi">10.1128/mSphereDirect.00185-18</pub-id> <pub-id pub-id-type="pmid">29794055</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>D. R.</given-names></name> <name><surname>Thomas</surname> <given-names>D.</given-names></name> <name><surname>Alger</surname> <given-names>N.</given-names></name> <name><surname>Ghavidel</surname> <given-names>A.</given-names></name> <name><surname>Inglis</surname> <given-names>G. D.</given-names></name> <name><surname>Abbott</surname> <given-names>D. W.</given-names></name></person-group> (<year>2018</year>). <article-title>SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets.</article-title> <source><italic>Biotechnol. Biofuels</italic></source> <volume>11</volume>:<fpage>27</fpage>. <pub-id pub-id-type="doi">10.1186/s13068-018-1027-x</pub-id> <pub-id pub-id-type="pmid">29441125</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kovatcheva-Datchary</surname> <given-names>P.</given-names></name> <name><surname>Shoaie</surname> <given-names>S.</given-names></name> <name><surname>Lee</surname> <given-names>S.</given-names></name> <name><surname>Wahlstr&#x00F6;m</surname> <given-names>A.</given-names></name> <name><surname>Nookaew</surname> <given-names>I.</given-names></name> <name><surname>Hallen</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Simplified intestinal microbiota to study microbe-diet-host interactions in a mouse model.</article-title> <source><italic>Cell Rep.</italic></source> <volume>26</volume> <fpage>3772</fpage>&#x2013;<lpage>3783.e6</lpage>. <pub-id pub-id-type="doi">10.1016/j.celrep.2019.02.090</pub-id> <pub-id pub-id-type="pmid">30917328</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lap&#x00E9;bie</surname> <given-names>P.</given-names></name> <name><surname>Lombard</surname> <given-names>V.</given-names></name> <name><surname>Drula</surname> <given-names>E.</given-names></name> <name><surname>Terrapon</surname> <given-names>N.</given-names></name> <name><surname>Henrissat</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). <article-title>Bacteroidetes use thousands of enzyme combinations to break down glycans.</article-title> <source><italic>Nat. Commun.</italic></source> <volume>10</volume>:<fpage>2043</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-019-10068-5</pub-id> <pub-id pub-id-type="pmid">31053724</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leshem</surname> <given-names>A.</given-names></name> <name><surname>Segal</surname> <given-names>E.</given-names></name> <name><surname>Elinav</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <article-title>The gut microbiome and individual-specific responses to diet.</article-title> <source><italic>mSystems</italic></source> <volume>5</volume>:<fpage>e00665-20</fpage>. <pub-id pub-id-type="doi">10.1128/mSystems.00665-20</pub-id> <pub-id pub-id-type="pmid">32994289</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Limenitakis</surname> <given-names>J. P.</given-names></name> <name><surname>Fuhrer</surname> <given-names>T.</given-names></name> <name><surname>Geuking</surname> <given-names>M. B.</given-names></name> <name><surname>Lawson</surname> <given-names>M. A.</given-names></name> <name><surname>Wyss</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>The outer mucus layer hosts a distinct intestinal microbial niche.</article-title> <source><italic>Nat. Commun.</italic></source> <volume>6</volume>:<fpage>8292</fpage>. <pub-id pub-id-type="doi">10.1038/ncomms9292</pub-id> <pub-id pub-id-type="pmid">26392213</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martens</surname> <given-names>E. C.</given-names></name> <name><surname>Chiang</surname> <given-names>H. C.</given-names></name> <name><surname>Gordon</surname> <given-names>J.</given-names> <suffix>I</suffix></name></person-group> (<year>2008</year>). <article-title>Mucosal glycan foraging enhances fitness and transmission of a saccharolytic human gut bacterial symbiont.</article-title> <source><italic>Cell Host Microbe</italic></source> <volume>4</volume> <fpage>447</fpage>&#x2013;<lpage>457</lpage>.</citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martens</surname> <given-names>E. C.</given-names></name> <name><surname>Koropatkin</surname> <given-names>N.</given-names></name> <name><surname>Smith</surname> <given-names>T.</given-names></name> <name><surname>Gordon</surname> <given-names>J.</given-names> <suffix>I</suffix></name></person-group> (<year>2009</year>). <article-title>Complex glycan catabolism by the human gut microbiota: the bacteroidetes sus-like paradigm.</article-title> <source><italic>J. Biol. Chem.</italic></source> <volume>284</volume> <fpage>24673</fpage>&#x2013;<lpage>24677</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.R109.022848</pub-id> <pub-id pub-id-type="pmid">19553672</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martens</surname> <given-names>E. C.</given-names></name> <name><surname>Lowe</surname> <given-names>E.</given-names></name> <name><surname>Chiang</surname> <given-names>H.</given-names></name> <name><surname>Pudlo</surname> <given-names>N. A.</given-names></name> <name><surname>Wu</surname> <given-names>M.</given-names></name> <name><surname>Mcnulty</surname> <given-names>N.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>Recognition and degradation of plant cell wall polysaccharides by two human gut symbionts.</article-title> <source><italic>PLoS Biol.</italic></source> <volume>9</volume>:<fpage>e1001221</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pbio.1001221</pub-id> <pub-id pub-id-type="pmid">22205877</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mosca</surname> <given-names>A.</given-names></name> <name><surname>Leclerc</surname> <given-names>M.</given-names></name> <name><surname>Hugot</surname> <given-names>J. P.</given-names></name></person-group> (<year>2016</year>). <article-title>Gut microbiota diversity and human diseases: should we reintroduce key predators in our ecosystem?</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>7</volume>:<fpage>455</fpage>. <pub-id pub-id-type="doi">10.3389/fmicb.2016.00455</pub-id> <pub-id pub-id-type="pmid">27065999</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mukhopadhya</surname> <given-names>I.</given-names></name> <name><surname>Mora&#x00EF;s</surname> <given-names>S.</given-names></name> <name><surname>Laverde-Gomez</surname> <given-names>J.</given-names></name> <name><surname>Sheridan</surname> <given-names>P.</given-names></name> <name><surname>Walker</surname> <given-names>A.</given-names></name> <name><surname>Kelly</surname> <given-names>W.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Sporulation capability and amylosome conservation among diverse human colonic and rumen isolates of the keystone starch-degrader <italic>Ruminococcus bromii</italic>: comparative genomics of <italic>Ruminococcus bromii</italic>.</article-title> <source><italic>Environ. Microbiol.</italic></source> <volume>20</volume> <fpage>324</fpage>&#x2013;<lpage>336</lpage>. <pub-id pub-id-type="doi">10.1111/1462-2920.14000</pub-id> <pub-id pub-id-type="pmid">29159997</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>Y.-J.</given-names></name> <name><surname>Jeong</surname> <given-names>Y.-U.</given-names></name> <name><surname>Kong</surname> <given-names>W.-S.</given-names></name></person-group> (<year>2018</year>). <article-title>Genome sequencing and carbohydrate-active enzyme (CAZyme) repertoire of the white rot fungus <italic>Flammulina elastica</italic>.</article-title> <source><italic>Int. J. Mol. Sci.</italic></source> <volume>19</volume>:<fpage>2379</fpage>. <pub-id pub-id-type="doi">10.3390/ijms19082379</pub-id> <pub-id pub-id-type="pmid">30104475</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Proctor</surname> <given-names>L.</given-names></name> <name><surname>Creasy</surname> <given-names>H. H.</given-names></name> <name><surname>Fettweis</surname> <given-names>J.</given-names></name> <name><surname>Lloyd-Price</surname> <given-names>J.</given-names></name> <name><surname>Mahurkar</surname> <given-names>A.</given-names></name> <name><surname>Zhou</surname> <given-names>W.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>The integrative human microbiome project.</article-title> <source><italic>Nature</italic></source> <volume>569</volume> <fpage>641</fpage>&#x2013;<lpage>648</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1238-8</pub-id> <pub-id pub-id-type="pmid">31142853</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qin</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Cai</surname> <given-names>Z.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>F.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>A metagenome-wide association study of gut microbiota in type 2 diabetes.</article-title> <source><italic>Nature</italic></source> <volume>490</volume> <fpage>55</fpage>&#x2013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1038/nature11450</pub-id> <pub-id pub-id-type="pmid">23023125</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><collab>R Core Team</collab> (<year>2018</year>). <source><italic>R: A Language and Environment for Statistical Computing.</italic></source> <publisher-loc>Vienna</publisher-loc>: <publisher-name>R Foundation for Statistical Computing</publisher-name>.</citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rey</surname> <given-names>F.</given-names></name> <name><surname>Faith</surname> <given-names>J. J.</given-names></name> <name><surname>Bain</surname> <given-names>J.</given-names></name> <name><surname>Muehlbauer</surname> <given-names>M. J.</given-names></name> <name><surname>Stevens</surname> <given-names>R.</given-names></name> <name><surname>Newgard</surname> <given-names>C. B.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>Dissecting the in vivo metabolic potential of two human gut acetogens.</article-title> <source><italic>J. Biol. Chem.</italic></source> <volume>285</volume> <fpage>22082</fpage>&#x2013;<lpage>22090</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.M110.117713</pub-id> <pub-id pub-id-type="pmid">20444704</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sayers</surname> <given-names>E. W.</given-names></name> <name><surname>Beck</surname> <given-names>J.</given-names></name> <name><surname>Brister</surname> <given-names>J. R.</given-names></name> <name><surname>Bolton</surname> <given-names>E. E.</given-names></name> <name><surname>Canese</surname> <given-names>K.</given-names></name> <name><surname>Comeau</surname> <given-names>D. C.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Database resources of the national center for biotechnology information.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>48</volume> <fpage>D9</fpage>&#x2013;<lpage>D16</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkz899</pub-id> <pub-id pub-id-type="pmid">31602479</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scheiman</surname> <given-names>J.</given-names></name> <name><surname>Luber</surname> <given-names>J. M.</given-names></name> <name><surname>Chavkin</surname> <given-names>T. A.</given-names></name> <name><surname>Macdonald</surname> <given-names>T.</given-names></name> <name><surname>Tung</surname> <given-names>A.</given-names></name> <name><surname>Pham</surname> <given-names>L.-D.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Meta-omics analysis of elite athletes identifies a performance-enhancing microbe that functions via lactate metabolism.</article-title> <source><italic>Nat. Med.</italic></source> <volume>25</volume> <fpage>1104</fpage>&#x2013;<lpage>1109</lpage>. <pub-id pub-id-type="doi">10.1038/s41591-019-0485-4</pub-id> <pub-id pub-id-type="pmid">31235964</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scott</surname> <given-names>K. P.</given-names></name> <name><surname>Martin</surname> <given-names>J.</given-names></name> <name><surname>Chassard</surname> <given-names>C.</given-names></name> <name><surname>Clerget</surname> <given-names>M.</given-names></name> <name><surname>Potrykus</surname> <given-names>J.</given-names></name> <name><surname>Campbell</surname> <given-names>G.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>Substrate-driven gene expression in <italic>Roseburia inulinivorans</italic>: importance of inducible enzymes in the utilization of inulin and starch.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>(108)</volume> <issue>Suppl. 1</issue> <fpage>4672</fpage>&#x2013;<lpage>4679</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1000091107</pub-id> <pub-id pub-id-type="pmid">20679207</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sheridan</surname> <given-names>P.</given-names></name> <name><surname>Martin</surname> <given-names>J.</given-names></name> <name><surname>Lawley</surname> <given-names>T. D.</given-names></name> <name><surname>Browne</surname> <given-names>H. P.</given-names></name> <name><surname>Harris</surname> <given-names>H. M. B.</given-names></name> <name><surname>Bernalier-Donadille</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Polysaccharide utilization loci and nutritional specialization in a dominant group of butyrate-producing human colonic firmicutes.</article-title> <source><italic>Microb. Genom.</italic></source> <volume>2</volume>:<fpage>e000043</fpage>. <pub-id pub-id-type="doi">10.1099/mgen.0.000043</pub-id> <pub-id pub-id-type="pmid">28348841</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sonnenburg</surname> <given-names>E.</given-names></name> <name><surname>Sonnenburg</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Starving our microbial self: the deleterious consequences of a diet deficient in microbiota-accessible carbohydrates.</article-title> <source><italic>Cell Metab.</italic></source> <volume>20</volume> <fpage>779</fpage>&#x2013;<lpage>786</lpage>. <pub-id pub-id-type="doi">10.1016/j.cmet.2014.07.003</pub-id> <pub-id pub-id-type="pmid">25156449</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stam</surname> <given-names>M. R.</given-names></name> <name><surname>Danchin</surname> <given-names>E. G. J.</given-names></name> <name><surname>Rancurel</surname> <given-names>C.</given-names></name> <name><surname>Coutinho</surname> <given-names>P. M.</given-names></name> <name><surname>Henrissat</surname> <given-names>B.</given-names></name></person-group> (<year>2007</year>). <article-title>Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of alpha-amylase-related proteins. protein Eng Des Sel 19: 555-562.</article-title> <source><italic>Protein Eng. Des. Sel.</italic></source> <volume>19</volume> <fpage>555</fpage>&#x2013;<lpage>562</lpage>. <pub-id pub-id-type="doi">10.1093/protein/gzl044</pub-id> <pub-id pub-id-type="pmid">17085431</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stewart</surname> <given-names>R. D.</given-names></name> <name><surname>Auffret</surname> <given-names>M. D.</given-names></name> <name><surname>Roehe</surname> <given-names>R.</given-names></name> <name><surname>Watson</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Open prediction of polysaccharide utilisation loci (PUL) in 5414 public bacteroidetes genomes using PULpy.</article-title> <source><italic>BioRxiv</italic> [Preprint]</source>. <pub-id pub-id-type="doi">10.1101/421024</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strain</surname> <given-names>C.</given-names></name> <name><surname>Collins</surname> <given-names>K.</given-names></name> <name><surname>Naughton</surname> <given-names>V.</given-names></name> <name><surname>McSorley</surname> <given-names>E.</given-names></name> <name><surname>Stanton</surname> <given-names>C.</given-names></name> <name><surname>Smyth</surname> <given-names>T.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Effects of a polysaccharide-rich extract derived from irish-sourced <italic>Laminaria digitata</italic> on the composition and metabolic activity of the human gut microbiota using an in vitro colonic model.</article-title> <source><italic>Eur. J. Nutr.</italic></source> <volume>59</volume> <fpage>309</fpage>&#x2013;<lpage>325</lpage>. <pub-id pub-id-type="doi">10.1007/s00394-019-01909-6</pub-id> <pub-id pub-id-type="pmid">30805695</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tanoue</surname> <given-names>T.</given-names></name> <name><surname>Morita</surname> <given-names>S.</given-names></name> <name><surname>Plichta</surname> <given-names>D.</given-names></name> <name><surname>Skelly</surname> <given-names>A.</given-names></name> <name><surname>Suda</surname> <given-names>W.</given-names></name> <name><surname>Sugiura</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>A defined commensal consortium elicits CD8 T Cells and anti-cancer immunity.</article-title> <source><italic>Nature</italic></source> <volume>565</volume>:<fpage>1</fpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-0878-z</pub-id> <pub-id pub-id-type="pmid">30675064</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tap</surname> <given-names>J.</given-names></name> <name><surname>Furet</surname> <given-names>J.-P.</given-names></name> <name><surname>Bensaada</surname> <given-names>M.</given-names></name> <name><surname>Catherine</surname> <given-names>P.</given-names></name> <name><surname>Roth</surname> <given-names>H.</given-names></name> <name><surname>Rabot</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Gut microbiota richness promotes its stability upon increased dietary fiber intake in healthy adults.</article-title> <source><italic>Environ. Microbiol.</italic></source> <volume>17</volume> <fpage>4954</fpage>&#x2013;<lpage>4964</lpage>. <pub-id pub-id-type="doi">10.1111/1462-2920.13006</pub-id> <pub-id pub-id-type="pmid">26235304</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tasse</surname> <given-names>L.</given-names></name> <name><surname>Bercovici</surname> <given-names>J.</given-names></name> <name><surname>Pizzut-Serin</surname> <given-names>S.</given-names></name> <name><surname>Robe</surname> <given-names>P.</given-names></name> <name><surname>Tap</surname> <given-names>J.</given-names></name> <name><surname>Klopp</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>Functional metagenomics to mine the human gut microbiome for dietary fiber catabolic enzymes.</article-title> <source><italic>Genome Res.</italic></source> <volume>20</volume> <fpage>1605</fpage>&#x2013;<lpage>1612</lpage>. <pub-id pub-id-type="doi">10.1101/gr.108332.110</pub-id> <pub-id pub-id-type="pmid">20841432</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tauzin</surname> <given-names>A.</given-names></name> <name><surname>Pereira</surname> <given-names>M.</given-names></name> <name><surname>van Vliet</surname> <given-names>L.</given-names></name> <name><surname>Colin</surname> <given-names>P.-Y.</given-names></name> <name><surname>Laville</surname> <given-names>E.</given-names></name> <name><surname>Esque</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Investigating host-microbiome interactions by droplet based microfluidics.</article-title> <source><italic>Microbiome</italic></source> <volume>8</volume>:<fpage>141</fpage>. <pub-id pub-id-type="doi">10.1186/s40168-020-00911-z</pub-id> <pub-id pub-id-type="pmid">33004077</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terrapon</surname> <given-names>N.</given-names></name> <name><surname>Lombard</surname> <given-names>V.</given-names></name> <name><surname>Drula</surname> <given-names>E.</given-names></name> <name><surname>Lap&#x00E9;bie</surname> <given-names>P.</given-names></name> <name><surname>Almasaudi</surname> <given-names>S.</given-names></name> <name><surname>Gilbert</surname> <given-names>H. J.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>PULDB: the expanded database of polysaccharide utilization loci.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>46</volume> <fpage>D677</fpage>&#x2013;<lpage>D683</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkx1022</pub-id> <pub-id pub-id-type="pmid">29088389</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Terrapon</surname> <given-names>N.</given-names></name> <name><surname>Lombard</surname> <given-names>V.</given-names></name> <name><surname>Gilbert</surname> <given-names>H. J.</given-names></name> <name><surname>Henrissat</surname> <given-names>B.</given-names></name></person-group> (<year>2015</year>). <article-title>Automatic prediction of polysaccharide utilization loci in Bacteroidetes species.</article-title> <source><italic>Bioinformatics</italic></source> <volume>31</volume> <fpage>647</fpage>&#x2013;<lpage>655</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu716</pub-id> <pub-id pub-id-type="pmid">25355788</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thomas</surname> <given-names>A.</given-names></name> <name><surname>Manghi</surname> <given-names>P.</given-names></name> <name><surname>Asnicar</surname> <given-names>F.</given-names></name> <name><surname>Pasolli</surname> <given-names>E.</given-names></name> <name><surname>Armanini</surname> <given-names>F.</given-names></name> <name><surname>Zolfo</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation.</article-title> <source><italic>Nat. Med.</italic></source> <volume>25</volume> <fpage>667</fpage>&#x2013;<lpage>678</lpage>. <pub-id pub-id-type="doi">10.1038/s41591-019-0405-7</pub-id> <pub-id pub-id-type="pmid">30936548</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thompson</surname> <given-names>J. D.</given-names></name> <name><surname>Higgins</surname> <given-names>D. G.</given-names></name> <name><surname>Gibson</surname> <given-names>T. J.</given-names></name></person-group> (<year>1994</year>). <article-title>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>22</volume> <fpage>4673</fpage>&#x2013;<lpage>4680</lpage>.</citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vangay</surname> <given-names>P.</given-names></name> <name><surname>Johnson</surname> <given-names>A.</given-names></name> <name><surname>Ward</surname> <given-names>T.</given-names></name> <name><surname>Al-Ghalith</surname> <given-names>G.</given-names></name> <name><surname>Shields-Cutler</surname> <given-names>R.</given-names></name> <name><surname>Hillmann</surname> <given-names>B.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>US Immigration westernizes the human gut microbiome.</article-title> <source><italic>Cell</italic></source> <volume>175</volume> <fpage>962</fpage>&#x2013;<lpage>972.e10</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2018.10.029</pub-id> <pub-id pub-id-type="pmid">30388453</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Villa</surname> <given-names>M. M.</given-names></name> <name><surname>Bloom</surname> <given-names>R. J.</given-names></name> <name><surname>Silverman</surname> <given-names>J. D.</given-names></name> <name><surname>Durand</surname> <given-names>H. K.</given-names></name> <name><surname>Jiang</surname> <given-names>S.</given-names></name> <name><surname>Wu</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Interindividual variation in dietary carbohydrate metabolism by gut bacteria revealed with droplet microfluidic culture. Edited by Christoph A. Thaiss.</article-title> <source><italic>mSystems</italic></source> <volume>5</volume>:<fpage>e00864-19</fpage>. <pub-id pub-id-type="doi">10.1128/mSystems.00864-19</pub-id> <pub-id pub-id-type="pmid">32606031</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Warren</surname> <given-names>F.</given-names></name> <name><surname>Fukuma</surname> <given-names>N.</given-names></name> <name><surname>Mikkelsen</surname> <given-names>D.</given-names></name> <name><surname>Flanagan</surname> <given-names>B.</given-names></name> <name><surname>Williams</surname> <given-names>B.</given-names></name> <name><surname>Lisle</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Food starch structure impacts gut microbiome composition.</article-title> <source><italic>mSphere</italic></source> <volume>3</volume> <fpage>e00086-18</fpage>. <pub-id pub-id-type="doi">10.1128/mSphere.00086-18</pub-id> <pub-id pub-id-type="pmid">29769378</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wickham</surname> <given-names>H.</given-names></name></person-group> (<year>2016</year>). <source><italic>Ggplot2: Elegant Graphics for Data Analysis.</italic></source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer-Verlag New York</publisher-name>.</citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wilkins</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <source><italic>Gggenes: Draw Gene Arrow Maps in &#x201C;Ggplot2.&#x201D;.</italic></source> <comment>Available online at:</comment> <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=gggenes">https://CRAN.R-project.org/package=gggenes</ext-link> <comment>(accessed September 21, 2021)</comment>.</citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>Y.</given-names></name></person-group> (<year>2015</year>). <source><italic>Dynamic Documents with R and Knitr</italic></source>, <edition>2nd Edn</edition>. <publisher-loc>Boca Raton, FL</publisher-loc>: <publisher-name>Chapman and Hall/CRC</publisher-name>.</citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>Y.</given-names></name> <name><surname>Cheng</surname> <given-names>J.</given-names></name> <name><surname>Tan</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <source><italic>DT: A Wrapper of the JavaScript Library &#x201C;DataTables.&#x201D;.</italic></source> <comment>Available online at:</comment> <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=DT">https://CRAN.R-project.org/package=DT</ext-link> <comment>(accessed September 21, 2021)</comment>.</citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yin</surname> <given-names>Y.</given-names></name> <name><surname>Mao</surname> <given-names>X.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Mao</surname> <given-names>F.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name></person-group> (<year>2012</year>). <article-title>DbCAN: a web resource for automated carbohydrate-active enzyme annotation.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>40</volume> <fpage>W445</fpage>&#x2013;<lpage>W451</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gks479</pub-id> <pub-id pub-id-type="pmid">22645317</pub-id></citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Yohe</surname> <given-names>T.</given-names></name> <name><surname>Huang</surname> <given-names>L.</given-names></name> <name><surname>Entwistle</surname> <given-names>S.</given-names></name> <name><surname>Wu</surname> <given-names>P.</given-names></name> <name><surname>Yang</surname> <given-names>Z.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>DbCAN2: a meta server for automated carbohydrate-active enzyme annotation.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>46</volume> <fpage>W95</fpage>&#x2013;<lpage>W101</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gky418</pub-id> <pub-id pub-id-type="pmid">29771380</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>F.</given-names></name> <name><surname>Ding</surname> <given-names>X.</given-names></name> <name><surname>Wu</surname> <given-names>G.</given-names></name> <name><surname>Lam</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Gut bacteria selectively promoted by dietary fibers alleviate type 2 diabetes.</article-title> <source><italic>Science</italic></source> <volume>359</volume> <fpage>1151</fpage>&#x2013;<lpage>1156</lpage>. <pub-id pub-id-type="doi">10.1126/science.aao5774</pub-id> <pub-id pub-id-type="pmid">29590046</pub-id></citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zmora</surname> <given-names>N.</given-names></name> <name><surname>Suez</surname> <given-names>J.</given-names></name> <name><surname>Elinav</surname> <given-names>E.</given-names></name></person-group> (<year>2018</year>). <article-title>You are what you eat: diet, health and the gut microbiota.</article-title> <source><italic>Nat. Rev. Gastroenterol. Hepatol.</italic></source> <volume>16</volume> <fpage>35</fpage>&#x2013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1038/s41575-018-0061-2</pub-id> <pub-id pub-id-type="pmid">30262901</pub-id></citation></ref>
<ref id="B69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zou</surname> <given-names>Y.</given-names></name> <name><surname>Xue</surname> <given-names>W.</given-names></name> <name><surname>Luo</surname> <given-names>G.</given-names></name> <name><surname>Deng</surname> <given-names>Z.</given-names></name> <name><surname>Qin</surname> <given-names>P.</given-names></name> <name><surname>Ruijin</surname> <given-names>G.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>37</volume>:<fpage>179</fpage>. <pub-id pub-id-type="doi">10.1038/s41587-018-0008-8</pub-id> <pub-id pub-id-type="pmid">30718868</pub-id></citation></ref>
</ref-list>
</back>
</article>