<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2022.851450</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Mini Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Interfacing Machine Learning and Microbial Omics: A Promising Means to Address Environmental Challenges</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>McElhinney</surname> <given-names>James M. W. R.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/477846/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Catacutan</surname> <given-names>Mary Krystelle</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1761514/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Mawart</surname> <given-names>Aurelie</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1335723/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Hasan</surname> <given-names>Ayesha</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1762121/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Dias</surname> <given-names>Jorge</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1313780/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Applied Genomics Laboratory, Center for Membranes and Advanced Water Technology, Khalifa University</institution>, <addr-line>Abu Dhabi</addr-line>, <country>United Arab Emirates</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Biomedical Engineering, Khalifa University</institution>, <addr-line>Abu Dhabi</addr-line>, <country>United Arab Emirates</country></aff>
<aff id="aff3"><sup>3</sup><institution>EECS, Center for Autonomous Robotic Systems, Khalifa University</institution>, <addr-line>Abu Dhabi</addr-line>, <country>United Arab Emirates</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: George Tsiamis, University of Patras, Greece</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Felipe Hernandes Coutinho, Institute of Marine Sciences (CSIC), Spain</p></fn>
<corresp id="c001">&#x002A;Correspondence: James M. W. R. McElhinney, <email>james.mcelhinney@ku.ac.ae</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>25</day>
<month>04</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>851450</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>01</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>03</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2022 McElhinney, Catacutan, Mawart, Hasan and Dias.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>McElhinney, Catacutan, Mawart, Hasan and Dias</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Microbial communities are ubiquitous and carry an exceptionally broad metabolic capability. Upon environmental perturbation, microbes are also amongst the first natural responsive elements with perturbation-specific cues and markers. These communities are thereby uniquely positioned to inform on the status of environmental conditions. The advent of microbial omics has led to an unprecedented volume of complex microbiological data sets. Importantly, these data sets are rich in biological information with potential for predictive environmental classification and forecasting. However, the patterns in this information are often hidden amongst the inherent complexity of the data. There has been a continued rise in the development and adoption of machine learning (ML) and deep learning architectures for solving research challenges of this sort. Indeed, the interface between molecular microbial ecology and artificial intelligence (AI) appears to show considerable potential for significantly advancing environmental monitoring and management practices through their application. Here, we provide a primer for ML, highlight the notion of retaining biological sample information for supervised ML, discuss workflow considerations, and review the state of the art of the exciting, yet nascent, interdisciplinary field of ML-driven microbial ecology. Current limitations in this sphere of research are also addressed to frame a forward-looking perspective toward the realization of what we anticipate will become a pivotal toolkit for addressing environmental monitoring and management challenges in the years ahead.</p>
</abstract>
<kwd-group>
<kwd>machine learning</kwd>
<kwd>microbial ecology</kwd>
<kwd>metagenomics</kwd>
<kwd>environmental monitoring</kwd>
<kwd>microbiology</kwd>
<kwd>artificial intelligence</kwd>
<kwd>microbial omics</kwd>
<kwd>predictive modeling</kwd>
</kwd-group>
<contract-num rid="cn001">Competitive Internal Research Award (CIRA2019-019)</contract-num>
<contract-sponsor id="cn001">Khalifa University of Science, Technology and Research<named-content content-type="fundref-id">10.13039/501100004070</named-content></contract-sponsor>
<counts>
<fig-count count="1"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="104"/>
<page-count count="11"/>
<word-count count="8395"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>Expansion of the human population is increasing resource consumption and discharge of waste products, placing significant burdens on the biosphere (<xref ref-type="bibr" rid="B12">Burrell et al., 2020</xref>; <xref ref-type="bibr" rid="B40">Grantham et al., 2020</xref>; <xref ref-type="bibr" rid="B67">Lv et al., 2020</xref>; <xref ref-type="bibr" rid="B1">Albert et al., 2021</xref>; <xref ref-type="bibr" rid="B66">Lu et al., 2021</xref>; <xref ref-type="bibr" rid="B71">Naumann et al., 2021</xref>; <xref ref-type="bibr" rid="B73">Ortiz-Bobea et al., 2021</xref>). These activities are contributing to the multifaceted pollution of the global ecological systems (<xref ref-type="bibr" rid="B49">Julinov&#x00E1; et al., 2018</xref>; <xref ref-type="bibr" rid="B80">Santos et al., 2019</xref>; <xref ref-type="bibr" rid="B92">Turan et al., 2019</xref>; <xref ref-type="bibr" rid="B93">Vardhan et al., 2019</xref>; <xref ref-type="bibr" rid="B10">Briffa et al., 2020</xref>; <xref ref-type="bibr" rid="B77">Pulster et al., 2020</xref>; <xref ref-type="bibr" rid="B84">Simul Bhuyan et al., 2021</xref>; <xref ref-type="bibr" rid="B87">Sohrabi et al., 2021</xref>; <xref ref-type="bibr" rid="B60">Li and Fantke, 2022</xref>). Consequently, we are witnessing an accelerating loss of biodiversity, habitats, and climate change (<xref ref-type="bibr" rid="B85">Sintayehu, 2018</xref>; <xref ref-type="bibr" rid="B11">Br&#x00FC;hl and Zaller, 2019</xref>). Gauging and forecasting such anthropogenic environmental impacts is often limited in scope due to scale-up challenges. At large scale, this endeavor remains an inordinately complex and resource-intensive task and therefore represents a major scientific goal.</p>
<p>At 93 gigatons carbon (Gt C), microbial communities comprise approximately 20% of the total estimated global biomass and exclusively form the deep subsurface biome (estimated at 70 Gt C) (<xref ref-type="bibr" rid="B8">Bar-On et al., 2018</xref>). These communities are ubiquitously distributed across the biosphere where their activities are central in shaping the environments of our planet (<xref ref-type="bibr" rid="B34">Gibbons and Gilbert, 2015</xref>); microbial communities possess exceptionally broad metabolic capabilities, enabling their utilization of many xenobiotics (<xref ref-type="bibr" rid="B52">Katsuyama et al., 2009</xref>; <xref ref-type="bibr" rid="B50">Junghare et al., 2019</xref>). Microbes can have short generation times and are amongst the first responders with perturbation-specific cues and markers (<xref ref-type="bibr" rid="B24">De Anda et al., 2018</xref>; <xref ref-type="bibr" rid="B4">Astudillo-Garc&#x00ED;a et al., 2019</xref>) these can therefore serve as a valuable source of biological information for establishing the status of their respective environmental niches and can serve as dynamic biosensors for monitoring and tracing environmental changes (<xref ref-type="bibr" rid="B14">Cesare et al., 2020</xref>; <xref ref-type="bibr" rid="B70">Morimura et al., 2020</xref>).</p>
<p>Omics methodologies enable rapid community-wide profiling of microbial populations across environmental perturbations. Omics data are information-rich, leading to an unprecedented volume of large multidimensional data sets with potential for predictive environmental classification and forecasting. However, the inherent complexity in these data conceals the patterns underlying the biological information, challenging manual curation and interpretation. Machine learning (ML) is well suited to address such challenges and there has been a sharp rise in their application in health-oriented microbiomics (<xref ref-type="bibr" rid="B102">Zeller et al., 2014</xref>; <xref ref-type="bibr" rid="B90">Szafra&#x0144;ski et al., 2015</xref>; <xref ref-type="bibr" rid="B55">Knight et al., 2018</xref>). ML-driven omics is now being applied to address environmental challenges (<xref ref-type="fig" rid="F1">Figure 1</xref>). Here, we will discuss the state of the art in this interdisciplinary field and highlight considerations, ongoing limitations, and challenges for future work. The interface between ML and molecular microbial ecology (MME) holds great promise for significantly advancing environmental monitoring and management practices. Indeed, ML will likely become a routine toolkit for the molecular microbiologist and will be essential to manage large multidimensional environmental omics data.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>The interface of microbial omics and machine learning (ML). A generalized and simplified overview of the workflows is presented highlighting the major steps in the microbial omics and ML workflows as they relate to one another along with key outcomes obtainable from the application of ML to omics data. Microbial community responses (biological information on which learning is aimed) are summarized below the cartoon snapshot of a contaminated environment of interest. Here, HC cont., hydrocarbon contamination; PAH, polyaromatic hydrocarbons (as examples of targets in petroleum hydrocarbon scenarios); QC, quality control; ASV, amplicon sequence variant (ASVs are given here as an example of an omics classification, other examples include the often used OTU, genes, mRNA transcripts, protein categories or metabolite IDs); DL, deep learning; ANN, artificial neural networks (shallow); RF, random forest; SVM, support vector machine; GB, gradient boost; LR, logistic regression; SMOTE, synthetic minority oversampling technique; SML, supervised machine learning; and MP, model performance.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmicb-13-851450-g001.tif"/>
</fig>
</sec>
<sec id="S2">
<title>Main Body</title>
<sec id="S2.SS1">
<title>A Primer on Machine Learning</title>
<p>Machine learning approaches can be supervised (SML) or unsupervised (USML). In SML methods, data sets are reduced/converted into the sets of features which serve as the input and form a variable for the SML model. Features are measurable and informative properties of the data, e.g., taxa abundances, annotated with metadata of interest (labels) which define the desired output (the target). Feature sets are subset into groups for model training and model testing/validation for SML learning. The SML architecture then attempts to derive a model that can predict the label for new input data. SML can be carried out to address regression or classification challenges. For regression, the SML tool predicts values for a continuous series (such as levels of environmental pollutants). For classification, the SML will predict the conditional label pertaining to the sample (such as contamination status). Deep learning (DL) is a subset of SML, which employs neural networks with multiple (&#x003E;3) processing layers and has the highest capacity for learning. For USML, no label or target output is defined; instead the USML architecture establishes patterns in the data naively, usually by clustering or ordination projections. USML is particularly useful for exploratory analysis of microbial omics data and includes ordination methods that are commonly applied in microbiology. Here we focus primarily on SML applications for environmentally centered microbial omics research. For more details on the underlying principles of ML for microbial ecology, readers are encouraged to see reviews (<xref ref-type="bibr" rid="B33">Ghannam and Techtmann, 2021</xref>; <xref ref-type="bibr" rid="B38">Goodswen et al., 2021</xref>).</p>
</sec>
<sec id="S2.SS2">
<title>Omics Data Sets Are Rich in Learnable Biological Information</title>
<p>Anthropogenic perturbations give rise to spatiotemporal patterns in microbial communities by influencing the following: abundances, interactions between, and dispersal of community members (<xref ref-type="bibr" rid="B9">Blaser et al., 2016</xref>; <xref ref-type="bibr" rid="B64">Liao et al., 2018</xref>). Community dynamics are perturbation-specific, reproducible, and predictable, affecting taxonomic diversity, differential abundances in taxa, functional gene clusters, and shifts in metabolic circuits which influence microbial interactions (<xref ref-type="fig" rid="F1">Figure 1</xref>). Microbial omics approaches are rapidly advancing our views of these complex shifts and have opened myriad avenues for the utilization of microbial data to address environmental challenges. Often these omics approaches scrutinize a single systems level (e.g., DNA or RNA), but can synergistically provide more information when integrated with supporting omics data from other systems layers (<xref ref-type="bibr" rid="B31">Franzosa et al., 2015</xref>). Such integrative omics represents a powerful means to understand communities through cross-systems-level descriptions but is in its infancy and yet to be much applied in this area. A central challenge for any ML-led omics analyses is the preservation of the biological information hidden within the microbial community, throughout the workflow (<xref ref-type="fig" rid="F1">Figure 1</xref>), to allow for effective learning. There are numerous ways <italic>via</italic> which the biological information in omics samples can be compromised. These pitfalls occur at virtually all decision points in the omics workflow and begin with the experimental design phase. The significance of a given pitfall is highly dependent on the phenomena under investigation and aims of the study but common pitfalls include inadequate sampling, improper preservation, sample transport conditions or subcommunity sampling (e.g., planktonic/sessile), biases arising from sample handling (e.g., during extraction and amplification), the choice of sequencing/liquid chromatography-mass spectrometry (LC&#x2013;MS) platform and analytical methodology, classification and filtering of omics data (which can remove rare but important taxa, transcripts, or proteins), artifacts from data transformation and normalization approaches (correcting for library size is especially essential for meta-analyses), and the choice and engineering of features. A number of considerations can help in preserving the biological information for omics-led SML, and many are discussed in the following.</p>
</sec>
<sec id="S2.SS3">
<title>Workflow Considerations</title>
<sec id="S2.SS3.SSS1">
<title>Microbial Omics Input</title>
<p>Microbial omics pitfalls, from sampling to the bioinformatics pipeline, can reduce or bias the information yielded (<xref ref-type="bibr" rid="B41">Gutleben et al., 2018</xref>; <xref ref-type="bibr" rid="B51">Kaster and Sobol, 2020</xref>). Typically, some trade-off must be made in the experimental design, for which options have been suggested (<xref ref-type="bibr" rid="B31">Franzosa et al., 2015</xref>). In metataxonomics, resolution is usually limited to the genus level, though it is the most commonly used omics input for SML (<xref ref-type="table" rid="T1">Table 1</xref>), wherein relative operational taxonomic unit (OTU) abundances form the feature set (<xref ref-type="bibr" rid="B69">Miao et al., 2020</xref>; <xref ref-type="bibr" rid="B47">Jan&#x00DF;en et al., 2021</xref>; <xref ref-type="bibr" rid="B54">Kim and Oh, 2021</xref>). However, the use of OTUs is inherently limiting for retaining community information and can miss important taxonomic groups. Indeed, since the development of the more biologically meaningful amplicon sequence variants (ASVs; <xref ref-type="bibr" rid="B13">Callahan et al., 2017</xref>), the absence of ASVs in most metataxonomic studies is striking. As ASVs represent a more accurate basis for taxa assignment, it will be interesting to see how their application influences ML performances in future.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Example applications of the SML of microbial Omics data for addressing environmental challenges.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Environment</td>
<td valign="top" align="left">Niche</td>
<td valign="top" align="left">Application</td>
<td valign="top" align="left">Omics</td>
<td valign="top" align="left">Input data</td>
<td valign="top" align="left">Feature</td>
<td valign="top" align="left">Target(s)</td>
<td valign="top" align="left">SML architectures</td>
<td valign="top" align="left">Software</td>
<td valign="top" align="left">References</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Aquatic</td>
<td valign="top" align="left">Marine (Coral Reef)</td>
<td valign="top" align="left">Prediction of environmental status</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance</td>
<td valign="top" align="left">Eutrophication indicators and temperature</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">Caret and RF R packages</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B36">Glasl et al., 2019</xref></td>
</tr>
<tr>
<td valign="top" align="left">Industrial</td>
<td valign="top" align="left">WWTP</td>
<td valign="top" align="left">Prediction of environmental variable to identify key subpopulations</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance, PCA coordinates</td>
<td valign="top" align="left">WWTP water temperature</td>
<td valign="top" align="left">LR, RF, SVML, DT, KNN, SVMRBF</td>
<td valign="top" align="left">Scikit-Learn</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B54">Kim and Oh, 2021</xref></td>
</tr>
<tr>
<td valign="top" align="left">Terrestrial</td>
<td valign="top" align="left">Soil<xref ref-type="table-fn" rid="t1fn1"><sup>1</sup></xref></td>
<td valign="top" align="left">Prediction of carbon cycling</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance</td>
<td valign="top" align="left">[DOC]</td>
<td valign="top" align="left">RF, ANN</td>
<td valign="top" align="left">THEANO, Scikit-Learn</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B91">Thompson et al., 2019</xref></td>
</tr>
<tr>
<td valign="top" align="left">Terrestrial</td>
<td valign="top" align="left">Compost</td>
<td valign="top" align="left">Classification of microbial biomarkers</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance</td>
<td valign="top" align="left">Compost cycle</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">RF R package</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B103">Zhang et al., 2020</xref></td>
</tr>
<tr>
<td valign="top" align="left">Terrestrial</td>
<td valign="top" align="left">Ground water + Soil<xref ref-type="table-fn" rid="t1fn1"><sup>1</sup></xref></td>
<td valign="top" align="left">Prediction of environmental contaminants</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance</td>
<td valign="top" align="left">[dioxane] and [CVOCs]</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left"/><td valign="top" align="left"><xref ref-type="bibr" rid="B69">Miao et al., 2020</xref></td>
</tr>
<tr>
<td valign="top" align="left">Terrestrial</td>
<td valign="top" align="left">Soil</td>
<td valign="top" align="left">Prediction of environmental quality</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance</td>
<td valign="top" align="left">Soil physicochemical features</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">RF R package</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B45">Hermans et al., 2020</xref></td>
</tr>
<tr>
<td valign="top" align="left">Aquatic</td>
<td valign="top" align="left">Marine (coastal waters)<xref ref-type="table-fn" rid="t1fn1"><sup>1</sup></xref></td>
<td valign="top" align="left">Prediction of environmental contaminants</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance, 16S rRNA gene sequences</td>
<td valign="top" align="left">Glyphosate</td>
<td valign="top" align="left">RF, ANN</td>
<td valign="top" align="left">RF R package and DL4J</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B48">Jan&#x00DF;en et al., 2019</xref></td>
</tr>
<tr>
<td valign="top" align="left">Aquatic</td>
<td valign="top" align="left">Freshwater (river)</td>
<td valign="top" align="left">Classification of anthropogenic pathogen loads</td>
<td valign="top" align="left">metataxonomics<xref ref-type="table-fn" rid="t1fn2"><sup>2</sup></xref></td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance</td>
<td valign="top" align="left">Fecal source</td>
<td valign="top" align="left">RF, MCMC</td>
<td valign="top" align="left">RF R package and SourceTracker</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B27">Dubinsky et al., 2016</xref></td>
</tr>
<tr>
<td valign="top" align="left">Aquatic</td>
<td valign="top" align="left">Marine and Freshwater</td>
<td valign="top" align="left">Classification of microbial biomarkers</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA and ITS OTUs</td>
<td valign="top" align="left">OTU abundance</td>
<td valign="top" align="left">Plastisphere communities</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">RF R package</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B59">Li et al., 2021</xref></td>
</tr>
<tr>
<td valign="top" align="left">Aquatic</td>
<td valign="top" align="left">Marine sediment (munitions dumpsite)</td>
<td valign="top" align="left">Prediction of environmental contaminants</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance</td>
<td valign="top" align="left">TNT</td>
<td valign="top" align="left">RF, ANN</td>
<td valign="top" align="left">Ranger R package ANN R keras framework + TensorFlow back end</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B47">Jan&#x00DF;en et al., 2021</xref></td>
</tr>
<tr>
<td valign="top" align="left">Aquatic</td>
<td valign="top" align="left">Freshwater (river)</td>
<td valign="top" align="left">Classification of sample origin</td>
<td valign="top" align="left">metataxonomics</td>
<td valign="top" align="left">16S rRNA OTUs</td>
<td valign="top" align="left">OTU abundance (top taxa)</td>
<td valign="top" align="left">Sample origin</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">RF R package</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B94">Wang et al., 2021</xref></td>
</tr>
<tr>
<td valign="top" align="left">Aquatic</td>
<td valign="top" align="left">Marine (oceanic waters)</td>
<td valign="top" align="left">Classification of trophic modes</td>
<td valign="top" align="left">Metatranscriptomics</td>
<td valign="top" align="left">Gene expression levels</td>
<td valign="top" align="left">expression levels of selected Pfam entries</td>
<td valign="top" align="left">Trophic mode (photo/hetero/mixo)</td>
<td valign="top" align="left">RF, DT, ANN</td>
<td valign="top" align="left">NR and XGBoost</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B56">Lambert et al., 2021</xref></td>
</tr>
<tr>
<td valign="top" align="left">Terrestrial</td>
<td valign="top" align="left">Soil</td>
<td valign="top" align="left">Prediction of crop productivity</td>
<td valign="top" align="left">metagenomics</td>
<td valign="top" align="left">Shotgun sequencing</td>
<td valign="top" align="left">OTU abundance</td>
<td valign="top" align="left">Crop productivity</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">Ranger R package</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B15">Chang et al., 2017</xref></td>
</tr>
<tr>
<td valign="top" align="left">Terrestrial</td>
<td valign="top" align="left">Soil</td>
<td valign="top" align="left">Prediction of soil phylogroups from environmental metadata</td>
<td valign="top" align="left">metagenomics</td>
<td valign="top" align="left">NR</td>
<td valign="top" align="left">NR</td>
<td valign="top" align="left"><italic>Listeria</italic> species</td>
<td valign="top" align="left">RF</td>
<td valign="top" align="left">RF R package</td>
<td valign="top" align="left"><xref ref-type="bibr" rid="B63">Liao et al., 2021</xref></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="t1fn1"><p><italic><sup>1</sup>Indirectly studied in microcosms.</italic></p></fn>
<fn id="t1fn2"><p><italic><sup>2</sup>Using PhyloChip array.</italic></p></fn>
<fn><p><italic>Here, ANN, Artificial Neural Network; CVOCs, Chlorinated Volatile Organic Compounds; DOC, Dissolved Organic Carbon; DT, Decision Tree; KNN, K-Nearest Neighbors; LR, Logistic Regression; MCMC, Markov Chain Monte Carlo; NR, Not reported; RF, Random Forest; SVML, Support Vector Machine (SVM) with a linear kernel; SVMRBF, SVM with a radial basis function kernel; TNT, trinitrotoluene; WWTP, Wastewater Treatment Plant.</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<p>Metagenomics is highly sensitive for low-abundance taxa, but is rarely applied for SML and carries additional costs which may limit sampling and options for ML (<xref ref-type="bibr" rid="B17">Chen and Tyler, 2020</xref>). Importantly, metagenomic approaches do not always convey a clear advantage over the more cost-effective metataxonomic approach (<xref ref-type="bibr" rid="B99">Xu et al., 2014</xref>). The choice between metataxonomics and metagenomics is evidently not clear-cut and should be considered in light of the expected community under study, choice of sequencing platform, and research goals. Microbial omics inputs are most often derived from closed-reference databases, leading to inevitable loss of learnable biological information in environmental samples due to unclassified/misclassified data (<xref ref-type="bibr" rid="B17">Chen and Tyler, 2020</xref>). However, the development of ML and DL tools (<xref ref-type="bibr" rid="B62">Liang et al., 2020</xref>) for enhancing taxonomic classification in metagenomic data sets could prove helpful. Alternatively, the direct use of biological sequences (from microbial omics surveys) circumvents this issue (by forgoing categorical assignment), thereby permitting the inclusion of more comprehensive feature spaces, at the cost of reducing the immediate interpretability for the user. Informative abstractions of omics data, such as the use of K-mer distributions as a feature set, have shown success in both taxonomic (<xref ref-type="bibr" rid="B29">Fiannaca et al., 2018</xref>) subtyping (<xref ref-type="bibr" rid="B88">Solis-Reyes et al., 2018</xref>) and phenotypic (<xref ref-type="bibr" rid="B5">Aun et al., 2018</xref>) classification, and are applicable to environmental applications. Indeed, K-mer abstractions have shown predictive potential for classifying sample environment and host-phenotype (an environmental status) that excels over OTU features (<xref ref-type="bibr" rid="B3">Asgari et al., 2018</xref>). Environmental metatranscriptomics-led SML is currently limited. However, the approach has been shown to uncover the mixotrophic processes of protists in response to nutrient gradients in the Pacific Ocean (<xref ref-type="bibr" rid="B56">Lambert et al., 2021</xref>), thereby demonstrating that trophic modes can be readily predicted from metatranscriptomic data.</p>
</sec>
<sec id="S2.SS3.SSS2">
<title>Choice of Machine Learning Architecture</title>
<p>There is a broad selection of the SML tools to select from and each carries its own advantages and limitations (<xref ref-type="bibr" rid="B38">Goodswen et al., 2021</xref>). Not a single architecture performs best in all environmental application cases and users must make a trade-off in terms of interpretability, learning performance, computational costs, data requirements, and ease of implementation (<xref ref-type="bibr" rid="B33">Ghannam and Techtmann, 2021</xref>). At the outset, selecting a set of architectures can help to ensure the delivery of research goals. Random forest (RF) is a popular choice for microbial omics-driven SML for its learning capacity, straightforward implementation, and high degree of interpretability (<xref ref-type="bibr" rid="B33">Ghannam and Techtmann, 2021</xref>). For especially complex tasks, or where knowledge is limited, DL approaches (multi-layered architectures) have the highest performance, as they can self-learn (i.e., do not require user extraction of) the feature set (<xref ref-type="bibr" rid="B20">Christin et al., 2019</xref>). However, DL comes with elevated computational costs and low interpretability of the underlying model (&#x201C;<italic>black box</italic>&#x201D; effect) and requires large volumes of data (thousands of samples). Consequently, though very promising, DL approaches for environmental omics are currently limited.</p>
</sec>
<sec id="S2.SS3.SSS3">
<title>Feature Engineering</title>
<p>Feature selection and engineering are crucial for generating meaningful SML-based ecological models. Reducing the feature space can help to limit overfitting, reduce computational costs, improve cross-study comparison, and improve generalized prediction performance across data sets (<xref ref-type="bibr" rid="B33">Ghannam and Techtmann, 2021</xref>). However, care is needed when reducing features for training as biologically meaningful features can be missed if feature selection is based on abundance. This is especially so when assessing anthropogenic perturbations of pollutants in the environment, wherein the rare microbiome (taxa representing &#x003C;0.1% of the total community) comprise a significant reservoir of gene clusters that enable the utilization and degradation of xenobiotic organic compounds (<xref ref-type="bibr" rid="B95">Wang et al., 2017</xref>). Taking embedded approaches for feature selection (that can evaluate across the full feature space) (<xref ref-type="bibr" rid="B95">Wang et al., 2017</xref>) or a biologically driven feature selection method (such as taxonomically aware hierarchical feature engineering) (<xref ref-type="bibr" rid="B74">Oudah and Henschel, 2018</xref>) may help in optimizing feature selection in metataxonomics-driven ML applications. Feature selection methods designed for functional feature sets are still notably lacking in this space.</p>
<p>Conventional statistics require assumptions on the underlying data and care is needed, given the compositional nature of microbial omics data sets (<xref ref-type="bibr" rid="B37">Gloor et al., 2017</xref>). For example, conventional ecological models often assume monotonicity in relationships, which can hinder ecological explanations of community variance across study sites. By applying SML (allowing for non-monotonic feature capture), the ability to capture this variance can increase nine-fold (<xref ref-type="bibr" rid="B30">Fontaine et al., 2021</xref>). It is important to note that the goal of SML should not be to replace classical statistical modeling, but rather to complement it. Integrating these two approaches presents an promising opportunity to leverage their advantages for predictive environmental microbiology (<xref ref-type="bibr" rid="B65">Lopatkin and Collins, 2020</xref>) and monitoring. For multi-omics studies, feature selection and engineering becomes increasingly complex with the successive systems levels, and there is much to be done in this area. In such studies, functional data across systems levels will likely need to be empirically assessed prior to SML to identify the most informative biomarkers for learning (<xref ref-type="bibr" rid="B99">Xu et al., 2014</xref>).</p>
</sec>
<sec id="S2.SS3.SSS4">
<title>Evaluating Data Leakage</title>
<p>Data leakage is a subtle but important aspect of ML, referring to the unintended use or influence of data (that should not be available at the time of prediction) during the training process. This often occurs when the features used for training hide within themselves the result of the prediction, resulting in an overestimation of performance of the model during validation (<xref ref-type="bibr" rid="B18">Chiavegatto Filho et al., 2021</xref>). Due to the subtleties with which this can occur, avoiding data leakage is challenging and should be evaluated on a case by case basis. Important aspects for consideration here have been discussed previously (<xref ref-type="bibr" rid="B97">Wirbel et al., 2021</xref>) and include (1) data filtering that is influenced by the target label and (2) the splitting of dependent data (e.g., replicates and time-series data points) across training and validation sets. The use of an externally generated test data set (handled separately from the training set) for additional validation checks can help (<xref ref-type="bibr" rid="B75">Oyetunde et al., 2019</xref>; <xref ref-type="bibr" rid="B97">Wirbel et al., 2021</xref>), though data leakage is seldom discussed in microbial omics papers that use SML. We urge future authors in this space to consider including at least a statement on leakage assessment in studies based on SML.</p>
</sec>
</sec>
<sec id="S2.SS4">
<title>Applications of Molecular Microbial Ecology&#x2013;Machine Learning for Environmental Challenges</title>
<sec id="S2.SS4.SSS1">
<title>Microbes as Environmental Biosensors</title>
<p>Anthropogenic impacts are motivating the development of cost-effective and scalable environmental bioassessment methodologies (<xref ref-type="bibr" rid="B32">Fruehe et al., 2021</xref>). Microbes have long been recognized as potential <italic>in situ</italic> biosensors for following human impacts (<xref ref-type="bibr" rid="B89">Su et al., 2011</xref>), allowing for highly accurate quantitative SML predictions of the perturbation. Indeed, metataxonomic data can be valuable for the prediction of a variety of environmental contaminants (<xref ref-type="table" rid="T1">Table 1</xref>), spanning from relatively inert plastics (<xref ref-type="bibr" rid="B59">Li et al., 2021</xref>) to petroleum hydrocarbons [which illicit strong responses with detectable influences even after the pollutant is degraded and undetectable by conventional measures (<xref ref-type="bibr" rid="B86">Smith et al., 2015</xref>)]. Hydrocarbonoclastic indicator species have also been identified as key biosensors in ML-based bioprospecting of hydrocarbon seepage from subsurface reservoirs and can improve the likelihood of success in drilling for new assets (<xref ref-type="bibr" rid="B25">de Dios Miranda et al., 2019</xref>; <xref ref-type="bibr" rid="B19">Chitu et al., 2022</xref>). The same approach is also being explored as the potential early-warning indicators of leakage from hydrocarbon transport lines (<xref ref-type="bibr" rid="B83">Shaheen et al., 2011</xref>). Indeed, the SML of microbial fingerprints has even demonstrated reasonable predictions (accuracies of 72&#x2013;85%) of the future production of hydrocarbon reservoirs (using metataxonomic input) (<xref ref-type="bibr" rid="B104">Zijp et al., 2021</xref>) which can facilitate decision-making for enhanced asset management. These approaches thereby have real potential for reducing the carbon footprint and ecological impact of upstream oil and gas activities.</p>
</sec>
<sec id="S2.SS4.SSS2">
<title>Microbes as Predictors of Environmental Status</title>
<p>Microbes have proved valuable as ecological assessment indicators in multiple diverse environments (<xref ref-type="bibr" rid="B4">Astudillo-Garc&#x00ED;a et al., 2019</xref>; <xref ref-type="bibr" rid="B36">Glasl et al., 2019</xref>; <xref ref-type="bibr" rid="B45">Hermans et al., 2020</xref>; <xref ref-type="bibr" rid="B16">Chen et al., 2021</xref>). Moreover, improvements in sequencing technologies are facilitating the upscaling and deployment of omics-based ML for more ambitious environmental monitoring and mitigation applications (<xref ref-type="bibr" rid="B94">Wang et al., 2021</xref>). These indicators can reveal important relationships for land management, when conventional field measurements are unhelpful (<xref ref-type="bibr" rid="B15">Chang et al., 2017</xref>). Indeed, the SML of microbial 16S rRNA abundances can directly predict soil productivity in arable land and risks posed for agriculture (<xref ref-type="bibr" rid="B101">Yuan et al., 2020</xref>). USML is routinely applied <italic>via</italic> ordination techniques to establish the organization of microbiome data in relation to their environmental parameters. However, in instances where conventional ordinations fail to determine clear relationships, SML may still yield community subpopulations that can serve as predictors for environmental parameters and processes of interest. For example, the influence between temperature and key phosphate and glycogen-accumulating organisms involved in the enhanced biological phosphorous removal processes of a set of wastewater treatment plants (WWTPs) in South Korea was identified using an SML approach, resulting in findings with clear implications for WWTP design and operation (<xref ref-type="bibr" rid="B72">Oh and Kim, 2021</xref>). Additionally, the SML of metabarcoded environmental DNA (eDNA) can provide superior performance for environmental quality monitoring over conventional bioindicator values for marine aquaculture monitoring (<xref ref-type="bibr" rid="B32">Fruehe et al., 2021</xref>). Furthermore, RF learning of eDNA has been shown to outperform conventional taxonomy-based biotic indices assessments (<xref ref-type="bibr" rid="B22">Cordier et al., 2018</xref>). Biodiversity in microbial communities can also be a useful proxy to assess the environmental impact of anthropogenic perturbations through changes in biotic indices (<xref ref-type="bibr" rid="B6">Aylagas et al., 2017</xref>). In these ways, SML is a useful means to improve environmental monitoring programs.</p>
</sec>
<sec id="S2.SS4.SSS3">
<title>Predicting Sample Origin With Microbiological Data</title>
<p>The predictive power of ML for monitoring environmental status also enables sample origin to be established (<xref ref-type="bibr" rid="B79">Raza et al., 2021</xref>). Microbial metrics have proved to be exceptionally sensitive indicators of human impacts on freshwater environments (<xref ref-type="bibr" rid="B64">Liao et al., 2018</xref>). Indeed, <italic>via</italic> ML modeling, the partitioning of microbes along complex anthropogenic xenobiotic gradients from urban and agricultural runoffs is sufficient to identify the origin of water samples from the 30 most abundant taxa (<xref ref-type="bibr" rid="B94">Wang et al., 2021</xref>) and is able to resolve sample origin depth and local salinity in the Baltic Sea (<xref ref-type="bibr" rid="B2">Alneberg et al., 2020</xref>). Such origin tracing carries the potential to inform for public health by accurately predicting the origins of fecal contaminants in public waters (<xref ref-type="bibr" rid="B16">Chen et al., 2021</xref>; <xref ref-type="bibr" rid="B79">Raza et al., 2021</xref>) and the source of food-borne pathogen outbreaks (<xref ref-type="bibr" rid="B96">Wheeler, 2019</xref>). The ability to identify sample origin sources is likely to be of critical importance moving forward for tracing runoffs from agricultural and industrial entities to ensure compliance with environmentally mindful legislation. It will be interesting to see whether this sort of tracing application will lend itself to following waterbodies in other settings, or indeed, other mobile elements within the environment (forensic analysis of migratory animals under conservation management, for example). Given the perceived stability in the gut microbiome, it is possible that this approach could also be extended as a biological tagging approach for following animal populations at the center of conservation efforts.</p>
</sec>
<sec id="S2.SS4.SSS4">
<title>Supporting Environmental Meta-Analyses and Data Mining</title>
<p>The high volumes of omics data are enabling large-scale meta-analyses (<xref ref-type="bibr" rid="B102">Zeller et al., 2014</xref>) that can provide a global view of microbial roles within major environments (<xref ref-type="bibr" rid="B78">Ramirez et al., 2018</xref>; <xref ref-type="bibr" rid="B98">Wu et al., 2019</xref>; <xref ref-type="bibr" rid="B101">Yuan et al., 2020</xref>). However, several challenges arise in such studies owing to non-standardized sample collection, extraction methods, and primer choice (<xref ref-type="bibr" rid="B78">Ramirez et al., 2018</xref>). Additionally, technicalities of sequencing platforms, variable library sizes, and environmental confounders can reduce concordance across omics studies (though SML is alleviating this issue) (<xref ref-type="bibr" rid="B100">York, 2021</xref>). ML tools are well suited for uncovering patterns within these challenging data collections. For example, a meta-analysis of soil microbiomes with SML was able to reveal microbiological indicators for predicting propensity for <italic>Fusarium</italic> wilt (<xref ref-type="bibr" rid="B101">Yuan et al., 2020</xref>), an agriculturally important pest. Additionally, a meta-analysis of global soil (<xref ref-type="bibr" rid="B78">Ramirez et al., 2018</xref>) and WWTP (<xref ref-type="bibr" rid="B98">Wu et al., 2019</xref>) communities provided macroecological insights into the microbial biogeography communities and confirmed the importance of the rare microbiome members as bioindicators. There remains significant scope for standardizing the workflows in both omics and SML. Such standardizations are crucial to mitigating common pitfalls; these enhance reproducibility and promote meta-analyses and data mining. An important limiting factor here is that many data sets are unavailable, uploaded to repositories without raw data or lacking metadata descriptions. This issue has been raised before (<xref ref-type="bibr" rid="B78">Ramirez et al., 2018</xref>) and impedes otherwise valuable work. For instance, bioprospecting of biosynthetic gene clusters with SML-based omics data mining can yield proteins with biotechnological potential (<xref ref-type="bibr" rid="B23">Correia and Weimann, 2021</xref>) for bioremediation, biodegradable plastic production, and sustainable biofuels (<xref ref-type="bibr" rid="B43">Haque et al., 2020</xref>; <xref ref-type="bibr" rid="B53">Keasling et al., 2021</xref>). We therefore urge that omics data sets be uploaded in their raw form with metadata made available.</p>
</sec>
<sec id="S2.SS4.SSS5">
<title>Supervised Machine Learning of Microbial Omics Data to Address Climate Change</title>
<p>The collective effects of anthropogenic perturbations are driving the consequences of climate change (notably, losses of ecosystem function, services, biodiversity, and habitat) at unprecedented rates (<xref ref-type="bibr" rid="B35">Giuliani et al., 2017</xref>). The actions of microbial communities are implicitly tied to geochemical cycling, global water chemistries, nutrient availabilities, and soil/plant health (<xref ref-type="bibr" rid="B39">Gorbushina and Krumbein, 2000</xref>; <xref ref-type="bibr" rid="B28">Falkowski et al., 2008</xref>; <xref ref-type="bibr" rid="B61">Lian et al., 2008</xref>; <xref ref-type="bibr" rid="B26">Dong, 2010</xref>; <xref ref-type="bibr" rid="B76">Panke-Buisse et al., 2014</xref>). Microbes are thereby drivers of numerous ecosystem services on which the global population relies (<xref ref-type="bibr" rid="B68">Marco and Abram, 2019</xref>). Understanding microbe&#x2013;ecosystem interactions and functions is therefore central to their utilization in ecological models and biotechnologies for intervening on climate change. The generation of high-resolution spatiotemporal dynamics data and incorporation of different omics data sets can provide important insights into the molecular mechanisms behind climate changes responses and improve the accuracy of forecasting models (<xref ref-type="bibr" rid="B46">Herold et al., 2020</xref>; <xref ref-type="bibr" rid="B58">Layton and Bradbury, 2021</xref>). Together with their ubiquitous nature, the core roles of microbial communities afford us with a broad framework for potential microbiological tools with which the fundamental impacts of global climate change can be understood, monitored, predicted, and conceivably, mitigated. The short generation times of microbial community members and their predictable changes following changing environmental parameters (<xref ref-type="bibr" rid="B57">Larsen et al., 2012</xref>) open the possibility for their use as early-warning indicators of climate change-led impacts on macroecological networks (<xref ref-type="bibr" rid="B82">Shah et al., 2022</xref>) before further biodiversity loss is observable on the macroscale. Conversely, microbial contributions to climate change <italic>via</italic> carbon cycle-climate feedback and N<sub>2</sub>O production (<xref ref-type="bibr" rid="B7">Bardgett et al., 2008</xref>) are an ideal candidate for predictive SML modeling and intervention. Indeed, predictive models from microbial omics data have also shown utility across a range of climate change-linked phenomena, including browning (<xref ref-type="bibr" rid="B30">Fontaine et al., 2021</xref>), eutrophication (<xref ref-type="bibr" rid="B36">Glasl et al., 2019</xref>), harmful algal blooms (<xref ref-type="bibr" rid="B44">Hennon and Dyhrman, 2020</xref>), and arability of soils (<xref ref-type="bibr" rid="B15">Chang et al., 2017</xref>; <xref ref-type="bibr" rid="B44">Hennon and Dyhrman, 2020</xref>; <xref ref-type="bibr" rid="B101">Yuan et al., 2020</xref>). omics in soil-plant, subsurface, and aquatic microbiomes is also central to making inroads in the development of carbon capture and sequestration (CCS) biotechnologies (<xref ref-type="bibr" rid="B81">Schweitzer et al., 2021</xref>). It will be interesting to see whether such developments benefit from SML-based modeling, which could prove useful for establishing taxa and metabolisms that predict stability and sequestration rates in CCS systems. Therefore, SML modeling can facilitate the establishment and optimization of carbon fluxes in microbial communities (particularly for the poorly characterized deep subsurface microbiome) and may also help to bridge bioenergy production to CCS, which is considered essential for many climate change mitigation plans (<xref ref-type="bibr" rid="B42">Hanssen et al., 2020</xref>). At present, the ability of microbes to inform on, and forecast, climate change impacts <italic>via</italic> ecological monitoring programs is perhaps the most immediately applicable area for the SML of microbial omics in climate change research. In this way, microbes can assist decision-makers for sustainable policies and intervention measures to ensure food security and maintain ecosystem services before further ecological detriment occurs (<xref ref-type="bibr" rid="B21">Cordier et al., 2021</xref>; <xref ref-type="bibr" rid="B82">Shah et al., 2022</xref>). The potential future applications in this space, however, are vast and may be key for realizing goals in global-scale climate management and engineering against climate change.</p>
</sec>
</sec>
</sec>
<sec id="S3">
<title>Concluding Remarks and Future Perspectives</title>
<p>Machine learning is a powerful toolbox for drawing meaningful biological insights from large multidimensional microbial data. Here, we discussed how SML can contribute to environmental challenges by valorizing microbial community data sets. The predictive potential of interfacing omics and SML has opened exciting new avenues for managing environmental pollution and status. The ability to identify key species and functional elements can be expected to accelerate biotechnological developments with implications for environmental intervention (such as bioremediation). Through the interface of these important disciplines, we are rapidly advancing our view of global microbiome and the ecological impacts from human activities.</p>
<p>This nascent, but fast-evolving, application area for ML has several notable opportunities which are yet to be exploited. Metataxonomics-centric ML efforts have dominated this space, but has yet to apply long-read and metagenome-assembled genomic data for feature set development in this research area. Additionally, several advanced systems-level techniques (metaproteomics, metabolomics, and in particular, integrative omics) remain at much earlier stages of development compared with DNA sequencing-based approaches and are consequently lagging in this arena. ML tools will likely become integral to pipelines for these advanced omics methodologies. We foresee SML becoming a routine complement to conventional statistics and expect that this will key for revealing the often-overlooked rare microbiome. As omics approaches continue to advance, and sample costs reduce, we can expect to see a rise in the application of promising DL architectures at this interdisciplinary interface. DL tools will no doubt prove indispensable in data mining the ever-increasing public omics repositories and represent an exciting means to address feature engineering challenges <italic>via</italic> unsupervised feature extractions.</p>
</sec>
<sec id="S4">
<title>Author Contributions</title>
<p>JM: structure of manuscript, figure design and production, literature review, manuscript writing, population of table, and revisions. MC: initial draft of manuscript, figure design, and literature review. AM: literature review, population of table, figure design, and development of content. AH: structure of the manuscript, secured funding, manuscript review, and development of content. JD: conceptualize the manuscript, manuscript review, and development of content. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="pudiscl1" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec id="S5" sec-type="funding-information">
<title>Funding</title>
<p>This work was funded by the Competitive Internal Research Award (CIRA2019-019) of Khalifa University.</p>
</sec>
<ack>
<p>We would like to acknowledge valuable discussions on this topic with Olivier Monga and Andreas Henschel.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Albert</surname> <given-names>J. S.</given-names></name> <name><surname>Destouni</surname> <given-names>G.</given-names></name> <name><surname>Duke-Sylvester</surname> <given-names>S. M.</given-names></name> <name><surname>Magurran</surname> <given-names>A. E.</given-names></name> <name><surname>Oberdorff</surname> <given-names>T.</given-names></name> <name><surname>Reis</surname> <given-names>R. E.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Scientists&#x2019; warning to humanity on the freshwater biodiversity crisis.</article-title> <source><italic>Ambio</italic></source> <volume>50</volume> <fpage>85</fpage>&#x2013;<lpage>94</lpage>. <pub-id pub-id-type="doi">10.1007/s13280-020-01318-8</pub-id> <pub-id pub-id-type="pmid">32040746</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alneberg</surname> <given-names>J.</given-names></name> <name><surname>Bennke</surname> <given-names>C.</given-names></name> <name><surname>Beier</surname> <given-names>S.</given-names></name> <name><surname>Bunse</surname> <given-names>C.</given-names></name> <name><surname>Quince</surname> <given-names>C.</given-names></name> <name><surname>Ininbergs</surname> <given-names>K.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Ecosystem-wide metagenomic binning enables prediction of ecological niches from genomes.</article-title> <source><italic>Comm. Biol.</italic></source> <volume>3</volume>:<fpage>119</fpage>. <pub-id pub-id-type="doi">10.1038/s42003-020-0856-x</pub-id> <pub-id pub-id-type="pmid">32170201</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Asgari</surname> <given-names>E.</given-names></name> <name><surname>Garakani</surname> <given-names>K.</given-names></name> <name><surname>McHardy</surname> <given-names>A. C.</given-names></name> <name><surname>Mofrad</surname> <given-names>M. R. K.</given-names></name></person-group> (<year>2018</year>). <article-title>MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples.</article-title> <source><italic>Bioinformatics</italic></source> <volume>34</volume> <fpage>i32</fpage>&#x2013;<lpage>i42</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty296</pub-id> <pub-id pub-id-type="pmid">29950008</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Astudillo-Garc&#x00ED;a</surname> <given-names>C.</given-names></name> <name><surname>Hermans</surname> <given-names>S. M.</given-names></name> <name><surname>Stevenson</surname> <given-names>B.</given-names></name> <name><surname>Buckley</surname> <given-names>H. L.</given-names></name> <name><surname>Lear</surname> <given-names>G.</given-names></name></person-group> (<year>2019</year>). <article-title>Microbial assemblages and bioindicators as proxies for ecosystem health status: potential and limitations.</article-title> <source><italic>Appl. Microbiol. Biotechnol.</italic></source> <volume>103</volume> <fpage>6407</fpage>&#x2013;<lpage>6421</lpage>. <pub-id pub-id-type="doi">10.1007/s00253-019-09963-0</pub-id> <pub-id pub-id-type="pmid">31243501</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aun</surname> <given-names>E.</given-names></name> <name><surname>Brauer</surname> <given-names>A.</given-names></name> <name><surname>Kisand</surname> <given-names>V.</given-names></name> <name><surname>Tenson</surname> <given-names>T.</given-names></name> <name><surname>Remm</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>14</volume>:<fpage>e1006434</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1006434</pub-id> <pub-id pub-id-type="pmid">30346947</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aylagas</surname> <given-names>E.</given-names></name> <name><surname>Borja</surname> <given-names>&#x00C1;</given-names></name> <name><surname>Tangherlini</surname> <given-names>M.</given-names></name> <name><surname>Dell&#x2019;Anno</surname> <given-names>A.</given-names></name> <name><surname>Corinaldesi</surname> <given-names>C.</given-names></name> <name><surname>Michell</surname> <given-names>C. T.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>A bacterial community-based index to assess the ecological status of estuarine and coastal environments.</article-title> <source><italic>Mar. Poll. Bull.</italic></source> <volume>114</volume> <fpage>679</fpage>&#x2013;<lpage>688</lpage>. <pub-id pub-id-type="doi">10.1016/j.marpolbul.2016.10.050</pub-id> <pub-id pub-id-type="pmid">27784536</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bardgett</surname> <given-names>R. D.</given-names></name> <name><surname>Freeman</surname> <given-names>C.</given-names></name> <name><surname>Ostle</surname> <given-names>N. J.</given-names></name></person-group> (<year>2008</year>). <article-title>Microbial contributions to climate change through carbon cycle feedbacks.</article-title> <source><italic>ISME J.</italic></source> <volume>2</volume> <fpage>805</fpage>&#x2013;<lpage>814</lpage>. <pub-id pub-id-type="doi">10.1038/ismej.2008.58</pub-id> <pub-id pub-id-type="pmid">18615117</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bar-On</surname> <given-names>Y. M.</given-names></name> <name><surname>Phillips</surname> <given-names>R.</given-names></name> <name><surname>Milo</surname> <given-names>R.</given-names></name></person-group> (<year>2018</year>). <article-title>The biomass distribution on Earth.</article-title> <source><italic>Proc. Natl. Acad. Sci.</italic></source> <volume>115</volume> <fpage>6506</fpage>&#x2013;<lpage>6511</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1711842115</pub-id> <pub-id pub-id-type="pmid">29784790</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blaser</surname> <given-names>M. J.</given-names></name> <name><surname>Cardon</surname> <given-names>Z. G.</given-names></name> <name><surname>Cho</surname> <given-names>M. K.</given-names></name> <name><surname>Dangl</surname> <given-names>J. L.</given-names></name> <name><surname>Donohue</surname> <given-names>T. J.</given-names></name> <name><surname>Green</surname> <given-names>J. L.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Toward a Predictive Understanding of Earth&#x2019;s Microbiomes to Address 21st Century Challenges.</article-title> <source><italic>mBio</italic></source> <volume>7</volume>:<fpage>e00714</fpage>&#x2013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1128/mBio.00714-16</pub-id> <pub-id pub-id-type="pmid">27178263</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Briffa</surname> <given-names>J.</given-names></name> <name><surname>Sinagra</surname> <given-names>E.</given-names></name> <name><surname>Blundell</surname> <given-names>R.</given-names></name></person-group> (<year>2020</year>). <article-title>Heavy metal pollution in the environment and their toxicological effects on humans.</article-title> <source><italic>Heliyon</italic></source> <volume>6</volume>:<fpage>e04691</fpage>. <pub-id pub-id-type="doi">10.1016/j.heliyon.2020.e04691</pub-id> <pub-id pub-id-type="pmid">32964150</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Br&#x00FC;hl</surname> <given-names>C. A.</given-names></name> <name><surname>Zaller</surname> <given-names>J. G.</given-names></name></person-group> (<year>2019</year>). <article-title>Biodiversity Decline as a Consequence of an Inappropriate Environmental Risk Assessment of Pesticides.</article-title> <source><italic>Front. Environ. Sci.</italic></source> <volume>7</volume>:<fpage>177</fpage>. <pub-id pub-id-type="doi">10.3389/fenvs.2019.0017</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burrell</surname> <given-names>A. L.</given-names></name> <name><surname>Evans</surname> <given-names>J. P.</given-names></name> <name><surname>De Kauwe</surname> <given-names>M. G.</given-names></name></person-group> (<year>2020</year>). <article-title>Anthropogenic climate change has driven over 5 million km2 of drylands towards desertification.</article-title> <source><italic>Nat. Commun.</italic></source> <volume>11</volume>:<fpage>3853</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-020-17710-7</pub-id> <pub-id pub-id-type="pmid">32737311</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Callahan</surname> <given-names>B. J.</given-names></name> <name><surname>McMurdie</surname> <given-names>P. J.</given-names></name> <name><surname>Holmes</surname> <given-names>S. P.</given-names></name></person-group> (<year>2017</year>). <article-title>Exact sequence variants should replace operational taxonomic units in marker-gene data analysis.</article-title> <source><italic>ISME J.</italic></source> <volume>11</volume> <fpage>2639</fpage>&#x2013;<lpage>2643</lpage>. <pub-id pub-id-type="doi">10.1038/ismej.2017.119</pub-id> <pub-id pub-id-type="pmid">28731476</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cesare</surname> <given-names>A.</given-names></name> <name><surname>Di Pjevac</surname> <given-names>P.</given-names></name> <name><surname>Eckert</surname> <given-names>E.</given-names></name> <name><surname>Curkov</surname> <given-names>N.</given-names></name> <name><surname>Miko &#x0160;parica</surname> <given-names>M.</given-names></name> <name><surname>Corno</surname> <given-names>G.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>The role of metal contamination in shaping microbial communities in heavily polluted marine sediments.</article-title> <source><italic>Environ. Poll.</italic></source> <volume>265</volume>:<fpage>114823</fpage>. <pub-id pub-id-type="doi">10.1016/j.envpol.2020.114823</pub-id> <pub-id pub-id-type="pmid">32512474</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>H.-X.</given-names></name> <name><surname>Haudenshield</surname> <given-names>J. S.</given-names></name> <name><surname>Bowen</surname> <given-names>C. R.</given-names></name> <name><surname>Hartman</surname> <given-names>G. L.</given-names></name></person-group> (<year>2017</year>). <article-title>Metagenome-Wide Association Study and Machine Learning Prediction of Bulk Soil Microbiome and Crop Productivity.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>8</volume>:<fpage>519</fpage>. <pub-id pub-id-type="doi">10.3389/fmicb.2017.00519</pub-id> <pub-id pub-id-type="pmid">28421041</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>F.</given-names></name> <name><surname>Koh</surname> <given-names>X. P.</given-names></name> <name><surname>Tang</surname> <given-names>M. L. Y.</given-names></name> <name><surname>Gan</surname> <given-names>J.</given-names></name> <name><surname>Lau</surname> <given-names>S. C. K.</given-names></name></person-group> (<year>2021</year>). <article-title>Microbiological assessment of ecological status in the Pearl River Estuary.</article-title> <source><italic>Chin. Ecol. Indicat.</italic></source> <volume>130</volume>:<fpage>108084</fpage>. <pub-id pub-id-type="doi">10.1016/j.ecolind.2021.108084</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J.C.-y</given-names></name> <name><surname>Tyler</surname> <given-names>A. D.</given-names></name></person-group> (<year>2020</year>). <article-title>Systematic evaluation of supervised machine learning for sample origin prediction using metagenomic sequencing data.</article-title> <source><italic>Biol. Dir.</italic></source> <volume>15</volume>:<fpage>29</fpage>. <pub-id pub-id-type="doi">10.1186/s13062-020-00287-y</pub-id> <pub-id pub-id-type="pmid">33302990</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chiavegatto Filho</surname> <given-names>A.</given-names></name> <name><surname>Batista</surname> <given-names>A. F. D. M.</given-names></name> <name><surname>dos Santos</surname> <given-names>H. G.</given-names></name></person-group> (<year>2021</year>). <article-title>Data Leakage in Health Outcomes Prediction With Machine Learning. Comment on &#x201C;Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning&#x201D;.</article-title> <source><italic>J. Med. Internet Res.</italic></source> <volume>23</volume>:<fpage>e10969</fpage>. <pub-id pub-id-type="doi">10.2196/10969</pub-id> <pub-id pub-id-type="pmid">33570496</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chitu</surname> <given-names>A. G.</given-names></name> <name><surname>Zijp</surname> <given-names>M. H. A. A.</given-names></name> <name><surname>Zwaan</surname> <given-names>J.</given-names></name></person-group> (<year>2022</year>). <article-title>A novel exploration technique using the microbial fingerprint of shallow sediment to detect hydrocarbon microseepage and predict hydrocarbon charge &#x2014; An Argentinian case study.</article-title> <source><italic>Interpretation</italic></source> <volume>10</volume> <fpage>1F</fpage>&#x2013;<lpage>T211</lpage>.</citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Christin</surname> <given-names>S.</given-names></name> <name><surname>Hervet</surname> <given-names>&#x00C9;</given-names></name> <name><surname>Lecomte</surname> <given-names>N.</given-names></name></person-group> (<year>2019</year>). <article-title>Applications for deep learning in ecology.</article-title> <source><italic>Methods Ecol. Evol.</italic></source> <volume>10</volume> <fpage>1632</fpage>&#x2013;<lpage>1644</lpage>. <pub-id pub-id-type="doi">10.1111/2041-210x.13256</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cordier</surname> <given-names>T.</given-names></name> <name><surname>Alonso-S&#x00E1;ez</surname> <given-names>L.</given-names></name> <name><surname>Apoth&#x00E9;loz-Perret-Gentil</surname> <given-names>L.</given-names></name> <name><surname>Aylagas</surname> <given-names>E.</given-names></name> <name><surname>Bohan</surname> <given-names>D. A.</given-names></name> <name><surname>Bouchez</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Ecosystems monitoring powered by environmental genomics: A review of current strategies with an implementation roadmap.</article-title> <source><italic>Mol. Ecol.</italic></source> <volume>30</volume> <fpage>2937</fpage>&#x2013;<lpage>2958</lpage>. <pub-id pub-id-type="doi">10.1111/mec.15472</pub-id> <pub-id pub-id-type="pmid">32416615</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cordier</surname> <given-names>T.</given-names></name> <name><surname>Forster</surname> <given-names>D.</given-names></name> <name><surname>Dufresne</surname> <given-names>Y.</given-names></name> <name><surname>Martins</surname> <given-names>C. I. M.</given-names></name> <name><surname>Stoeck</surname> <given-names>T.</given-names></name> <name><surname>Pawlowski</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Supervised machine learning outperforms taxonomy-based environmental DNA metabarcoding applied to biomonitoring.</article-title> <source><italic>Mol. Ecol. Res.</italic></source> <volume>18</volume> <fpage>1381</fpage>&#x2013;<lpage>1391</lpage>. <pub-id pub-id-type="doi">10.1111/1755-0998.12926</pub-id> <pub-id pub-id-type="pmid">30014577</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Correia</surname> <given-names>A.</given-names></name> <name><surname>Weimann</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Protein antibiotics: mind your language.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>19</volume>:<fpage>7</fpage>. <pub-id pub-id-type="doi">10.1038/s41579-020-00485-5</pub-id> <pub-id pub-id-type="pmid">33219332</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Anda</surname> <given-names>V.</given-names></name> <name><surname>Zapata-Pe&#x00F1;asco</surname> <given-names>I.</given-names></name> <name><surname>Blaz</surname> <given-names>J.</given-names></name> <name><surname>Poot-Hern&#x00E1;ndez</surname> <given-names>A. C.</given-names></name> <name><surname>Contreras-Moreira</surname> <given-names>B.</given-names></name> <name><surname>Gonz&#x00E1;lez-Laffitte</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Understanding the Mechanisms Behind the Response to Environmental Perturbation in Microbial Mats: A Metagenomic-Network Based Approach.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>9</volume>:<fpage>2606</fpage>. <pub-id pub-id-type="doi">10.3389/fmicb.2018.02606</pub-id> <pub-id pub-id-type="pmid">30555424</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Dios Miranda</surname> <given-names>J.</given-names></name> <name><surname>Seoane</surname> <given-names>J. M.</given-names></name> <name><surname>Esteban</surname> <given-names>&#x00C1;</given-names></name> <name><surname>Esp&#x00ED;</surname> <given-names>E.</given-names></name></person-group> (<year>2019</year>). <source><italic>Microbial Exploration Techniques: An Offshore Case Study, Oilfield Microbiology.</italic></source> <publisher-loc>Florida</publisher-loc>: <publisher-name>CRC Press</publisher-name>, <fpage>271</fpage>&#x2013;<lpage>298</lpage>.</citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>H.</given-names></name></person-group> (<year>2010</year>). <article-title>Mineral-microbe interactions: a review.</article-title> <source><italic>Front. Earth Sci. Chin.</italic></source> <volume>4</volume>:<fpage>127</fpage>&#x2013;<lpage>147</lpage>. <pub-id pub-id-type="doi">10.1007/s11707-010-0022-8</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dubinsky</surname> <given-names>E. A.</given-names></name> <name><surname>Butkus</surname> <given-names>S. R.</given-names></name> <name><surname>Andersen</surname> <given-names>G. L.</given-names></name></person-group> (<year>2016</year>). <article-title>Microbial source tracking in impaired watersheds using PhyloChip and machine-learning classification.</article-title> <source><italic>Water Res.</italic></source> <volume>105</volume> <fpage>56</fpage>&#x2013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1016/j.watres.2016.08.035</pub-id> <pub-id pub-id-type="pmid">27598696</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Falkowski</surname> <given-names>P. G.</given-names></name> <name><surname>Fenchel</surname> <given-names>T.</given-names></name> <name><surname>Delong</surname> <given-names>E. F.</given-names></name></person-group> (<year>2008</year>). <article-title>The Microbial Engines That Drive Earth&#x2019;s Biogeochemical Cycles.</article-title> <source><italic>Science</italic></source> <volume>320</volume> <fpage>1034</fpage>&#x2013;<lpage>1039</lpage>. <pub-id pub-id-type="doi">10.1126/science.1153213</pub-id> <pub-id pub-id-type="pmid">18497287</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fiannaca</surname> <given-names>A.</given-names></name> <name><surname>La Paglia</surname> <given-names>L.</given-names></name> <name><surname>La Rosa</surname> <given-names>M.</given-names></name> <name><surname>Lo Bosco</surname> <given-names>G.</given-names></name> <name><surname>Renda</surname> <given-names>G.</given-names></name> <name><surname>Rizzo</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Deep learning models for bacteria taxonomic classification of metagenomic data.</article-title> <source><italic>BMC Bioinform.</italic></source> <volume>19</volume>:<fpage>198</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-018-2182-6</pub-id> <pub-id pub-id-type="pmid">30066629</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fontaine</surname> <given-names>L.</given-names></name> <name><surname>Khomich</surname> <given-names>M.</given-names></name> <name><surname>Andersen</surname> <given-names>T.</given-names></name> <name><surname>Hessen</surname> <given-names>D. O.</given-names></name> <name><surname>Rasconi</surname> <given-names>S.</given-names></name> <name><surname>Davey</surname> <given-names>M. L.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Multiple thresholds and trajectories of microbial biodiversity predicted across browning gradients by neural networks and decision tree learning.</article-title> <source><italic>ISME Commun.</italic></source> <volume>1</volume>:<fpage>37</fpage>.</citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Franzosa</surname> <given-names>E. A.</given-names></name> <name><surname>Hsu</surname> <given-names>T.</given-names></name> <name><surname>Sirota-Madi</surname> <given-names>A.</given-names></name> <name><surname>Shafquat</surname> <given-names>A.</given-names></name> <name><surname>Abu-Ali</surname> <given-names>G.</given-names></name> <name><surname>Morgan</surname> <given-names>X. C.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Sequencing and beyond: integrating molecular &#x2018;omics&#x2019; for microbial community profiling.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>13</volume> <fpage>360</fpage>&#x2013;<lpage>372</lpage>. <pub-id pub-id-type="doi">10.1038/nrmicro3451</pub-id> <pub-id pub-id-type="pmid">25915636</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fruehe</surname> <given-names>L.</given-names></name> <name><surname>Cordier</surname> <given-names>T.</given-names></name> <name><surname>Dully</surname> <given-names>V.</given-names></name> <name><surname>Breiner</surname> <given-names>H. W.</given-names></name> <name><surname>Lentendu</surname> <given-names>G.</given-names></name> <name><surname>Pawlowski</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Supervised machine learning is superior to indicator value inference in monitoring the environmental impacts of salmon aquaculture using eDNA metabarcodes.</article-title> <source><italic>Mol. Ecol.</italic></source> <volume>30</volume> <fpage>2988</fpage>&#x2013;<lpage>3006</lpage>. <pub-id pub-id-type="doi">10.1111/mec.15434</pub-id> <pub-id pub-id-type="pmid">32285497</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghannam</surname> <given-names>R. B.</given-names></name> <name><surname>Techtmann</surname> <given-names>S. M.</given-names></name></person-group> (<year>2021</year>). <article-title>Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring.</article-title> <source><italic>Comput. Struct. Biotechnol. J.</italic></source> <volume>19</volume> <fpage>1092</fpage>&#x2013;<lpage>1107</lpage>. <pub-id pub-id-type="doi">10.1016/j.csbj.2021.01.028</pub-id> <pub-id pub-id-type="pmid">33680353</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gibbons</surname> <given-names>S. M.</given-names></name> <name><surname>Gilbert</surname> <given-names>J. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Microbial diversity&#x2013;exploration of natural ecosystems and microbiomes.</article-title> <source><italic>Curr. Opin. Genet. Dev.</italic></source> <volume>35</volume> <fpage>66</fpage>&#x2013;<lpage>72</lpage>. <pub-id pub-id-type="doi">10.1016/j.gde.2015.10.003</pub-id> <pub-id pub-id-type="pmid">26598941</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giuliani</surname> <given-names>G.</given-names></name> <name><surname>Dao</surname> <given-names>H.</given-names></name> <name><surname>De Bono</surname> <given-names>A.</given-names></name> <name><surname>Chatenoux</surname> <given-names>B.</given-names></name> <name><surname>Allenbach</surname> <given-names>K.</given-names></name> <name><surname>De Laborie</surname> <given-names>P.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Live Monitoring of Earth Surface (LiMES): A framework for monitoring environmental changes from Earth Observations.</article-title> <source><italic>Rem. Sensing Environ.</italic></source> <volume>202</volume> <fpage>222</fpage>&#x2013;<lpage>233</lpage>. <pub-id pub-id-type="doi">10.1016/j.rse.2017.05.040</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Glasl</surname> <given-names>B.</given-names></name> <name><surname>Bourne</surname> <given-names>D. G.</given-names></name> <name><surname>Frade</surname> <given-names>P. R.</given-names></name> <name><surname>Thomas</surname> <given-names>T.</given-names></name> <name><surname>Schaffelke</surname> <given-names>B.</given-names></name> <name><surname>Webster</surname> <given-names>N. S.</given-names></name></person-group> (<year>2019</year>). <article-title>Microbial indicators of environmental perturbations in coral reef ecosystems.</article-title> <source><italic>Microbiome</italic></source> <volume>7</volume>:<fpage>94</fpage>. <pub-id pub-id-type="doi">10.1186/s40168-019-0705-7</pub-id> <pub-id pub-id-type="pmid">31227022</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gloor</surname> <given-names>G. B.</given-names></name> <name><surname>Macklaim</surname> <given-names>J. M.</given-names></name> <name><surname>Pawlowsky-Glahn</surname> <given-names>V.</given-names></name> <name><surname>Egozcue</surname> <given-names>J. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Microbiome Datasets Are Compositional: And This Is Not Optional.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>8</volume>:<fpage>2224</fpage>. <pub-id pub-id-type="doi">10.3389/fmicb.2017.02224</pub-id> <pub-id pub-id-type="pmid">29187837</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodswen</surname> <given-names>S. J.</given-names></name> <name><surname>Barratt</surname> <given-names>J. L. N.</given-names></name> <name><surname>Kennedy</surname> <given-names>P. J.</given-names></name> <name><surname>Kaufer</surname> <given-names>A.</given-names></name> <name><surname>Calarco</surname> <given-names>L.</given-names></name> <name><surname>Ellis</surname> <given-names>J. T.</given-names></name></person-group> (<year>2021</year>). <article-title>Machine learning and applications in microbiology.</article-title> <source><italic>FEMS Microbiol. Rev.</italic></source> <volume>45</volume>:<fpage>fuab015</fpage>.</citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gorbushina</surname> <given-names>A. A.</given-names></name> <name><surname>Krumbein</surname> <given-names>W. E.</given-names></name></person-group> (<year>2000</year>). &#x201C;<article-title>Subaerial Microbial Mats and Their Effects on Soil and Rock</article-title>,&#x201D; in <source><italic>Microbial Sediments</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Riding</surname> <given-names>R. E.</given-names></name> <name><surname>Awramik</surname> <given-names>S. M.</given-names></name></person-group> (<publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>161</fpage>&#x2013;<lpage>170</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-662-04036-2_18</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grantham</surname> <given-names>H. S.</given-names></name> <name><surname>Duncan</surname> <given-names>A.</given-names></name> <name><surname>Evans</surname> <given-names>T. D.</given-names></name> <name><surname>Jones</surname> <given-names>K. R.</given-names></name> <name><surname>Beyer</surname> <given-names>H. L.</given-names></name></person-group> (<year>2020</year>). <article-title>Anthropogenic modification of forests means only 40% of remaining forests have high ecosystem integrity.</article-title> <source><italic>Nat. Comm.</italic></source> <volume>11</volume>:<fpage>5978</fpage>.</citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gutleben</surname> <given-names>J.</given-names></name> <name><surname>De Mares</surname> <given-names>M.</given-names></name> <name><surname>Chaib</surname></name> <name><surname>van Elsas</surname> <given-names>J. D.</given-names></name> <name><surname>Smidt</surname> <given-names>H.</given-names></name> <name><surname>Overmann</surname> <given-names>J.</given-names></name> <name><surname>Sipkema</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>The multi-omics promise in context: from sequence to microbial isolate.</article-title> <source><italic>Crit. Rev. Microbiol.</italic></source> <volume>44</volume> <fpage>212</fpage>&#x2013;<lpage>229</lpage>. <pub-id pub-id-type="doi">10.1080/1040841X.2017.1332003</pub-id> <pub-id pub-id-type="pmid">28562180</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanssen</surname> <given-names>S. V.</given-names></name> <name><surname>Daioglou</surname> <given-names>V.</given-names></name> <name><surname>Steinmann</surname> <given-names>Z. J. N.</given-names></name> <name><surname>Doelman</surname> <given-names>J. C.</given-names></name> <name><surname>Van Vuuren</surname> <given-names>D. P.</given-names></name> <name><surname>Huijbregts</surname> <given-names>M. A. J.</given-names></name></person-group> (<year>2020</year>). <article-title>The climate change mitigation potential of bioenergy with carbon capture and storage.</article-title> <source><italic>Nat. Clim. Change</italic></source> <volume>10</volume> <fpage>1023</fpage>&#x2013;<lpage>1029</lpage>. <pub-id pub-id-type="doi">10.1038/s41558-020-0885-y</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haque</surname> <given-names>R.</given-names></name> <name><surname>Paradisi</surname> <given-names>F.</given-names></name> <name><surname>Allers</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>Haloferax volcanii for biotechnology applications: challenges, current state and perspectives.</article-title> <source><italic>Appl. Microbiol. Biotechnol.</italic></source> <volume>104</volume> <fpage>1371</fpage>&#x2013;<lpage>1382</lpage>. <pub-id pub-id-type="doi">10.1007/s00253-019-10314-2</pub-id> <pub-id pub-id-type="pmid">31863144</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hennon</surname> <given-names>G. M. M.</given-names></name> <name><surname>Dyhrman</surname> <given-names>S. T.</given-names></name></person-group> (<year>2020</year>). <article-title>Progress and promise of omics for predicting the impacts of climate change on harmful algal blooms.</article-title> <source><italic>Harmful Algae</italic></source> <volume>91</volume>:<fpage>101587</fpage>. <pub-id pub-id-type="doi">10.1016/j.hal.2019.03.005</pub-id> <pub-id pub-id-type="pmid">32057337</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hermans</surname> <given-names>S. M.</given-names></name> <name><surname>Buckley</surname> <given-names>H. L.</given-names></name> <name><surname>Case</surname> <given-names>B. S.</given-names></name> <name><surname>Curran-Cournane</surname> <given-names>F.</given-names></name> <name><surname>Taylor</surname> <given-names>M.</given-names></name> <name><surname>Lear</surname> <given-names>G.</given-names></name></person-group> (<year>2020</year>). <article-title>Using soil bacterial communities to predict physico-chemical variables and soil quality.</article-title> <source><italic>Microbiome</italic></source> <volume>8</volume>:<fpage>79</fpage>. <pub-id pub-id-type="doi">10.1186/s40168-020-00858-1</pub-id> <pub-id pub-id-type="pmid">32487269</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Herold</surname> <given-names>M.</given-names></name> <name><surname>Mart&#x00ED;nez Arbas</surname> <given-names>S.</given-names></name> <name><surname>Narayanasamy</surname> <given-names>S.</given-names></name> <name><surname>Sheik</surname> <given-names>A. R.</given-names></name> <name><surname>Kleine-Borgmann</surname> <given-names>L. A. K.</given-names></name> <name><surname>Lebrun</surname> <given-names>L. A.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Integration of time-series meta-omics data reveals how microbial ecosystems respond to disturbance.</article-title> <source><italic>Nat. Comm.</italic></source> <volume>11</volume>:<fpage>5281</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-020-19006-2</pub-id> <pub-id pub-id-type="pmid">33077707</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jan&#x00DF;en</surname> <given-names>R.</given-names></name> <name><surname>Beck</surname> <given-names>A. J.</given-names></name> <name><surname>Werner</surname> <given-names>J.</given-names></name> <name><surname>Dellwig</surname> <given-names>O.</given-names></name> <name><surname>Alneberg</surname> <given-names>J.</given-names></name> <name><surname>Kreikemeyer</surname> <given-names>B.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Machine Learning Predicts the Presence of 2,4,6-Trinitrotoluene in Sediments of a Baltic Sea Munitions Dumpsite Using Microbial Community Compositions.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>12</volume>:<fpage>626048</fpage>. <pub-id pub-id-type="doi">10.3389/fmicb.2021.626048</pub-id> <pub-id pub-id-type="pmid">34659134</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jan&#x00DF;en</surname> <given-names>R.</given-names></name> <name><surname>Zabel</surname> <given-names>J.</given-names></name> <name><surname>von Lukas</surname> <given-names>U.</given-names></name> <name><surname>Labrenz</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>An artificial neural network and Random Forest identify glyphosate-impacted brackish communities based on 16S rRNA amplicon MiSeq read counts.</article-title> <source><italic>Mar. Poll. Bull.</italic></source> <volume>149</volume>:<fpage>110530</fpage>. <pub-id pub-id-type="doi">10.1016/j.marpolbul.2019.110530</pub-id> <pub-id pub-id-type="pmid">31454615</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Julinov&#x00E1;</surname> <given-names>M.</given-names></name> <name><surname>Van&#x0306;harov&#x00E1;</surname> <given-names>L.</given-names></name> <name><surname>Jur&#x010D;a</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Water-soluble polymeric xenobiotics &#x2013; Polyvinyl alcohol and polyvinylpyrrolidon &#x2013; And potential solutions to environmental issues: A brief review.</article-title> <source><italic>J. Environ. Manage.</italic></source> <volume>228</volume> <fpage>213</fpage>&#x2013;<lpage>222</lpage>. <pub-id pub-id-type="doi">10.1016/j.jenvman.2018.09.010</pub-id> <pub-id pub-id-type="pmid">30223180</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Junghare</surname> <given-names>M.</given-names></name> <name><surname>Spiteller</surname> <given-names>D.</given-names></name> <name><surname>Schink</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). <article-title>Anaerobic degradation of xenobiotic isophthalate by the fermenting bacterium Syntrophorhabdus aromaticivorans.</article-title> <source><italic>ISME J.</italic></source> <volume>13</volume> <fpage>1252</fpage>&#x2013;<lpage>1268</lpage>. <pub-id pub-id-type="doi">10.1038/s41396-019-0348-5</pub-id> <pub-id pub-id-type="pmid">30647456</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaster</surname> <given-names>A.-K.</given-names></name> <name><surname>Sobol</surname> <given-names>M. S.</given-names></name></person-group> (<year>2020</year>). <article-title>Microbial single-cell omics: the crux of the matter.</article-title> <source><italic>Appl. Microbiol. Biotechnol.</italic></source> <volume>104</volume> <fpage>8209</fpage>&#x2013;<lpage>8220</lpage>. <pub-id pub-id-type="doi">10.1007/s00253-020-10844-0</pub-id> <pub-id pub-id-type="pmid">32845367</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Katsuyama</surname> <given-names>C.</given-names></name> <name><surname>Nakaoka</surname> <given-names>S.</given-names></name> <name><surname>Takeuchi</surname> <given-names>Y.</given-names></name> <name><surname>Tago</surname> <given-names>K.</given-names></name> <name><surname>Hayatsu</surname> <given-names>M.</given-names></name> <name><surname>Kato</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>Complementary cooperation between two syntrophic bacteria in pesticide degradation.</article-title> <source><italic>J. Theor. Biol.</italic></source> <volume>256</volume> <fpage>644</fpage>&#x2013;<lpage>654</lpage>. <pub-id pub-id-type="doi">10.1016/j.jtbi.2008.10.024</pub-id> <pub-id pub-id-type="pmid">19038271</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keasling</surname> <given-names>J.</given-names></name> <name><surname>Garcia Martin</surname> <given-names>H.</given-names></name> <name><surname>Lee</surname> <given-names>T. S.</given-names></name> <name><surname>Mukhopadhyay</surname> <given-names>A.</given-names></name> <name><surname>Singer</surname> <given-names>S. W.</given-names></name> <name><surname>Sundstrom</surname> <given-names>E.</given-names></name></person-group> (<year>2021</year>). <article-title>Microbial production of advanced biofuels.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>19</volume> <fpage>701</fpage>&#x2013;<lpage>715</lpage>.</citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>Y.</given-names></name> <name><surname>Oh</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Machine-learning insights into nitrate-reducing communities in a full-scale municipal wastewater treatment plant.</article-title> <source><italic>J. Environ. Manage.</italic></source> <volume>300</volume>:<fpage>113795</fpage>. <pub-id pub-id-type="doi">10.1016/j.jenvman.2021.113795</pub-id> <pub-id pub-id-type="pmid">34560468</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Knight</surname> <given-names>R.</given-names></name> <name><surname>Vrbanac</surname> <given-names>A.</given-names></name> <name><surname>Taylor</surname> <given-names>B. C.</given-names></name> <name><surname>Aksenov</surname> <given-names>A.</given-names></name> <name><surname>Callewaert</surname> <given-names>C.</given-names></name> <name><surname>Debelius</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Best practices for analysing microbiomes.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>16</volume> <fpage>410</fpage>&#x2013;<lpage>422</lpage>. <pub-id pub-id-type="doi">10.1038/s41579-018-0029-9</pub-id> <pub-id pub-id-type="pmid">29795328</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lambert</surname> <given-names>B. S.</given-names></name> <name><surname>Groussman</surname> <given-names>R. D.</given-names></name> <name><surname>Schatz</surname> <given-names>M. J.</given-names></name> <name><surname>Coesel</surname> <given-names>S. N.</given-names></name> <name><surname>Durham</surname> <given-names>B. P.</given-names></name> <name><surname>Alverson</surname> <given-names>A. J.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>The dynamic trophic architecture of open-ocean protist communities revealed through machine-guided metatranscriptomics.</article-title> <source><italic>Proc. Natl. Acad. Sci. U S A</italic></source> <volume>119</volume>:<fpage>e2100916119</fpage>. <pub-id pub-id-type="doi">10.1073/pnas.2100916119</pub-id> <pub-id pub-id-type="pmid">35145022</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Larsen</surname> <given-names>P. E.</given-names></name> <name><surname>Field</surname> <given-names>D.</given-names></name> <name><surname>Gilbert</surname> <given-names>J. A.</given-names></name></person-group> (<year>2012</year>). <article-title>Predicting bacterial community assemblages using an artificial neural network approach.</article-title> <source><italic>Nat. Methods</italic></source> <volume>9</volume> <fpage>621</fpage>&#x2013;<lpage>625</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.1975</pub-id> <pub-id pub-id-type="pmid">22504588</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Layton</surname> <given-names>K. K. S.</given-names></name> <name><surname>Bradbury</surname> <given-names>I. R.</given-names></name></person-group> (<year>2021</year>). <article-title>Harnessing the power of multi-omics data for predicting climate change response.</article-title> <source><italic>J. Anim. Ecol.</italic></source> [Epub online ahead of print]. <pub-id pub-id-type="doi">10.1111/1365-2656.13619</pub-id> <pub-id pub-id-type="pmid">34679193</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Ji</surname> <given-names>S.</given-names></name> <name><surname>Chang</surname> <given-names>M.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Gan</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>The ecology of the plastisphere: Microbial composition, function, assembly, and network in the freshwater and seawater ecosystems.</article-title> <source><italic>Water Res.</italic></source> <volume>2021</volume>:<fpage>117428</fpage>. <pub-id pub-id-type="doi">10.1016/j.watres.2021.117428</pub-id> <pub-id pub-id-type="pmid">34303166</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Fantke</surname> <given-names>P.</given-names></name></person-group> (<year>2022</year>). <article-title>Toward harmonizing global pesticide regulations for surface freshwaters in support of protecting human health.</article-title> <source><italic>J. Environ. Manage.</italic></source> <volume>301</volume>:<fpage>113909</fpage>. <pub-id pub-id-type="doi">10.1016/j.jenvman.2021.113909</pub-id> <pub-id pub-id-type="pmid">34624580</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lian</surname> <given-names>B.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Zhu</surname> <given-names>L.</given-names></name> <name><surname>Yang</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>Effect of Microbial Weathering on Carbonate Rocks.</article-title> <source><italic>Earth Sci. Front.</italic></source> <volume>15</volume> <fpage>90</fpage>&#x2013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1016/s1872-5791(09)60009-9</pub-id></citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liang</surname> <given-names>Q.</given-names></name> <name><surname>Bible</surname> <given-names>P. W.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Zou</surname> <given-names>B.</given-names></name> <name><surname>Wei</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>DeepMicrobes: taxonomic classification for metagenomics with deep learning.</article-title> <source><italic>NAR Genom. Bioinform.</italic></source> <volume>2</volume>:<fpage>lqaa009</fpage>. <pub-id pub-id-type="doi">10.1093/nargab/lqaa009</pub-id> <pub-id pub-id-type="pmid">33575556</pub-id></citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liao</surname> <given-names>J.</given-names></name> <name><surname>Guo</surname> <given-names>X.</given-names></name> <name><surname>Weller</surname> <given-names>D. L.</given-names></name> <name><surname>Pollak</surname> <given-names>S.</given-names></name> <name><surname>Buckley</surname> <given-names>D. H.</given-names></name> <name><surname>Wiedmann</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Nationwide genomic atlas of soil-dwelling Listeria reveals effects of selection and population ecology on pangenome evolution.</article-title> <source><italic>Nat. Microbiol.</italic></source> <volume>6</volume> <fpage>1021</fpage>&#x2013;<lpage>1030</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-021-00935-7</pub-id> <pub-id pub-id-type="pmid">34267358</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liao</surname> <given-names>K.</given-names></name> <name><surname>Bai</surname> <given-names>Y.</given-names></name> <name><surname>Huo</surname> <given-names>Y.</given-names></name> <name><surname>Jian</surname> <given-names>Z.</given-names></name> <name><surname>Hu</surname> <given-names>W.</given-names></name> <name><surname>Zhao</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Integrating microbial biomass, composition and function to discern the level of anthropogenic activity in a river ecosystem.</article-title> <source><italic>Environ. Int.</italic></source> <volume>116</volume> <fpage>147</fpage>&#x2013;<lpage>155</lpage>. <pub-id pub-id-type="doi">10.1016/j.envint.2018.04.003</pub-id> <pub-id pub-id-type="pmid">29679777</pub-id></citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lopatkin</surname> <given-names>A. J.</given-names></name> <name><surname>Collins</surname> <given-names>J. J.</given-names></name></person-group> (<year>2020</year>). <article-title>Predictive biology: modelling, understanding and harnessing microbial complexity.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>18</volume> <fpage>507</fpage>&#x2013;<lpage>520</lpage>. <pub-id pub-id-type="doi">10.1038/s41579-020-0372-5</pub-id> <pub-id pub-id-type="pmid">32472051</pub-id></citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>X.</given-names></name> <name><surname>Ye</surname> <given-names>X.</given-names></name> <name><surname>Zhou</surname> <given-names>M.</given-names></name> <name><surname>Zhao</surname> <given-names>Y.</given-names></name> <name><surname>Weng</surname> <given-names>H.</given-names></name> <name><surname>Kong</surname> <given-names>H.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>The underappreciated role of agricultural soil nitrogen oxide emissions in ozone pollution regulation in North China.</article-title> <source><italic>Nat. Comm.</italic></source> <volume>12</volume>:<fpage>5021</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-021-25147-9</pub-id> <pub-id pub-id-type="pmid">34408153</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lv</surname> <given-names>M.</given-names></name> <name><surname>Luan</surname> <given-names>X.</given-names></name> <name><surname>Liao</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Liu</surname> <given-names>D.</given-names></name> <name><surname>Zhang</surname> <given-names>G.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Human impacts on polycyclic aromatic hydrocarbon distribution in Chinese intertidal zones.</article-title> <source><italic>Nat. Sustain.</italic></source> <volume>3</volume> <fpage>878</fpage>&#x2013;<lpage>884</lpage>. <pub-id pub-id-type="doi">10.1038/s41893-020-0565-y</pub-id></citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marco</surname> <given-names>D. E.</given-names></name> <name><surname>Abram</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Editorial: Using Genomics, Metagenomics and Other &#x201C;Omics&#x201D; to Assess Valuable Microbial Ecosystem Services and Novel Biotechnological Applications.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>10</volume>:<fpage>151</fpage>. <pub-id pub-id-type="doi">10.3389/fmicb.2019.00151</pub-id> <pub-id pub-id-type="pmid">30809205</pub-id></citation></ref>
<ref id="B69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miao</surname> <given-names>Y.</given-names></name> <name><surname>Johnson</surname> <given-names>N. W.</given-names></name> <name><surname>Phan</surname> <given-names>T.</given-names></name> <name><surname>Heck</surname> <given-names>K.</given-names></name> <name><surname>Gedalanga</surname> <given-names>P. B.</given-names></name> <name><surname>Zheng</surname> <given-names>X.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Monitoring, assessment, and prediction of microbial shifts in coupled catalysis and biodegradation of 1,4-dioxane and co-contaminants.</article-title> <source><italic>Water Res.</italic></source> <volume>173</volume>:<fpage>115540</fpage>. <pub-id pub-id-type="doi">10.1016/j.watres.2020.115540</pub-id> <pub-id pub-id-type="pmid">32018172</pub-id></citation></ref>
<ref id="B70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morimura</surname> <given-names>S.</given-names></name> <name><surname>Zeng</surname> <given-names>X.</given-names></name> <name><surname>Noboru</surname> <given-names>N.</given-names></name> <name><surname>Hosono</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>Changes to the microbial communities within groundwater in response to a large crustal earthquake in Kumamoto, southern Japan.</article-title> <source><italic>J. Hydrol.</italic></source> <volume>581</volume>:<fpage>124341</fpage>. <pub-id pub-id-type="doi">10.1016/j.jhydrol.2019.124341</pub-id></citation></ref>
<ref id="B71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Naumann</surname> <given-names>G.</given-names></name> <name><surname>Cammalleri</surname> <given-names>C.</given-names></name> <name><surname>Mentaschi</surname> <given-names>L.</given-names></name> <name><surname>Feyen</surname> <given-names>L.</given-names></name></person-group> (<year>2021</year>). <article-title>Increased economic drought impacts in Europe with anthropogenic warming.</article-title> <source><italic>Nat. Clim. Change</italic></source> <volume>11</volume> <fpage>485</fpage>&#x2013;<lpage>491</lpage>. <pub-id pub-id-type="doi">10.1038/s41558-021-01044-3</pub-id></citation></ref>
<ref id="B72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oh</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>Machine learning application reveal dynamic interaction of polyphosphate-accumulating organism in full-scale wastewater treatment plant.</article-title> <source><italic>J. Water Proc. Eng.</italic></source> <volume>44</volume>:<fpage>102417</fpage>. <pub-id pub-id-type="doi">10.1016/j.jwpe.2021.102417</pub-id></citation></ref>
<ref id="B73"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ortiz-Bobea</surname> <given-names>A.</given-names></name> <name><surname>Ault</surname> <given-names>T. R.</given-names></name> <name><surname>Carrillo</surname> <given-names>C. M.</given-names></name> <name><surname>Chambers</surname> <given-names>R. G.</given-names></name> <name><surname>Lobell</surname> <given-names>D. B.</given-names></name></person-group> (<year>2021</year>). <article-title>Anthropogenic climate change has slowed global agricultural productivity growth.</article-title> <source><italic>Nat. Clim. Change</italic></source> <volume>11</volume> <fpage>306</fpage>&#x2013;<lpage>312</lpage>. <pub-id pub-id-type="doi">10.1038/s41558-021-01000-1</pub-id></citation></ref>
<ref id="B74"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oudah</surname> <given-names>M.</given-names></name> <name><surname>Henschel</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Taxonomy-aware feature engineering for microbiome classification.</article-title> <source><italic>BMC Bioinform.</italic></source> <volume>19</volume>:<fpage>227</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-018-2205-3</pub-id> <pub-id pub-id-type="pmid">29907097</pub-id></citation></ref>
<ref id="B75"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oyetunde</surname> <given-names>T.</given-names></name> <name><surname>Liu</surname> <given-names>D.</given-names></name> <name><surname>Martin</surname> <given-names>H. G.</given-names></name> <name><surname>Tang</surname> <given-names>Y. J.</given-names></name></person-group> (<year>2019</year>). <article-title>Machine learning framework for assessment of microbial factory performance.</article-title> <source><italic>PLoS One</italic></source> <volume>14</volume>:<fpage>e0210558</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0210558</pub-id> <pub-id pub-id-type="pmid">30645629</pub-id>.</citation></ref>
<ref id="B76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Panke-Buisse</surname> <given-names>K.</given-names></name> <name><surname>Poole</surname> <given-names>A. C.</given-names></name> <name><surname>Goodrich</surname> <given-names>J. K.</given-names></name> <name><surname>Ley</surname> <given-names>R. E.</given-names></name> <name><surname>Kao-Kniffin</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Selection on soil microbiomes reveals reproducible impacts on plant function.</article-title> <source><italic>Isme J.</italic></source> <volume>9</volume>:<fpage>980</fpage>. <pub-id pub-id-type="doi">10.1038/ismej.2014.196</pub-id> <pub-id pub-id-type="pmid">25350154</pub-id></citation></ref>
<ref id="B77"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pulster</surname> <given-names>E. L.</given-names></name> <name><surname>Gracia</surname> <given-names>A.</given-names></name> <name><surname>Armenteros</surname> <given-names>M.</given-names></name> <name><surname>Toro-Farmer</surname> <given-names>G.</given-names></name> <name><surname>Snyder</surname> <given-names>S. M.</given-names></name> <name><surname>Carr</surname> <given-names>B. E.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>A First Comprehensive Baseline of Hydrocarbon Pollution in Gulf of Mexico Fishes.</article-title> <source><italic>Sci. Rep.</italic></source> <volume>10</volume>:<fpage>6437</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-020-62944-6</pub-id> <pub-id pub-id-type="pmid">32296072</pub-id></citation></ref>
<ref id="B78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ramirez</surname> <given-names>K. S.</given-names></name> <name><surname>Knight</surname> <given-names>C. G.</given-names></name> <name><surname>de Hollander</surname> <given-names>M.</given-names></name> <name><surname>Brearley</surname> <given-names>F. Q.</given-names></name> <name><surname>Constantinides</surname> <given-names>B.</given-names></name> <name><surname>Cotton</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Detecting macroecological patterns in bacterial communities across independent studies of global soils.</article-title> <source><italic>Nat. Microbiol.</italic></source> <volume>3</volume> <fpage>189</fpage>&#x2013;<lpage>196</lpage>. <pub-id pub-id-type="doi">10.1038/s41564-017-0062-x</pub-id> <pub-id pub-id-type="pmid">29158606</pub-id></citation></ref>
<ref id="B79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Raza</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>J.</given-names></name> <name><surname>Sadowsky</surname> <given-names>M. J.</given-names></name> <name><surname>Unno</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>Microbial source tracking using metagenomics and other new technologies.</article-title> <source><italic>J. Microbiol.</italic></source> <volume>59</volume> <fpage>259</fpage>&#x2013;<lpage>269</lpage>. <pub-id pub-id-type="doi">10.1007/s12275-021-0668-9</pub-id> <pub-id pub-id-type="pmid">33565053</pub-id></citation></ref>
<ref id="B80"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Santos</surname> <given-names>A.</given-names></name> <name><surname>Barbosa-P&#x00F3;voa</surname> <given-names>A.</given-names></name> <name><surname>Carvalho</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Life cycle assessment in chemical industry &#x2013; a review.</article-title> <source><italic>Curr. Opin. Chem. Eng.</italic></source> <volume>26</volume> <fpage>139</fpage>&#x2013;<lpage>147</lpage>. <pub-id pub-id-type="doi">10.1016/j.coche.2019.09.009</pub-id></citation></ref>
<ref id="B81"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schweitzer</surname> <given-names>H.</given-names></name> <name><surname>Aalto</surname> <given-names>N. J.</given-names></name> <name><surname>Busch</surname> <given-names>W.</given-names></name> <name><surname>Chan</surname> <given-names>D.T.</given-names></name> <name><surname>Chat</surname></name> <name><surname>Chiesa</surname> <given-names>M.</given-names></name> <name><surname>Elvevoll</surname> <given-names>E. O.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Innovating carbon-capture biotechnologies through ecosystem-inspired solutions.</article-title> <source><italic>One Earth</italic></source> <volume>4</volume> <fpage>49</fpage>&#x2013;<lpage>59</lpage>. <pub-id pub-id-type="doi">10.1016/j.oneear.2020.12.006</pub-id></citation></ref>
<ref id="B82"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shah</surname> <given-names>R. M.</given-names></name> <name><surname>Stephenson</surname> <given-names>S.</given-names></name> <name><surname>Crosswell</surname> <given-names>J.</given-names></name> <name><surname>Gorman</surname> <given-names>D.</given-names></name> <name><surname>Hillyer</surname> <given-names>K. E.</given-names></name> <name><surname>Palombo</surname> <given-names>E. A.</given-names></name></person-group> (<year>2022</year>). <article-title>Omics-based ecosurveillance uncovers the influence of estuarine macrophytes on sediment microbial function and metabolic redundancy in a tropical ecosystem.</article-title> <source><italic>Sci. Total Environ.</italic></source> <volume>809</volume>:<fpage>151175</fpage>. <pub-id pub-id-type="doi">10.1016/j.scitotenv.2021.151175</pub-id> <pub-id pub-id-type="pmid">34699819</pub-id></citation></ref>
<ref id="B83"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shaheen</surname> <given-names>M.</given-names></name> <name><surname>Shahbaz</surname> <given-names>M.</given-names></name> <name><surname>ur Rehman</surname> <given-names>Z.</given-names></name> <name><surname>Guergachi</surname> <given-names>A.</given-names></name></person-group> (<year>2011</year>). <article-title>Data mining applications in hydrocarbon exploration.</article-title> <source><italic>Artif. Intell. Rev.</italic></source> <volume>35</volume> <fpage>1</fpage>&#x2013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1007/s10462-010-9180-z</pub-id></citation></ref>
<ref id="B84"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simul Bhuyan</surname> <given-names>M.</given-names></name> <name><surname>Venkatramanan</surname> <given-names>S.</given-names></name> <name><surname>Selvam</surname> <given-names>S.</given-names></name> <name><surname>Szabo</surname> <given-names>S.</given-names></name> <name><surname>Hossain</surname> <given-names>M.</given-names></name> <name><surname>Rashed-Un-Nabi</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Plastics in marine ecosystem: A review of their sources and pollution conduits.</article-title> <source><italic>Reg. Stud. Mar. Sci.</italic></source> <volume>41</volume>:<fpage>101539</fpage>. <pub-id pub-id-type="doi">10.1111/gcb.14572</pub-id> <pub-id pub-id-type="pmid">30663840</pub-id></citation></ref>
<ref id="B85"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sintayehu</surname> <given-names>D. W.</given-names></name></person-group> (<year>2018</year>). <article-title>Impact of climate change on biodiversity and associated key ecosystem services in Africa: a systematic review.</article-title> <source><italic>Ecosyst. Health Sustain.</italic></source> <volume>4</volume> <fpage>225</fpage>&#x2013;<lpage>239</lpage>. <pub-id pub-id-type="doi">10.1080/20964129.2018.1530054</pub-id></citation></ref>
<ref id="B86"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>M. B.</given-names></name> <name><surname>Rocha</surname> <given-names>A. M.</given-names></name> <name><surname>Smillie</surname> <given-names>C. S.</given-names></name> <name><surname>Olesen</surname> <given-names>S. W.</given-names></name> <name><surname>Paradis</surname> <given-names>C.</given-names></name> <name><surname>Wu</surname> <given-names>L.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Natural Bacterial Communities Serve as Quantitative Geochemical Biosensors.</article-title> <source><italic>mBio</italic></source> <volume>6</volume> <fpage>e326</fpage>&#x2013;<lpage>e315</lpage>. <pub-id pub-id-type="doi">10.1128/mBio.00326-15</pub-id> <pub-id pub-id-type="pmid">25968645</pub-id></citation></ref>
<ref id="B87"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sohrabi</surname> <given-names>H.</given-names></name> <name><surname>Hemmati</surname> <given-names>A.</given-names></name> <name><surname>Majidi</surname> <given-names>M. R.</given-names></name> <name><surname>Eyvazi</surname> <given-names>S.</given-names></name> <name><surname>Jahanban-Esfahlan</surname> <given-names>A.</given-names></name> <name><surname>Baradaran</surname> <given-names>B.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Recent advances on portable sensing and biosensing assays applied for detection of main chemical and biological pollutant agents in water samples: A critical review.</article-title> <source><italic>Trends. Anal. Chem.</italic></source> <volume>143</volume>:<fpage>116344</fpage>. <pub-id pub-id-type="doi">10.1016/j.trac.2021.116344</pub-id></citation></ref>
<ref id="B88"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Solis-Reyes</surname> <given-names>S.</given-names></name> <name><surname>Avino</surname> <given-names>M.</given-names></name> <name><surname>Poon</surname> <given-names>A.</given-names></name> <name><surname>Kari</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <article-title>An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.</article-title> <source><italic>PLoS One</italic></source> <volume>13</volume>:<fpage>e0206409</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0206409</pub-id> <pub-id pub-id-type="pmid">30427878</pub-id></citation></ref>
<ref id="B89"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Su</surname> <given-names>L.</given-names></name> <name><surname>Jia</surname> <given-names>W.</given-names></name> <name><surname>Hou</surname> <given-names>C.</given-names></name> <name><surname>Lei</surname> <given-names>Y.</given-names></name></person-group> (<year>2011</year>). <article-title>Microbial biosensors: a review.</article-title> <source><italic>Biosens. Bioelectr.</italic></source> <volume>26</volume> <fpage>1788</fpage>&#x2013;<lpage>1799</lpage>. <pub-id pub-id-type="doi">10.1016/j.bios.2010.09.005</pub-id> <pub-id pub-id-type="pmid">20951023</pub-id></citation></ref>
<ref id="B90"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Szafra&#x0144;ski</surname> <given-names>S. P.</given-names></name> <name><surname>Deng</surname> <given-names>Z.-L.</given-names></name> <name><surname>Tomasch</surname> <given-names>J.</given-names></name> <name><surname>Jarek</surname> <given-names>M.</given-names></name> <name><surname>Bhuju</surname> <given-names>S.</given-names></name> <name><surname>Meisinger</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Functional biomarkers for chronic periodontitis and insights into the roles of Prevotella nigrescens and <italic>Fusobacterium nucleatum</italic>; a metatranscriptome analysis.</article-title> <source><italic>Npj Biofilms and Microbiom.</italic></source> <volume>1</volume>:<fpage>15017</fpage>. <pub-id pub-id-type="doi">10.1038/npjbiofilms.2015.17</pub-id> <pub-id pub-id-type="pmid">28721234</pub-id></citation></ref>
<ref id="B91"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thompson</surname> <given-names>J.</given-names></name> <name><surname>Johansen</surname> <given-names>R.</given-names></name> <name><surname>Dunbar</surname> <given-names>J.</given-names></name> <name><surname>Munsky</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). <article-title>Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition.</article-title> <source><italic>PLoS One</italic></source> <volume>14</volume>:<fpage>e0215502</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0215502</pub-id> <pub-id pub-id-type="pmid">31260460</pub-id>.</citation></ref>
<ref id="B92"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Turan</surname> <given-names>N. B.</given-names></name> <name><surname>Erkan</surname> <given-names>H. S.</given-names></name> <name><surname>Engin</surname> <given-names>G. O.</given-names></name> <name><surname>Bilgili</surname> <given-names>M. S.</given-names></name></person-group> (<year>2019</year>). <article-title>Nanoparticles in the aquatic environment: Usage, properties, transformation and toxicity&#x2014;A review.</article-title> <source><italic>Proc. Safety Environ. Protect.</italic></source> <volume>130</volume> <fpage>238</fpage>&#x2013;<lpage>249</lpage>. <pub-id pub-id-type="doi">10.1016/j.psep.2019.08.014</pub-id></citation></ref>
<ref id="B93"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vardhan</surname> <given-names>K. H.</given-names></name> <name><surname>Kumar</surname> <given-names>P. S.</given-names></name> <name><surname>Panda</surname> <given-names>R. C.</given-names></name></person-group> (<year>2019</year>). <article-title>A review on heavy metal pollution, toxicity and remedial measures: Current trends and future perspectives.</article-title> <source><italic>J. Mol. Liquids</italic></source> <volume>290</volume>:<fpage>111197</fpage>. <pub-id pub-id-type="doi">10.1016/j.molliq.2019.111197</pub-id></citation></ref>
<ref id="B94"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Mao</surname> <given-names>G.</given-names></name> <name><surname>Liao</surname> <given-names>K.</given-names></name> <name><surname>Ben</surname> <given-names>W.</given-names></name> <name><surname>Qiao</surname> <given-names>M.</given-names></name> <name><surname>Bai</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Machine learning approach identifies water sample source based on microbial abundance.</article-title> <source><italic>Water Res.</italic></source> <volume>199</volume>:<fpage>117185</fpage>. <pub-id pub-id-type="doi">10.1016/j.watres.2021.117185</pub-id> <pub-id pub-id-type="pmid">33984588</pub-id></citation></ref>
<ref id="B95"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Hatt</surname> <given-names>J. K.</given-names></name> <name><surname>Tsementzi</surname> <given-names>D.</given-names></name> <name><surname>Rodriguez</surname> <given-names>R. L.</given-names></name> <name><surname>Ruiz-P&#x00E9;rez</surname> <given-names>C. A.</given-names></name> <name><surname>Weigand</surname> <given-names>M. R.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Quantifying the Importance of the Rare Biosphere for Microbial Community Response to Organic Pollutants in a Freshwater Ecosystem.</article-title> <source><italic>Appl. Environ. Microbiol.</italic></source> <volume>83</volume> <fpage>e3321</fpage>&#x2013;<lpage>e3316</lpage>. <pub-id pub-id-type="doi">10.1128/AEM.03321-16</pub-id> <pub-id pub-id-type="pmid">28258138</pub-id></citation></ref>
<ref id="B96"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wheeler</surname> <given-names>N. E.</given-names></name></person-group> (<year>2019</year>). <article-title>Tracing outbreaks with machine learning.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>17</volume> <fpage>269</fpage>&#x2013;<lpage>269</lpage>. <pub-id pub-id-type="doi">10.1038/s41579-019-0153-1</pub-id> <pub-id pub-id-type="pmid">30742026</pub-id></citation></ref>
<ref id="B97"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wirbel</surname> <given-names>J.</given-names></name> <name><surname>Zych</surname> <given-names>K.</given-names></name> <name><surname>Essex</surname> <given-names>M.</given-names></name> <name><surname>Karcher</surname> <given-names>N.</given-names></name> <name><surname>Kartal</surname> <given-names>E.</given-names></name> <name><surname>Salazar</surname> <given-names>G.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox.</article-title> <source><italic>Genom. Biol.</italic></source> <volume>22</volume>:<fpage>93</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-021-02306-1</pub-id> <pub-id pub-id-type="pmid">33785070</pub-id></citation></ref>
<ref id="B98"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>L.</given-names></name> <name><surname>Ning</surname> <given-names>D.</given-names></name> <name><surname>Zhang</surname> <given-names>B.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>P.</given-names></name> <name><surname>Shan</surname> <given-names>X.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Global diversity and biogeography of bacterial communities in wastewater treatment plants.</article-title> <source><italic>Nat. Microbiol.</italic></source> <volume>4</volume> <fpage>1183</fpage>&#x2013;<lpage>1195</lpage>.</citation></ref>
<ref id="B99"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Z.</given-names></name> <name><surname>Malmer</surname> <given-names>D.</given-names></name> <name><surname>Langille</surname> <given-names>M. G. I.</given-names></name> <name><surname>Way</surname> <given-names>S. F.</given-names></name> <name><surname>Knight</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Which is more important for classifying microbial communities: who&#x2019;s there or what they can do?</article-title> <source><italic>ISME J.</italic></source> <volume>8</volume> <fpage>2357</fpage>&#x2013;<lpage>2359</lpage>. <pub-id pub-id-type="doi">10.1038/ismej.2014.157</pub-id> <pub-id pub-id-type="pmid">25171332</pub-id></citation></ref>
<ref id="B100"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>York</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Avoiding the pitfalls in microbiota studies.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>19</volume>:<fpage>2</fpage>. <pub-id pub-id-type="doi">10.1038/s41579-020-00480-w</pub-id> <pub-id pub-id-type="pmid">33154571</pub-id></citation></ref>
<ref id="B101"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>J.</given-names></name> <name><surname>Wen</surname> <given-names>T.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>M.</given-names></name> <name><surname>Penton</surname> <given-names>C. R.</given-names></name> <name><surname>Thomashow</surname> <given-names>L. S.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Predicting disease occurrence with high accuracy based on soil macroecological patterns of Fusarium wilt.</article-title> <source><italic>ISME J.</italic></source> <volume>14</volume> <fpage>2936</fpage>&#x2013;<lpage>2950</lpage>. <pub-id pub-id-type="doi">10.1038/s41396-020-0720-5</pub-id> <pub-id pub-id-type="pmid">32681158</pub-id></citation></ref>
<ref id="B102"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zeller</surname> <given-names>G.</given-names></name> <name><surname>Tap</surname> <given-names>J.</given-names></name> <name><surname>Voigt</surname> <given-names>A. Y.</given-names></name> <name><surname>Sunagawa</surname> <given-names>S.</given-names></name> <name><surname>Kultima</surname> <given-names>J. R.</given-names></name> <name><surname>Costea</surname> <given-names>P. I.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Potential of fecal microbiota for early-stage detection of colorectal cancer.</article-title> <source><italic>Mol. Syst. Biol.</italic></source> <volume>10</volume> <fpage>766</fpage>&#x2013;<lpage>766</lpage>. <pub-id pub-id-type="doi">10.15252/msb.20145645</pub-id> <pub-id pub-id-type="pmid">25432777</pub-id></citation></ref>
<ref id="B103"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>C.</given-names></name> <name><surname>Gao</surname> <given-names>Z.</given-names></name> <name><surname>Shi</surname> <given-names>W.</given-names></name> <name><surname>Li</surname> <given-names>L.</given-names></name> <name><surname>Tian</surname> <given-names>R.</given-names></name> <name><surname>Huang</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>Material conversion, microbial community composition and metabolic functional succession during green soybean hull composting.</article-title> <source><italic>Biores. Technol.</italic></source> <volume>316</volume>:<fpage>123823</fpage>. <pub-id pub-id-type="doi">10.1016/j.biortech.2020.123823</pub-id> <pub-id pub-id-type="pmid">32795866</pub-id></citation></ref>
<ref id="B104"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zijp</surname> <given-names>M.</given-names></name> <name><surname>Mallinson</surname> <given-names>T.</given-names></name> <name><surname>Zwaan</surname> <given-names>J.</given-names></name> <name><surname>Chitu</surname> <given-names>A.</given-names></name> <name><surname>David</surname> <given-names>P.</given-names></name></person-group> (<year>2021</year>). &#x201C;<article-title>Eagle Ford and Bakken Productivity Prediction Using Soil Microbial Fingerprinting and Machine Learning</article-title>,&#x201D; in <source><italic>Paper Presented at the SPE/AAPG/SEG Unconventional Resources Technology Conference</italic></source>, (<publisher-loc>Houston, Texas, USA</publisher-loc>).</citation></ref>
</ref-list>
</back>
</article>