<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="review-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Mol. Biosci.</journal-id>
<journal-title>Frontiers in Molecular Biosciences</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Mol. Biosci.</abbrev-journal-title>
<issn pub-type="epub">2296-889X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">634141</article-id>
<article-id pub-id-type="doi">10.3389/fmolb.2021.634141</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Molecular Biosciences</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Review of Machine Learning Methods for the Prediction and Reconstruction of Metabolic Pathways</article-title>
<alt-title alt-title-type="left-running-head">Shah et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Review of Methods for Prediction of Metabolic Pathways</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Shah</surname>
<given-names>Hayat Ali</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1155147/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Liu</surname>
<given-names>Juan</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/135159/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yang</surname>
<given-names>Zhihui</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1363955/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Feng</surname>
<given-names>Jing</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1363947/overview"/>
</contrib>
</contrib-group>
<aff>Institute of Artificial Intelligence, School of Computer Science, Wuhan University, <addr-line>Wuhan</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/562375/overview">Liang Cheng</ext-link>, Harbin Medical University, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/308912/overview">Zhou Xiong Hui</ext-link>, Huazhong Agricultural University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/981074/overview">Ying Jiang</ext-link>, Heilongjiang University of Chinese Medicine, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Juan Liu, <email>liujuan@whu.edu.cn</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Metabolomics, a section of the journal Frontiers in Molecular Biosciences</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>17</day>
<month>06</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>8</volume>
<elocation-id>634141</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>11</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>01</day>
<month>06</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Shah, Liu, Yang and Feng.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Shah, Liu, Yang and Feng</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Prediction and reconstruction of metabolic pathways play significant roles in many fields such as genetic engineering, metabolic engineering, drug discovery, and are becoming the most active research topics in synthetic biology. With the increase of related data and with the development of machine learning techniques, there have many machine leaning based methods been proposed for prediction or reconstruction of metabolic pathways. Machine learning techniques are showing state-of-the-art performance to handle the rapidly increasing volume of data in synthetic biology. To support researchers in this field, we briefly review the research progress of metabolic pathway reconstruction and prediction based on machine learning. Some challenging issues in the reconstruction of metabolic pathways are also discussed in this&#x20;paper.</p>
</abstract>
<kwd-group>
<kwd>machine learning</kwd>
<kwd>prediction</kwd>
<kwd>metabolic pathway</kwd>
<kwd>enzymes</kwd>
<kwd>biochemical reaction</kwd>
<kwd>substrate</kwd>
<kwd>metabolites</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Metabolic pathways are a series of enzymatic reactions in a cell, where the products of reactions are the substrates for subsequent reactions. The reactants, products, and intermediates of an enzymatic reaction are known as metabolites. There are many metabolic pathways have been identified out and been stored and characterized in several public repositories according to their functions, including KEGG (<xref ref-type="bibr" rid="B68">Ogata et&#x20;al., 1998</xref>; <xref ref-type="bibr" rid="B69">Ogata et&#x20;al., 1999</xref>; <xref ref-type="bibr" rid="B70">Okuda et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B40">Kanehisa et&#x20;al., 2019</xref>), MetaCyc (<xref ref-type="bibr" rid="B46">Karp 2002b</xref>; <xref ref-type="bibr" rid="B13">Caspi 2006</xref>; <xref ref-type="bibr" rid="B12">Caspi et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B11">Caspi et&#x20;al., 2018</xref>), BioCyc (<xref ref-type="bibr" rid="B43">Karp et&#x20;al., 2019</xref>). However, there are still many metabolic pathways remain uncharacterized, because some components of them are not identified (<xref ref-type="bibr" rid="B76">Roche-Lima 2016</xref>). The reconstruction of metabolic pathways aims to refine incomplete pathways caused by the lack of enzymes, reactions or relationships between reactions. Some researchers reconstruct the metabolic pathways of an organism based on reference pathways. That is, mapping the incomplete pathways onto the reference ones to identify the unknown parts. A variety of reference-based approaches have been developed to reconstruct the metabolic pathways, including BlastKOALA (<xref ref-type="bibr" rid="B41">Kanehisa et&#x20;al., 2016</xref>), KAAS (<xref ref-type="bibr" rid="B61">Moriya et&#x20;al., 2007</xref>), GhostKOALA (<xref ref-type="bibr" rid="B41">Kanehisa et&#x20;al., 2016</xref>), and RAST (<xref ref-type="bibr" rid="B4">Aziz et&#x20;al., 2008</xref>). Now that there are many metabolic pathways have been collected and organized in some public databases, such as KEGG (<xref ref-type="bibr" rid="B68">Ogata et&#x20;al., 1998</xref>; <xref ref-type="bibr" rid="B69">Ogata et&#x20;al., 1999</xref>; <xref ref-type="bibr" rid="B70">Okuda et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B40">Kanehisa et&#x20;al., 2019</xref>), MetaCyc (<xref ref-type="bibr" rid="B46">Karp 2002b</xref>; <xref ref-type="bibr" rid="B13">Caspi 2006</xref>; <xref ref-type="bibr" rid="B12">Caspi et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B11">Caspi et&#x20;al., 2018</xref>), BioCyc (<xref ref-type="bibr" rid="B43">Karp et&#x20;al., 2019</xref>), Brenda (<xref ref-type="bibr" rid="B79">Schomburg 2002</xref>; <xref ref-type="bibr" rid="B37">Jeske et&#x20;al., 2019</xref>), Rhea (<xref ref-type="bibr" rid="B59">Lombardot et&#x20;al., 2019</xref>), and EcoCyc (<xref ref-type="bibr" rid="B45">Karp 2002a</xref>), the reference-based methods make use of the pathways in the public databases as references, and map the protein sequences of an organism onto the reference pathways according to sequence homology (<xref ref-type="bibr" rid="B99">Herrg&#xe5;rd et&#x20;al., 2008</xref>) to reconstruct the metabolic pathways of the organism. However, if some enzymes or reactions are also missed in reference pathways, such reference-based methods may reconstruct incorrect metabolic pathways and lead to incorrect elucidation. Furthermore, such kind of methods cannot predict new reactions or enzymes that do not exist in the reference pathways. Other researchers reconstruct metabolic pathways by beginning with predicting gene sequences from genome data using gene markers (<xref ref-type="bibr" rid="B7">Besemer 2001</xref>). The predicted gene sequences are first assigned initial functions by a variety of computational approaches such as clustering, similarity calculation with known sequences, and so on. Then they are &#x201c;attached&#x201d; to pathways by choosing templates from metabolic pathway database which best incorporate all observed functions (<xref ref-type="bibr" rid="B71">Overbeek 2000</xref>; <xref ref-type="bibr" rid="B60">Mascher et&#x20;al., 2019</xref>); then a basic functional model is created and evaluated against known data. Such kind of methods depends on the deduced gene sequence; however, the protein translated from coding sequences may be incorrect due to the problem of frameshift, resulting wrong pathways. For eukaryote, prediction of gene sequences is even more difficult due to the existence of introns.</p>
<p>In order to overcome the shortcomings of above methods, it is necessary to have strong evidence on genome context association, such as gene-gene interactions (<xref ref-type="bibr" rid="B100">Gurkun, 2012</xref>), classification and clustering based on their function and phylogenetic profiling (<xref ref-type="bibr" rid="B82">Sithambranathan et&#x20;al., 2020</xref>). Now that machine learning has outstanding ability in dealing with large and complex data sets and a large amount of data have been obtained through large projects, it is an inevitable trend to apply machine learning to the reconstruction of metabolic pathways. Over the past decade, there have been many researches focusing on the modeling and reconstruction of metabolic pathways. <xref ref-type="bibr" rid="B89">Wang et&#x20;al. (2017)</xref> have surveyed some computational tools for design and reconstruction of metabolic pathways. <xref ref-type="bibr" rid="B16">Cuperlovic-Culf (2018)</xref> has reviewed related work on modeling of metabolic pathways based on machine learning techniques. <xref ref-type="bibr" rid="B48">Kim et&#x20;al. (2020)</xref> have summarized the machine learning applications in systems metabolic engineering. However, there is lack of review on machine learning applications on predicting components in metabolic pathways. In this paper, we briefly review the machine learning approaches for the predictions of metabolic pathways and their components, including enzymes, metabolites, and reactions. This review, together with other reviews, can provide more comprehensive knowledge for machine learning algorithms in the prediction and reconstruction of the metabolic pathways.</p>
<p>The remainder of this paper is organized as follows: <italic>Prediction or Reconstruction of Metabolic Pathways</italic> describes the prediction and reconstruction of the metabolic pathways. <italic>Prediction of Missing Enzymes</italic> presents the prediction of missing enzymes. <italic>Identification of Metabolites</italic> introduces machine learning methods for predicting metabolites, followed by <italic>Prediction of Reactions</italic>, which describes prediction of reactions. <italic>Conclusion</italic> concludes this&#x20;paper.</p>
</sec>
<sec id="s2">
<title>Prediction or Reconstruction of Metabolic Pathways</title>
<p>A metabolic pathway is a linked series of chemical reactions that occur within a cell. These reactions are catalyzed by enzymes, where the product of one enzyme acts as the substrate for the next. The reactants, products, and intermediates of an enzymatic reaction are known as metabolites. In a pathway, the initial chemical (metabolite) is modified by a sequence of enzymatic reactions.</p>
<p>There are three pipelines of computational methods for analyzing metabolic pathways: prediction (<xref ref-type="bibr" rid="B5">Bagheri et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B24">Faust et&#x20;al., 2011</xref>), design or reconstruction (<xref ref-type="bibr" rid="B75">Qi et&#x20;al., 2014</xref>), and optimization (<xref ref-type="bibr" rid="B23">Ebenh&#xf6;h and Heinrich 2001</xref>; <xref ref-type="bibr" rid="B73">Planes and Beasley 2009</xref>; <xref ref-type="bibr" rid="B36">Jeanne et&#x20;al., 2016</xref>). The pipeline of prediction of metabolic pathways is to predict the metabolic pathways that a given molecular belongs to, which can help to understand the metabolic mechanism of the molecular. For example, in drug discovery, predicting the metabolic pathway of a drug compound involving in is very useful for knowing how the drug is absorbed, distributed, metabolized, and excreted. The purpose of the metabolic pathway design or reconstruction is to design or find the routines of enzymatic reactions that convert one metabolite (source) to the others (products). Reconstruction of metabolic pathways is also useful for finding functional modules or building the metabolic network of an unknown organism. In metabolic engineering, design or reconstruction of the metabolic pathways to a specific product can help to modify a microbial strain to enable and strengthen the new pathways for efficient production of biochemical. The optimization of metabolic pathways involves in finding or generating the optimal pathways based on the predetermined criteria, such as maximizing production yield of target products, minimizing the number of reactions, and so on. The optimization of metabolic pathways usually needs to meet some constraints, for example, with specific enzymes and with the highest yield of target products. Therefore, constraint-based methods are usually used, and in most cases additional metabolic flux analysis data is needed for the optimization of pathways, which is out of the scope of this review.</p>
<sec id="s2-1">
<title>Prediction of Metabolic Pathways</title>
<p>Now that the annotated metabolic pathways been organized into different categories according to their functions. For a new or unknown molecular, knowing which or what kind of pathways it belongs to can help to understand its metabolic mechanism, which is very useful for drug discovery. Therefore, the metabolic pathways prediction mentioned in this paper refers to identifying the metabolic pathways that a compound involves in. There have some machine learning methods been applied to building prediction models for pathways. For example, <xref ref-type="bibr" rid="B6">Baranwal et&#x20;al. (2019)</xref> proposed a hybrid framework of random forest (RF) and a graph convolution neural network for predicting the classes of metabolic pathways that a compound belongs to. Their method can only identify metabolic pathway types of compounds rather than the actual metabolic pathways. There remains a gap between predicting the type of metabolic pathways and predicting actual metabolic pathways to which the compound belongs. To fill this gap, <xref ref-type="bibr" rid="B38">Jia et&#x20;al. (2020)</xref> proposed a similarity-based model for predicting the metabolic pathways of given compounds. They regarded every pair of compound and metabolic pathway as a sample, and represented each sample by seven features extracted from seven associations of compounds. And then they built a binary classification model with the RF algorithm to output &#x201c;yes&#x201d; or &#x201c;no&#x201d; for every pair, where &#x201c;yes&#x201d; means the compound belongs to the pathway, and &#x201c;no&#x201d; for not. However, the method is only suitable for known pathways, and it is impossible to predict whether the compounds belong to unknown pathways. Moreover, just predicting metabolic pathways that given compounds belong to is not enough to fully understand their roles in the metabolism, and thus it is necessary to reconstruct or design the metabolic pathways involved by the compounds.</p>
</sec>
</sec>
<sec id="s3">
<title>Reconstruction of Metabolic Pathways</title>
<p>The reconstruction of a metabolic pathway connects metabolites and pairs of biochemical reactions catalyzed by enzymes, marking the routes and connecting source molecules to target molecules. Pathway reconstruction can be either knowledge-driven objective (KDO) or data-driven objective (DDO) (<xref ref-type="bibr" rid="B86">Viswanathan et&#x20;al., 2008</xref>). Since knowledge-driven pathway construction incorporates a large amount of domain knowledge, the development of a detailed pathway knowledge base for particular domains of interest, such as a cell type, disease, or system is needed. Such knowledge base serves as the pathway resources that help to reliably identify and extract the pertinent entities and interactions. For example, Karp and his collaborators developed a pathway software, Pathologic, to reconstruct metabolic pathways using functional annotations onto the MetaCyc collection or reactions of pathways (<xref ref-type="bibr" rid="B44">Karp et&#x20;al., 1999</xref>; <xref ref-type="bibr" rid="B72">Paley and Karp 2002</xref>). However, the development of domain knowledge is a tedious task. Data-driven pathway construction is used to generate relationship information of genes or proteins identified in a specific experiment. Different from KDO, DDO starts from genes or proteins whose relationships are not well understood. In order to identify the relationship of the genes or proteins, reference-based or template-based methods based on mapping a group of gene and protein sequences of an organism to known reference pathways have been commonly adopted (<xref ref-type="bibr" rid="B71">Overbeek 2000</xref>; <xref ref-type="bibr" rid="B99">Herrg&#xe5;rd et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B60">Mascher et&#x20;al., 2019</xref>). However, they generally cannot predict new reactions that do not exist in a reference pathway. Some researchers proposed ab initio methods that do not use reference pathways to reconstruct metabolic pathways. Most of these methods employ probabilistic inference methods such as graphical models and Bayesian networks (<xref ref-type="bibr" rid="B35">Jansen et&#x20;al., 2003</xref>; <xref ref-type="bibr" rid="B27">Friedman 2004</xref>; <xref ref-type="bibr" rid="B91">Werhli et&#x20;al., 2006</xref>; <xref ref-type="bibr" rid="B97">Zhao et&#x20;al., 2012</xref>) or ordinary differential equations (ODEs) (<xref ref-type="bibr" rid="B52">Koza et&#x20;al., 2001</xref>; <xref ref-type="bibr" rid="B78">Schmidt et&#x20;al., 2011</xref>). Ab initio reconstruction methods can predict novel reactions and interactions, but their accuracies tend to be low leading to a lot of false positives. In order to address the limitations of reference-based and ab initio methods, <xref ref-type="bibr" rid="B75">Qi et&#x20;al. (2014)</xref> proposed to combine existing pathway knowledge and a Bayesian probabilistic graphical model together, and thus to improve both the coverage and accuracy of metabolic pathway construction. However, the pathway built through this method may be an incomplete elucidation due to the unknown enzyme genes. Therefore, besides inferring interactions or reactions, predicting the composition of the pathway from a reference database for the organism is necessary for pathway reconstruction.</p>
<sec id="s3-1">
<title>Design of Metabolic Pathways</title>
<p>In metabolic engineering, one usually needs to design or find metabolic pathways to chemicals of interest that meets certain constraints in a strain from living organisms. In order to expand the chemical repertoire for the production of compounds, a major effort is required in the development of novel design tools that target chemical diversity through rapid and predictable protocols. Addressing that goal involves retrosynthesis approaches that explore the chemical biosynthetic space. The basic idea of a retrosynthesis approach is to iteratively break down a target molecule into simpler molecules that can be combined chemically or enzymatically to produce it until all required compounds are either commercially available or present in the microbial strain of choice (<xref ref-type="bibr" rid="B49">Koch et&#x20;al., 2020</xref>). Several researchers have reviewed efforts of retrosynthesis (<xref ref-type="bibr" rid="B74">Planson et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B89">Wang et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B56">Lin et&#x20;al., 2019</xref>). However, the complexity associated with the large combinatorial retrosynthesis design space has often been recognized as the main challenge hindering the approach (<xref ref-type="bibr" rid="B18">Del&#xe9;pine et&#x20;al., 2018</xref>). Pathway pruning methods (<xref ref-type="bibr" rid="B29">Gerlee et&#x20;al., 2009</xref>) or optimization-based (<xref ref-type="bibr" rid="B39">K&#xfc;ken and Nikoloski 2019</xref>; <xref ref-type="bibr" rid="B49">Koch et&#x20;al. 2020</xref>) methods are usually used to explore the chemical biosynthetic space. For example, <xref ref-type="bibr" rid="B15">Connor et&#x20;al. (2017)</xref> proposed a Retrosynthesis approach Based on Molecular Similarity; <xref ref-type="bibr" rid="B18">Del&#xe9;pine et&#x20;al. (2018)</xref> developed an automated open source workflow for retrosynthesis based on generalized reaction rules that perform the retrosynthesis search from chassis to target through an efficient and well-controlled protocol; <xref ref-type="bibr" rid="B49">Koch et&#x20;al. (2020)</xref> proposed to explore the bioretrosynthesis space using the Monte Carlo Tree Search reinforcement learning method, guided by chemical similarity. However, the integration of both metabolic engineers&#x2019; expertise and years of lessons from the industry is not enough when performing pathway searching and ranking, resulting that the designed pathway may be far from the optimal.</p>
</sec>
<sec id="s3-2">
<title>Issues Need to Be Addressed</title>
<p>In order for the reconstruction of metabolic pathways, <italic>de novo</italic> reaction prediction is still a significant challenge. Though some methods can learn the enzymatic reaction likeness to predict whether a compound-compound pair is possible converted by an enzymatic reaction, and even can find hidden reactions among many compounds at a time, they are insufficient to predict a multistep metabolic pathway correctly.</p>
<p>In order to construct the metabolic pathways, more efforts should be paid for the difficulties of distinguishing unidentified parts of the pathways and structuring pathways for desired products. In particular, the extraction of useful information from metabolomics is necessary to structure the pathways. Moreover, the computational algorithms should consider the case that an enzyme connects with at least two substrates at the same time to increase the yield of production. Though the graph-based approach can be used to analyze flux-balanced pathways in the metabolic network (<xref ref-type="bibr" rid="B3">Arabzadeh et&#x20;al., 2018</xref>), it usually needs extra post-processing steps to adjust co-metabolites of the predicted pathway that could be unbalanced. In addition, the prediction of catalytic activities of enzymes has become one of the hot research topics.</p>
</sec>
</sec>
<sec id="s4">
<title>Prediction of Missing Enzymes</title>
<sec id="s4-1">
<title>Description of the Problem</title>
<p>An enzyme is a protein catalyst that acts on substrates and converts them into molecules known as products. If a particular function is not assigned to a protein, any reaction catalyzed by that protein will be referred to as a missing enzyme or pathway hole (<xref ref-type="bibr" rid="B31">Green and Karp, 2004</xref>). The missing enzymes make it difficult to understand the behaviors of them in the metabolic pathways. The comprehensive and accurate reconstruction of the metabolic pathways in an organism includes the identification of the missing enzymes catalyzing the reactions of the pathways. Basically, identification of missing enzymes contains two steps: selecting candidates and evaluating candidates. The selection of candidates is to find a set of proteins or encoding genes that may catalyze the specific reaction based on some strategies, such as calculating similarities, finding correlations, and so on; and the evaluation of the candidates is to identify the missing enzyme catalyzing the reaction from the candidates to fill in the pathway&#x20;hole.</p>
</sec>
<sec id="s4-2">
<title>Identification of Candidates of Missing Enzymes</title>
<p>Traditional computational efforts to identify missing enzymes in metabolic pathways have focused on finding candidate enzymes based on sequence homology (<xref ref-type="bibr" rid="B31">Green and Karp, 2004</xref>). That is, calculating the similarity of a sequence from the organism of interest to sequences that catalyze the same reaction of other organisms with known metabolic pathways. However, such sequence homology methods fail to identify enzymes encoded by genes with poor sequence homology to known metabolic enzymes. To solve the problem, <xref ref-type="bibr" rid="B31">Green and Karp (2004)</xref> developed a method that efficiently combined homology and pathway-based evidence to identify candidates; <xref ref-type="bibr" rid="B93">Yamanishi et&#x20;al. (2007)</xref> used supervised network inference to select enzyme encoding gene candidates based on the estimation of the functional association between the genes with respect to chromosomal proximity and evolutionary association; <xref ref-type="bibr" rid="B47">Kharchenko et&#x20;al. (2006)</xref> showed that a number of different types of functional association evidence, including phylogenetic profile co-occurrence, physical clustering of genes on the chromosome and protein interaction data can be used to identify metabolic enzyme encoding genes, and presented two kinds of integration methods, that is, direct likelihood-ratio (DLR) method and alternating decision trees (ADT) built by Adaboost. Since such kind of methods is based on the generally accepted biological hypothesis to build the models, the obtained candidates can more likely fill the pathway hole. However, complicated strategies are usually needed to integrate knowledge into the models.</p>
<p>Now that a huge amount of data from multiple omics, such as transcriptomics, metabonomics, have been accumulated and there are many feature extracting methods (<xref ref-type="bibr" rid="B34">Iqbal et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B57">Liu et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B20">Du et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B58">Liu et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B28">Gao and Wu 2018</xref>; <xref ref-type="bibr" rid="B88">Wang et&#x20;al., 2020</xref>), some researchers regarded the identification of enzyme candidates as the catalytic and non-catalytic classification problem and built models to classify protein sequences or encoding genes into either catalytic or non-catalytic by using machine learning algorithms such as support vector machine (SVM), <italic>K</italic>-nearest neighbors (KNN), Bayesian, and RF (<xref ref-type="bibr" rid="B84">Teng et&#x20;al., 2010</xref>; <xref ref-type="bibr" rid="B32">Halperin et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B25">Ferrari and Mitchell 2014</xref>; <xref ref-type="bibr" rid="B64">Nagao et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B2">Amidi et al., 2017</xref>). The workflow for classifying protein sequences as catalytic and non-catalytic protein sequences is illustrated in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>. The idea of such kind of methods is very simple. However, large amounts of positive (enzyme) and negative (non-enzyme) should be collected to build the models. Moreover, the predicted results can only answer whether the proteins have catalytic function, but not whether they may catalyze specific reactions.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Classification of catalytic and non-catalytic protein sequences.</p>
</caption>
<graphic xlink:href="fmolb-08-634141-g001.tif"/>
</fig>
</sec>
<sec id="s4-3">
<title>Evaluation of Candidates</title>
<p>The purpose of evaluating candidates is to select the missing enzymes catalyzing the specific reactions from the candidates, and there have many approaches been proposed for the evaluation. For example, <xref ref-type="bibr" rid="B31">Green and Karp (2004)</xref> proposed Bayesian method to prioritize candidates according to the information on whether the candidate gene is located adjacent to, or in the same transcriptional unit as known enzyme-encoding genes of related metabolic function. <xref ref-type="bibr" rid="B93">Yamanishi et&#x20;al. (2007)</xref> made the prediction of the encoding genes of missing enzymes based on the scores of the candidates and the chemical reaction information encoded in the EC number. The chemical information, including substrates, products, and chemical reactions, can be achieved from their EC numbers, using the KEGG database (<xref ref-type="bibr" rid="B70">Okuda et&#x20;al., 2008</xref>). After the encoding genes are indicated, the functional association between genes concerning evolutionary associations and phylogenetic profiling (<xref ref-type="bibr" rid="B77">Rosetta and Method 2008</xref>; <xref ref-type="bibr" rid="B67">Nives and Dessimoz 2015</xref>; <xref ref-type="bibr" rid="B95">Zalguizuri et&#x20;al., 2019</xref>) can be estimated and the missing enzyme can be deduced. An example of the phylogenetic profiling for filling the pathway holes is illustrated in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>. <xref ref-type="bibr" rid="B21">Dug&#xe9; de Bernonville et&#x20;al. (2020)</xref> proposed several prioritization strategies, that is, by homology-based screening, by searching physical gene clusters, by random mutagenesis and by gene co-expression analysis. For the gene clustering or co-expression analysis, some algorithms have been presented to clustering gene sequences into different functional groups (<xref ref-type="bibr" rid="B96">Zhang et&#x20;al., 2002</xref>; <xref ref-type="bibr" rid="B98">Zhong et&#x20;al., 2005</xref>; <xref ref-type="bibr" rid="B10">Bustamam et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B80">Sharma and Ali 2017</xref>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Schematic illustration of ML-based algorithms.</p>
</caption>
<graphic xlink:href="fmolb-08-634141-g002.tif"/>
</fig>
<p>The problem of evaluating whether the candidate enzyme catalyzes a specific can also be regarded as the problem of predicting the interaction of substrate-enzyme-product. <xref ref-type="bibr" rid="B14">Chen et&#x20;al. (2010)</xref> developed a KNN model for predicting substrate-enzyme-product triads. In order to measure the nearness between two triads, they defined a novel metric to weigh similarities between substrates, products, and enzymes that were calculated separately. By using their constructed benchmark date set, they got overall accuracy of 95.41%. <xref ref-type="bibr" rid="B66">Niu et&#x20;al. (2013)</xref> also proposed KNN based model combining with mRMR-IFS (Minimum Redundancy Maximum Relevance, Incremental Feature Selection) feature selection method to predict substrate-enzyme-product triads. In order to represent each triad, they encoded substrate/product and enzyme molecules with molecular descriptors and physicochemical properties, respectively, and obtained 290 features; and then they selected 160 features that can be clustered into the ten categories. Testing on the data set that they generated based on KEGG, the model achieved the accuracy of 89.1%. Because these methods directly predict the triads, they can be used not only to predict the missing enzymes catalyzing specific reactions, but also to predict the reactions or metabolites. However, large number of labeled data is needed to promise their good performance.</p>
</sec>
</sec>
<sec id="s5">
<title>Identification of Metabolites</title>
<sec id="s5-1">
<title>Description of the Problem</title>
<p>The metabolites are small molecules which are used in, or created by the chemical reactions occurring in every cell of living organisms. The reactants, intermediates, and products in a metabolic pathway are all called metabolites. Interpreting biochemical characteristics of the metabolites is an essential part of the metabolomics to extend the knowledge of biological systems. It is also the key to the development of many applications in areas such as biotechnology, biomedicine or pharmaceuticals (<xref ref-type="bibr" rid="B65">Nguyen et&#x20;al., 2019</xref>). The identification of the metabolites remains a challenging task in metabolomics with a huge number of potentially interesting but unknown metabolites. Nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) hyphenated with separation techniques such as liquid chromatography (LC), gas chromatography (GC) and capillary electrophoresis (CE) are the most frequently used techniques to collect large amounts of data on complex biological mixtures or matrices (<xref ref-type="bibr" rid="B87">Wachsmuth et&#x20;al., 2013</xref>). They typically yield complicated spectra or feature-rich chromatograms containing thousands of unknown or unidentified peaks. NMR has the disadvantage that it requires abundant and pure samples, yielding low sensitivity. By contrast, MS is more sensitive and specific, requiring fewer amount of samples (<xref ref-type="bibr" rid="B65">Nguyen et&#x20;al., 2019</xref>). Therefore, most methods for identifying metabolites are based on the MS (<xref ref-type="bibr" rid="B94">Yi et&#x20;al., 2018</xref>). The identification of small molecules from MS data remains a major challenge.</p>
</sec>
<sec id="s5-2">
<title>Identification of Metabolites</title>
<p>A traditional approach to identifying metabolites is to compare a query MS or MS/MS spectrum of an unknown compound against a database, such as METLIN (<xref ref-type="bibr" rid="B83">Smith et&#x20;al., 2005</xref>), of a number of reference MS or MS/MS spectra. The candidate molecules from the database are ranked based on the similarity of their spectra and the query spectrum and the best matching candidates are returned. Though such methods are reliable, they are only helpful for those unknown metabolites that have reference spectra in the database (<xref ref-type="bibr" rid="B33">Hufsky et&#x20;al., 2014</xref>). Unfortunately, the reference database is often incomplete in reality, leading to unreliable matching results if the reference spectrum of the targeted compound is not contained in the database (<xref ref-type="bibr" rid="B65">Nguyen et&#x20;al., 2019</xref>). To alleviate above problem, a lot of machine learning based approaches have been proposed to predict metabolites <italic>via</italic> learning the spectra patterns of the known compounds. For example, <xref ref-type="bibr" rid="B42">Kangas et&#x20;al. (2012)</xref> developed an algorithm based on Monte Carlo simulations for identifying metabolites. The algorithm has two phases, illustrated in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>. In the first phase, it predicts bond cleavage energies from which cleavage rates can be calculated based on the ANN (Artificial Neural Network). In the second phase, it generates in&#x20;silico tandem mass spectra from molecular structures and uses these spectra for the identification. There are roughly two schemas for machine learning methods (<xref ref-type="bibr" rid="B65">Nguyen et&#x20;al., 2019</xref>). Some methods rely on predicting molecular fingerprints from MS/MS data and finding the most similar fingerprint from the molecular structure database (<xref ref-type="bibr" rid="B22">D&#xfc;hrkop et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B9">Brouard et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B8">Brouard et&#x20;al., 2019</xref>). And the other methods call for predicting MS/MS spectra for a set of candidate molecular structures and choosing the most similar predicted MS/MS spectrum to the observed MS/MS spectrum (<xref ref-type="bibr" rid="B1">Allen et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B81">Shen et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B19">Djoumbou-Feunang et&#x20;al., 2019</xref>). Those approaches have achieved good identification performance. However, they are highly sensitive and generally cannot model non-linear relationship. It is known that deep learning architecture can be used to build internal representation of large non-linear data, which may lead to superior predictive performance compared to traditional machine learning algorithms. For instance, graph convolution neural network can be directly used to process the graph structure of small molecules, where nodes represent the atoms and edges stand for the bonds between atoms. Moreover, different variants of graph convolution neural network, such as spatial graph convolution networks and spectral graph convolution networks, can be used to optimize the predictive performance.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Prediction of metabolites using ML techniques.</p>
</caption>
<graphic xlink:href="fmolb-08-634141-g003.tif"/>
</fig>
</sec>
</sec>
<sec id="s6">
<title>Prediction of Reactions</title>
<sec id="s6-1">
<title>Description of the Problem</title>
<p>With the great developments in metabolomics and synthetic biology, on one hand a large amount of data related on metabolic pathways has been generated and been organized in several databases, such as KEGG (<xref ref-type="bibr" rid="B70">Okuda et&#x20;al., 2008</xref>), BioCyc (<xref ref-type="bibr" rid="B43">Karp et&#x20;al., 2019</xref>), and MetaCyc (<xref ref-type="bibr" rid="B45">Karp 2002a</xref>; <xref ref-type="bibr" rid="B13">Caspi 2006</xref>). On the other hand, it is assumed that a large number of metabolic pathways remain unknown, and many reactions are still missing even in known pathways. What&#x2019;s more, there is an increasing number of compounds that are known to be present in living organisms but whose synthetic/degradation pathways are unknown. The missing of one or more reactions may result that the pathways from an initial compound to the desired target in an organism are incomplete. Therefore, it is necessary identify such missing reactions during the reconstruction of metabolic pathways. In the field of biosynthesis, finding the potential connection betweeen two known pathways by introducing a novel reaction may lead to a new pathway to the desired product.</p>
</sec>
<sec id="s6-2">
<title>Prediction of Reactions</title>
<p>Reaction prediction remains a challenging task for investigating metabolic pathways due to resonance structure and specific products that can be redundant and problematic. However, recent machine learning developments have alleviated this problem, resulting in additional performance (<xref ref-type="bibr" rid="B16">Cuperlovic-Culf, 2018</xref>). According to whether compounds or pairs of compounds are used in modeling, there are two kinds of roadmaps for reaction prediction: focusing compounds (<xref ref-type="bibr" rid="B50">Kotera et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B90">Wei et&#x20;al., 2016</xref>) and focusing compound pairs (<xref ref-type="bibr" rid="B63">Mu et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B51">Kotera et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B26">Fooshee et&#x20;al., 2018</xref>).</p>
<p>The compound-focused methods identify products or precursors for given compounds and then generate the plausible reactions. For example, <xref ref-type="bibr" rid="B50">Kotera et&#x20;al. (2008)</xref> presented a substructure-based approach to identify possible products and/or precursors for a given compound and to generate a plausible reaction. By using the RF methods, they searched compounds that were structurally related to the target compound, and the structural differences were then checked to determine which of these has the potential to be a product (or precursor) of the target compound in an enzyme-catalyzed reaction. <xref ref-type="bibr" rid="B90">Wei et&#x20;al. (2016)</xref> followed the similar roadmap. Given a set of reagents and reactants, they first built a neural network to predict the reaction type based on a reaction fingerprinting method, and then they used SMARTS (SMiles ARbitrary Target Specification) transformation to predict the likely product from reactants. The neural network workflow starts with reactant and reagent molecules and enumerates all possible electron sources and sinks within the input molecules, based on the atom and bond descriptors, shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>. The fingerprinting approach is based on a specific pattern of the molecules, searching occurs all around the molecular structure to detect the presence and absence of the specific pattern in the molecule. The fingerprints for concatenated reactants and reagents become the input for the neural network to predict possible reaction&#x20;types.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Schematic illustration of a deep neural network method for the prediction of a reaction.</p>
</caption>
<graphic xlink:href="fmolb-08-634141-g004.tif"/>
</fig>
<p>The compound pair-focused methods aim for predicting whether a given compound-compound pair is possibly reactive or not. For instance, <xref ref-type="bibr" rid="B63">Mu et&#x20;al. (2011)</xref> built SVM classifiers to discriminate between functional groups that are reactive and non-reactive. To train the classifiers, they collected positive and negative examples from the KEGG database for each SMARTS-defined substructure, and used atomic properties of atoms in putative reaction centers and molecular properties as features. <xref ref-type="bibr" rid="B51">Kotera et&#x20;al. (2013)</xref> applied a sparsity-induced classifier and SVM to learn whether a compound-compound pair is possibly converted to each other by enzymatic reactions. In order to represent the samples, they defined feature vectors representing the chemical transformation patterns of compound-compound pairs in enzymatic reactions by using chemical fingerprints. Recently, <xref ref-type="bibr" rid="B26">Fooshee et&#x20;al. (2018)</xref> presented a deep learing based reaction prediction method that operated at the level of elementary reactions. Each elementary step involves the movement of electrons from an electron source to an electron sink, and all elementary reactions can be chained together to yield the complex global reaction.</p>
</sec>
</sec>
<sec sec-type="conclusion" id="s7">
<title>Conclusion</title>
<p>The prediction and construction of synthetic metabolic pathways is a significant challenge in bioinformatics. Machine Learning techniques play important roles in constructing and understanding metabolic pathways and their subparts. This mini review provided the outline of the applications of machine learning approaches for prediction and reconstruction of metabolic pathways. Some related issues needed to be addressed were also discussed. Moreover, some machine learning based methods for the identification of missing enzymes, metabolites, or reactions were introduced in this paper. This review complements the existing review work and can provide more comprehensive knowledge for machine learning algorithms in the prediction and reconstruction of the metabolic pathways.</p>
</sec>
</body>
<back>
<sec id="s8">
<title>Author Contributions</title>
<p>JL proposed the ideas, HAS wrote the manuscript, JL, HAS, ZY, and JF discussed the outline of the manuscript, JL and HAS revised the manuscript.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>This work was funded by the National Key R&#x26;D Program of China (No. 2019YFA0904303), the Major Projects of Technological Innovation in Hubei Province (2019AEA170), and the Frontier Projects of Wuhan for Application Foundation (2019010701011381). The National Key R&#x26;D Program of China (No. 2019YFA0904303) pay for the open access publication&#x20;fees.</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Allen</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Pon</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Greiner</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wishart</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>CFM-ID: A Web Server for Annotation, Spectrum Prediction and Metabolite Identification from Tandem Mass Spectra</article-title>. <source>Nucleic Acids Res.</source> <volume>42</volume>, <fpage>W94</fpage>&#x2013;<lpage>W99</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gku436</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amidi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Amidi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Vlachakis</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Paragios</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Zacharaki</surname>
<given-names>E. I.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Automatic Single- and Multi-Label Enzymatic Function Prediction by Machine Learning</article-title>. <source>PeerJ</source> <volume>5</volume> (<issue>3</issue>), <fpage>e3095</fpage>&#x2013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.7717/peerj.3095</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arabzadeh</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Saheb Zamani</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sedighi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Marashi</surname>
<given-names>S.-A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A Graph-Based Approach to Analyze Flux-Balanced Pathways in Metabolic Networks</article-title>. <source>BioSystems</source> <volume>165</volume>, <fpage>40</fpage>&#x2013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1016/j.biosystems.2017.12.001</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aziz</surname>
<given-names>R. K.</given-names>
</name>
<name>
<surname>Bartels</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Best</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>DeJongh</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Disz</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>R. A.</given-names>
</name>
<etal/>
</person-group> (<year>2008</year>). <article-title>The RAST Server: Rapid Annotations Using Subsystems Technology</article-title>. <source>BMC Genomics</source> <volume>9</volume>, <fpage>75</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2164-9-75</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bagheri</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Marashi</surname>
<given-names>S.-A.</given-names>
</name>
<name>
<surname>Amoozegar</surname>
<given-names>M. A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A Genome-Scale Metabolic Network Reconstruction of Extremely Halophilic Bacterium Salinibacter Ruber</article-title>. <source>PLoS One</source> <volume>14</volume> (<issue>5</issue>), <fpage>e0216336</fpage>&#x2013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0216336</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baranwal</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Magner</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Elvati</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Saldinger</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Violi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hero</surname>
<given-names>A. O.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A Deep Learning Architecture for Metabolic Pathway Prediction</article-title>. <source>Bioinformatics</source> <volume>36</volume> (<issue>8</issue>), <fpage>2547</fpage>&#x2013;<lpage>2553</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz954</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Besemer</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>GeneMarkS: a Self-Training Method for Prediction of Gene Starts in Microbial Genomes. Implications for Finding Sequence Motifs in Regulatory Regions</article-title>. <source>Nucleic Acids Res.</source> <volume>29</volume> (<issue>12</issue>), <fpage>2607</fpage>&#x2013;<lpage>2618</lpage>. <pub-id pub-id-type="doi">10.1093/nar/29.12.2607</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brouard</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bass&#xe9;</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>d&#x2019;Alch&#xe9;-Buc</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Rousu</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models</article-title>. <source>Metabolites</source> <volume>9</volume>, <fpage>160</fpage>. <pub-id pub-id-type="doi">10.3390/metabo9080160</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brouard</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>D&#xfc;hrkop</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>d&#x27;Alch&#xe9;-Buc</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>B&#xf6;cker</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Rousu</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Fast Metabolite Identification with Input Output Kernel Regression</article-title>. <source>Bioinformatics</source> <volume>32</volume>, <fpage>i28</fpage>&#x2013;<lpage>i36</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btw246</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bustamam</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tasman</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yuniarti</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Frisca</surname>
<given-names>
</given-names>
</name>
<name>
<surname>Mursidah</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Application of K-Means Clustering Algorithm in Grouping the DNA Sequences of Hepatitis B Virus (HBV)</article-title>. <source>AIP Conf. Proc.</source> <volume>1862</volume>:<fpage>030134</fpage>. <pub-id pub-id-type="doi">10.1063/1.4991238</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Caspi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Billington</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Fulcher</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Keseler</surname>
<given-names>I. M.</given-names>
</name>
<name>
<surname>Kothari</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Krummenacker</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>The MetaCyc Database of Metabolic Pathways and Enzymes</article-title>. <source>Nucleic Acids Res.</source> <volume>46</volume> (<issue>D1</issue>), <fpage>D633</fpage>&#x2013;<lpage>D639</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkx935</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Caspi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Foerster</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Fulcher</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Kaipa</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Krummenacker</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Latendresse</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2008</year>). <article-title>The MetaCyc Database of Metabolic Pathways and Enzymes and the BioCyc Collection of Pathway/genome Databases</article-title>. <source>Nucleic Acids Res.</source> <volume>36</volume>, <fpage>D623</fpage>&#x2013;<lpage>D631</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkm900</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Caspi</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>MetaCyc: a Multiorganism Database of Metabolic Pathways and Enzymes</article-title>. <source>Nucleic Acids Res.</source> <volume>34</volume>, <fpage>D511</fpage>&#x2013;<lpage>D516</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkj128</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>K.-Y.</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>Y.-D.</given-names>
</name>
<name>
<surname>Chou</surname>
<given-names>K.-C.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.-P.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Predicting the Network of Substrate-Enzyme-Product Triads by Combining Compound Similarity and Functional Domain Composition</article-title>. <source>BMC Bioinformatics</source> <volume>11</volume>, <fpage>293</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-293</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Connor</surname>
<given-names>W. C.</given-names>
</name>
<name>
<surname>Rogers</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Green</surname>
<given-names>W. H</given-names>
</name>
</person-group>, (<year>2017</year>). &#x201c;<article-title>Computer-Assisted Retrosynthesis Based on Molecular Similarity</article-title>&#x201d;, <source>ACS Cent. Sci.</source>, <volume>3</volume>, <fpage>1237</fpage>&#x2013;<lpage>1245</lpage>. </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cuperlovic-Culf</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling</article-title>, <source>Metabolites</source>, <volume>8</volume>, <fpage>4</fpage>. <pub-id pub-id-type="doi">10.3390/metabo8010004</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Del&#xe9;pine</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Duigou</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Carbonell</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Faulon</surname>
<given-names>J.-L.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>RetroPath2.0: A Retrosynthesis Workflow for Metabolic Engineers</article-title>. <source>Metab. Eng.</source> <volume>45</volume>, <fpage>158</fpage>&#x2013;<lpage>170</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymben.2017.12.002</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Djoumbou-Feunang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Pon</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Karu</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Arndt</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification</article-title>. <source>Metabolites</source> <volume>9</volume>, <fpage>72</fpage>. <pub-id pub-id-type="doi">10.3390/metabo9040072</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Du</surname>
<given-names>P.-F.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Miao</surname>
<given-names>Y.-Y.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>L.-Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Ultrapse: A Universal and Extensible Software Platform for Representing Biological Sequences</article-title>. <source>Ijms</source> <volume>18</volume>, <fpage>2400</fpage>&#x2013;<lpage>2411</lpage>. <pub-id pub-id-type="doi">10.3390/ijms18112400</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dug&#xe9; de Bernonville</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Papon</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Clastre</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>O&#x27;Connor</surname>
<given-names>S. E.</given-names>
</name>
<name>
<surname>Courdavault</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Identifying Missing Biosynthesis Enzymes of Plant Natural Products</article-title>. <source>Trends Pharmacol. Sci.</source> <volume>41</volume> (<issue>3</issue>), <fpage>142</fpage>&#x2013;<lpage>146</lpage>. <pub-id pub-id-type="doi">10.1016/j.tips.2019.12.006</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>D&#xfc;hrkop</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Meusel</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Rousu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>B&#xf6;cker</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID</article-title>. <source>Proc. Natl. Acad. Sci. USA</source> <volume>112</volume>, <fpage>12580</fpage>&#x2013;<lpage>12585</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1509788112</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ebenh&#xf6;h</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Heinrich</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Evolutionary Optimization of Metabolic Pathways. Theoretical Reconstruction of the Stoichiometry of ATP and NADH Producing Systems</article-title>. <source>Bull. Math. Biol.</source> <volume>63</volume> (<issue>1</issue>), <fpage>21</fpage>&#x2013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1006/bulm.2000.0197</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Faust</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Croes</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>van Helden</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Prediction of Metabolic Pathways from Genome-Scale Metabolic Networks</article-title>. <source>BioSystems</source> <volume>105</volume> (<issue>2</issue>), <fpage>109</fpage>&#x2013;<lpage>121</lpage>. <pub-id pub-id-type="doi">10.1016/j.biosystems.2011.05.004</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ferrari</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Mitchell</surname>
<given-names>J.&#x20;B. O.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>From Sequence to Enzyme Mechanism Using Multi-Label Machine Learning</article-title>. <source>BMC Bioinformatics</source> <volume>15</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-15-150</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fooshee</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Mood</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gutman</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Tavakoli</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Urban</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Deep Learning for Chemical Reaction Prediction</article-title>. <source>Mol. Syst. Des. Eng.</source> <volume>3</volume> (<issue>3</issue>), <fpage>442</fpage>&#x2013;<lpage>452</lpage>. <pub-id pub-id-type="doi">10.1039/c7me00107j</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedman</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Inferring Cellular Networks Using Probabilistic Graphical Models</article-title>. <source>Science</source> <volume>303</volume>, <fpage>799</fpage>&#x2013;<lpage>805</lpage>. <pub-id pub-id-type="doi">10.1126/science.1094068</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>C. F.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X. Y.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Feature Extraction Method for Proteins Based on Markov Tripeptide by Compressive Sensing</article-title>. <source>BMC Bioinformatics</source> <volume>19</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1186/s12859-018-2235-x</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gerlee</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Lizana</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sneppen</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Pathway Identification by Network Pruning in the Metabolic Network of <italic>Escherichia coli</italic>
</article-title>. <source>Bioinformatics</source> <volume>25</volume> (<issue>24</issue>), <fpage>3282</fpage>&#x2013;<lpage>3288</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp575</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Green</surname>
<given-names>M. L.</given-names>
</name>
<name>
<surname>Karp</surname>
<given-names>P. D.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>A Bayesian Method for Identifying Missing Enzymes in Predicted Metabolic Pathway Databases</article-title>. <source>BMC Bioinformatics</source> <volume>5</volume>, <fpage>76</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-5-76</pub-id> </citation>
</ref>
<ref id="B100">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gurkun</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2012</year>). &#x201c;<article-title>Identifying Gene Interaction Networks</article-title>,&#x201d; in <source>Statistical Human Genetics Methods and Protocols</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Robert</surname>
<given-names>C. E.</given-names>
</name>
<name>
<surname>Jaya</surname>
<given-names>M. S.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>S.</given-names>
</name>
</person-group> (<publisher-loc>Totowa, NY</publisher-loc>: <publisher-name>Humana Press</publisher-name>), <fpage>483</fpage>&#x2013;<lpage>494</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-61779-555-8_26</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Halperin</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Glazer</surname>
<given-names>D. S.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>R. B.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>The FEATURE Framework for Protein Function Annotation: Modeling New Functions, Improving Performance, and Extending to Novel Applications</article-title>. <source>BMC Genomics</source> <volume>9</volume> (<issue>2</issue>), <fpage>S2</fpage>&#x2013;<lpage>S14</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2164-9-S2-S2</pub-id> </citation>
</ref>
<ref id="B99">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Herrg&#xe5;rd</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Swainston</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Dobson</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Dunn</surname>
<given-names>W. B.</given-names>
</name>
<name>
<surname>Arga</surname>
<given-names>K. Y.</given-names>
</name>
<name>
<surname>Arvas</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2008</year>). <article-title>A Consensus Yeast Metabolic Network Reconstruction Obtained from a Community Approach to Systems Biology</article-title>. <source>Nat. Biotechnol.</source> <volume>26</volume> (<issue>10</issue>), <fpage>1155</fpage>&#x2013;<lpage>1160</lpage>. <pub-id pub-id-type="doi">10.1038/nbt1492</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hufsky</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Scheubert</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>B&#xf6;cker</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Computational Mass Spectrometry for Small-Molecule Fragmentation</article-title>. <source>Trac Trends Anal. Chem.</source> <volume>53</volume>, <fpage>41</fpage>&#x2013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.1016/j.trac.2013.09.008</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Iqbal</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Faye</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Samir</surname>
<given-names>B. B.</given-names>
</name>
<name>
<surname>Md Said</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics</article-title>. <source>Scientific World J.</source> <volume>2014</volume>, <fpage>1</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1155/2014/173869</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jansen</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Emili</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kluger</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Greenbaum</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Chung</surname>
<given-names>S</given-names>
</name>
<etal/>
</person-group> (<year>2003</year>). <article-title>A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data</article-title>. <source>Science</source> <volume>302</volume>, <fpage>449</fpage>&#x2013;<lpage>453</lpage>. <pub-id pub-id-type="doi">10.1126/science.1087361</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Jeanne</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Tebbani</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Goelzer</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fromion</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Dumur</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Modelling and Optimization of Metabolic Pathways in Bacteria</article-title>,&#x201d; in <conf-name>Int. Conf. Syst. Theory, Control Comput. ICSTCC 2016 - Jt. Conf. SINTES 20</conf-name>, <conf-loc>Sinaia, Romania</conf-loc>, <conf-date>13-15 Oct. 2016</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>312</fpage>&#x2013;<lpage>317</lpage>. <pub-id pub-id-type="doi">10.1109/ICSTCC.2016.7790684</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jeske</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Placzek</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schomburg</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Schomburg</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>BRENDA in 2019: A European ELIXIR Core Data Resource</article-title>. <source>Nucleic Acids Res.</source> <volume>47</volume> (<issue>D1</issue>), <fpage>D542</fpage>&#x2013;<lpage>D549</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gky1048</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jia</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Similarity-Based Machine Learning Model for Predicting the Metabolic Pathways of Compounds</article-title>. <source>IEEE Access</source> <volume>8</volume>, <fpage>130687</fpage>&#x2013;<lpage>130696</lpage>. <pub-id pub-id-type="doi">10.1109/access.2020.3009439</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>K&#xfc;ken</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Nikoloski</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201C;<article-title>Computational Approaches to Design and Test Plant Synthetic Metabolic Pathways</article-title>,&#x201D; <source>Plant Physiol.</source> <volume>179</volume> (<issue>3</issue>), <fpage>894</fpage>&#x2013;<lpage>906</lpage>. <pub-id pub-id-type="doi">10.1104/pp.18.01273</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Furumichi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Morishima</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Tanabe</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>New Approach for Understanding Genome Variations in KEGG</article-title>. <source>Nucleic Acids Res.</source> <volume>47</volume> (<issue>D1</issue>), <fpage>D590</fpage>&#x2013;<lpage>D595</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gky962</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Morishima</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences</article-title>. <source>J.&#x20;Mol. Biol.</source> <volume>428</volume> (<issue>4</issue>), <fpage>726</fpage>&#x2013;<lpage>731</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmb.2015.11.006</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kangas</surname>
<given-names>L. J.</given-names>
</name>
<name>
<surname>Metz</surname>
<given-names>T. O.</given-names>
</name>
<name>
<surname>Isaac</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Schrom</surname>
<given-names>B. T.</given-names>
</name>
<name>
<surname>Ginovska-Pangovska</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>In Silico identification Software (ISIS): A Machine Learning Approach to Tandem Mass Spectral Identification of Lipids</article-title>. <source>Bioinformatics</source> <volume>28</volume> (<issue>13</issue>), <fpage>1705</fpage>&#x2013;<lpage>1713</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts194</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karp</surname>
<given-names>P. D.</given-names>
</name>
<name>
<surname>Billington</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Caspi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Fulcher</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Latendresse</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kothari</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>The BioCyc Collection of Microbial Genomes and Metabolic Pathways</article-title>. <source>Brief. Bioinform.</source> <volume>20</volume> (<issue>4</issue>), <fpage>1085</fpage>&#x2013;<lpage>1093</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbx085</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karp</surname>
<given-names>P. D.</given-names>
</name>
<name>
<surname>Krummenacker</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Paley</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wagg</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>Integrated Pathway-Genome Databases and Their Role in Drug Discovery</article-title>. <source>Trends Biotechnol.</source> <volume>17</volume> (<issue>7</issue>), <fpage>275</fpage>&#x2013;<lpage>281</lpage>. <pub-id pub-id-type="doi">10.1016/s0167-7799(99)01316-5</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karp</surname>
<given-names>P. D.</given-names>
</name>
</person-group> (<year>2002a</year>). <article-title>The EcoCyc Database</article-title>. <source>Nucleic Acids Res.</source> <volume>30</volume> (<issue>1</issue>), <fpage>56</fpage>&#x2013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.1093/nar/30.1.56</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karp</surname>
<given-names>P. D.</given-names>
</name>
</person-group> (<year>2002b</year>). <article-title>The MetaCyc Database</article-title>. <source>Nucleic Acids Res.</source> <volume>30</volume> (<issue>1</issue>), <fpage>59</fpage>&#x2013;<lpage>61</lpage>. <pub-id pub-id-type="doi">10.1093/nar/30.1.59</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kharchenko</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Freund</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Vitkup</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>G. M.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Identifying Metabolic Enzymes with Multiple Types of Association Evidence</article-title>. <source>BMC Bioinformatics</source> <volume>7</volume>, <fpage>177</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-7-177</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>G. B.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>W. J.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>H. U.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S. Y.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Machine Learning Applications in Systems Metabolic Engineering</article-title>. <source>Curr. Opin. Biotechnol.</source> <volume>64</volume>, <fpage>1</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.copbio.2019.08.010</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koch</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Duigou</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Faulon</surname>
<given-names>J.-L.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Reinforcement Learning for Bioretrosynthesis</article-title>&#x201d;, <source>ACS Synth. Biol.</source>, <volume>9</volume>, <fpage>157</fpage>&#x2013;<lpage>168</lpage>. <pub-id pub-id-type="doi">10.1021/acssynbio.9b00447</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kotera</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>McDonald</surname>
<given-names>A. G.</given-names>
</name>
<name>
<surname>Boyce</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Tipton</surname>
<given-names>K. F.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Eliciting Possible Reaction Equations and Metabolic Pathways Involving Orphan Metabolites</article-title>. <source>J.&#x20;Chem. Inf. Model.</source> <volume>48</volume> (<issue>12</issue>), <fpage>2335</fpage>&#x2013;<lpage>2349</lpage>. <pub-id pub-id-type="doi">10.1021/ci800213g</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kotera</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tabei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yamanishi</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tokimatsu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Supervised De Novo Reconstruction of Metabolic Pathways from Metabolome-Scale Compound Sets</article-title>. <source>Bioinformatics</source> <volume>29</volume> (<issue>13</issue>), <fpage>i135</fpage>&#x2013;<lpage>i144</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btt244</pub-id> </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koza</surname>
<given-names>J.&#x20;R.</given-names>
</name>
<name>
<surname>Mydlowec</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Lanza</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Keane</surname>
<given-names>M. A.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Reverse Engineering of Metabolic Pathways from Observed Data Using Genetic Programming</article-title>. <source>Pac. Symp. Biocomput</source> <volume>2001</volume>, <fpage>434</fpage>&#x2013;<lpage>445</lpage>. <pub-id pub-id-type="doi">10.1142/9789814447362_0043</pub-id> </citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>G.-M.</given-names>
</name>
<name>
<surname>Warden-Rothman</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Voigt</surname>
<given-names>C. A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Retrosynthetic Design of Metabolic Pathways to Chemicals Not Found in Nature</article-title>. <source>Curr. Opin. Syst. Biol.</source> <volume>14</volume>, <fpage>82</fpage>&#x2013;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1016/j.coisb.2019.04.004</pub-id> </citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Chou</surname>
<given-names>K.-C.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Pse-in-One: A Web Server for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences</article-title>. <source>Nucleic Acids Res.</source> <volume>43</volume>, <fpage>W65</fpage>&#x2013;<lpage>W71</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv458</pub-id> </citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Chou</surname>
<given-names>K.-C.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Pse-Analysis: a python Package for DNA/RNA and Protein/peptide Sequence Analysis Based on Pseudo Components and Kernel Methods</article-title>. <source>Oncotarget</source> <volume>8</volume> (<issue>8</issue>), <fpage>13338</fpage>&#x2013;<lpage>13343</lpage>. <pub-id pub-id-type="doi">10.18632/oncotarget.14524</pub-id> </citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lombardot</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Morgat</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Axelsen</surname>
<given-names>K. B.</given-names>
</name>
<name>
<surname>Aimo</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hyka-Nouspikel</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Niknejad</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Updates in Rhea: SPARQLing Biochemical Reaction Data</article-title>, <source>Nucleic Acids Res.</source> <volume>47</volume>, <fpage>D596</fpage>&#x2013;<lpage>D600</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gky876</pub-id> </citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mascher</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schreiber</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Scholz</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Graner</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Reif</surname>
<given-names>J.&#x20;C.</given-names>
</name>
<name>
<surname>Stein</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Genebank Genomics Bridges the gap between the Conservation of Crop Diversity and Plant Breeding</article-title>, <source>Nat. Genet.</source> <volume>51</volume>. <fpage>1076</fpage>&#x2013;<lpage>1081</lpage>. <pub-id pub-id-type="doi">10.1038/s41588-019-0443-6</pub-id> </citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moriya</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Itoh</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Okuda</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yoshizawa</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>KAAS: An Automatic Genome Annotation and Pathway Reconstruction Server</article-title>. <source>Nucleic Acids Res.</source> <volume>35</volume>, <fpage>W182</fpage>&#x2013;<lpage>W185</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkm321</pub-id> </citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Unkefer</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Unkefer</surname>
<given-names>P. J.</given-names>
</name>
<name>
<surname>Hlavacek</surname>
<given-names>W. S.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Prediction of Metabolic Reactions Based on Atomic and Molecular Properties of Small-Molecule Compounds</article-title>. <source>Bioinformatics</source> <volume>27</volume> (<issue>11</issue>), <fpage>1537</fpage>&#x2013;<lpage>1545</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btr177</pub-id> </citation>
</ref>
<ref id="B64">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nagao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Nagano</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Mizuguchi</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Prediction of Detailed Enzyme Functions and Identification of Specificity Determining Residues by Random Forests</article-title>. <source>PLoS One</source> <volume>9</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0084623</pub-id> </citation>
</ref>
<ref id="B65">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nguyen</surname>
<given-names>D. H.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>C. H.</given-names>
</name>
<name>
<surname>Mamitsuka</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Recent Advances and Prospects of Computational Methods for Metabolite Identification: A Review with Emphasis on Machine Learning Approaches</article-title>. <source>Brief. Bioinform.</source> <volume>20</volume> (<issue>6</issue>), <fpage>2028</fpage>&#x2013;<lpage>2043</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bby066</pub-id> </citation>
</ref>
<ref id="B66">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Niu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Prediction of Substrate-Enzyme-Product Interaction Based on Molecular Descriptors and Physicochemical Properties</article-title>. <source>Biomed. Res. Int.</source> <volume>2013</volume>, <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1155/2013/674215</pub-id> </citation>
</ref>
<ref id="B67">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nives</surname>
<given-names>&#x160;.</given-names>
</name>
<name>
<surname>Dessimoz</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Phylogenetic Profiling&#x202f;: How Much Input Data Is Enough?</article-title> <source>plos one</source> <volume>10</volume>, <fpage>e0114701</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0114701</pub-id> </citation>
</ref>
<ref id="B68">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ogata</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Fujibuchi</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Computation with the KEGG Pathway Database</article-title>. <source>BioSystems</source> <volume>47</volume> (<issue>1</issue>), <fpage>119</fpage>&#x2013;<lpage>128</lpage>. <pub-id pub-id-type="doi">10.1016/S0303-2647(98)00017-3</pub-id> </citation>
</ref>
<ref id="B69">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ogata</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Fujibuchi</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Bono</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>KEGG: Kyoto Encyclopedia of Genes and Genomes</article-title>. <source>Nucleic Acids Res.</source> <volume>27</volume> (<issue>1</issue>), <fpage>29</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1093/nar/27.1.29</pub-id> </citation>
</ref>
<ref id="B70">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Okuda</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yamada</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Hamajima</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Itoh</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Katayama</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P.</given-names>
</name>
<etal/>
</person-group> (<year>2008</year>). <article-title>KEGG Atlas Mapping for Global Analysis of Metabolic Pathways</article-title>. <source>Nucleic Acids Res.</source> <volume>36</volume>, <fpage>423</fpage>&#x2013;<lpage>426</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkn282</pub-id> </citation>
</ref>
<ref id="B71">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Overbeek</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>WIT: Integrated System for High-Throughput Genome Sequence Analysis and Metabolic Reconstruction</article-title>. <source>Nucleic Acids Res.</source> <volume>28</volume> (<issue>1</issue>), <fpage>123</fpage>&#x2013;<lpage>125</lpage>. <pub-id pub-id-type="doi">10.1093/nar/28.1.123</pub-id> </citation>
</ref>
<ref id="B72">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paley</surname>
<given-names>S. M.</given-names>
</name>
<name>
<surname>Karp</surname>
<given-names>P. D.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Predictions for <italic>Helicobacter pylori</italic>
</article-title>. <source>Bioinformatics</source> <volume>18</volume> (<issue>5</issue>), <fpage>715</fpage>&#x2013;<lpage>724</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/18.5.715</pub-id> </citation>
</ref>
<ref id="B73">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Planes</surname>
<given-names>F. J.</given-names>
</name>
<name>
<surname>Beasley</surname>
<given-names>J.&#x20;E.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>An Optimization Model for Metabolic Pathways</article-title>. <source>Bioinformatics</source> <volume>25</volume> (<issue>20</issue>), <fpage>2723</fpage>&#x2013;<lpage>2729</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp441</pub-id> </citation>
</ref>
<ref id="B74">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Planson</surname>
<given-names>A. G.</given-names>
</name>
<name>
<surname>Carbonell</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Grigoras</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Faulon</surname>
<given-names>J.&#x20;L</given-names>
</name>
</person-group>, (<year>2012</year>). <article-title>A Retrosynthetic Biology Approach to Therapeutics: from conception to Delivery</article-title>. <source>Curr. Opin. Biotechnol.</source> <volume>23</volume>, <fpage>948</fpage>&#x2013;<lpage>956</lpage>. <pub-id pub-id-type="doi">10.1016/j.copbio.2012.03.009</pub-id> </citation>
</ref>
<ref id="B75">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qi</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Reconstruction of Metabolic Pathways by Combining Probabilistic Graphical Model-Based and Knowledge-Based Methods</article-title>. <source>BMC Proc.</source> <volume>8</volume> (<issue>6</issue>), <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1186/1753-6561-8-S6-S5</pub-id> </citation>
</ref>
<ref id="B76">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roche-Lima</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Implementation and Comparison of Kernel-Based Learning Methods to Predict Metabolic Networks</article-title>. <source>Netw. Model. Anal. Heal. Inform. Bioinforma.</source> <volume>5</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1007/s13721-016-0134-5</pub-id> </citation>
</ref>
<ref id="B77">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosetta</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Method</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Chapter 7 the Rosetta Stone Method</article-title>. <source>Methods Mol. Biol.</source> <volume>453</volume>, <fpage>169</fpage>&#x2013;<lpage>180</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-60327-429-610.1007/978-1-60327-429-6_7</pub-id> </citation>
</ref>
<ref id="B78">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schmidt</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Vallabhajosyula</surname>
<given-names>R. R.</given-names>
</name>
<name>
<surname>Jenkins</surname>
<given-names>J.&#x20;W.</given-names>
</name>
<name>
<surname>Hood</surname>
<given-names>J.&#x20;E.</given-names>
</name>
<name>
<surname>Soni</surname>
<given-names>A. S.</given-names>
</name>
<name>
<surname>Wikswo</surname>
<given-names>J.&#x20;P.</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). <article-title>Automated Refinement and Inference of Analytical Models for Metabolic Networks</article-title>. <source>Phys. Biol.</source> <volume>8</volume> (<issue>5</issue>), <fpage>055011</fpage>. <pub-id pub-id-type="doi">10.1088/1478-3975/8/5/055011</pub-id> </citation>
</ref>
<ref id="B79">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schomburg</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>BRENDA, Enzyme Data and Metabolic Information</article-title>. <source>Nucleic Acids Res.</source> <volume>30</volume> (<issue>1</issue>), <fpage>47</fpage>&#x2013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1093/nar/30.1.47</pub-id> </citation>
</ref>
<ref id="B80">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ali</surname>
<given-names>H. H.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Analysis of Clustering Algorithms in Biological Networks</article-title>. <source>Proc. - 2017 IEEE Int. Conf. Bioinforma. Biomed. BIBM</source> <volume>2017</volume>, <fpage>2303</fpage>&#x2013;<lpage>2305</lpage>. <pub-id pub-id-type="doi">10.1109/BIBM.2017.8218036</pub-id> </citation>
</ref>
<ref id="B81">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>D&#xfc;hrkop</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>B&#xf6;cker</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Rousu</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Metabolite Identification through Multiple Kernel Learning on Fragmentation Trees</article-title>. <source>Bioinformatics</source> <volume>30</volume> (<issue>12</issue>), <fpage>157</fpage>&#x2013;<lpage>164</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu275</pub-id> </citation>
</ref>
<ref id="B82">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sithambranathan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kasim</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hassan</surname>
<given-names>M. Z.</given-names>
</name>
<name>
<surname>Syafi</surname>
<given-names>N. A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Clustering of Genes Skin&#x27; S Cancer</article-title>, <source>Intelligence Comput.</source> <volume>1</volume>, <fpage>1</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.18517/ijods.1.1.51-56.2020</pub-id> </citation>
</ref>
<ref id="B83">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Maile</surname>
<given-names>O. G</given-names>
</name>
<name>
<surname>Want</surname>
<given-names>E. J</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Trauge</surname>
<given-names>S. A</given-names>
</name>
<name>
<surname>Brandon</surname>
<given-names>T. R</given-names>
</name>
<etal/>
</person-group> (<year>2005</year>). <article-title>METLIN: A Metabolite Mass Spectral Database</article-title>. <source>Ther. Drug Monit.</source> <volume>27</volume> (<issue>6</issue>), <fpage>747</fpage>&#x2013;<lpage>751</lpage>. <pub-id pub-id-type="doi">10.1097/01.ftd.0000179845.53213.39</pub-id> </citation>
</ref>
<ref id="B84">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Teng</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Srivastava</surname>
<given-names>A. K.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Sequence Feature-Based Prediction of Protein Stability Changes upon Amino Acid Substitutions</article-title>. <source>BMC Genomics</source> <volume>11</volume> (<issue>2</issue>), <fpage>S5</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2164-11-S2-S5</pub-id> </citation>
</ref>
<ref id="B86">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Viswanathan</surname>
<given-names>G. A.</given-names>
</name>
<name>
<surname>Seto</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Patil</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Nudelman</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Sealfon</surname>
<given-names>S. C.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Getting Started in Biological Pathway Construction and Analysis</article-title>. <source>Plos Comput. Biol.</source> <volume>4</volume> (<issue>2</issue>), <fpage>16</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.0040016</pub-id> </citation>
</ref>
<ref id="B87">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wachsmuth</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Vogl</surname>
<given-names>F. C.</given-names>
</name>
<name>
<surname>Oefner</surname>
<given-names>P. J.</given-names>
</name>
<name>
<surname>Dettmer</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Gas Chromatographic Techniques in Metabolomics</article-title>. <source>RSC Chromatogr. Monogr. Chromatogr. Methods Metabolomics</source>. <fpage>87</fpage>&#x2013;<lpage>113</lpage>. <pub-id pub-id-type="doi">10.1039/9781849737272-00087</pub-id> </citation>
</ref>
<ref id="B88">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>P. F.</given-names>
</name>
<name>
<surname>Xue</surname>
<given-names>X. Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G. P.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>Y. K.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>W.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>VisFeature: A Stand-Alone Program for Visualizing and Analyzing Statistical Features of Biological Sequences</article-title>. <source>Bioinformatics</source> <volume>36</volume> (<issue>4</issue>), <fpage>1277</fpage>&#x2013;<lpage>1278</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz689</pub-id> </citation>
</ref>
<ref id="B89">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Dash</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ng</surname>
<given-names>C. Y.</given-names>
</name>
<name>
<surname>Maranas</surname>
<given-names>C. D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A Review of Computational Tools for Design and Reconstruction of Metabolic Pathways</article-title>. <source>Synth. Syst. Biotechnol.</source> <volume>2</volume> (<issue>4</issue>), <fpage>243</fpage>&#x2013;<lpage>252</lpage>. <pub-id pub-id-type="doi">10.1016/j.synbio.2017.11.002</pub-id> </citation>
</ref>
<ref id="B90">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>J.&#x20;N.</given-names>
</name>
<name>
<surname>Duvenaud</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Aspuru-Guzik</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Neural Networks for the Prediction of Organic Chemistry Reactions</article-title>. <source>ACS Cent. Sci.</source> <volume>2</volume> (<issue>10</issue>), <fpage>725</fpage>&#x2013;<lpage>732</lpage>. <pub-id pub-id-type="doi">10.1021/acscentsci.6b00219</pub-id> </citation>
</ref>
<ref id="B91">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Werhli</surname>
<given-names>A. V.</given-names>
</name>
<name>
<surname>Grzegorczyk</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Husmeier</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Comparative Evaluation of Reverse Engineering Gene Regulatory Networks with Relevance Networks, Graphical Gaussian Models and Bayesian Networks</article-title>. <source>Bioinformatics</source> <volume>22</volume>, <fpage>2523</fpage>&#x2013;<lpage>2531</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btl391</pub-id> </citation>
</ref>
<ref id="B93">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yamanishi</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Mihara</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Osaki</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Muramatsu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Esaki</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2007</year>). <article-title>Prediction of Missing Enzyme Genes in a Bacterial Metabolic Network: Reconstruction of the Lysine-Degradation Pathway of <italic>Pseudomonas aeruginosa</italic>
</article-title>. <source>FEBS J.</source> <volume>274</volume> (<issue>9</issue>), <fpage>2262</fpage>&#x2013;<lpage>2273</lpage>. <pub-id pub-id-type="doi">10.1111/j.1742-4658.2007.05763.x</pub-id> </citation>
</ref>
<ref id="B94">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yi</surname>
<given-names>J.&#x20;J.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>W. J.</given-names>
</name>
<name>
<surname>Rhee</surname>
<given-names>J.&#x20;K.</given-names>
</name>
<name>
<surname>Son</surname>
<given-names>W. S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Spectroscopic Methods to Analyze Drug Metabolites</article-title>. <source>Arch. Pharmacal Res.</source> <volume>41</volume> (<issue>4</issue>), <fpage>355</fpage>&#x2013;<lpage>371</lpage>. <pub-id pub-id-type="doi">10.1007/s12272-018-1010-x</pub-id> </citation>
</ref>
<ref id="B95">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zalguizuri</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Caetano-Anoll&#xe9;s</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Lepek</surname>
<given-names>V. C.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Phylogenetic Profiling, an Untapped Resource for the Prediction of Secreted Proteins and its Complementation with Sequence-Based Classifiers in Bacterial Type III, IV and VI Secretion Systems</article-title>. <source>Brief. Bioinform.</source> <volume>20</volume> (<issue>4</issue>), <fpage>1395</fpage>&#x2013;<lpage>1402</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bby009</pub-id> </citation>
</ref>
<ref id="B96">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Tomb</surname>
<given-names>J.-F.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.&#x20;T. L.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Clustering and Classifying Enzymes in Metabolic Pathways&#x202f;: Some Preliminary Results</article-title>. <conf-name>BIOKDD&#x2019;02 Proceedings of the 2nd International Conference on Data Mining in Bioinformatics</conf-name>, <publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <conf-date>23 July 2002</conf-date> <fpage>19</fpage>&#x2013;<lpage>24</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://dl.acm.org/citation.cfm?id=3012339">http://dl.acm.org/citation.cfm?id&#x3d;3012339</ext-link>
</comment>. </citation>
</ref>
<ref id="B97">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>M-H.</given-names>
</name>
<name>
<surname>Pei</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Rowe</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Shin</surname>
<given-names>D. G.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>W.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>A Bayesian Approach to Pathway Analysis by Integrating Gene-Gene Functional Directions and Microarray Data</article-title>. <source>Stat. Biosciences</source> <volume>4</volume>, <fpage>105</fpage>&#x2013;<lpage>131</lpage>. <pub-id pub-id-type="doi">10.1007/s12561-011-9046-1</pub-id> </citation>
</ref>
<ref id="B98">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhong</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Altun</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Harrison</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Tai</surname>
<given-names>P. C.</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Improved K-Means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property</article-title>. <source>IEEE Trans. Nanobioscience</source> <volume>4</volume> (<issue>3</issue>), <fpage>255</fpage>&#x2013;<lpage>265</lpage>. <pub-id pub-id-type="doi">10.1109/TNB.2005.853667</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>