<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1062576</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2022.1062576</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>LABAMPsGCN: A framework for identifying lactic acid bacteria antimicrobial peptides based on graph convolutional neural network</article-title>
<alt-title alt-title-type="left-running-head">Sun et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fgene.2022.1062576">10.3389/fgene.2022.1062576</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Sun</surname>
<given-names>Tong-Jie</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bu</surname>
<given-names>He-Long</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yan</surname>
<given-names>Xin</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sun</surname>
<given-names>Zhi-Hong</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Zha</surname>
<given-names>Mu-Su</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Dong</surname>
<given-names>Gai-Fang</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1190979/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>College of Computer and Information Engineering</institution>, <institution>Inner Mongolia Agricultural University</institution>, <addr-line>Hohhot</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>College of Food Science and Engineering</institution>, <institution>Inner Mongolia Agricultural University</institution>, <addr-line>Hohhot</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/672166/overview">Wenzheng Bao</ext-link>, Xuzhou University of Technology, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1227154/overview">Hao Wu</ext-link>, School of Software, Shandong University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/588629/overview">William W. Guo</ext-link>, Central Queensland University, Australia</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Gai-Fang Dong, <email>donggf@imau.edu.cn</email>; Mu-Su Zha, <email>mszha1988@163.com</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>03</day>
<month>11</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>1062576</elocation-id>
<history>
<date date-type="received">
<day>06</day>
<month>10</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>10</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Sun, Bu, Yan, Sun, Zha and Dong.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Sun, Bu, Yan, Sun, Zha and Dong</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Lactic acid bacteria antimicrobial peptides (LABAMPs) are a class of active polypeptide produced during the metabolic process of lactic acid bacteria, which can inhibit or kill pathogenic bacteria or spoilage bacteria in food. LABAMPs have broad application in important practical fields closely related to human beings, such as food production, efficient agricultural planting, and so on. However, screening for antimicrobial peptides by biological experiment researchers is time-consuming and laborious. Therefore, it is urgent to develop a model to predict LABAMPs. In this work, we design a graph convolutional neural network framework for identifying of LABAMPs. We build heterogeneous graph based on amino acids, tripeptide and their relationships and learn weights of a graph convolutional network (GCN). Our GCN iteratively completes the learning of embedded words and sequence weights in the graph under the supervision of inputting sequence labels. We applied 10-fold cross-validation experiment to two training datasets and acquired accuracy of 0.9163 and 0.9379 respectively. They are higher that of other machine learning and GNN algorithms. In an independent test dataset, accuracy of two datasets is 0.9130 and 0.9291, which are 1.08% and 1.57% higher than the best methods of other online webservers.</p>
</abstract>
<kwd-group>
<kwd>lactic acid bacteria antimicrobial peptides</kwd>
<kwd>word embedding</kwd>
<kwd>tripeptide</kwd>
<kwd>graph convolution neural network</kwd>
<kwd>deep learning</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Lactic acid bacteria (LAB) is a kind of bacteria that can use fermentable carbohydrates to produce large amounts of lactic acid (<xref ref-type="bibr" rid="B17">Gu et al., 2022</xref>; <xref ref-type="bibr" rid="B24">Hu et al., 2022</xref>). Organic acids, special enzymes, lactobacilli and other substances produced by lactic acid bacteria through fermentation have special physiological functions. A large number of research data show that lactic acid bacteria can promote animal growth, regulate the normal flora of gastrointestinal tract, maintain micro ecological balance, thereby improving gastrointestinal function, improving food digestibility and biological titer, reducing serum cholesterol, controlling endotoxin, inhibiting the growth of intestinal putrefactive bacteria, and improving the immunity of the body (<xref ref-type="bibr" rid="B46">Teusink and Molenaar, 2017</xref>). Lactic acid bacteria have been widely used in food industry and poultry husbandry, and also have important academic value in genetic engineering (<xref ref-type="bibr" rid="B16">Greub et al., 2016</xref>), biochemistry (<xref ref-type="bibr" rid="B28">Kadomatsu, 2022</xref>), genetics (<xref ref-type="bibr" rid="B45">Sung Won et al., 2020</xref>) and molecular biology (<xref ref-type="bibr" rid="B39">Saibil, 2022</xref>).</p>
<p>Antimicrobial peptides of lactic acid bacteria are a kind of active peptides or proteins produced by the metabolic process of lactic acid bacteria, which can inhibit or kill pathogenic bacteria or spoilage bacteria in food. In recent years, several new methods have been developed for the screening and development of new antimicrobial peptides, including enzyme-linked immunodeficient assay (<xref ref-type="bibr" rid="B26">Huang X et al., 2022</xref>), biological analysis of K&#x2b; ion current (<xref ref-type="bibr" rid="B32">Lauger and Apell, 1988</xref>), ATP-bioluminescence method (<xref ref-type="bibr" rid="B9">Crouch et al., 1993</xref>; <xref ref-type="bibr" rid="B1">Aiken et al., 2011</xref>), Lux gene-bioluminescence method (<xref ref-type="bibr" rid="B48">Van Dyk et al., 1994</xref>), berberine-based fluorescence analysis method (<xref ref-type="bibr" rid="B34">Liu et al., 1998</xref>; <xref ref-type="bibr" rid="B43">Song et al., 2018</xref>) and micro-plate method (<xref ref-type="bibr" rid="B29">Kai et al., 2012</xref>). Although the above wet experimental methods can distinguish, they are time-consuming and expensive, so they cannot be popularized and used. To help wet lab researchers identify novel antimicrobial peptides, a variety of computational methods for antimicrobial peptide identification have been proposed. Many algorithms combine machine learning or statistical analysis techniques such as discriminant analysis (DA) (<xref ref-type="bibr" rid="B31">Kouw and Loog, 2021</xref>; <xref ref-type="bibr" rid="B4">Beck and Sharon, 2022</xref>), fuzzy K-nearest neighbors (<xref ref-type="bibr" rid="B53">Zhai et al., 2020</xref>), hidden Markov models (<xref ref-type="bibr" rid="B12">Fuentes-Beals et al., 2022</xref>), logistic regression (<xref ref-type="bibr" rid="B10">Fagerland and Hosmer, 2012</xref>), random forests (RF) (<xref ref-type="bibr" rid="B56">Ziegler and Koenig, 2014</xref>), and support vector machines (SVM) (<xref ref-type="bibr" rid="B3">Azar and El-Said, 2014</xref>). Although these models have made great progress in antimicrobial peptide recognition, the following challenges still exist: First, many related classification tasks based on machine learning suffer from the small number of samples. The model trained with small sample size cannot achieve robustness and is prone to the problems of over fitting and poor generalization ability. Secondly, most of the existing feature extraction technologies are aimed at specific datasets, and do not have universality.</p>
<p>In a word, most of the existing machine learning based classification work mainly uses the manually determined features (<xref ref-type="bibr" rid="B27">Jiang et al., 2021</xref>), which is highly dependent on biologists. The artificially determined features also have their shortcomings. On the one hand, the intrinsic nonlinear information of the function of some peptides cannot be obtained through this featured way; On the other hand, when the research object is changed, the adaptability of artificial features is poor. In addition, the dimension disaster caused by feature engineering brings new troubles to researchers.</p>
<p>In the past 10&#xa0;years, deep learning has achieved extremely rapid development. In the field of text processing, achievements in the application of natural language processing to biological information prediction have been published repeatedly. In particular, graph neural network plays an excellent role in text classification (<xref ref-type="bibr" rid="B50">Xie et al., 2022</xref>; <xref ref-type="bibr" rid="B55">Zhou et al., 2022</xref>). Qu (<xref ref-type="bibr" rid="B37">Qu et al., 2017</xref>) proposed a method based on deep learning to identify DNA binding protein sequences. This method uses a two-stage convolutional network to detect the functional domain of protein sequences, and uses LSTM neural networks to identify context dependencies. In the independent test set, the accuracy of the model in the yeast data set is 80%; Hamid and Friedberg (<xref ref-type="bibr" rid="B19">Hamid and Friedberg, 2018</xref>) proposed a method used word embedding and RNN to identify bacteriocin and non bacteriocin sequences. The recall of the model in the two training data sets is 89.8% and 92.1% respectively; Veltri (<xref ref-type="bibr" rid="B49">Veltri et al., 2018</xref>) proposed a deep neural network model, which includes an embedded layer, a convolution layer and a recursive layer. The accuracy of the model in the independent test set is 91.01%; Zeng (<xref ref-type="bibr" rid="B52">Zeng et al., 2019</xref>) proposed to identify protein sequences based on the use of node2vec technology, convolution neural network and sampling technology. In this framework, node2vec technology is used to capture the semantic features and topology of each protein in the protein interaction network, and convolution layer is used to extract information from gene expression profiles. The AUC of the model in the training set is 82%; he (<xref ref-type="bibr" rid="B22">He et al., 2021</xref>) proposed a new Meta learning framework based on mutual information maximization. The core of the framework is ProtoNet, a classical meta learning algorithm based on metric learning, which learns the vector representation of each prototype. The accuracy of this model in the training set of antifungal peptides was 91.3%. The above five deep learning models have improved the performance of AMP prediction to a certain extent, but most of these models used convolutional neural network and LSTM neural network combination framework without significant innovation. Recently, due to the rise of graph neural networks, more and more people began to do some research on graph neural networks. Therefore, our work is based on graph convolution neural network to predict LABAMPs.</p>
<p>In this work, we design a graph convolution neural network framework to predict antimicrobial peptides of lactic acid bacteria. First, we construct a large heterogeneous graph based on all the samples, which contains sequences and peptides (amino acids, dipeptide, tripeptide. We can think of these peptides as words in natural language processing) as nodes. Then connect the nodes by doing that: The edge between two peptide fragments is determined by whether the two peptide fragments appear together in the fixed range (window size) of a sequence. The edge between a peptide fragment and a sequence depends on whether the peptide fragment is a substring of this sequence. Finally, the classification of nodes on the graph is realized through the calculation and transmission of information between nodes on the graph. The experimental results show that our model has great advantages over machine learning methods, deep learning models and other webservers.</p>
</sec>
<sec id="s2">
<title>2 Materials and methods</title>
<sec id="s2-1">
<title>2.1 Collection of datasets</title>
<p>We collected LABAMPs records from 25 databases (<xref ref-type="bibr" rid="B18">Gueguen et al., 2006</xref>; <xref ref-type="bibr" rid="B35">Mulvenna et al., 2006</xref>; <xref ref-type="bibr" rid="B11">Fjell et al., 2007</xref>; <xref ref-type="bibr" rid="B23">Henderson et al., 2007</xref>; <xref ref-type="bibr" rid="B30">Kawashima et al., 2008</xref>; <xref ref-type="bibr" rid="B20">Hammami et al., 2009</xref>; <xref ref-type="bibr" rid="B21">Hammami et al., 2010</xref>; <xref ref-type="bibr" rid="B44">Sundararajan et al., 2012</xref>; <xref ref-type="bibr" rid="B15">Gogoladze et al., 2014</xref>; <xref ref-type="bibr" rid="B47">Theolier et al., 2014</xref>; <xref ref-type="bibr" rid="B36">Pirtskhalava et al., 2021</xref>; <xref ref-type="bibr" rid="B41">Shi et al., 2022</xref>) according to the 30 genus classification of lactic acid bacteria in <xref ref-type="sec" rid="s10">Supplementary Table S1</xref>. Finally, after removing duplicate records, 1622 LABAMPs are obtained, and their lengths are from 2 to 1619.</p>
<p>According to the positive raw data set obtained above, we do some processing on it: First, we remove records which contain unnatural amino acids such as B, J, O, U, X, and Z from these raw data. Second, to reduce sequence homology bias and redundancy, we used respectively the CD-HIT program (<xref ref-type="bibr" rid="B33">Li and Godzik, 2006</xref>) to delete peptides with 70% and 90% similarity to each other. Finally, we get 460 and 636 peptide sequences after removing redundancy, respectively.</p>
<p>Our negative raw datasets obtained as follows:<list list-type="simple">
<list-item>
<p>1 On the UniProt website (<xref ref-type="bibr" rid="B8">Consortium, 2021</xref>), we obtain peptide sequences between the length of 2&#x2013;1619;</p>
</list-item>
<list-item>
<p>2 Remove sequences contain or annotated with information of antimicrobial, antibiotic, fungicide, defensin, AMP, membrane, toxic, secretory, defensive, anticancer, antiviral, antifungal, effector, and exacted;</p>
</list-item>
<list-item>
<p>3 Remove resulting protein sequences which include unnatural amino acids;</p>
</list-item>
<list-item>
<p>4 Remove peptide sequences with 70% and 90% similarity by CD-HIT program;</p>
</list-item>
<list-item>
<p>5 Randomly select the same number of sequences as the number of positive samples.</p>
</list-item>
</list>
</p>
<p>All positive and negative samples are shown as <xref ref-type="table" rid="T1">Table 1</xref>, with processing of 70% and 90% by CD-HIT. We called them DS-70% and DS-90% respectively. The statistics of the preprocessed datasets are summarized in <xref ref-type="table" rid="T2">Table 2</xref>. Since we classify nodes on the graph, the number of graphs is one respectively in DS-70% and DS-90%. The number of sequences is the total number of positive and negative samples of DS-70% or DS-90%. The number of words is obtained by removing stop words and the words whose frequencies are less than 5. The number of nodes is the sum of the number of sequences and the number of words. Because our work has two categories of tasks, the number of categories is two.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Raw data processed through CD-HIT program.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Attribute</th>
<th align="left">Raw data</th>
<th align="left">DS-70%</th>
<th align="left">DS-90%</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">AMPs</td>
<td align="left">1622</td>
<td align="left">460</td>
<td align="left">636</td>
</tr>
<tr>
<td align="left">nonAMPs</td>
<td align="left">1622</td>
<td align="left">460</td>
<td align="left">636</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Summary statistics results of datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Attributes</th>
<th colspan="2" align="left">Datasets</th>
</tr>
<tr>
<th align="left">DS-70%</th>
<th align="left">DS-90%</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Graphs</td>
<td align="left">1</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Sequences</td>
<td align="left">920</td>
<td align="left">1272</td>
</tr>
<tr>
<td align="left">Words</td>
<td align="left">7455</td>
<td align="left">7621</td>
</tr>
<tr>
<td align="left">Nodes</td>
<td align="left">8375</td>
<td align="left">8893</td>
</tr>
<tr>
<td align="left">Classes</td>
<td align="left">2</td>
<td align="left">2</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2-2">
<title>2.2 Model construction</title>
<p>The model construction is divided into three steps: first, establish the initial graph, then conduct the convolution operation on the graph, and finally complete the node classification through the classification function.</p>
<sec id="s2-2-1">
<title>2.2.1 Establishment of initial graph</title>
<p>Before the construction of initial graph, we preprocess all positive and negative samples. First, all positive and negative samples are segmented by amino acid, dipeptide or tripeptide as words. Secondly, count the words frequencies, and filter all the words whose word frequency is less than 5 times. Then, we get the required words.</p>
<p>Suppose the initial input graph is expressed as Graph <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, then the number of <inline-formula id="inf2">
<mml:math id="m2">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is equal to the sum of the number of sequences and the number of peptide segments, and the number of edges depends on the connecting lines between peptide segments and the connecting lines between peptide segments and sequences. As shown in <xref ref-type="fig" rid="F1">Figure 1A</xref>, there are two kinds of edges. One kind of edges are the connecting lines between peptide segments&#x2014;if two peptide segments occur at the same time within the specified range of the same sequence, the corresponding nodes of these two peptide segments will be connected. The other kind of edges are the connection lines between peptide segments and sequences&#x2014;if a peptide segment is a sub string of a sequence, the corresponding nodes will be connected.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Overview of LABAMPsGCN model architecture. <bold>(A)</bold> Graph Construction. Each sequence is processed by word segmentation, and then the required graph is obtained by word co-occurrence technology. <bold>(B)</bold> Graph convolutional neural network. It mainly carries out message transmission through word-sequence relations. <bold>(C)</bold> Classification. It uses the full connection layer for classification.</p>
</caption>
<graphic xlink:href="fgene-13-1062576-g001.tif"/>
</fig>
<p>In order to calculate the information on the graph through the edges, we establish the adjacency matrix <italic>A</italic> of the initial graph, that is, assign a certain weight to each edge. The calculation method is shown in <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>. Where <inline-formula id="inf3">
<mml:math id="m3">
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> represents the total number of sliding windows in all sequences, and its value is a positive integer. <inline-formula id="inf4">
<mml:math id="m4">
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is the number of sliding windows containing peptide segment <inline-formula id="inf5">
<mml:math id="m5">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> in all sequences, and <inline-formula id="inf6">
<mml:math id="m6">
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is the number of sliding windows containing both peptide segment <inline-formula id="inf7">
<mml:math id="m7">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and peptide segment <inline-formula id="inf8">
<mml:math id="m8">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> in all sequences. <inline-formula id="inf9">
<mml:math id="m9">
<mml:mrow>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the number of occurrences of the peptide segment <inline-formula id="inf10">
<mml:math id="m10">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> in sequence <italic>j</italic>, and <italic>N</italic> is the total peptide number of all sequences. <inline-formula id="inf11">
<mml:math id="m11">
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is the total number of all sequences, and <inline-formula id="inf12">
<mml:math id="m12">
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is the number of sequences containing peptide segment <inline-formula id="inf13">
<mml:math id="m13">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. The reason for adding one to the denominator is that when the peptide segment is not in all known sequences, <inline-formula id="inf14">
<mml:math id="m14">
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> will be zero. Therefore, one is added to denominator.<disp-formula id="e1">
<mml:math id="m15">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mtable columnalign="center">
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>log</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi mathvariant="normal">a</mml:mi>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>w</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mi mathvariant="normal">d</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>log</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>w</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mi mathvariant="normal">d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>j</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mtext>&#xa0;</mml:mtext>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">q</mml:mi>
<mml:mi mathvariant="normal">u</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">c</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
</sec>
<sec id="s2-2-2">
<title>2.2.2 Graph convolutional network module</title>
<p>Word embedding is a method converting a word into a vector representation. There are many methods for word embedding, such as one-hot embedding, Skip Gram model (<xref ref-type="bibr" rid="B5">Carrasco and Sicilia, 2018</xref>), CBOW model (<xref ref-type="bibr" rid="B51">Xiong et al., 2019</xref>) and GloVe word vector (<xref ref-type="bibr" rid="B13">Gao and Huang, 2021</xref>). In this module, we first need to determine the node features of the initial graph. We use one-hot embedding to embed each word and send it to the model together with the sequence for training. Because the initial values of node features have little influence on the graph convolution neural network, we set <inline-formula id="inf15">
<mml:math id="m16">
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> as the identity matrix <inline-formula id="inf16">
<mml:math id="m17">
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>Since the diagonal element of the adjacency matrix is 0, it is easy to lose the information of the nodes themselves in the calculation process, so an identity matrix is added to the adjacency matrix. In order to avoid the change of feature distribution, the adjacent matrix with an identity matrix is normalized to obtain the processed adjacent matrix <inline-formula id="inf17">
<mml:math id="m18">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B13">Gao and Huang, 2021</xref>).</p>
<p>We design a graph convolution neural network framework to learn the information between nodes on the graph and transfer the related information under the supervision of labels, and finally achieve node classification. The graph convolution neural network framework of lactobacillus antibacterial peptides can be expressed as <xref ref-type="disp-formula" rid="e2">Eq. 2</xref>.<disp-formula id="e2">
<mml:math id="m19">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>max</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>U</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>X</mml:mi>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>The network learning process under the supervision of sequence labels needs to calculate the loss rate, and we use the cross entropy loss function to calculate the loss (<xref ref-type="bibr" rid="B2">Aurelio et al., 2019</xref>). <xref ref-type="disp-formula" rid="e2">Eq. 2</xref> is a general model of LABAMPsGCN. <xref ref-type="fig" rid="F1">Figure 1B</xref> shows a two-layer LABAMPsGCN. In the following chapters, we analyze that the two-layer LABAMPsGCN has the best performance.</p>
</sec>
<sec id="s2-2-3">
<title>2.2.3 Classification module</title>
<p>We use the full connection layer to integrate the feature space into the sample label space, and then use the <italic>softmax</italic> classification function to calculate the probability of nodes being classified into different categories. As is shown in <xref ref-type="fig" rid="F1">Figure 1C</xref>.</p>
</sec>
</sec>
<sec id="s2-3">
<title>2.3 Evaluation metrics</title>
<p>To assess the performance of LABAMPsGCN, we adopt statistical metric of precision, recall, accuracy and <inline-formula id="inf18">
<mml:math id="m20">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>_</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. They defined as follows:<disp-formula id="e3">
<mml:math id="m21">
<mml:mrow>
<mml:mi>Pr</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
<disp-formula id="e4">
<mml:math id="m22">
<mml:mrow>
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
<disp-formula id="e5">
<mml:math id="m23">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>y</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
<disp-formula id="e6">
<mml:math id="m24">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>_</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>Pr</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>Pr</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi mathvariant="normal">R</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>
<inline-formula id="inf19">
<mml:math id="m25">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf20">
<mml:math id="m26">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf21">
<mml:math id="m27">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf22">
<mml:math id="m28">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> are the four components of the confusion matrix, and also are the abbreviation of true positive, true negative, false positive and false negative, respectively. Precision rate means the proportion of correctly predicted positive to all actually positive samples. Recall rate means the proportion of correctly predicted positive samples to all the samples that should be predicted to be positive samples. Accuracy means the percentage of correct predictions in all samples. <inline-formula id="inf23">
<mml:math id="m29">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>_</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> denotes the harmonic value of precision and recall.</p>
</sec>
<sec id="s2-4">
<title>2.4 Implementation details</title>
<p>The parameters of a model have an important impact on the performance of the model. In our LABAMPsGCN, we set the activation function, window size, first layer convolution size, learning rate and loss rate to ReLU, 15, 200, 0.01, and 0.5 respectively. We used Adam optimizer (<xref ref-type="bibr" rid="B40">Shao et al., 2021</xref>) to train our model for 150 epochs.</p>
</sec>
<sec id="s2-5">
<title>2.5 Development of the webserver.</title>
<p>We constructed a webserver with our prediction model embedded at the back end of website. When users submit their interested LABAMPs, the predicted percentage will be displayed based on the website prediction tool (<xref ref-type="bibr" rid="B42">Sim et al., 2012</xref>). Because the weight matrix of the graph convolution neural network will change with the change of the adjacency matrix and feature matrix of the input data, we embedded SVM model with accuracy lower of 3.77% than that of LABAMPsGCN.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Results</title>
<sec id="s3-1">
<title>3.1 Effects of different feature extraction methods on graph convolutional neural networks</title>
<p>We randomly combined the features of single peptide, dipeptide and tripeptide respectively, and obtained six feature combinations: dipeptide, dipeptide and single peptide, tripeptide, tripeptide and single peptide, tripeptide and dipeptide, tripeptide added by dipeptide and single peptide. <xref ref-type="table" rid="T3">Table 3</xref> shows the model accuracy on the DS-70% and DS-90%.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>The different accuracy of different features on LABAMPsGCN.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Features</th>
<th rowspan="2" align="left">Number of features</th>
<th colspan="2" align="left">Datasets</th>
</tr>
<tr>
<th align="left">DS-70%</th>
<th align="left">DS-90%</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">D<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="left">400</td>
<td align="left">0.8913</td>
<td align="left">0.9088</td>
</tr>
<tr>
<td align="left">D &#x2b; S<xref ref-type="table-fn" rid="Tfn2">
<sup>b</sup>
</xref>
</td>
<td align="left">420</td>
<td align="left">0.8870</td>
<td align="left">0.9010</td>
</tr>
<tr>
<td align="left">T <xref ref-type="table-fn" rid="Tfn3">
<sup>c</sup>
</xref>
</td>
<td align="left">8000</td>
<td align="left">0.9098</td>
<td align="left">0.9340</td>
</tr>
<tr>
<td align="left">T &#x2b; S<xref ref-type="table-fn" rid="Tfn4">
<sup>d</sup>
</xref>
</td>
<td align="left">
<bold>8020</bold>
</td>
<td align="left">
<bold>0.9163</bold>
</td>
<td align="left">
<bold>0.9379</bold>
</td>
</tr>
<tr>
<td align="left">T &#x2b; D<xref ref-type="table-fn" rid="Tfn5">
<sup>e</sup>
</xref>
</td>
<td align="left">8400</td>
<td align="left">0.9076</td>
<td align="left">0.9277</td>
</tr>
<tr>
<td align="left">T &#x2b; S &#x2b; D<xref ref-type="table-fn" rid="Tfn6">
<sup>f</sup>
</xref>
</td>
<td align="left">8420</td>
<td align="left">0.9065</td>
<td align="left">0.9285</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="Tfn1">
<label>
<sup>a</sup>
</label>
<p>D: Dipeptide.</p>
</fn>
<fn id="Tfn2">
<label>
<sup>b</sup>
</label>
<p>D &#x2b; S: Dipeptide &#x2b; Single peptide.</p>
</fn>
<fn id="Tfn3">
<label>
<sup>c</sup>
</label>
<p>T: Tripetide.</p>
</fn>
<fn id="Tfn4">
<label>
<sup>d</sup>
</label>
<p>T &#x2b; S: Tripetide &#x2b; Single peptide.</p>
</fn>
<fn id="Tfn5">
<label>
<sup>e</sup>
</label>
<p>T &#x2b; D: Tripetide &#x2b; Dipetide.</p>
</fn>
<fn id="Tfn6">
<label>
<sup>f</sup>
</label>
<p>T &#x2b; S &#x2b; D: Tripetide &#x2b; Single peptide &#x2b; Dipetide.</p>
</fn>
<fn>
<p>Note: the bold value in table means the best value.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>It can be seen that the features of tripeptide and single peptide are significantly better than other combinations on DS-70% and DS-90%. As the number of features continues to increase, the accuracy (ACC) of the test data is also slowly increasing, and the number of features in DS-70% and DS-90% begins to decline significantly after 8020.</p>
</sec>
<sec id="s3-2">
<title>3.2 Parameter sensitivity</title>
<sec id="s3-2-1">
<title>3.2.1 Window sizes</title>
<p>
<xref ref-type="fig" rid="F2">Figure 2A</xref> reports the accuracy for different sliding window on DS-70% and DS-90% based on features of tripeptide and single peptides. It demonstrates that the influence of the size of the sliding window on the prediction accuracy meets the general rule&#x2014;taking 15 as the dividing point, it rises first and then falls. It is further explained that small windows cannot accommodate sequence fragments that play key functions, while too large windows take some irrelevant information as key information to participate, disturbing the judgment. Therefore, in this paper, window size is set to 15.</p>
</sec>
<sec id="s3-2-2">
<title>3.2.2 Graph convolutional network layer</title>
<p>We designed GCNs with different layers to obtain the features of LABAMPs. <xref ref-type="fig" rid="F2">Figure 2B</xref> indicate the effect of the number of GCN layers on the performance of our model. In this paper, we changed the GCN layer in {1, 2, 3, 4}. It can be seen from <xref ref-type="fig" rid="F2">Figure 2B</xref> that the 2-layer GCN can achieve the optimal performance. Too many GCN layers will cause the model to be over-smoothing, thus causing the learned model to collapse. Although there is no direct sequence-sequence edge connection in the graph, 2 GCNs can be connected through the middle word node, thus realizing sequence to sequence information interaction.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Parameter analysis of LABAMPsGCN. <bold>(A)</bold> Accuracy varied by windows size. <bold>(B)</bold> Accuracy varied by numbers of layers.</p>
</caption>
<graphic xlink:href="fgene-13-1062576-g002.tif"/>
</fig>
<p>If there are too many layers, the features of a node will aggregate the features of more and more neighbors, so that these nodes become similar, which increases the similarity between classes, and the natural classification effect is poor.</p>
</sec>
</sec>
<sec id="s3-3">
<title>3.3 Compare with machine learning methods</title>
<p>In order to verify metric of LABAMPsGCN, we compare machine learning models with it on the same features. In <xref ref-type="table" rid="T4">Table 4</xref> all results are obtained by using 10-fold cross-validation. We used Multinomial Bayesian classifier (MNB), Random forest (RF), Support vector machine (SVM), AdaBoost (<xref ref-type="bibr" rid="B25">Huang H et al., 2022</xref>) and XGBoost (<xref ref-type="bibr" rid="B54">Zhang et al., 2022</xref>). It can be seen that LABAMPsGCN show good performance no matter how features change. This is because LABAMPsGCN can obtain the information of sequence nodes through word nodes.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Comparisons of LABAMPsGCN with machine learning and GNN models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Datasets</th>
<th rowspan="2" align="left">Models</th>
<th colspan="6" align="left">Features</th>
</tr>
<tr>
<th align="left">D<xref ref-type="table-fn" rid="Tfn7">
<sup>a</sup>
</xref>
</th>
<th align="left">D &#x2b; S<xref ref-type="table-fn" rid="Tfn8">
<sup>b</sup>
</xref>
</th>
<th align="left">T<xref ref-type="table-fn" rid="Tfn9">
<sup>c</sup>
</xref>
</th>
<th align="left">T &#x2b; S<xref ref-type="table-fn" rid="Tfn10">
<sup>d</sup>
</xref>
</th>
<th align="left">T &#x2b; D<xref ref-type="table-fn" rid="Tfn11">
<sup>e</sup>
</xref>
</th>
<th align="left">T &#x2b; S &#x2b; D<xref ref-type="table-fn" rid="Tfn12">
<sup>f</sup>
</xref>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">DS-70%</td>
<td align="left">MNB<xref ref-type="table-fn" rid="Tfn13">
<sup>g</sup>
</xref>
</td>
<td align="left">0.8457</td>
<td align="left">0.8283</td>
<td align="left">0.8391</td>
<td align="left">0.8391</td>
<td align="left">0.8391</td>
<td align="left">0.8391</td>
</tr>
<tr>
<td align="left"/>
<td align="left">RF<xref ref-type="table-fn" rid="Tfn14">
<sup>h</sup>
</xref>
</td>
<td align="left">0.8446</td>
<td align="left">0.8576</td>
<td align="left">0.7989</td>
<td align="left">0.7891</td>
<td align="left">0.7957</td>
<td align="left">0.7978</td>
</tr>
<tr>
<td align="left"/>
<td align="left">SVM<xref ref-type="table-fn" rid="Tfn15">
<sup>i</sup>
</xref>
</td>
<td align="left">0.8554</td>
<td align="left">0.8663</td>
<td align="left">0.8402</td>
<td align="left">0.8402</td>
<td align="left">0.8402</td>
<td align="left">0.8402</td>
</tr>
<tr>
<td align="left"/>
<td align="left">AdaBoost</td>
<td align="left">0.7946</td>
<td align="left">0.8196</td>
<td align="left">0.7348</td>
<td align="left">0.7348</td>
<td align="left">0.7348</td>
<td align="left">0.7348</td>
</tr>
<tr>
<td align="left"/>
<td align="left">XGBoost</td>
<td align="left">0.8489</td>
<td align="left">0.8685</td>
<td align="left">0.7793</td>
<td align="left">0.7793</td>
<td align="left">0.7793</td>
<td align="left">0.7793</td>
</tr>
<tr>
<td align="left"/>
<td align="left">GNN<xref ref-type="table-fn" rid="Tfn16">
<sup>j</sup>
</xref>
</td>
<td align="left">0.8596</td>
<td align="left">0.8549</td>
<td align="left">0.8836</td>
<td align="left">0.8916</td>
<td align="left">0.8513</td>
<td align="left">0.8499</td>
</tr>
<tr>
<td align="left"/>
<td align="left">LABAMPsGCN</td>
<td align="left">
<bold>0.8913</bold>
</td>
<td align="left">
<bold>0.8870</bold>
</td>
<td align="left">
<bold>0.9098</bold>
</td>
<td align="left">
<bold>0.9163</bold>
</td>
<td align="left">
<bold>0.9076</bold>
</td>
<td align="left">
<bold>0.9065</bold>
</td>
</tr>
<tr>
<td align="left">DS-90%</td>
<td align="left">MNB</td>
<td align="left">0.8586</td>
<td align="left">0.8461</td>
<td align="left">0.8585</td>
<td align="left">0.8585</td>
<td align="left">0.8585</td>
<td align="left">0.8585</td>
</tr>
<tr>
<td align="left"/>
<td align="left">RF</td>
<td align="left">0.8776</td>
<td align="left">0.8576</td>
<td align="left">0.8383</td>
<td align="left">0.8218</td>
<td align="left">0.8359</td>
<td align="left">0.8281</td>
</tr>
<tr>
<td align="left"/>
<td align="left">SVM</td>
<td align="left">0.8800</td>
<td align="left">0.8842</td>
<td align="left">0.9002</td>
<td align="left">0.9002</td>
<td align="left">0.8988</td>
<td align="left">0.8987</td>
</tr>
<tr>
<td align="left"/>
<td align="left">AdaBoost</td>
<td align="left">0.8334</td>
<td align="left">0.8328</td>
<td align="left">0.7558</td>
<td align="left">0.7558</td>
<td align="left">0.7558</td>
<td align="left">0.7558</td>
</tr>
<tr>
<td align="left"/>
<td align="left">XGBoost</td>
<td align="left">0.8776</td>
<td align="left">0.8791</td>
<td align="left">0.8131</td>
<td align="left">0.8131</td>
<td align="left">0.8131</td>
<td align="left">0.8131</td>
</tr>
<tr>
<td align="left"/>
<td align="left">GNN</td>
<td align="left">0.8810</td>
<td align="left">0.8897</td>
<td align="left">0.9019</td>
<td align="left">0.9146</td>
<td align="left">0.8946</td>
<td align="left">0.8943</td>
</tr>
<tr>
<td align="left"/>
<td align="left">LABAMPsGCN</td>
<td align="left">
<bold>0.9088</bold>
</td>
<td align="left">
<bold>0.9010</bold>
</td>
<td align="left">
<bold>0.9340</bold>
</td>
<td align="left">
<bold>0.9379</bold>
</td>
<td align="left">
<bold>0.9277</bold>
</td>
<td align="left">
<bold>0.9285</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="Tfn7">
<label>
<sup>a</sup>
</label>
<p>D: Dipeptide.</p>
</fn>
<fn id="Tfn8">
<label>
<sup>b</sup>
</label>
<p>D &#x2b; S: Dipeptide &#x2b; Single peptide.</p>
</fn>
<fn id="Tfn9">
<label>
<sup>c</sup>
</label>
<p>T: Tripetide.</p>
</fn>
<fn id="Tfn10">
<label>
<sup>d</sup>
</label>
<p>T &#x2b; S: Tripetide &#x2b; Single peptide.</p>
</fn>
<fn id="Tfn11">
<label>
<sup>e</sup>
</label>
<p>T &#x2b; D: Tripetide &#x2b; Dipetide.</p>
</fn>
<fn id="Tfn12">
<label>
<sup>f</sup>
</label>
<p>T &#x2b; S &#x2b; D: Tripetide &#x2b; Single peptide &#x2b; Dipetide.</p>
</fn>
<fn id="Tfn13">
<label>
<sup>g</sup>
</label>
<p>MNB: Multinomial na&#xef;ve Bayes.</p>
</fn>
<fn id="Tfn14">
<label>
<sup>h</sup>
</label>
<p>RF: Random Forest.</p>
</fn>
<fn id="Tfn15">
<label>
<sup>i</sup>
</label>
<p>SVM: Support Vector Machine.</p>
</fn>
<fn id="Tfn16">
<label>
<sup>j</sup>
</label>
<p>GNN: graph neural network.</p>
</fn>
<fn>
<p>Note: the bold value in table means the best value.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3-4">
<title>3.4 Comparison with existing AMP prediction tools</title>
<p>
<xref ref-type="table" rid="T5">Table 5</xref> compares our LABAMPsGCN model to three state-of-the-art machine learning methods which can be found publicly for AMPs recognition. <xref ref-type="table" rid="T5">Table 5</xref> shows that our LABAMPsGCN model achieves the best values of metrics for Recall, Precision and Accuracy. In DS-70%, the Recall score of AMPfun model (<xref ref-type="bibr" rid="B7">Chung et al., 2020</xref>) is the highest (3.42% higher than our model). In DS-90%, the metrics of our LABAMPsGCN model are significantly better than other methods.</p>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Comparisons of LABAMPsGCN with three state-of-the-art webservers.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Datasets</th>
<th align="left">Tool</th>
<th align="left">R<xref ref-type="table-fn" rid="Tfn17">
<sup>a</sup>
</xref>
</th>
<th align="left">P<xref ref-type="table-fn" rid="Tfn18">
<sup>b</sup>
</xref>
</th>
<th align="left">ACC<xref ref-type="table-fn" rid="Tfn19">
<sup>c</sup>
</xref>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">DS-70%</td>
<td align="left">CAMP-SVM</td>
<td align="left">0.8696</td>
<td align="left">0.8889</td>
<td align="left">0.8804</td>
</tr>
<tr>
<td align="left"/>
<td align="left">iAMP-2L</td>
<td align="left">0.875</td>
<td align="left">0.9333</td>
<td align="left">0.9022</td>
</tr>
<tr>
<td align="left"/>
<td align="left">AMPfun</td>
<td align="left">
<bold>0.8913</bold>
</td>
<td align="left">0.9111</td>
<td align="left">0.9022</td>
</tr>
<tr>
<td align="left"/>
<td align="left">LABAMPsGCN</td>
<td align="left">0.8571</td>
<td align="left">
<bold>0.9556</bold>
</td>
<td align="left">
<bold>0.9130</bold>
</td>
</tr>
<tr>
<td align="left">DS-90%</td>
<td align="left">CAMP-SVM</td>
<td align="left">0.8852</td>
<td align="left">0.871</td>
<td align="left">0.8819</td>
</tr>
<tr>
<td align="left"/>
<td align="left">iAMP-2L</td>
<td align="left">0.8889</td>
<td align="left">0.9032</td>
<td align="left">0.8976</td>
</tr>
<tr>
<td align="left"/>
<td align="left">AMPfun</td>
<td align="left">0.8923</td>
<td align="left">0.9355</td>
<td align="left">0.9134</td>
</tr>
<tr>
<td align="left"/>
<td align="left">LABAMPsGCN</td>
<td align="left">
<bold>0.9077</bold>
</td>
<td align="left">
<bold>0.9516</bold>
</td>
<td align="left">
<bold>0.9291</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="Tfn17">
<label>
<sup>a</sup>
</label>
<p>R: Recall.</p>
</fn>
<fn id="Tfn18">
<label>
<sup>b</sup>
</label>
<p>P: Precision.</p>
</fn>
<fn id="Tfn19">
<label>
<sup>c</sup>
</label>
<p>ACC: accuracy.</p>
</fn>
<fn>
<p>Note: the bold value in table means the best value.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3-5">
<title>3.5 Ablation study</title>
<p>In order to judge if all the parts of our identifier are necessary, we adopt three variants of LABAMPsGCN (LABAMPsGCN-noFC, LABAMPsGCN-cheby and LABAMPsGCN-cheby-noFC) as comparison methods. Specifically, LABAMPsGCN-noFC means that we do not add a full connection layer after the GCN layer for classification, while directly use the output of the GCN layer for classifying. LABAMPsGCN-cheby adds Chebyshev polynomials (<xref ref-type="bibr" rid="B6">Christiansen et al., 2021</xref>), which can use polynomial expansion to approximate the convolution of graphs, that is, polynomial approximation of parameterized frequency response functions. LABAMPsGCN-cheby-noFC adds Chebyshev polynomials and there is no full connection layer after GCN layer output.</p>
<p>
<xref ref-type="table" rid="T6">Table 6</xref> shows the evaluation indicators obtained by training with LABAMPsGCN and its variants on DS-90%. These four groups of training were conducted on the feature of tripeptide and single peptide. For LABAMPsGCN and LABAMPsGCN-noFC, the ACC of LABAMPsGCN was significantly higher than that of LABAMPsGCN-noFC. This is because the full connection layer integrates the feature representations and maps them to the space where the sample labels are located. For LABAMPsGCN and LABAMPsGCN-cheby, the performance of LABAMPsGCN-cheby is slightly poor because the use of Chebyshev polynomials makes each sequence vertex fuse too much irrelevant information. For LABAMPsGCN and LABAMPsGCN-cheby-noFC, the performance of LABAMPsGCN with full connection layer is significantly higher than that without it.</p>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>Performance evaluation of LABAMPsGCN and its three variants.</p>
</caption>
<table>
<tbody valign="top">
<tr>
<td align="left">Methods</td>
<td align="left">R<xref ref-type="table-fn" rid="Tfn20">
<sup>a</sup>
</xref>
</td>
<td align="left">P<xref ref-type="table-fn" rid="Tfn21">
<sup>b</sup>
</xref>
</td>
<td align="left">ACC<xref ref-type="table-fn" rid="Tfn22">
<sup>c</sup>
</xref>
</td>
<td align="left">F1-score</td>
</tr>
<tr>
<td align="left">LABAMPsGCN</td>
<td align="left">
<bold>0.9492</bold>
</td>
<td align="left">
<bold>0.9032</bold>
</td>
<td align="left">
<bold>0.9291</bold>
</td>
<td align="left">
<bold>0.9256</bold>
</td>
</tr>
<tr>
<td align="left">LABAMPsGCN-noFC</td>
<td align="left">0.8906</td>
<td align="left">0.9194</td>
<td align="left">0.9055</td>
<td align="left">0.9048</td>
</tr>
<tr>
<td align="left">LABAMPsGCN-cheby</td>
<td align="left">0.803</td>
<td align="left">0.8413</td>
<td align="left">0.8189</td>
<td align="left">0.8217</td>
</tr>
<tr>
<td align="left">LABAMPsGCN-cheby-noFC</td>
<td align="left">0.7846</td>
<td align="left">0.8095</td>
<td align="left">0.7874</td>
<td align="left">0.7969</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="Tfn20">
<label>
<sup>a</sup>
</label>
<p>R: Recall.</p>
</fn>
<fn id="Tfn21">
<label>
<sup>b</sup>
</label>
<p>P: Precision.</p>
</fn>
<fn id="Tfn22">
<label>
<sup>c</sup>
</label>
<p>ACC: accuracy.</p>
</fn>
<fn>
<p>Note: the bold value in table means the best value.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3-6">
<title>3.6 Visualization of words</title>
<p>LABAMPsGCN learned a lot of word features related to labels. In order to observe these words clearly, we visualized them qualitatively. <xref ref-type="fig" rid="F3">Figure 3</xref> shows the t-SNE visualization (<xref ref-type="bibr" rid="B38">Ruit et al., 2022</xref>) of the second layer word features learned from DS-70% and DS-90%. We set the dimension of the maximum value in the word feature vectors as the label of the word. As can be seen from <xref ref-type="fig" rid="F3">Figure 3</xref>, words of the same color are clustered together, which means that a large number of words are closely related to certain specific classes. The red, green and orange in <xref ref-type="fig" rid="F3">Figure 3</xref> are used for visualization to determine whether word embedding can learn the main information of some sequences. Different colors in the figure represent different sequences. <xref ref-type="fig" rid="F3">Figure 3A</xref> and <xref ref-type="fig" rid="F3">Figure 3B</xref> is the results of DS-70% and DS-90%, respectively. In <xref ref-type="table" rid="T7">Table 7</xref>, we show the top representative words in each category, such as &#x201c;ILE,&#x201d; &#x201c;TIW,&#x201d; and &#x201c;KLK&#x201d;.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>The t-SNE visualization of the second layer features learned from DS-70% and DS-90%. <bold>(A)</bold> The second word features learned from DS-70%. <bold>(B)</bold> The second word features learned from DS-90%.</p>
</caption>
<graphic xlink:href="fgene-13-1062576-g003.tif"/>
</fig>
<table-wrap id="T7" position="float">
<label>TABLE 7</label>
<caption>
<p>Words with highest ACCs for two datasets of DS-70% and DS-90%. We used the word embedding at the last level of GCN to view the best performing words under each category.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th colspan="2" align="center">DS-70%</th>
<th colspan="2" align="center">DS-90%</th>
</tr>
<tr>
<th align="left">LABAMPs</th>
<th align="left">nonLABAMPs</th>
<th align="left">LABAMPs</th>
<th align="left">nonLABAMPs</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">ILE</td>
<td align="left">YET</td>
<td align="left">GSG</td>
<td align="left">FAD</td>
</tr>
<tr>
<td align="left">TIW</td>
<td align="left">MAV</td>
<td align="left">CIV</td>
<td align="left">EAE</td>
</tr>
<tr>
<td align="left">KLK</td>
<td align="left">RNF</td>
<td align="left">KYR</td>
<td align="left">GHH</td>
</tr>
<tr>
<td align="left">KDF</td>
<td align="left">LCH</td>
<td align="left">SAV</td>
<td align="left">KPP</td>
</tr>
<tr>
<td align="left">GDH</td>
<td align="left">RSS</td>
<td align="left">WHT</td>
<td align="left">FKF</td>
</tr>
<tr>
<td align="left">YQN</td>
<td align="left">WAL</td>
<td align="left">NAV</td>
<td align="left">FIL</td>
</tr>
<tr>
<td align="left">GTW</td>
<td align="left">FGW</td>
<td align="left">IQS</td>
<td align="left">VMM</td>
</tr>
<tr>
<td align="left">MPI</td>
<td align="left">WSG</td>
<td align="left">EYE</td>
<td align="left">PTD</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s4">
<title>4 Discussion</title>
<p>In this study, we constructed LABAMPsGCN, a novel graph-based identifier to predict LABAMPs accurately. In this identifier, we designed a graph convolutional neural network framework to automatically learning sequence features. By retrieving and reorganizing multiple AMPs databases and Uni-Prot database, we constructed the positive and negative datasets. We organized positive and negative samples into a large heterogeneous graph, transforming the sequence classification problem into a node classification problem. Graph convolution neural network can aggregate the information of the surrounding nodes to predict the label information of the central node.</p>
<p>LABAMPsGCN is superior to other predictors, on the one hand, because the graph structure can effectively represent the relationship between sequences and words (when constructing a graph, an edge is established between a word and a sequence when this word belongs to this sequence), on the other hand, the label information of sequences can be transferred through the edges on the graph. Because the graph structure is a kind of many-to-many structure, the label information of sequences can be transferred in the whole graph. In this way, the words corresponding to positive and negative labels can be easily distinguished. These words may be the key features to determine whether a sequence is a LABAMP.</p>
<p>For users&#x2019; convenience, we have established a publicly accessible web server (<ext-link ext-link-type="uri" xlink:href="http://www.dong-group.cn/database/dlabamp/Prediction/amplab/result/">http://www.dong-group.cn/database/dlabamp/Prediction/amplab/result/</ext-link>) that can help to predict LABAMPs metabolized from various Lactic acid bacteria. In the next step, we will discuss how to mine the key fragments with antimicrobial function from the whole genome sequence by combining information such as multiple sequence alignment and domain prediction. We believe LABAMPsGCN will be a competent tool for screening lactic acid strains with antimicrobial activities.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/<xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
</sec>
<sec id="s6">
<title>Author contributions</title>
<p>T-JS conducted experiments, analyzed data and wrote the manuscript. H-LB and XY collected data and made the webserver. Z-HS guided the collection of data and the construction of the webserver. G-FD and M-SZ supervised the experiment and managed the whole project.</p>
</sec>
<sec id="s7">
<title>Funding</title>
<p>This work has been supported by the High level Talent Fund Project of Inner Mongolia Agricultural University, China (No. NDYBH 2017-1, NDYB 2018-9), Inner Mongolia Natural Science Foundation Project (No.2021MS06023), the National Natural Science Foundation Project (No.31901666), Major Project of Inner Mongolia Natural Science Foundation (2020ZD12), and 2022 Basic Scientific Research Business Fee Project of Universities Directly under the Inner Mongolia Autonomous Region&#x2014;Interdisciplinary Research Fund of Inner Mongolia Agricultural University (BR22-14-01).</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s10">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2022.1062576/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2022.1062576/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.PDF" id="SM1" mimetype="application/PDF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aiken</surname>
<given-names>Z. A.</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pratten</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Evaluation of ATP bioluminescence assays for potential use in a hospital setting</article-title>. <source>Infect. Control Hosp. Epidemiol.</source> <volume>32</volume>, <fpage>507</fpage>&#x2013;<lpage>509</lpage>. <pub-id pub-id-type="doi">10.1086/659761</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aurelio</surname>
<given-names>Y. S.</given-names>
</name>
<name>
<surname>de Almeida</surname>
<given-names>G. M.</given-names>
</name>
<name>
<surname>de Castro</surname>
<given-names>C. L.</given-names>
</name>
<name>
<surname>Braga</surname>
<given-names>A. P.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Learning from imbalanced data sets with weighted cross-entropy function</article-title>. <source>Neural process. Lett.</source> <volume>50</volume>, <fpage>1937</fpage>&#x2013;<lpage>1949</lpage>. <pub-id pub-id-type="doi">10.1007/s11063-018-09977-1</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Azar</surname>
<given-names>A. T.</given-names>
</name>
<name>
<surname>El-Said</surname>
<given-names>S. A.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Performance analysis of support vector machines classifiers in breast cancer mammography recognition</article-title>. <source>Neural comput. Appl.</source> <volume>24</volume>, <fpage>1163</fpage>&#x2013;<lpage>1177</lpage>. <pub-id pub-id-type="doi">10.1007/s00521-012-1324-4</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beck</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sharon</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>New results on multi-dimensional linear discriminant analysis</article-title>. <source>Operations Res. Lett.</source> <volume>50</volume>, <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1016/j.orl.2021.11.003</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carrasco</surname>
<given-names>R. S. M.</given-names>
</name>
<name>
<surname>Sicilia</surname>
<given-names>M.-A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Unsupervised intrusion detection through skip-gram models of network behavior</article-title>. <source>Comput. Secur.</source> <volume>78</volume>, <fpage>187</fpage>&#x2013;<lpage>197</lpage>. <pub-id pub-id-type="doi">10.1016/j.cose.2018.07.003</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Christiansen</surname>
<given-names>J. S. L.</given-names>
</name>
<name>
<surname>Henriksen</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>H. L.</given-names>
</name>
<name>
<surname>Petersen</surname>
<given-names>C. L.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Filled julia sets of Chebyshev polynomials</article-title>. <source>J. Geom. Anal.</source> <volume>31</volume>, <fpage>12250</fpage>&#x2013;<lpage>12263</lpage>. <pub-id pub-id-type="doi">10.1007/s12220-021-00716-y</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chung</surname>
<given-names>C.-R.</given-names>
</name>
<name>
<surname>Kuo</surname>
<given-names>T.-R.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>L.-C.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>T.-Y.</given-names>
</name>
<name>
<surname>Horng</surname>
<given-names>J.-T.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Characterization and identification of antimicrobial peptides with different functional activities</article-title>. <source>Brief. Bioinform.</source> <volume>21</volume>, <fpage>1098</fpage>&#x2013;<lpage>1114</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbz043</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Consortium</surname>
<given-names>U.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>UniProt: The universal protein knowledgebase in 2021</article-title>. <source>Nucleic Acids Res.</source> <volume>49</volume>, <fpage>D480</fpage>&#x2013;<lpage>D489</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkaa1100</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crouch</surname>
<given-names>S. P.</given-names>
</name>
<name>
<surname>Kozlowski</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Slater</surname>
<given-names>K. J.</given-names>
</name>
<name>
<surname>Fletcher</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1993</year>). <article-title>The use of ATP bioluminescence as a measure of cell proliferation and cytotoxicity</article-title>. <source>J. Immunol. Methods</source> <volume>160</volume>, <fpage>81</fpage>&#x2013;<lpage>88</lpage>. <pub-id pub-id-type="doi">10.1016/0022-1759(93)90011-u</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fagerland</surname>
<given-names>M. W.</given-names>
</name>
<name>
<surname>Hosmer</surname>
<given-names>D. W.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>A generalized Hosmer-Lemeshow goodness-of-fit test for multinomial logistic regression models</article-title>. <source>Stata J.</source> <volume>12</volume>, <fpage>447</fpage>&#x2013;<lpage>453</lpage>. <pub-id pub-id-type="doi">10.1177/1536867x1201200307</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fjell</surname>
<given-names>C. D.</given-names>
</name>
<name>
<surname>Hancock</surname>
<given-names>R. E. W.</given-names>
</name>
<name>
<surname>Cherkasov</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>AMPer: A database and an automated discovery tool for antimicrobial peptides</article-title>. <source>Bioinformatics</source> <volume>23</volume>, <fpage>1148</fpage>&#x2013;<lpage>1155</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btm068</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fuentes-Beals</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Valds-Jimnez</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Riadi</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Hidden Markov modeling with HMMTeacher</article-title>. <source>PLoS Comput. Biol.</source> <volume>18</volume>, <fpage>e1009703</fpage>&#x2013;<lpage>e1009709</lpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1009703</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A gating context-Aware text classification model with BERT and graph convolutional networks</article-title>. <source>J. Intelligent Fuzzy Syst.</source> <volume>40</volume>, <fpage>4331</fpage>&#x2013;<lpage>4343</lpage>. <pub-id pub-id-type="doi">10.3233/jifs-201051</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garg</surname>
<given-names>S. B.</given-names>
</name>
<name>
<surname>Subrahmanyam</surname>
<given-names>V. V.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Sentiment analysis: Choosing the right word embedding for deep learning model</article-title>. <source>Lect. Notes Netw. Syst.</source> <volume>218</volume>, <fpage>417</fpage>&#x2013;<lpage>428</lpage>. <pub-id pub-id-type="doi">10.1007/978-981-16-2164-2_33</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gogoladze</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Grigolava</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Vishnepolsky</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Chubinidze</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Duroux</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Lefranc</surname>
<given-names>M.-P.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Dbaasp: Database of antimicrobial activity and structure of peptides</article-title>. <source>FEMS Microbiol. Lett.</source> <volume>357</volume>, <fpage>63</fpage>&#x2013;<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1111/1574-6968.12489</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Greub</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Holliger</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sanglard</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Schrenzel</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Thiel</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Viollier</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>The Swiss society of microbiology: Small bugs, big questions and cool answers</article-title>. <source>Chimia</source> <volume>70</volume>, <fpage>874</fpage>&#x2013;<lpage>877</lpage>. <pub-id pub-id-type="doi">10.2533/chimia.2016.874</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Effect of <italic>Saccharomyces cerevisiae</italic> cell-free supernatant on the physiology, quorum sensing, and protein synthesis of lactic acid bacteria</article-title>. <source>LWT</source> <volume>165</volume>, <fpage>113732</fpage>. <pub-id pub-id-type="doi">10.1016/j.lwt.2022.113732</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gueguen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Garnier</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Robert</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lefranc</surname>
<given-names>M. P.</given-names>
</name>
<name>
<surname>Mougenot</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>de Lorgeril</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2006</year>). <article-title>PenBase, the shrimp antimicrobial peptide penaeidin database: Sequence-based classification and recommended nomenclature</article-title>. <source>Dev. Comp. Immunol.</source> <volume>30</volume>, <fpage>283</fpage>&#x2013;<lpage>288</lpage>. <pub-id pub-id-type="doi">10.1016/j.dci.2005.04.003</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hamid</surname>
<given-names>M. N.</given-names>
</name>
<name>
<surname>Friedberg</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Identifying antimicrobial peptides using word embedding with deep recurrent neural networks</article-title>. <source>Bioinformatics</source> <volume>35</volume>, <fpage>2009</fpage>&#x2013;<lpage>2016</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty937</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hammami</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Ben Hamida</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Vergoten</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Fliss</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>PhytAMP: A database dedicated to antimicrobial plant peptides</article-title>. <source>Nucleic Acids Res.</source> <volume>37</volume>, <fpage>D963</fpage>&#x2013;<lpage>D968</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkn655</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hammami</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zouhir</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lay</surname>
<given-names>C. L.</given-names>
</name>
<name>
<surname>Hamida</surname>
<given-names>J. B.</given-names>
</name>
<name>
<surname>Fliss</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Forsberg</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>BACTIBASE second release: A database and tool platform for bacteriocin characterization</article-title>. <source>BMC Microbiol.</source> <volume>10</volume>, <fpage>22</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2180-1-22</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Manavalan</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Accelerating bioactive peptide discovery via mutual information-based meta-learning</article-title>. <source>Brief. Bioinform.</source> <volume>23</volume>, <fpage>bbab499</fpage>&#x2013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbab499</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Henderson</surname>
<given-names>K. A.</given-names>
</name>
<name>
<surname>Bialeschki</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>James</surname>
<given-names>P. A.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Overview of camp research</article-title>. <source>Child. Adolesc. Psychiatr. Clin. N. Am.</source> <volume>16</volume>, <fpage>755</fpage>&#x2013;<lpage>767</lpage>. <pub-id pub-id-type="doi">10.1016/j.chc.2007.05.010</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Kong</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Role of lactic acid bacteria in flavor development in traditional Chinese fermented foods: A review</article-title>. <source>Crit. Rev. Food Sci. Nutr.</source> <volume>62</volume>, <fpage>2741</fpage>&#x2013;<lpage>2755</lpage>. <pub-id pub-id-type="doi">10.1080/10408398.2020.1858269</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Preparation of trifluralin antibody and development of enzyme linked immunosorbent assay</article-title>. <source>Mod. Food Sci. Technol.</source> <volume>38</volume>, <fpage>345</fpage>&#x2013;<lpage>354</lpage>. <pub-id pub-id-type="doi">10.13982/j.mfst.1673-9078.2022.1.0470</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Fair-AdaBoost: Extending AdaBoost method to achieve fair classification</article-title>. <source>Expert Syst. Appl.</source> <volume>202</volume>, <fpage>117240</fpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2022.117240</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Nfdd: A dynamic malicious document detection method without manual feature dictionary</article-title>. <source>Lect. Notes Comput. Sci.</source> <volume>12938</volume>, <fpage>147</fpage>&#x2013;<lpage>159</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-86130-8_12</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kadomatsu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kishida</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Tsubota</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>The heparin-binding growth factor midkine: The biological activities and candidate receptors</article-title>. <source>J. Biochem.</source> <volume>172</volume>, <fpage>511</fpage>&#x2013;<lpage>521</lpage>. <pub-id pub-id-type="doi">10.1093/jb/mvt035</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kai</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Puntambekar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Santiago</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S. H.</given-names>
</name>
<name>
<surname>Sehy</surname>
<given-names>D. W.</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>V.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>A novel microfluidic microplate as the next generation assay platform for enzyme linked immunoassays (ELISA)</article-title>. <source>Lab. Chip</source> <volume>12</volume>, <fpage>4257</fpage>&#x2013;<lpage>4262</lpage>. <pub-id pub-id-type="doi">10.1039/c2lc40585g</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kawashima</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pokarowski</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Pokarowska</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kolinski</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Katayama</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>AAindex: Amino acid index database, progress report 2008</article-title>. <source>Nucleic Acids Res.</source> <volume>36</volume>, <fpage>D202</fpage>&#x2013;<lpage>D205</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkm998</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kouw</surname>
<given-names>W. M.</given-names>
</name>
<name>
<surname>Loog</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Target robust discriminant analysis</article-title>. <source>Lect. Notes Comput. Sci.</source> <volume>12644</volume>, <fpage>3</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-73973-7_1</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lauger</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Apell</surname>
<given-names>H. J.</given-names>
</name>
</person-group> (<year>1988</year>). <article-title>Transient behaviour of the Na&#x2b;/K&#x2b;-pump: Microscopic analysis of nonstationary ion-translocation</article-title>. <source>Biochim. Biophys. Acta</source> <volume>944</volume>, <fpage>451</fpage>&#x2013;<lpage>464</lpage>. <pub-id pub-id-type="doi">10.1016/0005-2736(88)90516-0</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Godzik</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Cd-Hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences</article-title>. <source>Bioinformatics</source> <volume>22</volume>, <fpage>1658</fpage>&#x2013;<lpage>1659</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btl158</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>W. H.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J. H.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>G. L.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>R. Q.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>An optical fiber sensor for berberine based on immobilized 1, 4-bis(naphth 2, 1-d oxazole-2-yl)benzene in a new copolymer</article-title>. <source>Talanta</source> <volume>46</volume>, <fpage>679</fpage>&#x2013;<lpage>688</lpage>. <pub-id pub-id-type="doi">10.1016/s0039-9140(97)00330-5</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mulvenna</surname>
<given-names>J. P.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Craik</surname>
<given-names>D. J.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>CyBase: A database of cyclic protein sequence and structure</article-title>. <source>Nucleic Acids Res.</source> <volume>34</volume>, <fpage>D192</fpage>&#x2013;<lpage>D194</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkj005</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pirtskhalava</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Amstrong</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Grigolava</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Chubinidze</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Alimbarashvili</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Vishnepolsky</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>DBAASP v3: Database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics</article-title>. <source>Nucleic Acids Res.</source> <volume>49</volume>, <fpage>D288</fpage>&#x2013;<lpage>D297</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkaa991</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qu</surname>
<given-names>Y. H.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>X. J.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>J. H.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>H. S.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach</article-title>. <source>PLoS One</source> <volume>12</volume>, <fpage>e0188129</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0188129</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ruit</surname>
<given-names>M. v. d.</given-names>
</name>
<name>
<surname>Billeter</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Eisemann</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>An efficient dual-hierarchy t-SNE minimization</article-title>. <source>IEEE Trans. Vis. Comput. Graph.</source> <volume>28</volume>, <fpage>614</fpage>&#x2013;<lpage>622</lpage>. <pub-id pub-id-type="doi">10.1109/tvcg.2021.3114817</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saibil</surname>
<given-names>H. R.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Cryo-EM in molecular and cellular biology</article-title>. <source>Mol. Cell.</source> <volume>82</volume>, <fpage>274</fpage>&#x2013;<lpage>284</lpage>. <pub-id pub-id-type="doi">10.1016/j.molcel.2021.12.016</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Dietrich</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Nettelblad</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Training algorithm matters for the performance of neural network potential: A case study of Adam and the kalman filter optimizers</article-title>. <source>J. Chem. Phys.</source> <volume>155</volume>, <fpage>204108</fpage>. <pub-id pub-id-type="doi">10.1063/5.0070931</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Dramp 3.0: An enhanced comprehensive data repository of antimicrobial peptides</article-title>. <source>Nucleic Acids Res.</source> <volume>50</volume>, <fpage>D488</fpage>&#x2013;<lpage>D496</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkab651</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sim</surname>
<given-names>N.-L.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Henikoff</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schneider</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Ng</surname>
<given-names>P. C.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>SIFT web server: Predicting effects of amino acid substitutions on proteins</article-title>. <source>Nucleic Acids Res.</source> <volume>40</volume>, <fpage>W452</fpage>&#x2013;<lpage>W457</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gks539</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Song</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Lan</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Label-free fluorescent aptasensor berberine-based strategy for ultrasensitive detection of Hg2&#x2b; ion</article-title>. <source>Spectrochim. Acta. A Mol. Biomol. Spectrosc.</source> <volume>204</volume>, <fpage>301</fpage>&#x2013;<lpage>307</lpage>. <pub-id pub-id-type="doi">10.1016/j.saa.2018.06.058</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sundararajan</surname>
<given-names>V. S.</given-names>
</name>
<name>
<surname>Gabere</surname>
<given-names>M. N.</given-names>
</name>
<name>
<surname>Pretorius</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Adam</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Christoffels</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lehvaeslaiho</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Dampd: A manually curated antimicrobial peptide database</article-title>. <source>Nucleic Acids Res.</source> <volume>40</volume>, <fpage>D1108</fpage>&#x2013;<lpage>D1112</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkr1063</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sung Won</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Jaewoo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Sang Woo</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Engineering tools for the development of recombinant lactic acid bacteria</article-title>. <source>Biotechnol. J.</source> <volume>15</volume>, <fpage>e1900344</fpage>. <pub-id pub-id-type="doi">10.1002/biot.201900344</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Teusink</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Molenaar</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Systems biology of lactic acid bacteria: For food and thought</article-title>. <source>Curr. Opin. Syst. Biol.</source> <volume>6</volume>, <fpage>7</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1016/j.coisb.2017.07.005</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Theolier</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fliss</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Jean</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hammami</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>MilkAMP: A comprehensive database of antimicrobial peptides of dairy origin</article-title>. <source>Dairy Sci. Technol.</source> <volume>94</volume>, <fpage>181</fpage>&#x2013;<lpage>193</lpage>. <pub-id pub-id-type="doi">10.1007/s13594-013-0153-2</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Van Dyk</surname>
<given-names>T. K.</given-names>
</name>
<name>
<surname>Majarian</surname>
<given-names>W. R.</given-names>
</name>
<name>
<surname>Konstantinov</surname>
<given-names>K. B.</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>R. M.</given-names>
</name>
<name>
<surname>Dhurjati</surname>
<given-names>P. S.</given-names>
</name>
<name>
<surname>LaRossa</surname>
<given-names>R. A.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>Rapid and sensitive pollutant detection by induction of heat shock gene-bioluminescence gene fusions</article-title>. <source>Appl. Environ. Microbiol.</source> <volume>60</volume>, <fpage>1414</fpage>&#x2013;<lpage>1420</lpage>. <pub-id pub-id-type="doi">10.1128/aem.60.5.1414-1420.1994</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Veltri</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kamath</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Shehu</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Deep learning improves antimicrobial peptide recognition</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>2740</fpage>&#x2013;<lpage>2747</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty179</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Furuhata</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yamakawa</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Regmi</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Graph neural network-enabled manufacturing method classification from engineering drawings</article-title>. <source>Comput. Industry</source> <volume>142</volume>, <fpage>103697</fpage>. <pub-id pub-id-type="doi">10.1016/j.compind.2022.103697</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiong</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Xiong</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>New generation model of word vector representation based on CBOW or skip-gram</article-title>. <source>Comput. Mat. Contin.</source> <volume>60</volume>, <fpage>259</fpage>&#x2013;<lpage>273</lpage>. <pub-id pub-id-type="doi">10.32604/cmc.2019.05155</pub-id> </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zeng</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>F.-X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>DeepEP: A deep learning framework for identifying essential proteins</article-title>. <source>BMC Bioinforma.</source> <volume>20</volume>, <fpage>506</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-019-3076-y</pub-id> </citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhai</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>An instance selection algorithm for fuzzy K-nearest neighbor</article-title>. <source>J. Intelligent Fuzzy Syst.</source> <volume>40</volume>, <fpage>521</fpage>&#x2013;<lpage>533</lpage>. <pub-id pub-id-type="doi">10.3233/jifs-200124</pub-id> </citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Jia</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Altered dynamic functional connectivity in rectal cancer patients with and without chemotherapy: A resting-state fMRI study</article-title>. <source>Int. J. Neurosci.</source> <volume>18</volume>, <fpage>1</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1080/00207454.2022.2130295</pub-id> </citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Graph neural networks: Taxonomy, advances, and trends</article-title>. <source>ACM Trans. Intell. Syst. Technol.</source> <volume>13</volume>, <fpage>1</fpage>&#x2013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.1145/3495161</pub-id> </citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ziegler</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Koenig</surname>
<given-names>I. R.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Mining data with random forests: Current options for real-world applications</article-title>. <source>WIREs. Data Min. Knowl. Discov.</source> <volume>4</volume>, <fpage>55</fpage>&#x2013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1002/widm.1114</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>