<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Physiol.</journal-id>
<journal-title>Frontiers in Physiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Physiol.</abbrev-journal-title>
<issn pub-type="epub">1664-042X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fphys.2021.658633</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Physiology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A Transfer Learning-Based Approach for Lysine Propionylation Prediction</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Li</surname> <given-names>Ang</given-names></name>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1167577/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Deng</surname> <given-names>Yingwei</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x2020;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Tan</surname> <given-names>Yan</given-names></name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Chen</surname> <given-names>Min</given-names></name>
<xref ref-type="corresp" rid="c002"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/629707/overview"/>
</contrib>
</contrib-group>
<aff><institution>School of Computer Science and Technology, Hunan Institute of Technology</institution>, <addr-line>Hengyang</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Yu Xue, Huazhong University of Science and Technology, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Han Cheng, Zhengzhou University, China; Jian-Ding Qiu, Nanchang University, China; Yan Xu, University of Science and Technology Beijing, China</p></fn>
<corresp id="c001">&#x002A;Correspondence: Yingwei Deng, <email>dengyingwei@hnit.edu.cn</email></corresp>
<corresp id="c002">Min Chen, <email>chenmin@hnit.edu.cn</email></corresp>
<fn fn-type="other" id="fn002"><p><sup>&#x2020;</sup>These authors share first authorship</p></fn>
<fn fn-type="other" id="fn004"><p>This article was submitted to Systems Biology, a section of the journal Frontiers in Physiology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>21</day>
<month>04</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>658633</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>01</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>03</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 Li, Deng, Tan and Chen.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Li, Deng, Tan and Chen</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Lysine propionylation is a newly discovered posttranslational modification (PTM) and plays a key role in the cellular process. Although proteomics techniques was capable of detecting propionylation, large-scale detection was still challenging. To bridge this gap, we presented a transfer learning-based method for computationally predicting propionylation sites. The recurrent neural network-based deep learning model was trained firstly by the malonylation and then fine-tuned by the propionylation. The trained model served as feature extractor where protein sequences as input were translated into numerical vectors. The support vector machine was used as the final classifier. The proposed method reached a matthews correlation coefficient (MCC) of 0.6615 on the 10-fold crossvalidation and 0.3174 on the independent test, outperforming state-of-the-art methods. The enrichment analysis indicated that the propionylation was associated with these GO terms (GO:0016620, GO:0051287, GO:0003735, GO:0006096, and GO:0005737) and with metabolism. We developed a user-friendly online tool for predicting propoinylation sites which is available at <ext-link ext-link-type="uri" xlink:href="http://47.113.117.61/">http://47.113.117.61/</ext-link>.</p>
</abstract>
<kwd-group>
<kwd>propionylation</kwd>
<kwd>malonylation</kwd>
<kwd>deep learning</kwd>
<kwd>transfer learning</kwd>
<kwd>recurrent neural network</kwd>
<kwd>long short term memory</kwd>
<kwd>support machine vector</kwd>
</kwd-group>
<contract-num rid="cn001">2019JJ40064</contract-num>
<contract-sponsor id="cn001">Natural Science Foundation of Hunan Province<named-content content-type="fundref-id">10.13039/501100004735</named-content></contract-sponsor>
<counts>
<fig-count count="4"/>
<table-count count="6"/>
<equation-count count="10"/>
<ref-count count="54"/>
<page-count count="9"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1">
<title>Introduction</title>
<p>No machine is more sophisticated than the cell. This is because there are too many sophisticated mechanisms in the cell, including transcription, gene splicing, translation, and posttranslational modification (PTM). All constituted the sophisticated life. As a key mechanism, PTM increases not only diversities of protein structures and functions but also make regulation more sophisticated. Many studies indicated that aberrant of PTM was always implicated in many human diseases including cancer (<xref ref-type="bibr" rid="B28">Martin et al., 2011</xref>; <xref ref-type="bibr" rid="B30">Nakamura et al., 2015</xref>; <xref ref-type="bibr" rid="B19">Junqueira et al., 2019</xref>). Propionylation, one of more than 400 types of PTM, was firstly discovered in histone in 2007 (<xref ref-type="bibr" rid="B6">Chen et al., 2007</xref>), and later in nonhistone (<xref ref-type="bibr" rid="B7">Cheng et al., 2009</xref>). The propionylation was a dynamic process where propionyl group was conjugated by some acetyltransferases to substrate proteins which was thus propionylated and could be removed by Sirt1 and Sirt2 (<xref ref-type="bibr" rid="B6">Chen et al., 2007</xref>; <xref ref-type="bibr" rid="B21">Leemhuis et al., 2008</xref>; <xref ref-type="bibr" rid="B53">Zhang et al., 2008</xref>; <xref ref-type="bibr" rid="B7">Cheng et al., 2009</xref>). Although it was known that lysine propionylation played a regulating role in the metabolism (<xref ref-type="bibr" rid="B52">Yang et al., 2019</xref>) and was a mark of active chromatin (<xref ref-type="bibr" rid="B20">Kebede et al., 2017</xref>), many of its unknown functions were still not uncovered.</p>
<p>Identifying propionylation sites was crucial to further explore functions of propionylated proteins. The mass spectrometry has been developed to detect propionylation sites in the past decades and obtained vast achievements (<xref ref-type="bibr" rid="B6">Chen et al., 2007</xref>). However, this technique was time consuming and labor intensive. Another alternative was computational methods which learned a model from the known data and then gave the predictions for unknown data. The process was similar with learning of human. In the past 30 years, more than 100 computational methods or tools have been developed for predicting PTM sites (<xref ref-type="bibr" rid="B14">Huang and Zeng, 2016</xref>; <xref ref-type="bibr" rid="B54">Zhou et al., 2016</xref>; <xref ref-type="bibr" rid="B1">Ai et al., 2017</xref>; <xref ref-type="bibr" rid="B48">Wei et al., 2017</xref>, <xref ref-type="bibr" rid="B47">2019</xref>; <xref ref-type="bibr" rid="B49">Xiang et al., 2017</xref>; <xref ref-type="bibr" rid="B5">Chen et al., 2018</xref>; <xref ref-type="bibr" rid="B9">de Brevern et al., 2018</xref>; <xref ref-type="bibr" rid="B32">Ning et al., 2018</xref>, <xref ref-type="bibr" rid="B31">2019</xref>; <xref ref-type="bibr" rid="B50">Xie et al., 2018</xref>; <xref ref-type="bibr" rid="B16">Huang et al., 2019</xref>, <xref ref-type="bibr" rid="B15">2020</xref>; <xref ref-type="bibr" rid="B25">Luo et al., 2019</xref>; <xref ref-type="bibr" rid="B27">Malebary et al., 2019</xref>; <xref ref-type="bibr" rid="B45">Wang et al., 2019</xref>; <xref ref-type="bibr" rid="B26">Lv et al., 2020</xref>; <xref ref-type="bibr" rid="B39">Qian et al., 2020</xref>; <xref ref-type="bibr" rid="B42">Thapa et al., 2020</xref>). For example, <xref ref-type="bibr" rid="B27">Malebary et al. (2019)</xref> proposed a computational model for lysine crotonylation prediction by integrating various position and composition relative features along with statistical moments, and reached the average accuracy of 0.9917 in the experimental dataset. <xref ref-type="bibr" rid="B5">Chen et al. (2018)</xref> presented a computational tool named ProAcePred to predict prokaryote lysine acetylation sites by extracting sequence-based, physicochemical property and evolutionary information features. <xref ref-type="bibr" rid="B48">Wei et al. (2017</xref>, <xref ref-type="bibr" rid="B47">2019)</xref> used sequence-based information to build computational models for predicting phosphorylation sites and protein methylation sites, respectively. Although propionylation was a newly discovered PTM, there still were two computational methods developed to detect propoinylation sites. One was that the biased support vector machine (SVM) model (<xref ref-type="bibr" rid="B18">Ju and He, 2017</xref>) which incorporated four different sequence features into Chou&#x2019;s pseudo-amino acid composition. Another was the PropSeek which was also a SVM model and which exploited evolutionary information, sequenced-derived information, predicted structural information, and feature annotations (<xref ref-type="bibr" rid="B46">Wang et al., 2017</xref>). Advance in deep learning techniques could accelerate development of propionylation prediction. A well-known example was that the AlphaFold, a deep-learning-based method, accurately determined a protein&#x2019;s 3D shape from its amino-acid sequence (<xref ref-type="bibr" rid="B3">Callaway, 2020</xref>). The detection of protein structure especially in more than two dimensions was one of biology&#x2019;s grandest challenges and to date no better technique can solve this issue. In this paper, we attempted to build a deep learning model to predict propionylation sites. However, the accumulated propionylation data was too small to better train deep learning model. Lysine propionylation has <italic>in situ</italic> crosstalk with lysine malonylation. <xref ref-type="bibr" rid="B46">Wang et al. (2017)</xref> statistically compared 1,471 propionylation sites in 605 proteins with the dataset of 1,745 malonylation sites in 595 proteins and found that 600 (40.8%) of 1,471 propionylation sites are overlapped with malonylation. What is more, the number of malonylation was much more than that of propionylation sites. Inspired by this, we proposed a transfer learning method for predicting propionylation sites. We firstly constructed a recurrent neural network (RNN)-based deep learning model, which was trained by the malonylation data. The model was then fine tuned by the propionylation data. The model served as feature extractor. Finally, the SVM-based classifier was trained to discriminate propionylation from nonpropionylation.</p>
</sec>
<sec id="S2">
<title>Data</title>
<p>All lysine propionylation sites were both from the protein lysine modifications database (PLMD) (<xref ref-type="bibr" rid="B51">Xu et al., 2017</xref>) and Uniprot database (<xref ref-type="bibr" rid="B43">UniProt Consortium, 2018</xref>). The PLMD was devoted to collect lysine modification, currently hosting 284,780 modification events in 53,501 proteins for 20 types of lysine modification such as ubiquitination, methylation, and sumoylation. The Uniprot was a comprehensive database of protein sequence and function annotation. We firstly downloaded 192 proteins containing 413 propionyllysine sites from the PLMD <ext-link ext-link-type="uri" xlink:href="http://plmd.biocuckoo.org/download.php">http://plmd.biocuckoo.org/download.php</ext-link>. We then retrieved 18 propionylation proteins from the Uniprot database. After merging two dataset of proteins and removing repeated proteins, we obtained 207 unique proteins. Functions of protein including protein modification would rely more or less on homology. To reduce or remove influences of homology on the proposed method, we applied the sequence clustering software CD-HIT (<xref ref-type="bibr" rid="B23">Li and Godzik, 2006</xref>) to perform sequence clustering. The sequence identity was set to 0.7. Finally, we obtained 189 proteins as experimental data, of which sequence similarities between any two was less than 0.7. We selected randomly 4/5 of 189 proteins (151) as positive training samples which containing 304 sites, the remaining (38) as positive testing ones containing 104 sites. Lysine sites largely outnumbered lysine propionylation sites, so positive and negative samples were unbalanced, i.e., negative samples extremely outnumbered positive ones. The unbalance between positive and negative samples would cause the trained model to prefer to negative samples. Therefore, we randomly selected sites of lysine which does not undergo PTM from these proteins as negative samples at a ratio of positive to negative 1:1. The training set consisted of 304 positive and 304 negative lysine sites, while the testing set of 104 positive and 104 negative lysine sites. All the positive and the negative sites are listed in the <xref ref-type="supplementary-material" rid="DS1">Supplementary Material</xref>.</p>
<p>We also downloaded 3,429 malonylated proteins containing 9,584 malonylation sites. Similarly, we randomly chose the same number of lysine sites as nonmalonylation sites, These lysine sites did not undergo malonylation events as negative samples. Therefore, the malonylation set contained 9,584 malonylation sites and 9,584 nonmalonylation lysine sites.</p>
</sec>
<sec id="S3" sec-type="materials|methods">
<title>Materials and Methods</title>
<p>As shown in <xref ref-type="fig" rid="F1">Figure 1</xref>, the proposed method consisted of three main steps: feature encoding, training classifier, and predicting propionylation, or eight modules: segmenting sequences, constructing a deep RNN model, training the deep RNN model, extracting features, constructing SVM model, optimizing the window size and the super-parameters in the SVM model, training the SVM model, and predicting propionylation with trained SVM model. We used the malonylation dataset to train the RNN model and then fine tuned the trained model by the training set of propionylation data. Propionylation sequences were inputted into the fine-tuned and trained deep RNN model and the outputs in its last second layer were viewed as features of the propionylation sequences. The subsequent workflow was the same as the common machine learning method.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>The workflow of the proposed method.</p></caption>
<graphic xlink:href="fphys-12-658633-g001.tif"/>
</fig>
<sec id="S3.SS1">
<title>Segmenting Sequences</title>
<p>As shown in <xref ref-type="fig" rid="F2">Figure 2A</xref>, protein sequences were segmented into peptides where lysine was the center and <italic>n</italic> residues were located in its downstream and upstream, respectively. If the number of residues in the downstream or the upstream was less than <italic>n</italic>, the corresponding number of character <italic>X</italic> were supplemented, as shown in <xref ref-type="fig" rid="F2">Figure 2B</xref>. The peptides were a window of residues in fixed size (2&#x00D7;<italic>n</italic> + 1). We obtained 816 peptides, and 9,584 + 9,584 = 19,168 peptides for propionylation dataset and for malonylation dataset above, respectively.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Illustration of segmenting protein sequences. <bold>(A)</bold> is normal segment; <bold>(B)</bold> is segment when the number of residues is less than 8.</p></caption>
<graphic xlink:href="fphys-12-658633-g002.tif"/>
</fig>
</sec>
<sec id="S3.SS2">
<title>Deep RNN Model</title>
<p>As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, the deep RNN model was made up of one embedding, two long short-term memory (LSTM), one Gated Recurrent Unit (GRU), one dropout, one flatten, one fully connected, and one output layer. The embedding layer translated integer indices of amino acid characters into embedding vectors. In general, the embedding layer was regarded as a bridge from text to numerical vector in field of natural language process. The LSTM (<xref ref-type="bibr" rid="B12">Hochreiter and Schmidhuber, 1997</xref>) was a RNN (<xref ref-type="bibr" rid="B36">Pearlmutter, 1989</xref>; <xref ref-type="bibr" rid="B10">Giles et al., 1994</xref>). The RNN shared network weights where output at current step not only depended on the input at current step but also on output at previous steps. Due to its effect and efficiency, the RNN has widely been applied in the field of sequence analysis or time-series analysis. The RNN could not remember information about previous inputs which was away from the current input. The LSTM was one of better solutions to it. The typical LSTM included three gates: forget gate, input gate, and output gate. The forget gate was to forget some past information selected, and the input gate was to remember some current information. All three gates adopted the sigmoid as the activation function, whose output ranged from 0 to 1. The output was 0, meaning that no information was passed and 1 meant all information was passed. The LSTM also included a candidate memory cell which fused current and past memories. The GRU was a variant of the LSTM. Compared with the LSTM, the GRU included only two gates: reset gate and update gate, dropping the candidate memory cell. The reset gate was to determine which past information to be forgotten, and the update gate to drop some past information and to add some new information. The number of operations in the GRU was less than that that in the LSTM, so the GRU was computed faster than the LSTM. For the purpose of detecting bidirectional semantic information, we used the bidirectional LSTM and the bidirectional GRU.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>The RNN-based deep learning model.</p></caption>
<graphic xlink:href="fphys-12-658633-g003.tif"/>
</fig>
<p>Deep learning model would cause overfitting and be time consuming. <xref ref-type="bibr" rid="B11">Hinton et al. (2012)</xref> proposed a dropout operation as a solution to prevent neural networks from overfitting. The dropout operation was to drop some neurons whose weights were not updated during training at a certain rate of dropout, while all the neurons were used during testing. Since the dropout was created, it is becoming a more prevalent trick in the deep learning models (<xref ref-type="bibr" rid="B40">Srivastava et al., 2014</xref>).</p>
<p>Flatten layer was a bridge between the LSTM layer and fully connected layer, and its aim was only to transform the shape of input so that it could be connected to the subsequent fully connected layer. The fully connected layer corresponded to the hidden layer in the multiple layer perception. The number of neurons in the output layer was responsible for the number of class labels.</p>
</sec>
<sec id="S3.SS3">
<title>Support Vector Machine</title>
<p>The SVM proposed by Vapnik et al. (<xref ref-type="bibr" rid="B2">Boser et al., 1992</xref>; Cortes et al. 1995; Vapnik et al. 1998) is a statistical learning algorithm. Due to mathematically theoretical foundation, the SVM has been applied to a wide range of fields from handwritten digit recognition (<xref ref-type="bibr" rid="B29">Matic et al., 1993</xref>), text categorization (<xref ref-type="bibr" rid="B17">Joachims, 1999</xref>), face images detection (<xref ref-type="bibr" rid="B35">Osuna et al., 1997</xref>), to protein/gene structure or function prediction (<xref ref-type="bibr" rid="B4">Caragea et al., 2007</xref>; <xref ref-type="bibr" rid="B37">Plewczynski et al., 2008</xref>; <xref ref-type="bibr" rid="B22">Li et al., 2009</xref>; <xref ref-type="bibr" rid="B38">Pugalenthi et al., 2010</xref>; <xref ref-type="bibr" rid="B24">Li et al., 2011</xref>; <xref ref-type="bibr" rid="B41">Sun et al., 2015</xref>; <xref ref-type="bibr" rid="B32">Ning et al., 2018</xref>). Take, for example, a binary classification with the <italic>n</italic> training samples {(<italic>x</italic><sub><italic>i</italic></sub>,<italic>y</italic><sub><italic>i</italic></sub>)|<italic>i</italic> = 1,2,&#x2026;,<italic>n</italic>} where <italic>y</italic><sub><italic>i</italic></sub> &#x2208; {1,&#x2212;1}. The SVM aimed to find a hyperplane <italic>f</italic>(<italic>x</italic>) = <italic>w</italic><italic>x</italic> + <italic>b</italic> to separate samples with positive label 1 from ones with label &#x2212;1. That is to say, the hyperplane made positive samples satisfy <italic>f</italic>(<italic>x</italic>) = <italic>w</italic><italic>x</italic> + <italic>b</italic> &#x003E; 0 and negative ones satisfy <italic>f</italic>(<italic>x</italic>) = <italic>w</italic><italic>x</italic> + <italic>b</italic> &#x003C; 0. In fact, there would be many hyperplane meeting the requirement above. The SVM was to find such a hyperplane that maximizes the separating margin. This question was modeled as minimizing the following formulas:</p>
<disp-formula id="S3.E1">
<label>(1)</label>
<mml:math id="M1">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo rspace="8.1pt">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>subject to the constraints:</p>
<disp-formula id="S3.E2">
<label>(2)</label>
<mml:math id="M2">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo rspace="8.1pt" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+2.8pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>3</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x2026;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In the real world, the training samples were not always completely separable by any hyperplane. That is to say, there were some samples which were separated as another category. To address this issue, the SVM introduced the slack variables &#x03BE;<sub><italic>i</italic></sub>. The objective function (1) was rewrote as</p>
<disp-formula id="S3.E3">
<label>(3)</label>
<mml:math id="M3">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mo>&#x03BE;</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo rspace="8.1pt">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mo movablelimits="false">&#x2211;</mml:mo>
<mml:mrow>
<mml:mpadded width="+5.6pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:msub>
<mml:mo>&#x03BE;</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>C</italic> was called penalty factor, a user-specified hyper-parameter, while the constraint (2) was rewrote as</p>
<disp-formula id="S3.E4">
<label>(4)</label>
<mml:math id="M4">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo rspace="8.1pt" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mo>&#x03BE;</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="9.2pt">,</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mpadded width="+2.8pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mn>&#x2004;1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>3</mml:mn>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="5.8pt">,</mml:mo>
<mml:mrow>
<mml:mpadded width="+5.6pt">
<mml:msub>
<mml:mo>&#x03BE;</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mpadded>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>The objective function was composed of the structural risk (the first term in Eq. 3) and empirical risk (the second term in Eq. 3). The penalty factor controlled trade-off between two risks. Another superiority of the SVM was that it absorbed the kernel function. There existed a case that samples could be not discriminable in the low-dimensional space, but they became discriminable. The SVM firstly exploited the kernel function to transform these undistinguishable samples from low-dimensional into high-dimensional shape, and then found a high-dimensional hyperplane to separate them, which was expressed by</p>
<disp-formula id="S3.E5">
<label>(5)</label>
<mml:math id="M5">
<mml:mrow>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo rspace="5.3pt" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="5.3pt">=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mi>w</mml:mi>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:mo>&#x2062;</mml:mo>
<mml:mtext>f</mml:mtext>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where &#x03D5;(<italic>x</italic>) was a kernel function. There are more than ten kernel functions such as linear kernel <inline-formula><mml:math id="INEQ7"><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03D5;</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo rspace="8.1pt">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mo>&#x2062;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula>, polynomial kernel <inline-formula><mml:math id="INEQ8"><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03D5;</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo rspace="8.1pt">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>&#x2062;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mo>&#x2062;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, Gaussian Kernel <inline-formula><mml:math id="INEQ9"><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03D5;</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo rspace="5.3pt">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:msup><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x2062;</mml:mo><mml:msup><mml:mi mathvariant="normal">&#x03C3;</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula> etc. The corresponding constraint were updated as</p>
<disp-formula id="S3.E6">
<label>(6)</label>
<mml:math id="M6">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi mathvariant="normal">&#x03D5;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo rspace="8.1pt" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mi>&#x03BE;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="11.9pt">,</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mpadded width="+5.6pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>3</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x2026;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="8.6pt">,</mml:mo>
<mml:mrow>
<mml:mpadded width="+5.6pt">
<mml:msub>
<mml:mi>&#x03BE;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mpadded>
<mml:mo>&#x2265;</mml:mo>
<mml:mpadded width="+2.8pt">
<mml:mn>0</mml:mn>
</mml:mpadded>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>The SVM was soluble by the dual theory and the Lagrange optimization algorithm. Readers can refer to the relevant scientific references.</p>
</sec>
</sec>
<sec id="S4">
<title>Crossvalidation and Metrics</title>
<p>In the case of regression or classification question, there are generally four types of validations: hold-out validation, <italic>k</italic>-fold crossvalidation, leave-one-out, and independent test. In the hold-out validation, the training set was splitted into two parts: one for training and another for validation. In the <italic>k</italic>-fold cross validation, the training set was divided into <italic>k</italic> parts. Each part was tested by the model trained over other <italic>k</italic> &#x2212; 1 parts. Leave-one-out was an extreme cross validation, where the number of samples is equal to <italic>k</italic>. We used 10-fold cross validation and independent test to examine the proposed method.</p>
<p>To quantitatively compare performance of methods, the following metrics: sensitivity (SN), specificity (SP), accuracy (ACC), and Matthews correlation coefficient (MCC), were used, which were computed by</p>
<disp-formula id="S4.Ex1">
<mml:math id="M7">
<mml:mrow>
<mml:mtext>SN</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mtext>TP</mml:mtext>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S4.Ex2">
<mml:math id="M8">
<mml:mrow>
<mml:mtext>SP</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mtext>TN</mml:mtext>
<mml:mrow>
<mml:mtext>FP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>TN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S4.Ex3">
<mml:math id="M9">
<mml:mrow>
<mml:mtext>ACC</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>TN</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>FN</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>FP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>TN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S4.Ex4">
<mml:math id="M10">
<mml:mrow>
<mml:mtext>MCC</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mpadded width="+5.6pt">
<mml:mi>TP</mml:mi>
</mml:mpadded>
<mml:mo>&#x00D7;</mml:mo>
<mml:mtext>TN</mml:mtext>
</mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mpadded width="+5.6pt">
<mml:mi>FP</mml:mi>
</mml:mpadded>
<mml:mo>&#x00D7;</mml:mo>
<mml:mtext>FN</mml:mtext>
</mml:mrow>
</mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>FN</mml:mtext>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mtext>TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>FP</mml:mtext>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mtext>TN</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>FN</mml:mtext>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mtext>TN</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>FP</mml:mtext>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In the equations above, TP is the number of the true positive samples, TN the number of the true negative samples, FN the number of false-negative samples, and FP the number of false-positive samples. SN, SP, and ACC ranges from 0 to 1, 0 meaning completely wrong and 1 completely correct. For example, SN = 0 implied that all the positive samples were predicted as negative ones. MCC ranges from &#x2212;1 to 1, 1 meaning perfect prediction, 0 random prediction, and &#x2212;1 the prediction completely opposite to the true.</p>
<p>The receiver operating characteristic (ROC) curve was used to depict performance, which plotted true positive rate against false positive rate under various thresholds. The area under the ROC curve (AUC) was used to quantitively assess the performance. The AUC ranged from 0 to 1, 0.5 meaning random guess and 1 perfect performance.</p>
</sec>
<sec id="S5">
<title>Results</title>
<sec id="S5.SS1">
<title>Parameter Optimization</title>
<p>The size of peptide window was generally set to one of the interval [21, 41]. We conducted 10-fold crossvalidations over the training set to search for better window size. The performances under various window size were listed in <xref ref-type="table" rid="T1">Table 1</xref>. The crossvalidation of window size 29 obtained the better performance. Therefore, we set window size to 29 in the subsequent experiments. We also optimized super parameters in the SVM classifier, i.e., <italic>C</italic>, kernel, and gamma. We searched combination space of <italic>C</italic> = [0.5, 1, 1.5, 2, 2.5, 3, 10, 100, 1,000], kernel = [&#x201C;linear,&#x201D; &#x201C;poly,&#x201D; &#x201C;rbf&#x201D;], and gamma = [&#x201C;scale,&#x201D; &#x201C;auto&#x201D;]. <xref ref-type="table" rid="T2">Table 2</xref> shows the best 15 combinations. The best performance was SN = 0.8454, SP = 0.8158, ACC = 0.8306, and MCC = 0.6615, slightly better than previous, and the corresponding parameter was that <italic>C</italic> = 1, kernel = rbf, and gamma = scale. The predictive performance in the testing set was a SN of 0.6731, a SP 0.6442, an ACC of 0.6587, and a MCC of 0.3174.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Performance of various window size in the 10-fold crossvalidation.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Size</td>
<td valign="top" align="center">SN</td>
<td valign="top" align="center">SP</td>
<td valign="top" align="center">ACC</td>
<td valign="top" align="center">MCC</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">21</td>
<td valign="top" align="center">0.6579</td>
<td valign="top" align="center">0.7862</td>
<td valign="top" align="center">0.7220</td>
<td valign="top" align="center">0.4478</td>
</tr>
<tr>
<td valign="top" align="left">23</td>
<td valign="top" align="center">0.7631</td>
<td valign="top" align="center">0.8421</td>
<td valign="top" align="center">0.8026</td>
<td valign="top" align="center">0.6072</td>
</tr>
<tr>
<td valign="top" align="left">25</td>
<td valign="top" align="center">0.7697</td>
<td valign="top" align="center">0.8553</td>
<td valign="top" align="center">0.8125</td>
<td valign="top" align="center">0.6273</td>
</tr>
<tr>
<td valign="top" align="left">27</td>
<td valign="top" align="center">0.7533</td>
<td valign="top" align="center">0.7763</td>
<td valign="top" align="center">0.7648</td>
<td valign="top" align="center">0.5297</td>
</tr>
<tr>
<td valign="top" align="left">29</td>
<td valign="top" align="center"><bold>0.8355</bold></td>
<td valign="top" align="center">0.8158</td>
<td valign="top" align="center"><bold>0.8257</bold></td>
<td valign="top" align="center"><bold>0.6514</bold></td>
</tr>
<tr>
<td valign="top" align="left">31</td>
<td valign="top" align="center">0.7697</td>
<td valign="top" align="center">0.8059</td>
<td valign="top" align="center">0.7878</td>
<td valign="top" align="center">0.5760</td>
</tr>
<tr>
<td valign="top" align="left">33</td>
<td valign="top" align="center">0.7928</td>
<td valign="top" align="center"><bold>0.8553</bold></td>
<td valign="top" align="center">0.8240</td>
<td valign="top" align="center">0.6493</td>
</tr>
<tr>
<td valign="top" align="left">35</td>
<td valign="top" align="center">0.7664</td>
<td valign="top" align="center">0.7796</td>
<td valign="top" align="center">0.7730</td>
<td valign="top" align="center">0.5461</td>
</tr>
<tr>
<td valign="top" align="left">37</td>
<td valign="top" align="center">0.7500</td>
<td valign="top" align="center">0.7697</td>
<td valign="top" align="center">0.7599</td>
<td valign="top" align="center">0.5198</td>
</tr>
<tr>
<td valign="top" align="left">39</td>
<td valign="top" align="center">0.7467</td>
<td valign="top" align="center">0.7336</td>
<td valign="top" align="center">0.7401</td>
<td valign="top" align="center">0.4803</td>
</tr>
<tr>
<td valign="top" align="left">41</td>
<td valign="top" align="center">0.7697</td>
<td valign="top" align="center">0.7434</td>
<td valign="top" align="center">0.7566</td>
<td valign="top" align="center">0.5133</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>Bold values mean the best in the column.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>The best 15 combinations in the searching space.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"><italic>C</italic></td>
<td valign="top" align="center">Gamma</td>
<td valign="top" align="center">Kernel</td>
<td valign="top" align="center">Average accuracy</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">Scale</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8389</td>
</tr>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">Auto</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8389</td>
</tr>
<tr>
<td valign="top" align="left">0.5</td>
<td valign="top" align="center">Scale</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8356</td>
</tr>
<tr>
<td valign="top" align="left">0.5</td>
<td valign="top" align="center">Auto</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8356</td>
</tr>
<tr>
<td valign="top" align="left">1.5</td>
<td valign="top" align="center">Scale</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8307</td>
</tr>
<tr>
<td valign="top" align="left">1.5</td>
<td valign="top" align="center">Auto</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8307</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="center">Scale</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8258</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="center">Auto</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8241</td>
</tr>
<tr>
<td valign="top" align="left">2.5</td>
<td valign="top" align="center">Scale</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8143</td>
</tr>
<tr>
<td valign="top" align="left">2.5</td>
<td valign="top" align="center">Auto</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8143</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="center">Auto</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8093</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="center">Scale</td>
<td valign="top" align="center">rbf</td>
<td valign="top" align="center">0.8093</td>
</tr>
<tr>
<td valign="top" align="left">0.5</td>
<td valign="top" align="center">Auto</td>
<td valign="top" align="center">Sigmoid</td>
<td valign="top" align="center">0.7960</td>
</tr>
<tr>
<td valign="top" align="left">0.5</td>
<td valign="top" align="center">Scale</td>
<td valign="top" align="center">Sigmoid</td>
<td valign="top" align="center">0.7960</td>
</tr>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">Auto</td>
<td valign="top" align="center">Sigmoid</td>
<td valign="top" align="center">0.7664</td>
</tr>
</tbody>
</table></table-wrap>
</sec>
<sec id="S5.SS2">
<title>Comparison With Other Methods</title>
<p>To the best of my knowledge, there were two computational methods for propionylation prediction. One was the PropPred (<xref ref-type="bibr" rid="B18">Ju and He, 2017</xref>) and another was the PropSeek (<xref ref-type="bibr" rid="B46">Wang et al., 2017</xref>). However, to date, these two webservers stopped work. The performance of the PropPred with 250 optimal features and a window size of 25 residues in the 10-fold crossvalidation was a SN of 0.7003, a SP 0.7561, an ACC of 0.7502, and a MCC of 0.3085, inferior to that of the proposed method. The performance of the PropPred in the testing set was a SN of 0.6604, a SP of 0.7504, an ACC of 0.7431, and a MCC of 0.2495, inferior to that of the proposed method in terms of SN and MCC. It must be pointed out that the training and the testing set used by two methods were different. To perform fair comparison, we implemented the PropProd with the 250 optimal features and a window size of 25 residues. Both performances of 10-fold crossvalidation on the training set and of independent test on the testing set are listed in <xref ref-type="table" rid="T3">Table 3</xref>. Obviously, the proposed method outperformed the PropPred. We also compared the presented method with the deep RNN model. The performance of the deep RNN model over the testing set obtained a SN of 0.5962, a SP of 0.6731, an ACC of 0.6346, and a MCC of 0.2700. The presented method outperformed the deep RNN model.</p>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Performances of the PropPred method.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center">SN</td>
<td valign="top" align="center">SP</td>
<td valign="top" align="center">ACC</td>
<td valign="top" align="center">MCC</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">10-fold</td>
<td valign="top" align="center">0.7928</td>
<td valign="top" align="center">0.7599</td>
<td valign="top" align="center">0.7763</td>
<td valign="top" align="center">0.5529</td>
</tr>
<tr>
<td valign="top" align="left">Independent</td>
<td valign="top" align="center">0.4904</td>
<td valign="top" align="center">0.6442</td>
<td valign="top" align="center">0.5673</td>
<td valign="top" align="center">0.1362</td>
</tr>
</tbody>
</table></table-wrap>
<p><xref ref-type="fig" rid="F4">Figure 4A</xref> shows performances of 10-fold crossvalidation for the presented method and the PropPred. Although the AUC of the presented method was inferior to that of the PropPred, the best performance at the most up-left was better than that of that of the PropPred. In the independent test (<xref ref-type="fig" rid="F4">Figure 4B</xref>), the presented method outperformed the PropPred and the deep RNN method. Obviously, the presented method occupied advantage of the deep learning and avoided artificial design of feature extraction.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Receiver operating characteristic curves of <bold>(A)</bold> 10-fold cross validation and <bold>(B)</bold> independent test.</p></caption>
<graphic xlink:href="fphys-12-658633-g004.tif"/>
</fig>
</sec>
<sec id="S5.SS3">
<title>Functional Analysis</title>
<p>We used the DAVID web application (<xref ref-type="bibr" rid="B13">Huang da et al., 2009</xref>) for functional analysis which included a comprehensive set of functional annotation tools to uncover and understand biological meaning behind studied genes. Firstly, we exploited the gene functional classification tool in the DAVID to cluster 183 proteins from Thermus thermophilus HB8. As shown in <xref ref-type="table" rid="T4">Table 4</xref>, only 29 proteins clustered into four similar function groups, while other proteins showed no similarity of functions. The proteins leucyl-tRNA synthetase (leuS) and the protein histidyl-tRNA synthetase (hisS) appeared simultaneously in two groups. We also used the function annotation tool in the DAVID perform enrichment of GO and KEGG pathway. Because 183 of 207 proteins were from Thermus thermophilus HB8, genes of Thermus thermophilus HB8 were used as background. Under the condition of ease less than or equal to 0.01, the enriched GO terms of molecular function were GO:0016620 (oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor), GO:0051287 (NAD binding), and GO:0003735 (structural constituent of ribosome). The enriched GO terms of biological process and cellular component was GO:0006096 (glycolytic process) and GO:0005737 (cytoplasm), respectively, as shown in <xref ref-type="table" rid="T5">Table 5</xref>. The enriched pathways are listed in <xref ref-type="table" rid="T6">Table 6</xref>. In the nine enriched pathways, four was related to metabolism, and two to biosynthesis, implying involvement roles of the propionylation in the metabolism. Some researchers reported that lysine propionylation was involved in metabolism (<xref ref-type="bibr" rid="B33">Okanishi et al., 2014</xref>, <xref ref-type="bibr" rid="B34">2017</xref>; <xref ref-type="bibr" rid="B52">Yang et al., 2019</xref>).</p>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>Function groups of proteins.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">UNIPROT _ACCESSION</td>
<td valign="top" align="left">Gene name</td>
<td valign="top" align="center">Enrichment score</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Q5SIR5</td>
<td valign="top" align="left">Ribose-5-phosphate isomerase A (TTHA1299)</td>
<td valign="top" align="center">3.8325</td>
</tr>
<tr>
<td valign="top" align="left">Q5SIC8</td>
<td valign="top" align="left">Fructose 1,6-bisphosphatase II (glpX)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SM35</td>
<td valign="top" align="left">Transketolase (TTHA0108)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SHF7</td>
<td valign="top" align="left">Fructose-1,6-bisphosphate aldolase (TTHA1773)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SM37</td>
<td valign="top" align="left">Ribulose-phosphate 3-epimerase (TTHA0106)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SLJ4</td>
<td valign="top" align="left">Glucokinase (TTHA0299)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SJM8</td>
<td valign="top" align="left">Hypothetical protein (TTHA0980)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>P56194</bold></td>
<td valign="top" align="left">Histidyl-tRNA synthetase (hisS)</td>
<td valign="top" align="center">3.2378</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Q5SLY2</bold></td>
<td valign="top" align="left">Leucyl-tRNA synthetase (leuS)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SJX7</td>
<td valign="top" align="left">Seryl-tRNA synthetase (TTHA0875)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">P56881</td>
<td valign="top" align="left">Threonyl-tRNA synthetase (thrS)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">P56206</td>
<td valign="top" align="left">Glycyl-tRNA synthetase (TTHA0543)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">P56690</td>
<td valign="top" align="left">Isoleucyl-tRNA synthetase (ileS)</td>
<td valign="top" align="center">2.5835</td>
</tr>
<tr>
<td valign="top" align="left">P23395</td>
<td valign="top" align="left">Methionyl-tRNA synthetase (TTHA1298)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>P56194</bold></td>
<td valign="top" align="left">Histidyl-tRNA synthetase (hisS)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Q5SLY2</bold></td>
<td valign="top" align="left">Leucyl-tRNA synthetase (leuS)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SJ45</td>
<td valign="top" align="left">Valyl-tRNA synthetase (valS)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SIH0</td>
<td valign="top" align="left">Tyrosyl-tRNA synthetase (TTHA1399)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">P80380</td>
<td valign="top" align="left">30S ribosomal protein S20 (rpsT)</td>
<td valign="top" align="center">1.8414</td>
</tr>
<tr>
<td valign="top" align="left">Q5SHQ2</td>
<td valign="top" align="left">30S ribosomal protein S8 (rpsH)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SHP6</td>
<td valign="top" align="left">50S ribosomal protein L29 (TTHA1684)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SHQ5</td>
<td valign="top" align="left">30S ribosomal protein S5 (rpsE)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SLP7</td>
<td valign="top" align="left">50S ribosomal protein L1 (rplA)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SHQ0</td>
<td valign="top" align="left">50S ribosomal protein L5 (rplE)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">P80377</td>
<td valign="top" align="left">30S ribosomal protein S13 (rpsM)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SHN3</td>
<td valign="top" align="left">30S ribosomal protein S12 (rpsL)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">P35871</td>
<td valign="top" align="left">50S ribosomal protein L33 (rpmG)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q8VVE2</td>
<td valign="top" align="left">50S ribosomal protein L7/L12 (rplL)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q5SLY1</td>
<td valign="top" align="left">30S ribosomal protein S1 (rpsA)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">P17291</td>
<td valign="top" align="left">30S ribosomal protein S7 (TTHA1696)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Q9Z9H5</td>
<td valign="top" align="left">50S ribosomal protein L17 (rplQ)</td>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>Bold values mean repeat.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T5">
<label>TABLE 5</label>
<caption><p>Significantly enriched GO terms.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Category</td>
<td valign="top" align="left">Term</td>
<td valign="top" align="center">Count</td>
<td valign="top" align="center"><italic>P</italic> value</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">GOTERM_CC_DIRECT</td>
<td valign="top" align="left">GO:0005737 cytoplasm</td>
<td valign="top" align="center">38</td>
<td valign="top" align="center">1.07E-05</td>
</tr>
<tr>
<td valign="top" align="left">GOTERM_MF_DIRECT</td>
<td valign="top" align="left">GO:0016620 oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">1.91E-03</td>
</tr>
<tr>
<td valign="top" align="left">GOTERM_BP_DIRECT</td>
<td valign="top" align="left">GO:0006096 glycolytic process</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">3.07E-03</td>
</tr>
<tr>
<td valign="top" align="left">GOTERM_MF_DIRECT</td>
<td valign="top" align="left">GO:0051287 NAD binding</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">3.09E-03</td>
</tr>
<tr>
<td valign="top" align="left">GOTERM_MF_DIRECT</td>
<td valign="top" align="left">GO:0003735 structural constituent of ribosome</td>
<td valign="top" align="center">13</td>
<td valign="top" align="center">9.83E-03</td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T6">
<label>TABLE 6</label>
<caption><p>Significant KEGG pathways.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Term</td>
<td valign="top" align="center">Count</td>
<td valign="top" align="center"><italic>P</italic> value</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">ttj01200:Carbon metabolism</td>
<td valign="top" align="center">35</td>
<td valign="top" align="center">2.18E-09</td>
</tr>
<tr>
<td valign="top" align="left">ttj01120:Microbial metabolism in diverse environments</td>
<td valign="top" align="center">44</td>
<td valign="top" align="center">1.49E-07</td>
</tr>
<tr>
<td valign="top" align="left">ttj01130:Biosynthesis of antibiotics</td>
<td valign="top" align="center">43</td>
<td valign="top" align="center">4.16E-06</td>
</tr>
<tr>
<td valign="top" align="left">ttj00010:Glycolysis/gluconeogenesis</td>
<td valign="top" align="center">15</td>
<td valign="top" align="center">3.92E-05</td>
</tr>
<tr>
<td valign="top" align="left">ttj00020:Citrate cycle (TCA cycle)</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">1.52E-04</td>
</tr>
<tr>
<td valign="top" align="left">ttj00620:Pyruvate metabolism</td>
<td valign="top" align="center">14</td>
<td valign="top" align="center">5.84E-04</td>
</tr>
<tr>
<td valign="top" align="left">ttj00710:Carbon fixation in photosynthetic organisms</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">5.95E-04</td>
</tr>
<tr>
<td valign="top" align="left">ttj01110:Biosynthesis of secondary metabolites</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">7.43E-04</td>
</tr>
<tr>
<td valign="top" align="left">ttj01100:Metabolic pathways</td>
<td valign="top" align="center">85</td>
<td valign="top" align="center">8.13E-04</td>
</tr>
</tbody>
</table></table-wrap>
</sec>
</sec>
<sec id="S6">
<title>Conclusion</title>
<p>We presented a transfer learning-based method and an online webserver<sup><xref ref-type="fn" rid="footnote1">1</xref></sup> for computationally predicting propionylation. The method took advantage of crosstalk between propionylation and malonylation. The advantage of the method was to avoid artificially designing features. Statistical enrichment analysis implied that propoinylation was associated with metabolism.</p>
</sec>
<sec id="S7">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="DS1">Supplementary Material</xref>, further inquiries can be directed to the corresponding authors.</p>
</sec>
<sec id="S8">
<title>Author Contributions</title>
<p>AL and YD: conceptualization, funding acquisition, and writing &#x2013; original draft. YT: data curation, formal analysis, and software. AL, MC, and YD: methodology, validation, writing &#x2013; review, and editing. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> This work was supported in part by the Natural Science Foundation of Hunan Province, China under Grant 2019JJ40064 and Scientific Research Project of Education Department of Hunan Province under Grant 19B142, 19A125, and 18A253.</p>
</fn>
</fn-group>
<sec id="S10" sec-type="supplementary material"><title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fphys.2021.658633/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fphys.2021.658633/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.CSV" id="DS1" mimetype="text/csv" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_2.CSV" id="DS2" mimetype="text/csv" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_3.CSV" id="DS3" mimetype="text/csv" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_4.CSV" id="DS4" mimetype="text/csv" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ai</surname> <given-names>H.</given-names></name> <name><surname>Wu</surname> <given-names>R.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Wu</surname> <given-names>X.</given-names></name> <name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>H.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>pSuc-PseRat: predicting lysine succinylation in proteins by exploiting the ratios of sequence coupling and properties.</article-title> <source><italic>J. Comput. Biol.</italic></source> <volume>24</volume> <fpage>1050</fpage>&#x2013;<lpage>1059</lpage>. <pub-id pub-id-type="doi">10.1089/cmb.2016.0206</pub-id> <pub-id pub-id-type="pmid">28682641</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boser</surname> <given-names>B. E.</given-names></name> <name><surname>Guyon</surname> <given-names>I. M.</given-names></name> <name><surname>Vapnik</surname> <given-names>V. N.</given-names></name></person-group> (<year>1992</year>). &#x201C;<article-title>A training algorithm for optimal margin classifiers</article-title>,&#x201D; in <source><italic>Proceedings of the 5th Annual Workshop on Computational Learning Theory</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>144</fpage>&#x2013;<lpage>152</lpage>.</citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Callaway</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x2018;It will change everything&#x2019;: DeepMind&#x2019;s AI makes gigantic leap in solving protein structures.</article-title> <source><italic>Nature</italic></source> <volume>588</volume> <fpage>203</fpage>&#x2013;<lpage>204</lpage>. <pub-id pub-id-type="doi">10.1038/d41586-020-03348-4</pub-id> <pub-id pub-id-type="pmid">33257889</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Caragea</surname> <given-names>C.</given-names></name> <name><surname>Sinapov</surname> <given-names>J.</given-names></name> <name><surname>Silvescu</surname> <given-names>A.</given-names></name> <name><surname>Dobbs</surname> <given-names>D.</given-names></name> <name><surname>Honavar</surname> <given-names>V.</given-names></name></person-group> (<year>2007</year>). <article-title>Glycosylation site prediction using ensembles of Support Vector Machine classifiers.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>8</volume>:<issue>438</issue>. <pub-id pub-id-type="doi">10.1186/1471-2105-8-438</pub-id> <pub-id pub-id-type="pmid">17996106</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>G.</given-names></name> <name><surname>Cao</surname> <given-names>M.</given-names></name> <name><surname>Luo</surname> <given-names>K.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Wen</surname> <given-names>P.</given-names></name> <name><surname>Shi</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>ProAcePred: prokaryote lysine acetylation sites prediction based on elastic net feature optimization.</article-title> <source><italic>Bioinformatics</italic></source> <volume>34</volume> <fpage>3999</fpage>&#x2013;<lpage>4006</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty444</pub-id> <pub-id pub-id-type="pmid">29868863</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Sprung</surname> <given-names>R.</given-names></name> <name><surname>Tang</surname> <given-names>Y.</given-names></name> <name><surname>Ball</surname> <given-names>H.</given-names></name> <name><surname>Sangras</surname> <given-names>B.</given-names></name> <name><surname>Kim</surname> <given-names>S. C.</given-names></name><etal/></person-group> (<year>2007</year>). <article-title>Lysine propionylation and butyrylation are novel post-translational modifications in histones.</article-title> <source><italic>Mol. Cell. Proteomics</italic></source> <volume>6</volume> <fpage>812</fpage>&#x2013;<lpage>819</lpage>. <pub-id pub-id-type="doi">10.1074/mcp.m700021-mcp200</pub-id> <pub-id pub-id-type="pmid">17267393</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Z.</given-names></name> <name><surname>Tang</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>S. S. C.</given-names></name><etal/></person-group> (<year>2009</year>). <article-title>Molecular characterization of propionyllysines in non-histone proteins.</article-title> <source><italic>Mol. Cell. Proteomics</italic></source> <volume>8</volume> <fpage>45</fpage>&#x2013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.1074/mcp.m800224-mcp200</pub-id> <pub-id pub-id-type="pmid">18753126</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cortes</surname> <given-names>C.</given-names></name> <name><surname>Vapnik</surname> <given-names>V.</given-names></name></person-group> (<year>1995</year>). <article-title>Support-vector networks.</article-title> <source><italic>Mach. Learn.</italic></source> <volume>20</volume> <fpage>273</fpage>&#x2013;<lpage>297</lpage>.</citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Brevern</surname> <given-names>A. G.</given-names></name> <name><surname>Hasan</surname> <given-names>M. M.</given-names></name> <name><surname>Kurata</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). <article-title>GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features.</article-title> <source><italic>Plos One</italic></source> <volume>13</volume>:<issue>e0200283</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0200283</pub-id> <pub-id pub-id-type="pmid">30312302</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giles</surname> <given-names>C. L.</given-names></name> <name><surname>Kuhn</surname> <given-names>G. M.</given-names></name> <name><surname>Williams</surname> <given-names>R. J.</given-names></name></person-group> (<year>1994</year>). <article-title>Dynamic recurrent neural networks: theory and applications.</article-title> <source><italic>IEEE Trans. Neural Netw.</italic></source> <volume>5</volume> <fpage>153</fpage>&#x2013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1109/tnn.1994.8753425</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hinton</surname> <given-names>G. E.</given-names></name> <name><surname>Srivastava</surname> <given-names>N.</given-names></name> <name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R. R.</given-names></name></person-group> (<year>2012</year>). <article-title>Improving neural networks by preventing co-adaptation of feature detectors.</article-title> <source><italic>arXiv</italic> [preprint]</source> <comment>arXiv:1207.0580</comment></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hochreiter</surname> <given-names>S.</given-names></name> <name><surname>Schmidhuber</surname> <given-names>J.</given-names></name></person-group> (<year>1997</year>). <article-title>Long short-term memory.</article-title> <source><italic>Neural Comput.</italic></source> <volume>9</volume> <fpage>1735</fpage>&#x2013;<lpage>1780</lpage>.</citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang da</surname> <given-names>W.</given-names></name> <name><surname>Sherman</surname> <given-names>B. T.</given-names></name> <name><surname>Lempicki</surname> <given-names>R. A.</given-names></name></person-group> (<year>2009</year>). <article-title>Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.</article-title> <source><italic>Nat. Protoc.</italic></source> <volume>4</volume> <fpage>44</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1038/nprot.2008.211</pub-id> <pub-id pub-id-type="pmid">19131956</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>G.</given-names></name> <name><surname>Zeng</surname> <given-names>W.</given-names></name></person-group> (<year>2016</year>). <article-title>A discrete hidden markov model for detecting histone crotonyllysine sites.</article-title> <source><italic>Match Commun. Math. Comput. Chem</italic></source> <volume>75</volume> <fpage>717</fpage>&#x2013;<lpage>730</lpage>.</citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>G.</given-names></name> <name><surname>Zheng</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>Y.-Q.</given-names></name> <name><surname>Yu</surname> <given-names>Z.-G.</given-names></name></person-group> (<year>2020</year>). <article-title>An information entropy-based approach for computationally identifying histone lysine butyrylation.</article-title> <source><italic>Front. Genet.</italic></source> <volume>10</volume>:<issue>1325</issue>. <pub-id pub-id-type="doi">10.3389/fgene.2019.01325</pub-id> <pub-id pub-id-type="pmid">32117407</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>K.-Y.</given-names></name> <name><surname>Hsu</surname> <given-names>J. B.-K.</given-names></name> <name><surname>Lee</surname> <given-names>T.-Y.</given-names></name></person-group> (<year>2019</year>). <article-title>Characterization and identification of lysine succinylation sites based on deep learning method.</article-title> <source><italic>Sci. Rep.</italic></source> <volume>9</volume>:<issue>16175</issue>.</citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Joachims</surname> <given-names>T.</given-names></name></person-group> (<year>1999</year>). &#x201C;<article-title>Transductive inference for text classification using support vector machines</article-title>,&#x201D; in <source><italic>Paper Presented at International Conference on Machine Learning; 6/27/1999, Bled.</italic></source></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ju</surname> <given-names>Z.</given-names></name> <name><surname>He</surname> <given-names>J. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou&#x2019;s PseAAC.</article-title> <source><italic>J. Mol. Graph. Model.</italic></source> <volume>76</volume> <fpage>356</fpage>&#x2013;<lpage>363</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmgm.2017.07.022</pub-id> <pub-id pub-id-type="pmid">28763688</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Junqueira</surname> <given-names>S. C.</given-names></name> <name><surname>Centeno</surname> <given-names>E. G. Z.</given-names></name> <name><surname>Wilkinson</surname> <given-names>K. A.</given-names></name> <name><surname>Cimarosti</surname> <given-names>H.</given-names></name></person-group> (<year>2019</year>). <article-title>Post-translational modifications of Parkinson&#x2019;s disease-related proteins: phosphorylation, SUMOylation and ubiquitination.</article-title> <source><italic>Biochim. Biophys. Acta</italic></source> <volume>1865</volume> <fpage>2001</fpage>&#x2013;<lpage>2007</lpage>. <pub-id pub-id-type="doi">10.1016/j.bbadis.2018.10.025</pub-id> <pub-id pub-id-type="pmid">30412791</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kebede</surname> <given-names>A. F.</given-names></name> <name><surname>Nieborak</surname> <given-names>A.</given-names></name> <name><surname>Shahidian</surname> <given-names>L. Z.</given-names></name> <name><surname>Le Gras</surname> <given-names>S.</given-names></name> <name><surname>Richter</surname> <given-names>F.</given-names></name> <name><surname>G&#x00F3;mez</surname> <given-names>D. A.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Histone propionylation is a mark of active chromatin.</article-title> <source><italic>Nat. Struct. Mol. Biol.</italic></source> <volume>24</volume> <fpage>1048</fpage>&#x2013;<lpage>1056</lpage>. <pub-id pub-id-type="doi">10.1038/nsmb.3490</pub-id> <pub-id pub-id-type="pmid">29058708</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leemhuis</surname> <given-names>H.</given-names></name> <name><surname>Packman</surname> <given-names>L. C.</given-names></name> <name><surname>Nightingale</surname> <given-names>K. P.</given-names></name> <name><surname>Hollfelder</surname> <given-names>F.</given-names></name></person-group> (<year>2008</year>). <article-title>The human histone acetyltransferase P/CAF is a promiscuous histone propionyltransferase.</article-title> <source><italic>Chembiochem</italic></source> <volume>9</volume> <fpage>499</fpage>&#x2013;<lpage>503</lpage>. <pub-id pub-id-type="doi">10.1002/cbic.200700556</pub-id> <pub-id pub-id-type="pmid">18247445</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>M.</given-names></name> <name><surname>Shyr</surname> <given-names>Y.</given-names></name> <name><surname>Xie</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name></person-group> (<year>2009</year>). <article-title>Improved prediction of lysine acetylation by support vector machines.</article-title> <source><italic>Protein Pept. Lett.</italic></source> <volume>16</volume> <fpage>977</fpage>&#x2013;<lpage>983</lpage>. <pub-id pub-id-type="doi">10.2174/092986609788923338</pub-id> <pub-id pub-id-type="pmid">19689425</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>Godzik</surname> <given-names>A.</given-names></name></person-group> (<year>2006</year>). <article-title>Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.</article-title> <source><italic>Bioinformatics</italic></source> <volume>22</volume> <fpage>1658</fpage>&#x2013;<lpage>1659</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btl158</pub-id> <pub-id pub-id-type="pmid">16731699</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y. X.</given-names></name> <name><surname>Shao</surname> <given-names>Y. H.</given-names></name> <name><surname>Jing</surname> <given-names>L.</given-names></name> <name><surname>Deng</surname> <given-names>N. Y.</given-names></name></person-group> (<year>2011</year>). <article-title>An efficient support vector machine approach for identifying protein S-nitrosylation sites.</article-title> <source><italic>Protein Pept. Lett.</italic></source> <volume>18</volume> <fpage>573</fpage>&#x2013;<lpage>587</lpage>. <pub-id pub-id-type="doi">10.2174/092986611795222731</pub-id> <pub-id pub-id-type="pmid">21271979</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Zhao</surname> <given-names>X. M.</given-names></name> <name><surname>Li</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>DeepPhos: prediction of protein phosphorylation sites with deep learning.</article-title> <source><italic>Bioinformatics</italic></source> <volume>35</volume> <fpage>2766</fpage>&#x2013;<lpage>2773</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty1051</pub-id> <pub-id pub-id-type="pmid">30601936</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lv</surname> <given-names>H.</given-names></name> <name><surname>Dao</surname> <given-names>F. Y.</given-names></name> <name><surname>Guan</surname> <given-names>Z. X.</given-names></name> <name><surname>Yang</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>Y. W.</given-names></name> <name><surname>Lin</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method.</article-title> <source><italic>Brief Bioinform.</italic></source> bbaa255. <pub-id pub-id-type="doi">10.1093/bib/bbaa255</pub-id> <pub-id pub-id-type="pmid">33099604</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Malebary</surname> <given-names>S. J.</given-names></name> <name><surname>Rehman</surname> <given-names>M. S. U.</given-names></name> <name><surname>Khan</surname> <given-names>Y. D.</given-names></name></person-group> (<year>2019</year>). <article-title>iCrotoK-PseAAC: identify lysine crotonylation sites by blending position relative statistical features according to the Chou&#x2019;s 5-step rule.</article-title> <source><italic>PLoS One</italic></source> <volume>14</volume>:<issue>e0223993</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0223993</pub-id> <pub-id pub-id-type="pmid">31751380</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martin</surname> <given-names>L.</given-names></name> <name><surname>Latypova</surname> <given-names>X.</given-names></name> <name><surname>Terro</surname> <given-names>F.</given-names></name></person-group> (<year>2011</year>). <article-title>Post-translational modifications of tau protein: implications for Alzheimer&#x2019;s disease.</article-title> <source><italic>Neurochem. Int.</italic></source> <volume>58</volume> <fpage>458</fpage>&#x2013;<lpage>471</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuint.2010.12.023</pub-id> <pub-id pub-id-type="pmid">21215781</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Matic</surname> <given-names>N.</given-names></name> <name><surname>Guyon</surname> <given-names>I.</given-names></name> <name><surname>Denker</surname> <given-names>J.</given-names></name> <name><surname>Vapnik</surname> <given-names>V.</given-names></name></person-group> (<year>1993</year>). &#x201C;<article-title>Writer-adaptation for on-line handwritten character recognition</article-title>,&#x201D; in <source><italic>Paper Presented at the 2nd International Conference on Document Analysis and Recognition; 10/20/1993, Tsukuba.</italic></source></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nakamura</surname> <given-names>T.</given-names></name> <name><surname>Prikhodko</surname> <given-names>O. A.</given-names></name> <name><surname>Pirie</surname> <given-names>E.</given-names></name> <name><surname>Nagar</surname> <given-names>S.</given-names></name> <name><surname>Akhtar</surname> <given-names>M. W.</given-names></name> <name><surname>Oh</surname> <given-names>C. K.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Aberrant protein S-nitrosylation contributes to the pathophysiology of neurodegenerative diseases.</article-title> <source><italic>Neurobiol. Dis.</italic></source> <volume>84</volume> <fpage>99</fpage>&#x2013;<lpage>108</lpage>. <pub-id pub-id-type="doi">10.1016/j.nbd.2015.03.017</pub-id> <pub-id pub-id-type="pmid">25796565</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ning</surname> <given-names>Q.</given-names></name> <name><surname>Yu</surname> <given-names>M.</given-names></name> <name><surname>Ji</surname> <given-names>J.</given-names></name> <name><surname>Ma</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name></person-group> (<year>2019</year>). <article-title>Analysis and prediction of human acetylation using a cascade classifier based on support vector machine.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>20</volume>:<issue>346</issue>. <pub-id pub-id-type="doi">10.1186/s12859-019-2938-7</pub-id> <pub-id pub-id-type="pmid">31208321</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ning</surname> <given-names>Q.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name> <name><surname>Bao</surname> <given-names>L.</given-names></name> <name><surname>Ma</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). <article-title>Detecting Succinylation sites from protein sequences using ensemble support vector machine.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>19</volume>:<issue>237</issue>. <pub-id pub-id-type="doi">10.1186/s12859-018-2249-4</pub-id> <pub-id pub-id-type="pmid">29940836</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okanishi</surname> <given-names>H.</given-names></name> <name><surname>Kim</surname> <given-names>K.</given-names></name> <name><surname>Masui</surname> <given-names>R.</given-names></name> <name><surname>Kuramitsu</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>Lysine propionylation is a prevalent post-translational modification in <italic>Thermus thermophilus</italic>.</article-title> <source><italic>Mol. Cell. Proteomics</italic></source> <volume>13</volume> <fpage>2382</fpage>&#x2013;<lpage>2398</lpage>. <pub-id pub-id-type="doi">10.1074/mcp.m113.035659</pub-id> <pub-id pub-id-type="pmid">24938286</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okanishi</surname> <given-names>H.</given-names></name> <name><surname>Kim</surname> <given-names>K.</given-names></name> <name><surname>Masui</surname> <given-names>R.</given-names></name> <name><surname>Kuramitsu</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Proteome-wide identification of lysine propionylation in thermophilic and mesophilic bacteria: <italic>Geobacillus kaustophilus</italic>, <italic>Thermus thermophilus</italic>, <italic>Escherichia coli</italic>, <italic>Bacillus subtilis</italic>, and <italic>Rhodothermus marinus</italic>.</article-title> <source><italic>Extremophiles</italic></source> <volume>21</volume> <fpage>283</fpage>&#x2013;<lpage>296</lpage>. <pub-id pub-id-type="doi">10.1007/s00792-016-0901-3</pub-id> <pub-id pub-id-type="pmid">27928680</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Osuna</surname> <given-names>E.</given-names></name> <name><surname>Freund</surname> <given-names>R.</given-names></name> <name><surname>Girosit</surname> <given-names>F.</given-names></name></person-group> (<year>1997</year>). &#x201C;<article-title>Training support vector machines: an application to face detection</article-title>,&#x201D; in <source><italic>Paper Presented at Computer Vision and Pattern Recognition; 6/17/1997, Los Alamitos.</italic></source></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pearlmutter</surname> <given-names>B. A.</given-names></name></person-group> (<year>1989</year>). <article-title>Learning state space trajectories in recurrent neural networks.</article-title> <source><italic>Neural Comput.</italic></source> <volume>1</volume> <fpage>263</fpage>&#x2013;<lpage>269</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1989.1.2.263</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Plewczynski</surname> <given-names>D.</given-names></name> <name><surname>Tkacz</surname> <given-names>A.</given-names></name> <name><surname>Wyrwicz</surname> <given-names>L. S.</given-names></name> <name><surname>Rychlewski</surname> <given-names>L.</given-names></name> <name><surname>Ginalski</surname> <given-names>K.</given-names></name></person-group> (<year>2008</year>). <article-title>AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update.</article-title> <source><italic>J. Mol. Model.</italic></source> <volume>14</volume> <fpage>69</fpage>&#x2013;<lpage>76</lpage>. <pub-id pub-id-type="doi">10.1007/s00894-007-0250-3</pub-id> <pub-id pub-id-type="pmid">17994256</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pugalenthi</surname> <given-names>G.</given-names></name> <name><surname>Kandaswamy</surname> <given-names>K. K.</given-names></name> <name><surname>Suganthan</surname> <given-names>P. N.</given-names></name> <name><surname>Sowdhamini</surname> <given-names>R.</given-names></name> <name><surname>Martinetz</surname> <given-names>T.</given-names></name> <name><surname>Kolatkar</surname> <given-names>P. R.</given-names></name></person-group> (<year>2010</year>). <article-title>SMpred: a support vector machine approach to identify structural motifs in protein structure without using evolutionary information.</article-title> <source><italic>J. Biomol. Struct. Dyn.</italic></source> <volume>28</volume> <fpage>405</fpage>&#x2013;<lpage>414</lpage>. <pub-id pub-id-type="doi">10.1080/07391102.2010.10507369</pub-id> <pub-id pub-id-type="pmid">20919755</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qian</surname> <given-names>Y.</given-names></name> <name><surname>Ye</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>SUMO-Forest: a Cascade Forest based method for the prediction of SUMOylation sites on imbalanced data.</article-title> <source><italic>Gene</italic></source> <volume>741</volume> <issue>144536</issue>. <pub-id pub-id-type="doi">10.1016/j.gene.2020.144536</pub-id> <pub-id pub-id-type="pmid">32160959</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Srivastava</surname> <given-names>N.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name> <name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Dropout: a simple way to prevent neural networks from overfitting.</article-title> <source><italic>J. Mach. Learn. Res.</italic></source> <volume>15</volume> <fpage>1929</fpage>&#x2013;<lpage>1958</lpage>.</citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Meng</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>lncRScan-SVM: a tool for predicting long non-coding RNAs using support vector machine.</article-title> <source><italic>PLoS One</italic></source> <volume>10</volume>:<issue>e0139654</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0139654</pub-id> <pub-id pub-id-type="pmid">26437338</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thapa</surname> <given-names>N.</given-names></name> <name><surname>Chaudhari</surname> <given-names>M.</given-names></name> <name><surname>McManus</surname> <given-names>S.</given-names></name> <name><surname>Roy</surname> <given-names>K.</given-names></name> <name><surname>Newman</surname> <given-names>R. H.</given-names></name> <name><surname>Saigo</surname> <given-names>H.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>21(Suppl 3)</volume>:<issue>63</issue>. <pub-id pub-id-type="doi">10.1186/s12859-020-3342-z</pub-id> <pub-id pub-id-type="pmid">32321437</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>UniProt Consortium</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>UniProt: the universal protein knowledgebase.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>46</volume>:<issue>2699</issue>. <pub-id pub-id-type="doi">10.1093/nar/gky092</pub-id> <pub-id pub-id-type="pmid">29425356</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vapnik</surname> <given-names>V. N.</given-names></name> <name><surname>Vapnik</surname> <given-names>V.</given-names></name></person-group> (<year>1998</year>). <source><italic>Statistical Learning Theory.</italic></source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Wiley</publisher-name>.</citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Liang</surname> <given-names>Y.</given-names></name> <name><surname>Xu</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Capsule network for protein post-translational modification site prediction.</article-title> <source><italic>Bioinformatics</italic></source> <volume>35</volume> <fpage>2386</fpage>&#x2013;<lpage>2394</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty977</pub-id> <pub-id pub-id-type="pmid">30520972</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>L. N.</given-names></name> <name><surname>Shi</surname> <given-names>S. P.</given-names></name> <name><surname>Wen</surname> <given-names>P. P.</given-names></name> <name><surname>Zhou</surname> <given-names>Z. Y.</given-names></name> <name><surname>Qiu</surname> <given-names>J. D.</given-names></name></person-group> (<year>2017</year>). <article-title>Computing prediction and functional analysis of prokaryotic propionylation.</article-title> <source><italic>J. Chem. Inf. Model.</italic></source> <volume>57</volume> <fpage>2896</fpage>&#x2013;<lpage>2904</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.7b00482</pub-id> <pub-id pub-id-type="pmid">29059524</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>L.</given-names></name> <name><surname>Xing</surname> <given-names>P.</given-names></name> <name><surname>Shi</surname> <given-names>G.</given-names></name> <name><surname>Ji</surname> <given-names>Z.</given-names></name> <name><surname>Zou</surname> <given-names>Q.</given-names></name></person-group> (<year>2019</year>). <article-title>Fast prediction of protein methylation sites using a sequence-based feature selection technique.</article-title> <source><italic>IEEE ACM Trans. Comput. Biol. Bioinform.</italic></source> <volume>16</volume> <fpage>1264</fpage>&#x2013;<lpage>1273</lpage>. <pub-id pub-id-type="doi">10.1109/tcbb.2017.2670558</pub-id> <pub-id pub-id-type="pmid">28222000</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>L.</given-names></name> <name><surname>Xing</surname> <given-names>P.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Zou</surname> <given-names>Q.</given-names></name></person-group> (<year>2017</year>). <article-title>PhosPred-RF: a novel sequence-based predictor for phosphorylation sites using sequential information only.</article-title> <source><italic>IEEE Trans. Nanobiosci.</italic></source> <volume>16</volume> <fpage>240</fpage>&#x2013;<lpage>247</lpage>. <pub-id pub-id-type="doi">10.1109/tnb.2017.2661756</pub-id> <pub-id pub-id-type="pmid">28166503</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xiang</surname> <given-names>Q.</given-names></name> <name><surname>Feng</surname> <given-names>K.</given-names></name> <name><surname>Liao</surname> <given-names>B.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>Prediction of lysine malonylation sites based on pseudo amino acid.</article-title> <source><italic>Comb. Chem. High Throughput Screen.</italic></source> <volume>20</volume> <fpage>622</fpage>&#x2013;<lpage>628</lpage>.</citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>Y.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Ma</surname> <given-names>W.</given-names></name> <name><surname>Huang</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>DeepNitro: prediction of protein nitration and nitrosylation sites by deep learning.</article-title> <source><italic>Genomics Proteomics Bioinformatics</italic></source> <volume>16</volume> <fpage>294</fpage>&#x2013;<lpage>306</lpage>. <pub-id pub-id-type="doi">10.1016/j.gpb.2018.04.007</pub-id> <pub-id pub-id-type="pmid">30268931</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>H.</given-names></name> <name><surname>Zhou</surname> <given-names>J.</given-names></name> <name><surname>Lin</surname> <given-names>S.</given-names></name> <name><surname>Deng</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Xue</surname> <given-names>Y.</given-names></name></person-group> (<year>2017</year>). <article-title>PLMD: an updated data resource of protein lysine modifications.</article-title> <source><italic>J. Genet. Genomics</italic></source> <volume>44</volume> <fpage>243</fpage>&#x2013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.1016/j.jgg.2017.03.007</pub-id> <pub-id pub-id-type="pmid">28529077</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>M.</given-names></name> <name><surname>Huang</surname> <given-names>H.</given-names></name> <name><surname>Ge</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Lysine propionylation is a widespread post-translational modification involved in regulation of photosynthesis and metabolism in <italic>Cyanobacteria</italic>.</article-title> <source><italic>Int J Mol Sci</italic></source> <volume>20</volume> <issue>4792</issue>. <pub-id pub-id-type="doi">10.3390/ijms20194792</pub-id> <pub-id pub-id-type="pmid">31561603</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>K.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>Y.</given-names></name></person-group> (<year>2008</year>). <article-title>Identification and verification of lysine propionylation and butyrylation in Yeast core histones using PTMap software.</article-title> <source><italic>J. Proteome Res.</italic></source> <volume>8</volume> <fpage>900</fpage>&#x2013;<lpage>906</lpage>. <pub-id pub-id-type="doi">10.1021/pr8005155</pub-id> <pub-id pub-id-type="pmid">19113941</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>T.</given-names></name> <name><surname>Huang</surname> <given-names>G.</given-names></name> <name><surname>Zhang</surname> <given-names>N.</given-names></name> <name><surname>Kong</surname> <given-names>X.</given-names></name> <name><surname>Cai</surname> <given-names>Y.-D.</given-names></name></person-group> (<year>2016</year>). <article-title>Prediction of protein N-formylation and comparison with N-acetylation based on a feature selection method.</article-title> <source><italic>Neurocomputing</italic></source> <volume>217</volume> <fpage>53</fpage>&#x2013;<lpage>62</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2015.10.148</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="footnote1">
<label>1</label>
<p><ext-link ext-link-type="uri" xlink:href="http://47.113.117.61/">http://47.113.117.61/</ext-link></p></fn>
</fn-group>
</back>
</article>