<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2019.00090</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Chen</surname> <given-names>Zhan-Heng</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/625777/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Li</surname> <given-names>Li-Ping</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>He</surname> <given-names>Zhou</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhou</surname> <given-names>Ji-Ren</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Li</surname> <given-names>Yangming</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/376644/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wong</surname> <given-names>Leon</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib></contrib-group>
<aff id="aff1"><sup>1</sup><institution>The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences</institution>, <addr-line>&#x00DC;r&#x00FC;mqi</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>University of Chinese Academy of Sciences</institution>, <addr-line>Beijing</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>College of Engineering and Applied Science, University of Colorado Boulder</institution>, <addr-line>Boulder, CO</addr-line>, <country>United States</country></aff>
<aff id="aff4"><sup>4</sup><institution>ECTET, Rochester Institute of Technology</institution>, <addr-line>Rochester, NY</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Xing Chen, China University of Mining and Technology, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Pengwei Hu, Hong Kong Polytechnic University, Hong Kong; Jie Gui, University of Michigan, United States</p></fn>
<corresp id="c001">&#x002A;Correspondence: Li-Ping Li, <email>Lipingli@ms.xjb.ac.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>01</day>
<month>03</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>10</volume>
<elocation-id>90</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>10</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>01</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2019 Chen, Li, He, Zhou, Li and Wong.</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Chen, Li, He, Zhou, Li and Wong</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Self-interacting proteins (SIPs), whose more than two identities can interact with each other, play significant roles in the understanding of cellular process and cell functions. Although a number of experimental methods have been designed to detect the SIPs, they remain to be extremely time-consuming, expensive, and challenging even nowadays. Therefore, there is an urgent need to develop the computational methods for predicting SIPs. In this study, we propose a deep forest based predictor for accurate prediction of SIPs using protein sequence information. More specifically, a novel feature representation method, which integrate position-specific scoring matrix (PSSM) with wavelet transform, is introduced. To evaluate the performance of the proposed method, cross-validation tests are performed on two widely used benchmark datasets. The experimental results show that the proposed model achieved high accuracies of 95.43 and 93.65% on <italic>human</italic> and <italic>yeast</italic> datasets, respectively. The AUC value for evaluating the performance of the proposed method was also reported. The AUC value for <italic>yeast</italic> and <italic>human</italic> datasets are 0.9203 and 0.9586, respectively. To further show the advantage of the proposed method, it is compared with several existing methods. The results demonstrate that the proposed model is better than other SIPs prediction methods. This work can offer an effective architecture to biologists in detecting new SIPs.</p>
</abstract>
<kwd-group>
<kwd>self-interacting proteins</kwd>
<kwd>disease</kwd>
<kwd>position-specific scoring matrix</kwd>
<kwd>deep learning</kwd>
<kwd>wavelet transform</kwd>
</kwd-group>
<contract-num rid="cn001">61373086</contract-num>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content></contract-sponsor>
<counts>
<fig-count count="7"/>
<table-count count="3"/>
<equation-count count="15"/>
<ref-count count="65"/>
<page-count count="10"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec><title>Introduction</title>
<p>Proteins, highly complex substance, are the main compound of all the life. It is also the material basis and the first element of the life. Individual proteins rarely works in isolation. Most of proteins can work together with molecular partners or other proteins, which are associated with protein-protein interactions (PPIs) (<xref ref-type="bibr" rid="B14">Chou and Cai, 2006</xref>; <xref ref-type="bibr" rid="B59">You et al., 2014b</xref>,<xref ref-type="bibr" rid="B60">c</xref>; <xref ref-type="bibr" rid="B33">Li et al., 2017</xref>). One special case of PPIs is self-interacting proteins (SIPs), whose more than two identities can interact with each other to form a homodimer or homotrimer or homo-oligomer (<xref ref-type="bibr" rid="B40">Marianayagam et al., 2004</xref>), play key roles in the understanding of celluar process and cell functions. These interactions have received much more attention than they have done in recent years. <xref ref-type="bibr" rid="B28">Ispolatov et al. (2005)</xref> specified that the quantity of SIPs is more than twice as much as that of other proteins in the protein interaction network (PIN) (<xref ref-type="bibr" rid="B55">You et al., 2010a</xref>, <xref ref-type="bibr" rid="B58">2014a</xref>, <xref ref-type="bibr" rid="B51">2015b</xref>, <xref ref-type="bibr" rid="B54">2017c</xref>; <xref ref-type="bibr" rid="B38">Liu et al., 2013</xref>; <xref ref-type="bibr" rid="B26">Huang et al., 2016a</xref>; <xref ref-type="bibr" rid="B35">Li et al., 2016</xref>), which point out the function of SIPs importance for cellular systems, so as to better understand the effect of disease mechanism. <xref ref-type="bibr" rid="B43">P&#x00E9;rez-Bercoff et al. (2010)</xref> considered that the genes of SIPs may have higher duplicability than others, and their research focus on the whole-genome level rather than the small scale. <xref ref-type="bibr" rid="B23">Hashimoto et al. (2011)</xref> presented several molecular mechanisms of self-interaction, mainly includes ligand-induced, domain swapping, insertions, and deletions. As a result, most previous works focus on the individual SIPs with the level of structures and functions. To our current knowledge, there are a great deal of computational techniques based on machine learning and deep learning (<xref ref-type="bibr" rid="B22">Gui et al., 2009</xref>; <xref ref-type="bibr" rid="B56">You et al., 2010b</xref>, <xref ref-type="bibr" rid="B50">2015a</xref>, <xref ref-type="bibr" rid="B52">2017a</xref>,<xref ref-type="bibr" rid="B53">b</xref>; <xref ref-type="bibr" rid="B39">Lu et al., 2013</xref>; <xref ref-type="bibr" rid="B41">Mi et al., 2013</xref>; <xref ref-type="bibr" rid="B25">Huang et al., 2015</xref>; <xref ref-type="bibr" rid="B13">Chen et al., 2016</xref>, <xref ref-type="bibr" rid="B9">2018a</xref>,<xref ref-type="bibr" rid="B10">b</xref>,<xref ref-type="bibr" rid="B11">c</xref>; <xref ref-type="bibr" rid="B21">Gui et al., 2016</xref>; <xref ref-type="bibr" rid="B27">Huang et al., 2016b</xref>; <xref ref-type="bibr" rid="B34">Li et al., 2018</xref>) which applied in the field of bioinformatics and genomics, in which they were few for detecting protein interactions.</p>
<p>Recently, <xref ref-type="bibr" rid="B64">Zhou et al. (2012)</xref> developed a PPI model for PPIs prediction, which inputs condon pair frequency difference into a support vector machine (SVM) predictor. Particularly, <xref ref-type="bibr" rid="B57">You et al. (2013)</xref> presented a novel method which combined principal component analysis (PCA) with ensemble extreme learning machine model to predict PPIs based on the amino acid sequences information. Since the proposed feature extraction method has a higher discriminative power to reveal most of the information from protein sequences, they are great success for PPIs detection. <xref ref-type="bibr" rid="B62">Zahiri et al. (2013)</xref> introduced a PPIevo algorithm based on evolutionary feature which extracted from position-specific scoring matrix (PSSM) of known protein sequences. <xref ref-type="bibr" rid="B17">Du et al. (2014)</xref> designed a predictor for SIPs by applying random forest with the ensemble coding method, which integrated many biochemical properties and useful features. <xref ref-type="bibr" rid="B63">Zhang et al. (2018)</xref> predicted PPIs by using a ensemble deep neural networks (DNN) based on various of representations of protein sequences. <xref ref-type="bibr" rid="B33">Li et al. (2017)</xref> detected the SIPs based on evolutionary information and amino acids sequences by using ensemble learning method. Although these methods were relatively mature for PPIs prediction, there were few machine learning and deep learning methods to predict SIPs.</p>
<p>Given this potential, in this study we presented a novel approach for SIPs prediction, which combined deep forest with wavelet transform (WT) method based on PSSM of protein sequences. First, we widely collected the golden standard <italic>human</italic> and <italic>yeast</italic> datasets from common database, which can be integrated for discriminating SIPs. Second, Position-specific Iterative Basic Local Alignment Search Tool (PSI-BLAST) collated each protein sequence conversion for a PSSM. Third, WT approach was applied to calculate the feature values which could be input into deep forest, and then the SIPs prediction model was constructed. At last, we carried out experiments on the two golden standard datasets and compared the presented model with SVM method and other existing methods. Experimental results suggest that our proposed model works very well for SIPs prediction and can provide clues for understanding protein functions. We described our work as a <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>The flowchart of our work.</p></caption>
<graphic xlink:href="fgene-10-00090-g001.tif"/>
</fig>
</sec>
<sec id="s1" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec><title>Datasets Preparation</title>
<p>In the experiment, we can derive 20,199 curated <italic>human</italic> protein sequences from the UniProt database (<xref ref-type="bibr" rid="B15">Consortium, 2014</xref>). Then, the PPI related information were integrated from all sorts of resources, including PDB (<xref ref-type="bibr" rid="B3">Berman et al., 2000</xref>), DIP (<xref ref-type="bibr" rid="B45">Salwinski et al., 2004</xref>), MINT (<xref ref-type="bibr" rid="B36">Licata et al., 2011</xref>), InnateDB (<xref ref-type="bibr" rid="B5">Breuer et al., 2012</xref>), IntAct (<xref ref-type="bibr" rid="B42">Orchard et al., 2013</xref>), BioGRID (<xref ref-type="bibr" rid="B7">Chatr-Aryamontri et al., 2017</xref>), and MatrixDB (<xref ref-type="bibr" rid="B31">Launay et al., 2014</xref>). The high quality data of these resources was sufficient for the creation of PPI prediction models. Here, we only paid close attention to those PPIs whose interaction types were labeled as &#x201C;direct interaction&#x201D; and for which the two interaction partners were identical. Finally, we can gather 2994 <italic>human</italic> self-interacting protein instances.</p>
<p>We need construct the golden standard datasets based on 2994 <italic>human</italic> SIPs mentioned above to measure the performance of the prediction model. It mainly includes the following steps (<xref ref-type="bibr" rid="B37">Liu et al., 2016</xref>): (1) We removed the protein sequences whose length &#x003C;50 residues and >5000 residues from all the <italic>human</italic> proteome, because they may be fragments; (2) To construct <italic>human</italic> golden standard positive dataset, and ensure that the SIPs is of high quality. It must be meet one of the following requirements: <inline-graphic xlink:href="fgene-10-00090-i001.jpg"/> the protein has been announced as homo-oligomer (containing homodimer and homotrimer) in UniProt; <inline-graphic xlink:href="fgene-10-00090-i002.jpg"/> the self-interaction could be detected by more than one small-scale experiment or two large-scale experiments; <inline-graphic xlink:href="fgene-10-00090-i003.jpg"/> it has been reported by more than two publications for the self-interactions; (3) For <italic>human</italic> golden standard negative dataset construction, we removed the various kinds of SIPs from all the <italic>human</italic> proteome (including proteins characterized as &#x201C;direct interaction&#x201D; and more wide-ranging &#x201C;physical association&#x201D;) and the detected SIPs annotated in UniProt database. As a result, the ultimate <italic>human</italic> golden standard datasets consisted of 1441 SIPs and 15,938 non-SIPs. And then, the whole <italic>human</italic> datasets size is 17379.</p>
<p>According to the above-mentioned method, we also built the <italic>yeast</italic> golden standard datasets to further measure the cross-species capacity of our proposed model. Thus, the final <italic>yeast</italic> datasets contained 710 SIPs as positives and 5511 non-SIPs as negatives. And then, the whole <italic>yeast</italic> datasets size is 6221.</p>
</sec>
<sec><title>Position Specific Scoring Matrix</title>
<p>In our achievements, position specific scoring matrix (PSSM) method is helpful to detect distantly related proteins (<xref ref-type="bibr" rid="B20">Gribskov et al., 1987</xref>; <xref ref-type="bibr" rid="B18">Gao et al., 2016</xref>; <xref ref-type="bibr" rid="B47">Wang L. et al., 2017</xref>;<xref ref-type="bibr" rid="B48">Wang Y.-B et al., 2017</xref>; <xref ref-type="bibr" rid="B49">Wang Y. et al., 2017</xref>). Accordingly, a PSSM was converted from each protein sequence information by employing the position specific iterated BLAST (PSI-BLAST) (<xref ref-type="bibr" rid="B2">Altschul and Koonin, 1998</xref>). And then, a given protein sequence can be transformed into an <italic>H &#x00D7; 20</italic> PSSM which can be announced as follow:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x007B;</mml:mo><mml:mi>M</mml:mi><mml:mi mathvariant='normal'>&#x03B1;&#x03B2;</mml:mi><mml:mi mathvariant='normal'>&#x200A;</mml:mi><mml:mi mathvariant='normal'>&#x2009;</mml:mi><mml:mi>&#x03B1;</mml:mi><mml:mo>:</mml:mo><mml:mn>1</mml:mn><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x22EF;</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant='normal'>&#x03B2;</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x22EF;</mml:mo><mml:mn>20</mml:mn><mml:mo>&#x007D;</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where the rows <italic>H</italic> of the matrix is the length of a protein sequence, and the columns represent the number of amino acids because of each protein gene was constructed by <italic>20</italic> types of amino acids. For the query protein sequence, the score <italic>C</italic><sub>&#x03B1;&#x03B2;</sub> represents the &#x03B2;-<italic>th</italic> amino acid in the position of <italic>&#x03B1;</italic> which can be distributed from a PSSM. Thus, the score <italic>C</italic><sub>&#x03B1;&#x03B2;</sub> can be defined as:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:mi>C</mml:mi><mml:mi mathvariant='normal'>&#x03B1;&#x03B2;</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi mathvariant='normal'>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>20</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi mathvariant='normal'>&#x03B1;</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:mi>q</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi mathvariant='normal'>&#x03B2;</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>where <italic>p(&#x03B1;,k)</italic> denotes the appearing frequency value of the <italic>k-th</italic> amino acid at position of <italic>&#x03B1;</italic> with the probe, and <italic>q(&#x03B2;,k)</italic> is the value of Dayhoff&#x2019;s mutation matrix between &#x03B2;-<italic>th</italic> and <italic>k-th</italic> amino acids. Eventually, different fractions represent different positional relationships, a strongly conservative position can achieve a greater score, and otherwise a lower degree denotes a weakly conservative position.</p>
<p>In conclusion, PSSM have become essential to much research for predicting SIPs. Each PSSM from protein sequence was generated by PSI-BLAST algorithm, which can be employed for predicting SIPs. In a detailed and exact way, to get a high degree and a wide range of homologous sequences, the <italic>E</italic>-value parameter of PSI-BLAST was set to be 0.001 which reported for a given result represents the number of two sequences&#x2019; alignments and chose three iterations in this process. As a result, the PSSM can be denoted as a <italic>20</italic>-dimensional matrix which compose of <italic>M &#x00D7; 20</italic> elements, where the rows <italic>M</italic> of the matrix is the number of residues of a protein, and the columns of the matrix denote the <italic>20</italic> amino acids.</p>
</sec>
<sec><title>Wavelet Transform</title>
<p>In signal processing, WT (<xref ref-type="bibr" rid="B16">Daubechies, 1990</xref>) is an ideal tool for signal time-frequency analysis and processing. The main point is that transformation can adequately highlight some aspects of the problems, and any details of signal can be focused. It solved the difficult problem of Fourier transform. And then, WT has been a major breakthrough in the scientific method since the Fourier transform.</p>
<p>In mathematics, WT is a new branch. It merges the technology of functional, Fourier analysis, harmonic analysis, and numerical analysis. A wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. WT (<xref ref-type="bibr" rid="B32">Lewis and Knowles, 1992</xref>) was applied to decompose the image. WT also can be employed in many fields, such as signal processing (<xref ref-type="bibr" rid="B44">Sahambi et al., 1997</xref>), speech processing (<xref ref-type="bibr" rid="B1">Agbinya, 1996</xref>), and non-linear science (<xref ref-type="bibr" rid="B46">Staszewski, 1998</xref>). The main feature is that some characteristics of the problem can be fully highlighted by transformation, and then it can focus on any details of the problem.</p>
<p>The integral WT can be defined as follow:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mrow><mml:mi>W</mml:mi><mml:mi>T</mml:mi><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>p</mml:mi><mml:mo>,</mml:mo><mml:mi>q</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msqrt><mml:mrow><mml:mo>&#x007C;</mml:mo><mml:mi>p</mml:mi><mml:mo>&#x007C;</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x222B;</mml:mo><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x221E;</mml:mi></mml:mrow><mml:mi>&#x221E;</mml:mi></mml:msubsup><mml:mrow><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>q</mml:mi></mml:mrow><mml:mi>p</mml:mi></mml:mfrac><mml:mo stretchy='false'>)</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>d</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>where, the binary dilation <italic>p</italic> = 2<sup>-i</sup>, and the dyadic position <italic>q</italic> = 2<sup>-i</sup> <italic>j</italic>, and the wavelet coefficients were given by</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi mathvariant='normal'>ij</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>W</mml:mi><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi mathvariant='normal'>&#x03C6;</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant='normal'>i</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant='normal'>i</mml:mi></mml:mrow></mml:msup><mml:mi>j</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>And then, an orthonormal wavelet can be applied to define a function <italic>&#x03C6;&#x1D716;L<sup>2</sup>(R)</italic>. <italic>L<sup>2</sup>(R)</italic> is the Hilbert space. The Hilbert basis is built as the family of functions:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mrow><mml:msub><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mrow><mml:mi mathvariant='normal'>ij</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mfrac><mml:mi mathvariant='normal'>i</mml:mi><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mi mathvariant='normal'>i</mml:mi></mml:msup><mml:mi mathvariant='normal'>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant='normal'>j</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M6"><mml:mrow><mml:mo>&#x007B;</mml:mo><mml:msub><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mrow><mml:mi mathvariant='normal'>ij</mml:mi></mml:mrow></mml:msub><mml:mo>:</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>&#x2124;</mml:mi><mml:mo>&#x007D;</mml:mo></mml:mrow></mml:math></disp-formula>
<p>If under the standard inner product on <italic>L<sup>2</sup>(R)</italic>,</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M7"><mml:mrow><mml:mrow><mml:mo>&#x2329;</mml:mo> <mml:mrow><mml:mi>f</mml:mi><mml:mo>,</mml:mo><mml:mi>g</mml:mi></mml:mrow> <mml:mo>&#x232A;</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x222B;</mml:mo><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x221E;</mml:mi></mml:mrow><mml:mi>&#x221E;</mml:mi></mml:msubsup><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mover accent='true'><mml:mrow><mml:mi>g</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo stretchy='true'>&#x00AF;</mml:mo></mml:mover><mml:mi>d</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>which is orthonormal, this is an orthonormal system:</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M8"><mml:mrow><mml:mrow><mml:mo>&#x2329;</mml:mo> <mml:mrow><mml:msub><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mrow><mml:mi mathvariant='normal'>ij</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mrow><mml:mi mathvariant='normal'>mn</mml:mi></mml:mrow></mml:msub></mml:mrow> <mml:mo>&#x232A;</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x222B;</mml:mo><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x221E;</mml:mi></mml:mrow><mml:mi>&#x221E;</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mrow><mml:mi mathvariant='normal'>ij</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mover accent='true'><mml:mrow><mml:msub><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mrow><mml:mi mathvariant='normal'>mn</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo stretchy='true'>&#x00AF;</mml:mo></mml:mover><mml:mi>d</mml:mi><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant='normal'>&#x03B4;</mml:mi><mml:mrow><mml:mi mathvariant='normal'>im</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi mathvariant='normal'>&#x03B4;</mml:mi><mml:mrow><mml:mi mathvariant='normal'>jn</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>where &#x03B4;<sub>im</sub> is the Kronecker delta.</p>
<p>In order to satisfy the completeness that every function <italic>f&#x2208;L<sup>2</sup>(R)</italic> may be expanded in the basis as</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M9"><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi mathvariant='normal'>i</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant='normal'>j</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>&#x221E;</mml:mi></mml:mrow><mml:mi>&#x221E;</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi mathvariant='normal'>ij</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi mathvariant='normal'>&#x03C6;</mml:mi><mml:mrow><mml:mi mathvariant='normal'>ij</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>with convergence of the series understood to be convergence in norm.</p>
<p>However, the establishment of features extraction based on machine learning methods is a challenging mission. In bioinformatics and genomics, an amino acid sequence can be treated as a series of digital signals, and then, we can applied WT method to analyses them (<xref ref-type="bibr" rid="B29">Jia et al., 2016</xref>). Because each protein sequence contains different amount of amino acids which will bring about different length of feature vectors. We cannot directly transform a PSSM into a feature vector. Hence, we multiplied the transpose of PSSM by PSSM to get <italic>20 &#x00D7; 20</italic> matrix, and employed the feature extraction method of WT to generate feature vectors from the <italic>20 &#x00D7; 20</italic> matrix. Afterward, the eigenvalues of each protein sequence can be calculated as a <italic>400</italic>-dimensional vector. Eventually, each protein sequence of <italic>yeast</italic> and <italic>human</italic> datasets was converted into a <italic>400</italic>-dimensional vector by applying WT method.</p>
<p>In our research, in order to reduce the influence of unimportant information and increase the prediction accuracy, we used the PCA approach to remove noisy features from <italic>yeast</italic> and <italic>human</italic> datasets. So that we can reduce the dimension of the two datasets from 400 to 300. Furthermore, reducing the dimensionality of the datasets could use lower dimension of features to represent the main information, so as to speed up calculation speed.</p>
</sec>
<sec><title>Deep Forest</title>
<p>As we all know, DNN have been successfully applied to various fields, such as visual and speech information (<xref ref-type="bibr" rid="B24">Hinton et al., 2012</xref>; <xref ref-type="bibr" rid="B30">Krizhevsky et al., 2012</xref>), leading to the hot wave of deep learning (<xref ref-type="bibr" rid="B19">Goodfellow et al., 2016</xref>; <xref ref-type="bibr" rid="B8">Chen and Huang, 2017</xref>; <xref ref-type="bibr" rid="B12">Chen et al., 2017</xref>). <xref ref-type="bibr" rid="B65">Zhou and Feng (2017)</xref> proposed deep forest, which also termed GCForest (multi-Grained Cascade Forest), that is a novel decision tree ensemble approach. Actually, it is used to do representation learning, which can find out the better features by end to end training. The performance of GCForest is more competitive than that of DNN.</p>
<p>GCForest model can deal with a wide variety of data from different domains, and whose training process has high computational efficiency and strong extensibility. In our experiment, the training process of GCForest model was mainly divided into two parts. The first part is devoted to the construction of cascade forest, as illustrated in <xref ref-type="fig" rid="F2">Figure 2</xref>; The second part is multi-grained scanning, as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Cascade forest structure.</p></caption>
<graphic xlink:href="fgene-10-00090-g002.tif"/>
</fig>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Flow chart of Multi-grained scanning approach.</p></caption>
<graphic xlink:href="fgene-10-00090-g003.tif"/>
</fig>
<p>From <xref ref-type="fig" rid="F2">Figure 2</xref>, we input the feature vector which obtained by multi-grained scanning approach. GCForest employs a cascade structure, and each level of the cascade forest includes two random forests and two complete-random tree forests (<xref ref-type="bibr" rid="B4">Breiman, 2001</xref>). Each random forest contains 500 trees, and the <italic>&#x221A;d</italic> number of features was chosen randomly as the candidate, and then the feature with the best <italic>gini</italic> value was selected as the segmentation. Each complete-random tree forest contains 500 complete-random trees, and the tree was generated by randomly choosing features to be partitioned at each node of the tree, and the tree grew until each leaf node only contains instances of the same class or no more than 10 instances. The number of trees in each forest was a hyper-parameter. It was a binary classification problem in our experiment, so the output of each forest will be a two-dimensional class vector, which is then linked to the input feature to represent the next original input. In order to reduce the risk of over-fitting, the class vectors generated by each forest are produced by k-fold cross validation.</p>
<p>From <xref ref-type="fig" rid="F3">Figure 3</xref>, multi-grained scanning approach was applied to enhance the cascade forest. This method used sliding window to scan the raw input features which extracted from <italic>human</italic> and <italic>yeast</italic> datasets by WT method into our model, and then generate instances which was fed into forests to merge the new feature vectors. In our experiment, there are two classes, and the raw input features dimensions are 300, and the dimension of sliding window is100.</p>
</sec>
<sec><title>Model Assessment</title>
<p>In order to intuitively present the availability and stability of our proposed model, in our study, we assessed our model and calculated the values of following parameters: Accuracy (Accu), specificity [Spec, also called true negative rate (TNR)], Precision [Prec, also named positive predictive value (PPV)], Recall [Sensitivity, also known as true positive rate (TPR)], F1_score (is the harmonic mean of precision and recall) and Matthews&#x2019;s correlation coefficient (MCC), respectively. These parameters can be described as follows:</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M10"><mml:mrow><mml:mi>A</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mi>u</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E11"><label>(11)</label><mml:math id="M11"><mml:mrow><mml:mi>S</mml:mi><mml:mi>p</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E12"><label>(12)</label><mml:math id="M12"><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>V</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E13"><label>(13)</label><mml:math id="M13"><mml:mrow><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E14"><label>(14)</label><mml:math id="M14"><mml:mrow><mml:mi>F</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x005F;</mml:mo><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mo>+</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="E15"><label>(15)</label><mml:math id="M15"><mml:mrow><mml:mi>M</mml:mi><mml:mi>C</mml:mi><mml:mi>C</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>where, <italic>TP</italic> represents the number of true positives, that is to say the count of true interacting pairs correctly predicted. <italic>FP</italic> represents the quantity of false positives, which defined as the count of true non-interacting pairs falsely predicted. <italic>TN</italic> represents the count of true negatives, which is the number of true non-interacting pairs predicted correctly. <italic>FN</italic> represents the quantity of false negatives, in other words, it represents true interacting pairs falsely predicted to be non-interacting pairs. On the basis of these parameters, we plotted a receiver operating curve (ROC) to assess the predictive properties and ability of our proposed model. And then, we can compute the area under curve (AUC) to evaluate the quality of the classifier.</p>
</sec>
</sec>
<sec><title>Results and Discussion</title>
<sec><title>Performance of GCForest on <italic>Human</italic> and <italic>Yeast</italic> Datasets</title>
<p>In order to illustrate that our proposed model can achieve good results as comprehensive as possible, we detected the <italic>human</italic> and <italic>yeast</italic> SIPs which was collected from multiple publicly available resources. In the experiment, we used cross validation to obtain reliable and stable model. Taking <italic>human</italic> dataset which was removed noisy features by PCA method as an example, the whole dataset was divided into five non-overlapping parts, and randomly selected four parts as training set, and the remaining part was taken as the independent test set. Next, to build the model on the training set, and evaluate the performance of the model on independent test set.</p>
<p>Based on the constructed data sets, we predicted the SIPs by using the proposed model. To ensure the fairness and objectivity of the experiment, the parameters of proposed model should be consistent on <italic>human</italic> and <italic>yeast</italic> datasets, respectively. The fewer hyper-parameters are contained in the GCForest model and the parameter setting is not very sensitive for the model. That is to say, GCForest model has high robustness for the hyper-parameters setting. But there are still some parameters need to be set up. In the experiment, we set shape_1X = 100 [shape of a single sample element (100, 100)], window = 100 (list of window sizes to use during Multi-Grain Scanning), tolerance = 5.0 (accuracy tolerance for the cascade growth).</p>
<p>Afterward, we implemented the proposed model on <italic>human</italic> and <italic>yeast</italic> datasets, respectively. The prediction results can be shown in <xref ref-type="table" rid="T1">Table 1</xref>. By cross-validation on the <italic>human</italic> and <italic>yeast</italic> datasets, we observed that the prediction accuracy of GCForest reached up to 95.43 and 93.65% on <italic>human</italic> and <italic>yeast</italic> datasets, respectively.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Performance of proposed model on <italic>human</italic> and <italic>yeast</italic> dataset.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"></th>
<th valign="top" align="center">Accu</th>
<th valign="top" align="center">Spec</th>
<th valign="top" align="center">Prec</th>
<th valign="top" align="center">Recall</th>
<th valign="top" align="center">F1_score</th>
<th valign="top" align="center">MCC</th>
</tr>
<tr>
<th valign="top" align="left">Datasets</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic>human</italic></td>
<td valign="top" align="center">95.43</td>
<td valign="top" align="center">99.09</td>
<td valign="top" align="center">84.07</td>
<td valign="top" align="center">54.06</td>
<td valign="top" align="center">65.81</td>
<td valign="top" align="center">65.26</td></tr>
<tr>
<td valign="top" align="left"><italic>yeast</italic></td>
<td valign="top" align="center">93.65</td>
<td valign="top" align="center">99.28</td>
<td valign="top" align="center">88.73</td>
<td valign="top" align="center">47.01</td>
<td valign="top" align="center">61.46</td>
<td valign="top" align="center">61.87</td>
</tr>
<tr>
<td valign="top" align="left"></td></tr>
</tbody>
</table>
</table-wrap>
<p>As shown in <xref ref-type="table" rid="T1">Table 1</xref> above, it is shown that the proposed model gained accuracy more than 93% for predicting SIPs on the two integrated datasets. We summed up that a reasonable classifier and feature extraction method is necessary and sufficient for SIPs prediction, and presented some reasons in the following: (1) The use of PSSM has greatly improved the prediction effect, which was transformed by PSI-BLAST. Not only can it describes the protein sequence in the terms of numerical forms, but also it contains useful enough information as much as possible. Accordingly, a PSSM provides almost all the major information of single protein sequence to detect SIPs. (2) The WT feature extraction method can find out more useful information of the protein sequences, and improve the performance of the prediction model. (3) GCForest is an appropriate classifier, and it can perform well when bound with the WT feature extraction method.</p>
</sec>
<sec><title>Comparison of GCForest and SVM Method</title>
<p>As shown in section &#x201C; Datasets Preparation,&#x201D; we can see that our proposed model can obtain a good performance on both <italic>human</italic> and <italic>yeast</italic> integrated datasets, respectively. But it is still necessary to further verify the effectiveness of the algorithm. In terms of classification, the state-of-the-art SVM is a common classification algorithm based on supervision learning model, which has been widely applied in a great deal of scientific research fields. Therefore, we compared the performance of GCForest with SVM classifiers to detect SIPs, employing the same features which extracted from the two integrated datasets described above. In the experiment, the LIBSVM packet tool (<xref ref-type="bibr" rid="B6">Chang and Lin, 2011</xref>) was mainly applied for classification. At the beginning of the experiment, we should set certain parameters of SVM. A radial basis function (RBF) was selected as the kernel function, and then using a grid search approach to adjust <italic>c</italic> and <italic>g</italic> of RBF, which were set up <italic>c = 0.3</italic> and <italic>g = 1000</italic>.</p>
<p>The performance statistics reported in <xref ref-type="fig" rid="F4">Figure 4</xref>, <xref ref-type="fig" rid="F5">5</xref> were obtained comparing the proposed model and SVM-based model on <italic>human</italic> and <italic>yeast</italic> datasets, respectively. From <xref ref-type="fig" rid="F4">Figure 4</xref>, on the <italic>human</italic> dataset, the prediction accuracy for both GCForest and SVM classifier were greater than 92%; the precision was 84.07% (GCForest) and 100% (SVM); the recall was 54.06% (GCForest) and 14.87% (SVM); the MCC was 65.26% (GCForest) and 37.13% (SVM). From <xref ref-type="fig" rid="F5">Figure 5</xref>, the accuracy, the precision, the recall, and the MCC of SVM classifier are 89.14, 100.00, 5.88, and 22.83% on the <italic>yeast</italic> dataset; Nevertheless, the GCForest classifier achieved 93.65% accuracy, 88.73% precision, 47.01% recall, and 61.87% MCC. These results all suggest that our proposed model is superior to those of SVM-based approach, and it has comparable performance in SIPs prediction.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Performance between GCForest and SVM on <italic>human</italic> dataset.</p></caption>
<graphic xlink:href="fgene-10-00090-g004.tif"/>
</fig>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Performace between GCForest and SVM on <italic>yeast</italic> dataset.</p></caption>
<graphic xlink:href="fgene-10-00090-g005.tif"/>
</fig>
</sec>
<sec><title>Compare GCForest With Other Existing Methods</title>
<p>To further illustrate that our GCForest model has higher prediction ability, we also measured the performance of our proposed model with other existing methods based on <italic>human</italic> and <italic>yeast</italic> datasets, respectively. As shown in <xref ref-type="table" rid="T2">Tables 2</xref>, <xref ref-type="table" rid="T3">3</xref>, we listed a clear statement of account that the accuracy of GCForest model was higher than that of other existing methods on the two integrated datasets (mentioned in section &#x201C;Materials and Methods&#x201D;). The same as Spe, MCC, and F1 Score. However, the recall (also named sensitivity, the true positive rate) of proposed model was lower than that of other existing methods, which measures the percentage of true positives that are successfully identified as having the condition. The reason may be that traditional PPI predictor could not work well for predicting SIPs because of the utilized correlation information between two proteins, such as co-localization, co-expression and co-evolution. These results on <italic>human</italic> and <italic>yeast</italic> datasets all indicate that our proposed model was justified to be a better deep learning method to detect SIPs in this work.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Measure the quality of GCForest and the other methods on human dataset.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"></th>
<th valign="top" align="center">Accu</th>
<th valign="top" align="center">Spec</th>
<th valign="top" align="center">Recall</th>
<th valign="top" align="center">MCC</th>
<th valign="top" align="center">F1 Score</th>
</tr>
<tr>
<th valign="top" align="left">Model</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">SLIPPER (<xref ref-type="bibr" rid="B6">Chang and Lin, 2011</xref>)</td>
<td valign="top" align="center">91.10</td>
<td valign="top" align="center">95.06</td>
<td valign="top" align="center">47.26</td>
<td valign="top" align="center">41.97</td>
<td valign="top" align="center">46.82</td>
</tr>
<tr>
<td valign="top" align="left">DXECPPI (<xref ref-type="bibr" rid="B17">Du et al., 2014</xref>)</td>
<td valign="top" align="center">30.90</td>
<td valign="top" align="center">25.83</td>
<td valign="top" align="center">87.08</td>
<td valign="top" align="center">8.25</td>
<td valign="top" align="center">17.28</td>
</tr>
<tr>
<td valign="top" align="left">PPIevo (<xref ref-type="bibr" rid="B62">Zahiri et al., 2013</xref>)</td>
<td valign="top" align="center">78.04</td>
<td valign="top" align="center">25.82</td>
<td valign="top" align="center">87.83</td>
<td valign="top" align="center">20.82</td>
<td valign="top" align="center">27.73</td>
</tr>
<tr>
<td valign="top" align="left">LocFuse (<xref ref-type="bibr" rid="B61">Zahiri et al., 2014</xref>)</td>
<td valign="top" align="center">80.66</td>
<td valign="top" align="center">80.50</td>
<td valign="top" align="center">50.83</td>
<td valign="top" align="center">20.26</td>
<td valign="top" align="center">27.65</td>
</tr>
<tr>
<td valign="top" align="left">CRS (<xref ref-type="bibr" rid="B37">Liu et al., 2016</xref>)</td>
<td valign="top" align="center">91.54</td>
<td valign="top" align="center">96.72</td>
<td valign="top" align="center">34.17</td>
<td valign="top" align="center">36.33</td>
<td valign="top" align="center">36.83</td>
</tr>
<tr>
<td valign="top" align="left">SPAR (<xref ref-type="bibr" rid="B37">Liu et al., 2016</xref>)</td>
<td valign="top" align="center">92.09</td>
<td valign="top" align="center">97.40</td>
<td valign="top" align="center">33.33</td>
<td valign="top" align="center">38.36</td>
<td valign="top" align="center">41.13</td>
</tr>
<tr>
<td valign="top" align="left">Random forest</td>
<td valign="top" align="center">94.33</td>
<td valign="top" align="center">100.00</td>
<td valign="top" align="center">29.14</td>
<td valign="top" align="center">52.39</td>
<td valign="top" align="center">45.13</td></tr>
<tr>
<td valign="top" align="left"><bold>Proposed method</bold></td>
<td valign="top" align="center"><bold>95.43</bold></td>
<td valign="top" align="center"><bold>99.09</bold></td>
<td valign="top" align="center"><bold>54.06</bold></td>
<td valign="top" align="center"><bold>65.26</bold></td>
<td valign="top" align="center"><bold>65.81</bold></td>
</tr>
<tr>
<td valign="top" align="left"></td></tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Measure the quality of GCForest and the other methods on yeast dataset.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"></th>
<th valign="top" align="center">Accu</th>
<th valign="top" align="center">Spec</th>
<th valign="top" align="center">Recall</th>
<th valign="top" align="center">MCC</th>
<th valign="top" align="center">F1 Score</th>
</tr>
<tr>
<th valign="top" align="left">Model</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
<th valign="top" align="center">(%)</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">SLIPPER (<xref ref-type="bibr" rid="B6">Chang and Lin, 2011</xref>)</td>
<td valign="top" align="center">71.90</td>
<td valign="top" align="center">72.18</td>
<td valign="top" align="center">69.72</td>
<td valign="top" align="center">28.42</td>
<td valign="top" align="center">36.16</td>
</tr>
<tr>
<td valign="top" align="left">DXECPPI (<xref ref-type="bibr" rid="B17">Du et al., 2014</xref>)</td>
<td valign="top" align="center">87.46</td>
<td valign="top" align="center">94.93</td>
<td valign="top" align="center">29.44</td>
<td valign="top" align="center">28.25</td>
<td valign="top" align="center">34.89</td>
</tr>
<tr>
<td valign="top" align="left">PPIevo (<xref ref-type="bibr" rid="B62">Zahiri et al., 2013</xref>)</td>
<td valign="top" align="center">66.28</td>
<td valign="top" align="center">87.46</td>
<td valign="top" align="center">60.14</td>
<td valign="top" align="center">18.01</td>
<td valign="top" align="center">28.92</td>
</tr>
<tr>
<td valign="top" align="left">LocFuse (<xref ref-type="bibr" rid="B61">Zahiri et al., 2014</xref>)</td>
<td valign="top" align="center">66.66</td>
<td valign="top" align="center">68.10</td>
<td valign="top" align="center">55.49</td>
<td valign="top" align="center">15.77</td>
<td valign="top" align="center">27.53</td>
</tr>
<tr>
<td valign="top" align="left">CRS (<xref ref-type="bibr" rid="B37">Liu et al., 2016</xref>)</td>
<td valign="top" align="center">72.69</td>
<td valign="top" align="center">74.37</td>
<td valign="top" align="center">59.58</td>
<td valign="top" align="center">23.68</td>
<td valign="top" align="center">33.05</td>
</tr>
<tr>
<td valign="top" align="left">SPAR (<xref ref-type="bibr" rid="B37">Liu et al., 2016</xref>)</td>
<td valign="top" align="center">76.96</td>
<td valign="top" align="center">80.02</td>
<td valign="top" align="center">53.24</td>
<td valign="top" align="center">24.84</td>
<td valign="top" align="center">34.54</td>
</tr>
<tr>
<td valign="top" align="left">Random Forest</td>
<td valign="top" align="center">92.77</td>
<td valign="top" align="center">100.00</td>
<td valign="top" align="center">44.10</td>
<td valign="top" align="center">63.81</td>
<td valign="top" align="center">61.21</td></tr>
<tr>
<td valign="top" align="left"><bold>Proposed method</bold></td>
<td valign="top" align="center"><bold>93.65</bold></td>
<td valign="top" align="center"><bold>99.28</bold></td>
<td valign="top" align="center"><bold>47.01</bold></td>
<td valign="top" align="center"><bold>61.87</bold></td>
<td valign="top" align="center"><bold>61.46</bold></td>
</tr>
<tr>
<td valign="top" align="left"></td></tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec><title>Receiver Operating Characteristic (ROC) Curve</title>
<p>The ROC curve, also called sensitivity curve, was widely used a great deal of fields such as medicine, bioinformatics, forecasting natural hazards, model performance assessment and so on. It is a comprehensive index reflecting the continuous variables of sensitivity and specificity, and it is a method to reveal the relationship between sensitivity and specificity. According to a series of different binary classification methods, the curve was plot with false positive rate (FPR, also called sensitivity) as abscissa and true positive rate (TPR, also named 1-specificity) as ordinate. We also used ROC curve to analysis the performance of the prediction model.</p>
<p>In <xref ref-type="fig" rid="F6">Figure 6</xref>, the ROC curve of our presented model performed on <italic>human</italic> SIPs dataset, it is shown that the AUC is 0.9586. The ROC curve of put forward model assessed on <italic>yeast</italic> SIPs dataset is shown in <xref ref-type="fig" rid="F7">Figure 7</xref>, it is clear that the AUC is 0.9203. Therefore, the proposed model is necessary and sufficient for SIPs detection.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>ROC curve of GCForest based on the results of <italic>human</italic> SIPs dataset.</p></caption>
<graphic xlink:href="fgene-10-00090-g006.tif"/>
</fig>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>ROC curve of GCForest based on the results of <italic>yeast</italic> SIPs dataset.</p></caption>
<graphic xlink:href="fgene-10-00090-g007.tif"/>
</fig>
</sec>
</sec>
<sec><title>Conclusion</title>
<p>In this study, we developed an improved deep learning-based model that was applied to predict whether an identified protein is likely to interact or not. More specifically, firstly, we converted the PSSM turned from each protein sequence into a 400-dimensional feature vector by employing the WT feature extraction method; then, in order to decrease the influence of noise and remove the redundant information, we reduced the dimension of the feature vector to 300 by using PCA dimensional-reduced method; finally, realized classification on <italic>human</italic> and <italic>yeast</italic> datasets by applying GCForest model. The performance of the proposed model achieved an accuracy of 95.43 and 93.65% on the <italic>human</italic> and <italic>yeast</italic> golden standard datasets, respectively. It is revealed that our model is suitable and perform well for detecting SIPs. We also compared it with SVM-based and other popular existing method, and the comparison empirical results show that the proposed model is superior to the SVM-based methods and other previous methods. It is anticipated that our proposed model can act as a potential tool in the SIPs prediction research.</p>
</sec>
<sec><title>Data Availability</title>
<p>The datasets for this manuscript are not publicly available because the data is too big to share. Requests to access the datasets should be directed to <email>chenzhanheng17@mails.ucas.ac.cn</email>.</p>
</sec>
<sec><title>Author Contributions</title>
<p>Z-HC and L-PL conceived the algorithm, carried out the analyses and experiments, prepared the data sets, and wrote the manuscript. J-RZ, ZH, and LW designed, performed and analyzed experiments, and wrote the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec><title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> This work was supported in part by the National Science Foundation of China, under Grants 61373086.</p>
</fn>
</fn-group>
<ack>
<p>The authors would like to thank the guest editors and reviewers for their constructive advice.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Agbinya</surname> <given-names>J. I.</given-names></name></person-group> (<year>1996</year>). <article-title>&#x201C;Discrete wavelet transform techniques in speech processing,&#x201D; in</article-title> <source><italic>Proceedings of the TENCON&#x2019;96</italic> IEEE TENCON. Digital Signal Processing Applications</source>, (<publisher-loc>Piscataway, NJ</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>514</fpage>&#x2013;<lpage>519</lpage>. <pub-id pub-id-type="doi">10.1109/TENCON.1996.608394</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname> <given-names>S. F.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>1998</year>). <article-title>Iterated profile searches with PSI-BLAST&#x2014;a tool for discovery in protein databases.</article-title> <source><italic>Trends Biochem. Sci.</italic></source> <volume>23</volume> <fpage>444</fpage>&#x2013;<lpage>447</lpage>. <pub-id pub-id-type="doi">10.1016/S0968-0004(98)01298-5</pub-id> <pub-id pub-id-type="pmid">9852764</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berman</surname> <given-names>H. M.</given-names></name> <name><surname>Westbrook</surname> <given-names>J.</given-names></name> <name><surname>Feng</surname> <given-names>Z.</given-names></name> <name><surname>Gilliland</surname> <given-names>G.</given-names></name> <name><surname>Bhat</surname> <given-names>T. N.</given-names></name> <name><surname>Weissig</surname> <given-names>H.</given-names></name><etal/></person-group> (<year>2000</year>). <article-title>The protein data bank.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>28</volume> <fpage>235</fpage>&#x2013;<lpage>242</lpage>. <pub-id pub-id-type="doi">10.1093/nar/28.1.235</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>Random forests.</article-title> <source><italic>Mach. Learn.</italic></source> <volume>45</volume> <fpage>5</fpage>&#x2013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breuer</surname> <given-names>K.</given-names></name> <name><surname>Foroushani</surname> <given-names>A. K.</given-names></name> <name><surname>Laird</surname> <given-names>M. R.</given-names></name> <name><surname>Chen</surname> <given-names>C.</given-names></name> <name><surname>Sribnaia</surname> <given-names>A.</given-names></name> <name><surname>Lo</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>InnateDB: systems biology of innate immunity and beyond&#x2014;recent updates and continuing curation.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>41</volume> <fpage>D1228</fpage>&#x2013;<lpage>D1233</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gks1147</pub-id> <pub-id pub-id-type="pmid">23180781</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>C.-C.</given-names></name> <name><surname>Lin</surname> <given-names>C.-J.</given-names></name></person-group> (<year>2011</year>). <article-title>LIBSVM: a library for support vector machines.</article-title> <source><italic>ACM Trans. Intell. Syst. Technol.</italic></source> <volume>2</volume> <fpage>1</fpage>&#x2013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1145/1961189.1961199</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chatr-Aryamontri</surname> <given-names>A.</given-names></name> <name><surname>Oughtred</surname> <given-names>R.</given-names></name> <name><surname>Boucher</surname> <given-names>L.</given-names></name> <name><surname>Rust</surname> <given-names>J.</given-names></name> <name><surname>Chang</surname> <given-names>C.</given-names></name> <name><surname>Kolas</surname> <given-names>N. K.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>The BioGRID interaction database: 2017 update.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>45</volume> <fpage>D369</fpage>&#x2013;<lpage>D379</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkw1102</pub-id> <pub-id pub-id-type="pmid">27980099</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Huang</surname> <given-names>L.</given-names></name></person-group> (<year>2017</year>). <article-title>Lrsslmda: laplacian regularized sparse subspace learning for mirna-disease association prediction.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>13</volume>:<issue>e1005912</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005912</pub-id> <pub-id pub-id-type="pmid">29253885</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Qu</surname> <given-names>J.</given-names></name> <name><surname>Guan</surname> <given-names>N.-N.</given-names></name> <name><surname>Li</surname> <given-names>J.-Q.</given-names></name> <name><surname>Berger</surname> <given-names>B.</given-names></name></person-group> (<year>2018a</year>). <article-title>Predicting miRNA-disease association based on inductive matrix completion.</article-title> <source><italic>Bioinformatics</italic></source> <volume>34</volume> <fpage>4256</fpage>&#x2013;<lpage>4265</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty503</pub-id> <pub-id pub-id-type="pmid">29939227</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Xie</surname> <given-names>D.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Zhao</surname> <given-names>Q.</given-names></name> <name><surname>You</surname> <given-names>Z.-H.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name></person-group> (<year>2018b</year>). <article-title>BNPMDA: bipartite network projection for MiRNA&#x2013;disease association prediction.</article-title> <source><italic>Bioinformatics</italic></source> <volume>34</volume> <fpage>3178</fpage>&#x2013;<lpage>3186</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty333</pub-id> <pub-id pub-id-type="pmid">29701758</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Yin</surname> <given-names>J.</given-names></name> <name><surname>Qu</surname> <given-names>J.</given-names></name> <name><surname>Huang</surname> <given-names>L.</given-names></name></person-group> (<year>2018c</year>). <article-title>MDHGI: matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>14</volume>:<issue>e1006418</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1006418</pub-id> <pub-id pub-id-type="pmid">30142158</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Xie</surname> <given-names>D.</given-names></name> <name><surname>Zhao</surname> <given-names>Q.</given-names></name> <name><surname>You</surname> <given-names>Z.-H.</given-names></name></person-group> (<year>2017</year>). <article-title>MicroRNAs and complex diseases: from experimental results to computational models.</article-title> <source><italic>Brief. Bioinform.</italic></source> <pub-id pub-id-type="doi">10.1093/bib/bbx130</pub-id> [Epub ahead of print]. <pub-id pub-id-type="pmid">29045685</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Yan</surname> <given-names>C. C.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>You</surname> <given-names>Z.-H.</given-names></name></person-group> (<year>2016</year>). <article-title>Long non-coding RNAs and complex diseases: from experimental results to computational models.</article-title> <source><italic>Brief. Bioinform.</italic></source> <volume>18</volume> <fpage>558</fpage>&#x2013;<lpage>576</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbw060</pub-id> <pub-id pub-id-type="pmid">27345524</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chou</surname> <given-names>K.-C.</given-names></name> <name><surname>Cai</surname> <given-names>Y.-D.</given-names></name></person-group> (<year>2006</year>). <article-title>Predicting protein- protein interactions from sequences in a hybridization space.</article-title> <source><italic>J. Proteome Res.</italic></source> <volume>5</volume> <fpage>316</fpage>&#x2013;<lpage>322</lpage>. <pub-id pub-id-type="doi">10.1021/pr050331g</pub-id> <pub-id pub-id-type="pmid">16457597</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Consortium</surname> <given-names>U.</given-names></name></person-group> (<year>2014</year>). <article-title>UniProt: a hub for protein information.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>43</volume> <fpage>D204</fpage>&#x2013;<lpage>D212</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gku989</pub-id> <pub-id pub-id-type="pmid">25348405</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Daubechies</surname> <given-names>I.</given-names></name></person-group> (<year>1990</year>). <article-title>The wavelet transform, time-frequency localization and signal analysis.</article-title> <source><italic>IEEE Trans. Inf. Theory</italic></source> <volume>36</volume> <fpage>961</fpage>&#x2013;<lpage>1005</lpage>. <pub-id pub-id-type="doi">10.1109/18.57199</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Du</surname> <given-names>X.</given-names></name> <name><surname>Cheng</surname> <given-names>J.</given-names></name> <name><surname>Zheng</surname> <given-names>T.</given-names></name> <name><surname>Duan</surname> <given-names>Z.</given-names></name> <name><surname>Qian</surname> <given-names>F.</given-names></name></person-group> (<year>2014</year>). <article-title>A novel feature extraction scheme with ensemble coding for protein&#x2013;protein interaction prediction.</article-title> <source><italic>Int. J. Mol. Sci.</italic></source> <volume>15</volume> <fpage>12731</fpage>&#x2013;<lpage>12749</lpage>. <pub-id pub-id-type="doi">10.3390/ijms150712731</pub-id> <pub-id pub-id-type="pmid">25046746</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>Z.-G.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Xia</surname> <given-names>S.-X.</given-names></name> <name><surname>You</surname> <given-names>Z.-H.</given-names></name> <name><surname>Yan</surname> <given-names>X.</given-names></name> <name><surname>Zhou</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>Ens-PPI: a novel ensemble classifier for predicting the interactions of proteins using autocovariance transformation from PSSM.</article-title> <source><italic>Biomed Res. Int.</italic></source> <volume>2016</volume> <fpage>1</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1155/2016/4563524</pub-id> <pub-id pub-id-type="pmid">27437399</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodfellow</surname> <given-names>I.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Courville</surname> <given-names>A.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <source><italic>Deep Learning.</italic></source> <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT press</publisher-name>.</citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gribskov</surname> <given-names>M.</given-names></name> <name><surname>McLachlan</surname> <given-names>A. D.</given-names></name> <name><surname>Eisenberg</surname> <given-names>D.</given-names></name></person-group> (<year>1987</year>). <article-title>Profile analysis: detection of distantly related proteins.</article-title> <source><italic>Proc. Nat. Acad. Sci. U.S.A.</italic></source> <volume>84</volume> <fpage>4355</fpage>&#x2013;<lpage>4358</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.84.13.4355</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gui</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>T.</given-names></name> <name><surname>Tao</surname> <given-names>D.</given-names></name> <name><surname>Sun</surname> <given-names>Z.</given-names></name> <name><surname>Tan</surname> <given-names>T.</given-names></name></person-group> (<year>2016</year>). <article-title>Representative vector machines: a unified framework for classical classifiers.</article-title> <source><italic>IEEE Trans. Cybern.</italic></source> <volume>46</volume> <fpage>1877</fpage>&#x2013;<lpage>1888</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2015.2457234</pub-id> <pub-id pub-id-type="pmid">26285229</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gui</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Zhu</surname> <given-names>L.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x201C;Locality preserving discriminant projections,&#x201D; in</article-title> <source><italic>Proceedings of the International Conference on Intelligent Computing</italic></source>, (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>566</fpage>&#x2013;<lpage>572</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-04020-7_60</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hashimoto</surname> <given-names>K.</given-names></name> <name><surname>Nishi</surname> <given-names>H.</given-names></name> <name><surname>Bryant</surname> <given-names>S.</given-names></name> <name><surname>Panchenko</surname> <given-names>A. R.</given-names></name></person-group> (<year>2011</year>). <article-title>Caught in self-interaction: evolutionary and functional mechanisms of protein homooligomerization.</article-title> <source><italic>Phys. Biol.</italic></source> <volume>8</volume>:<issue>035007</issue>. <pub-id pub-id-type="doi">10.1088/1478-3975/8/3/035007</pub-id> <pub-id pub-id-type="pmid">21572178</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hinton</surname> <given-names>G.</given-names></name> <name><surname>Deng</surname> <given-names>L.</given-names></name> <name><surname>Yu</surname> <given-names>D.</given-names></name> <name><surname>Dahl</surname> <given-names>G. E.</given-names></name> <name><surname>Mohamed</surname> <given-names>A. R.</given-names></name> <name><surname>Jaitly</surname> <given-names>N.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups.</article-title> <source><italic>IEEE Signal Process. Mag.</italic></source> <volume>29</volume> <fpage>82</fpage>&#x2013;<lpage>97</lpage>. <pub-id pub-id-type="doi">10.1109/MSP.2012.2205597</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>Q.</given-names></name> <name><surname>You</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Zhou</surname> <given-names>Y.</given-names></name></person-group> (<year>2015</year>). <article-title>Prediction of protein&#x2013;protein interactions with clustered amino acids and weighted sparse representation.</article-title> <source><italic>Int. J. Mol. Sci.</italic></source> <volume>16</volume> <fpage>10855</fpage>&#x2013;<lpage>10869</lpage>. <pub-id pub-id-type="doi">10.3390/ijms160510855</pub-id> <pub-id pub-id-type="pmid">25984606</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>Y.-A.</given-names></name> <name><surname>You</surname> <given-names>Z.-H.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Chan</surname> <given-names>K.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name></person-group> (<year>2016a</year>). <article-title>Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>17</volume>:<issue>184</issue>. <pub-id pub-id-type="doi">10.1186/s12859-016-1035-4</pub-id> <pub-id pub-id-type="pmid">27112932</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>Y.-A.</given-names></name> <name><surname>You</surname> <given-names>Z.-H.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Yan</surname> <given-names>G.-Y.</given-names></name></person-group> (<year>2016b</year>). <article-title>Improved protein-protein interactions prediction via weighted sparse representation model combining continuous wavelet descriptor and PseAA composition.</article-title> <source><italic>BMC Syst. Biol.</italic></source> <volume>10</volume>:<issue>120</issue>. <pub-id pub-id-type="doi">10.1186/s12918-016-0360-6</pub-id> <pub-id pub-id-type="pmid">28155718</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ispolatov</surname> <given-names>I.</given-names></name> <name><surname>Yuryev</surname> <given-names>A.</given-names></name> <name><surname>Mazo</surname> <given-names>I.</given-names></name> <name><surname>Maslov</surname> <given-names>S.</given-names></name></person-group> (<year>2005</year>). <article-title>Binding properties and evolution of homodimers in protein&#x2013;protein interaction networks.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>33</volume> <fpage>3629</fpage>&#x2013;<lpage>3635</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gki678</pub-id> <pub-id pub-id-type="pmid">15983135</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jia</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Xiao</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>B.</given-names></name> <name><surname>Chou</surname> <given-names>K.-C.</given-names></name></person-group> (<year>2016</year>). <article-title>Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition.</article-title> <source><italic>J. Biomol. Struct. Dyn.</italic></source> <volume>34</volume> <fpage>1946</fpage>&#x2013;<lpage>1961</lpage>. <pub-id pub-id-type="doi">10.1080/07391102.2015.1095116</pub-id> <pub-id pub-id-type="pmid">26375780</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>2012</year>). <article-title>Imagenet classification with deep (convolutional) neural networks.</article-title> <source><italic>Adv. Neural Inform. Process. Syst.</italic></source> <volume>25</volume> <fpage>1097</fpage>&#x2013;<lpage>1105</lpage>.</citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Launay</surname> <given-names>G.</given-names></name> <name><surname>Salza</surname> <given-names>R.</given-names></name> <name><surname>Multedo</surname> <given-names>D.</given-names></name> <name><surname>Thierry-Mieg</surname> <given-names>N.</given-names></name> <name><surname>Ricard-Blum</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>MatrixDB, the extracellular matrix interaction database: updated content, a new navigator and expanded functionalities.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>43</volume> <fpage>D321</fpage>&#x2013;<lpage>D327</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gku1091</pub-id> <pub-id pub-id-type="pmid">25378329</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewis</surname> <given-names>A. S.</given-names></name> <name><surname>Knowles</surname> <given-names>G.</given-names></name></person-group> (<year>1992</year>). <article-title>Image compression using the 2-D wavelet transform.</article-title> <source><italic>IEEE Trans. Image Process.</italic></source> <volume>1</volume> <fpage>244</fpage>&#x2013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.1109/83.136601</pub-id> <pub-id pub-id-type="pmid">18296159</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J.-Q.</given-names></name> <name><surname>You</surname> <given-names>Z.-H.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Ming</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name></person-group> (<year>2017</year>). <article-title>PSPEL: in silico prediction of self-interacting proteins from amino acids sequences using ensemble learning.</article-title> <source><italic>IEEE/ACM Trans. Comput. Biol. Bioinform.</italic></source> <volume>14</volume> <fpage>1165</fpage>&#x2013;<lpage>1172</lpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2017.2649529</pub-id> <pub-id pub-id-type="pmid">28092572</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>L.-P.</given-names></name> <name><surname>Wang</surname> <given-names>Y.-B.</given-names></name> <name><surname>You</surname> <given-names>Z.-H.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>An</surname> <given-names>J.-Y.</given-names></name></person-group> (<year>2018</year>). <article-title>PCLPred: a bioinformatics method for predicting protein&#x2013;protein interactions by combining relevance vector machine model with low-rank matrix approximation.</article-title> <source><italic>Int. J. Mol. Sci.</italic></source> <volume>19</volume>:<issue>1029</issue>. <pub-id pub-id-type="doi">10.3390/ijms19041029</pub-id> <pub-id pub-id-type="pmid">29596363</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Z.-W.</given-names></name> <name><surname>You</surname> <given-names>Z.-H.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Gui</surname> <given-names>J.</given-names></name> <name><surname>Nie</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>Highly accurate prediction of protein-protein interactions via incorporating evolutionary information and physicochemical characteristics.</article-title> <source><italic>Int. J. Mol. Sci.</italic></source> <volume>17</volume>:<issue>1396</issue>. <pub-id pub-id-type="doi">10.3390/ijms17091396</pub-id> <pub-id pub-id-type="pmid">27571061</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Licata</surname> <given-names>L.</given-names></name> <name><surname>Briganti</surname> <given-names>L.</given-names></name> <name><surname>Peluso</surname> <given-names>D.</given-names></name> <name><surname>Perfetto</surname> <given-names>L.</given-names></name> <name><surname>Iannuccelli</surname> <given-names>M.</given-names></name> <name><surname>Galeota</surname> <given-names>E.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>MINT, the molecular interaction database: 2012 update.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>40</volume> <fpage>D857</fpage>&#x2013;<lpage>D861</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkr930</pub-id> <pub-id pub-id-type="pmid">22096227</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Yang</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Song</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>SPAR: a random forest-based predictor for self-interacting proteins with fine-grained domain information.</article-title> <source><italic>Amino Acids</italic></source> <volume>48</volume> <fpage>1655</fpage>&#x2013;<lpage>1665</lpage>. <pub-id pub-id-type="doi">10.1007/s00726-016-2226-z</pub-id> <pub-id pub-id-type="pmid">27074717</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Guo</surname> <given-names>F.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Lu</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>Proteome-wide prediction of self-interacting proteins based on multiple properties.</article-title> <source><italic>Mol. Cell. Proteomics</italic></source> <volume>12</volume>:<fpage>1689</fpage>&#x2013;<lpage>1700</lpage>. <pub-id pub-id-type="doi">10.1074/mcp.M112.021790</pub-id> <pub-id pub-id-type="pmid">23422585</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>C.-Y.</given-names></name> <name><surname>Min</surname> <given-names>H.</given-names></name> <name><surname>Gui</surname> <given-names>J.</given-names></name> <name><surname>Zhu</surname> <given-names>L.</given-names></name> <name><surname>Lei</surname> <given-names>Y.-K.</given-names></name></person-group> (<year>2013</year>). <article-title>Face recognition via weighted sparse representation.</article-title> <source><italic>J. Vis. Commun. Image Represent.</italic></source> <volume>24</volume> <fpage>111</fpage>&#x2013;<lpage>116</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2017.2681841</pub-id> <pub-id pub-id-type="pmid">28320663</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marianayagam</surname> <given-names>N. J.</given-names></name> <name><surname>Sunde</surname> <given-names>M.</given-names></name> <name><surname>Matthews</surname> <given-names>J. M.</given-names></name></person-group> (<year>2004</year>). <article-title>The power of two: protein dimerization in biology.</article-title> <source><italic>Trends Biochem. Sci.</italic></source> <volume>29</volume> <fpage>618</fpage>&#x2013;<lpage>625</lpage>. <pub-id pub-id-type="doi">10.1016/j.tibs.2004.09.006</pub-id> <pub-id pub-id-type="pmid">15501681</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mi</surname> <given-names>J.-X.</given-names></name> <name><surname>Lei</surname> <given-names>D.</given-names></name> <name><surname>Gui</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>A novel method for recognizing face with partial occlusion via sparse representation.</article-title> <source><italic>Optik</italic></source> <volume>124</volume> <fpage>6786</fpage>&#x2013;<lpage>6789</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijleo.2013.05.099</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Orchard</surname> <given-names>S.</given-names></name> <name><surname>Ammari</surname> <given-names>M.</given-names></name> <name><surname>Aranda</surname> <given-names>B.</given-names></name> <name><surname>Breuza</surname> <given-names>L.</given-names></name> <name><surname>Briganti</surname> <given-names>L.</given-names></name> <name><surname>Broackes-Carter</surname> <given-names>F.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>The MIntAct project&#x2014;IntAct as a common curation platform for 11 molecular interaction databases.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>42</volume> <fpage>D358</fpage>&#x2013;<lpage>D363</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkt1115</pub-id> <pub-id pub-id-type="pmid">24234451</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>P&#x00E9;rez-Bercoff</surname> <given-names>&#x00C5;</given-names></name> <name><surname>Makino</surname> <given-names>T.</given-names></name> <name><surname>McLysaght</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). <article-title>Duplicability of self-interacting human genes.</article-title> <source><italic>BMC Evol. Biol.</italic></source> <volume>10</volume>:<issue>160</issue>. <pub-id pub-id-type="doi">10.1186/1471-2148-10-160</pub-id> <pub-id pub-id-type="pmid">20509897</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sahambi</surname> <given-names>J.</given-names></name> <name><surname>Tandon</surname> <given-names>S.</given-names></name> <name><surname>Bhatt</surname> <given-names>R.</given-names></name></person-group> (<year>1997</year>). <article-title>Using wavelet transforms for ECG characterization. An on-line digital signal processing system.</article-title> <source><italic>IEEE Eng. Med. Biol. Mag.</italic></source> <volume>16</volume> <fpage>77</fpage>&#x2013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1109/51.566158</pub-id> <pub-id pub-id-type="pmid">9058586</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salwinski</surname> <given-names>L.</given-names></name> <name><surname>Miller</surname> <given-names>C. S.</given-names></name> <name><surname>Smith</surname> <given-names>A. J.</given-names></name> <name><surname>Pettit</surname> <given-names>F. K.</given-names></name> <name><surname>Bowie</surname> <given-names>J. U.</given-names></name> <name><surname>Eisenberg</surname> <given-names>D.</given-names></name></person-group> (<year>2004</year>). <article-title>The database of interacting proteins: 2004 update.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>32</volume>(Suppl. 1), <fpage>D449</fpage>&#x2013;<lpage>D451</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkh086</pub-id> <pub-id pub-id-type="pmid">14681454</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Staszewski</surname> <given-names>W.</given-names></name></person-group> (<year>1998</year>). <article-title>Identification of non-linear systems using multi-scale ridges and skeletons of the wavelet transform.</article-title> <source><italic>J. Sound Vib.</italic></source> <volume>214</volume> <fpage>639</fpage>&#x2013;<lpage>658</lpage>. <pub-id pub-id-type="doi">10.1006/jsvi.1998.1616</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Xia</surname> <given-names>S. X.</given-names></name> <name><surname>Liu</surname> <given-names>F.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Yan</surname> <given-names>X.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier.</article-title> <source><italic>J. Theor. Biol.</italic></source> <volume>418</volume> <fpage>105</fpage>&#x2013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.1016/j.jtbi.2017.01.003</pub-id> <pub-id pub-id-type="pmid">28088356</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.-B.</given-names></name> <name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Jiang</surname> <given-names>T. H.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Zhou</surname> <given-names>X.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Predicting protein&#x2013;protein interactions from protein sequences by a stacked sparse autoencoder deep neural network.</article-title> <source><italic>Mol. Biosyst.</italic></source> <volume>13</volume> <fpage>1336</fpage>&#x2013;<lpage>1344</lpage>. <pub-id pub-id-type="doi">10.1039/c7mb00188f</pub-id> <pub-id pub-id-type="pmid">28604872</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>You</surname> <given-names>Z.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Jiang</surname> <given-names>T.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>PCVMZM: using the probabilistic classification vector machines model combined with a zernike moments descriptor to predict protein&#x2013;protein interactions from protein sequences.</article-title> <source><italic>Int. J. Mol. Sci.</italic></source> <volume>18</volume>:<issue>1029</issue>. <pub-id pub-id-type="doi">10.3390/ijms18051029</pub-id> <pub-id pub-id-type="pmid">28492483</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Chan</surname> <given-names>K. C.</given-names></name> <name><surname>Hu</surname> <given-names>P.</given-names></name></person-group> (<year>2015a</year>). <article-title>Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest.</article-title> <source><italic>PLoS One</italic></source> <volume>10</volume>:<issue>e0125811</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0125811</pub-id> <pub-id pub-id-type="pmid">25946106</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Gao</surname> <given-names>X.</given-names></name> <name><surname>He</surname> <given-names>Z.</given-names></name> <name><surname>Zhu</surname> <given-names>L.</given-names></name> <name><surname>Lei</surname> <given-names>Y. K.</given-names></name><etal/></person-group> (<year>2015b</year>). <article-title>Detecting protein-protein interactions with a novel matrix-based protein sequence representation and support vector machines.</article-title> <source><italic>Biomed Res. Int.</italic></source> <volume>2015</volume> <fpage>1</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1155/2015/867516</pub-id> <pub-id pub-id-type="pmid">26000305</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Huang</surname> <given-names>Z.-A.</given-names></name> <name><surname>Zhu</surname> <given-names>Z.</given-names></name> <name><surname>Yan</surname> <given-names>G.-Y.</given-names></name> <name><surname>Li</surname> <given-names>Z. W.</given-names></name> <name><surname>Wen</surname> <given-names>Z.</given-names></name><etal/></person-group> (<year>2017a</year>). <article-title>PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>13</volume>:<issue>e1005455</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005455</pub-id> <pub-id pub-id-type="pmid">28339468</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Chan</surname> <given-names>K. C.</given-names></name></person-group> (<year>2017b</year>). <article-title>An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers.</article-title> <source><italic>Neurocomputing</italic></source> <volume>228</volume> <fpage>277</fpage>&#x2013;<lpage>282</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2016.10.042</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Zhou</surname> <given-names>M.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name></person-group> (<year>2017c</year>). <article-title>Highly efficient framework for predicting interactions between proteins.</article-title> <source><italic>IEEE Trans. Cybernet.</italic></source> <volume>47</volume> <fpage>731</fpage>&#x2013;<lpage>743</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2016.2524994</pub-id> <pub-id pub-id-type="pmid">28113829</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z.-H.</given-names></name> <name><surname>Lei</surname> <given-names>Y.-K.</given-names></name> <name><surname>Gui</surname> <given-names>J.</given-names></name> <name><surname>Huang</surname> <given-names>D.-S.</given-names></name> <name><surname>Zhou</surname> <given-names>X.</given-names></name></person-group> (<year>2010a</year>). <article-title>Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data.</article-title> <source><italic>Bioinformatics</italic></source> <volume>26</volume> <fpage>2744</fpage>&#x2013;<lpage>2751</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq510</pub-id> <pub-id pub-id-type="pmid">20817744</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Yin</surname> <given-names>Z.</given-names></name> <name><surname>Han</surname> <given-names>K.</given-names></name> <name><surname>Huang</surname> <given-names>D. S.</given-names></name> <name><surname>Zhou</surname> <given-names>X.</given-names></name></person-group> (<year>2010b</year>). <article-title>A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>11</volume>:<issue>343</issue>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-343</pub-id> <pub-id pub-id-type="pmid">20573270</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Lei</surname> <given-names>Y.-K.</given-names></name> <name><surname>Zhu</surname> <given-names>L.</given-names></name> <name><surname>Xia</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>14</volume>(Suppl. 8):<issue>S10</issue>. <pub-id pub-id-type="doi">10.1186/1471-2105-14-S8-S10</pub-id> <pub-id pub-id-type="pmid">23815620</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Gao</surname> <given-names>X.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name> <name><surname>Ji</surname> <given-names>Z.</given-names></name></person-group> (<year>2014a</year>). <article-title>Large-scale protein-protein interactions detection by integrating big biosensing data with computational model.</article-title> <source><italic>Biomed Res. Int.</italic></source> <volume>2014</volume>:<issue>598129</issue>. <pub-id pub-id-type="doi">10.1155/2014/598129</pub-id> <pub-id pub-id-type="pmid">25215285</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Yu</surname> <given-names>J. Z.</given-names></name> <name><surname>Zhu</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Wen</surname> <given-names>Z. K.</given-names></name></person-group> (<year>2014b</year>). <article-title>A MapReduce based parallel SVM for large-scale predicting protein&#x2013;protein interactions.</article-title> <source><italic>Neurocomputing</italic></source> <volume>145</volume> <fpage>37</fpage>&#x2013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2014.05.072</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>You</surname> <given-names>Z. H.</given-names></name> <name><surname>Zhu</surname> <given-names>L.</given-names></name> <name><surname>Zheng</surname> <given-names>C. H.</given-names></name> <name><surname>Yu</surname> <given-names>H.-J.</given-names></name> <name><surname>Deng</surname> <given-names>S.-P.</given-names></name> <name><surname>Ji</surname> <given-names>Z.</given-names></name><etal/></person-group> (<year>2014c</year>). <article-title>Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>15</volume>:<issue>S9</issue>. <pub-id pub-id-type="doi">10.1186/1471-2105-15-S15-S9</pub-id> <pub-id pub-id-type="pmid">25474679</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zahiri</surname> <given-names>J.</given-names></name> <name><surname>Mohammad-Noori</surname> <given-names>M.</given-names></name> <name><surname>Ebrahimpour</surname> <given-names>R.</given-names></name> <name><surname>Saadat</surname> <given-names>S.</given-names></name> <name><surname>Bozorgmehr</surname> <given-names>J. H.</given-names></name> <name><surname>Goldberg</surname> <given-names>T.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>LocFuse: human protein&#x2013;protein interaction prediction via classifier fusion using protein localization information.</article-title> <source><italic>Genomics</italic></source> <volume>104</volume> <fpage>496</fpage>&#x2013;<lpage>503</lpage>. <pub-id pub-id-type="doi">10.1016/j.ygeno.2014.10.006</pub-id> <pub-id pub-id-type="pmid">25458812</pub-id></citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zahiri</surname> <given-names>J.</given-names></name> <name><surname>Yaghoubi</surname> <given-names>O.</given-names></name> <name><surname>Mohammad-Noori</surname> <given-names>M.</given-names></name> <name><surname>Ebrahimpour</surname> <given-names>R.</given-names></name> <name><surname>Masoudi-Nejad</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>PPIevo: protein&#x2013;protein interaction prediction from PSSM based evolutionary information.</article-title> <source><italic>Genomics</italic></source> <volume>102</volume> <fpage>237</fpage>&#x2013;<lpage>242</lpage>. <pub-id pub-id-type="doi">10.1016/j.ygeno.2013.05.006</pub-id> <pub-id pub-id-type="pmid">23747746</pub-id></citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Yu</surname> <given-names>G.</given-names></name> <name><surname>Xia</surname> <given-names>D.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Protein&#x2013;protein interactions prediction based on ensemble deep neural networks.</article-title> <source><italic>Neurocomputing</italic></source> <volume>324</volume> <fpage>10</fpage>&#x2013;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005929</pub-id> <pub-id pub-id-type="pmid">29309403</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>Zhou</surname> <given-names>Y. S.</given-names></name> <name><surname>He</surname> <given-names>F.</given-names></name> <name><surname>Song</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name></person-group> (<year>2012</year>). <article-title>Can simple codon pair usage predict protein&#x2013;protein interaction?</article-title> <source><italic>Mol. Biosyst.</italic></source> <volume>8</volume> <fpage>1396</fpage>&#x2013;<lpage>1404</lpage>. <pub-id pub-id-type="doi">10.1039/c2mb05427b</pub-id> <pub-id pub-id-type="pmid">22392100</pub-id></citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Z. H.</given-names></name> <name><surname>Feng</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x201C;Deep forest: towards an alternative to deep neural networks,&#x201D; in</article-title> <source><italic>Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence</italic></source>, (<publisher-loc>Stockholm</publisher-loc>: <publisher-name>IJCAI</publisher-name>), <fpage>3553</fpage>&#x2013;<lpage>3559</lpage>. <pub-id pub-id-type="doi">10.24963/ijcai.2017/497</pub-id></citation></ref>
</ref-list>
</back>
</article>