<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">792265</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2021.792265</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A New Method for Recognizing Protein Complexes Based on Protein Interaction Networks and GO Terms</article-title>
<alt-title alt-title-type="left-running-head">Wang et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">NNP</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Xiaoting</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1513734/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Nan</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhao</surname>
<given-names>Yulan</given-names>
</name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Wang</surname>
<given-names>Juan</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/560593/overview"/>
</contrib>
</contrib-group>
<aff>School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, <addr-line>Hohhot</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/44919/overview">Jijun Tang</ext-link>, University of South Carolina, United&#x20;States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/744052/overview">Zhanchao Li</ext-link>, Guangdong Pharmaceutical University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1308946/overview">Ruofan Xia</ext-link>, Facebook (United&#x20;States), United&#x20;States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Juan Wang, <email>wangjuan@imu.edu.cn</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Statistical Genetics and Methodology, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>12</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>792265</elocation-id>
<history>
<date date-type="received">
<day>10</day>
<month>10</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>11</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Wang, Zhang, Zhao and Wang.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Wang, Zhang, Zhao and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>
<bold>Motivation:</bold> A protein complex is the combination of proteins which interact with each other. Protein&#x2013;protein interaction (PPI) networks are composed of multiple protein complexes. It is very difficult to recognize protein complexes from PPI data due to the noise of&#x20;PPI.</p>
<p>
<bold>Results:</bold> We proposed a new method, called Topology and Semantic Similarity Network (TSSN), based on topological structure characteristics and biological characteristics to construct the PPI. Experiments show that the TSSN can filter the noise of PPI data. We proposed a new algorithm, called Neighbor Nodes of Proteins (NNP), for recognizing protein complexes by considering their topology information. Experiments show that the algorithm can identify more protein complexes and more accurately. The recognition of protein complexes is vital in research on evolution analysis.</p>
<p>Availability and implementation: <ext-link ext-link-type="uri" xlink:href="https://github.com/bioinformatical-code/NNP">https://github.com/bioinformatical-code/NNP</ext-link>.</p>
</abstract>
<kwd-group>
<kwd>protein interaction network</kwd>
<kwd>protein complex</kwd>
<kwd>GO terms</kwd>
<kwd>NNP</kwd>
<kwd>function of proteins</kwd>
</kwd-group>
<contract-num rid="cn001">62002181 62061035</contract-num>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>The recognition for protein complexes based on the PPI network has become one of the most important channels in current research. Detection of protein complexes from PPI networks is an important work in the understanding of biological processes. It is also of great significance for researching mechanisms and developing new drugs. Researchers have put forward a variety of effective methods to recognize protein complexes. The MCODE algorithm chooses a vertex with the maximum weight as the initial cluster, and then recursively searches for the vertices that meet a threshold value to add to the cluster (<xref ref-type="bibr" rid="B3">Bader and Hogue, 2003</xref>). The DPClus is a modified algorithm that chooses the vertices with high connectivity with the present cluster iteratively (<xref ref-type="bibr" rid="B2">Altaf-Ul-Amin et&#x20;al., 2006</xref>). Jerarca uses the hierarchical cluster to partition the complexes based on the distance among proteins (<xref ref-type="bibr" rid="B1">Aldecoa and Mar&#xed;n, 2010</xref>). RNSC divides the complexes by means of a cost function (<xref ref-type="bibr" rid="B9">King et&#x20;al., 2004</xref>). MCL (<xref ref-type="bibr" rid="B5">Enright et&#x20;al., 2002</xref>) simulates network flow by constructing a similarity matrix, alternately performs expansion and inflation operations, and achieves clustering effect after multiple iterations. But the method is difficult to identify the complexes with little overlap. After that, an improved method was proposed which measured the reliability of PPI based on the annotations of protein function (<xref ref-type="bibr" rid="B4">Cho et&#x20;al., 2007</xref>). SCI-BN and ClusterM combine topology of PPI and biological information of sequences to identify complexes (<xref ref-type="bibr" rid="B22">Qi et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B25">Wang et&#x20;al., 2020</xref>).</p>
<p>Although these methods can effectively identify functional modules of proteins, they all ignore the internal structure of the modules. The basic structure of a protein complex is composed of the nucleus of a protein complex and all its subordinate proteins (<xref ref-type="bibr" rid="B7">Gavin et&#x20;al., 2006</xref>). So, a protein complex can be regarded as a subgraph with a nucleus and its subordinate proteins for assisting the nucleus to play a specific role. COACH (<xref ref-type="bibr" rid="B26">Wu et&#x20;al., 2009</xref>) and CORE (<xref ref-type="bibr" rid="B11">Leung et&#x20;al., 2009</xref>) are proposed based on the idea. The F-MCL algorithm combines firefly algorithm and MCL (<xref ref-type="bibr" rid="B10">Lei et&#x20;al., 2016</xref>). ClusterONE is a clustering algorithm guided by cohesion which can identify subgraphs of dense substructure (<xref ref-type="bibr" rid="B15">Nepusz et&#x20;al., 2012</xref>). However, the cohesion formula may lead to deviation in the clustering process. EA (<xref ref-type="bibr" rid="B8">Halim et&#x20;al., 2015</xref>) uses multi-population evolutionary algorithm to cluster the probability map. MNC is a novel clustering model based on multi networks which combines the shared clustering structure in PPI and domain&#x2013;domain interaction (DDI) networks in order to improve the accuracy of identification (<xref ref-type="bibr" rid="B18">Ou-Yang et&#x20;al., 2017</xref>). IdenPC-CAP recognizes protein complexes from the interaction networks consisting of RNA&#x2013;RNA interactions, RNA&#x2013;protein interactions, and PPIs (<xref ref-type="bibr" rid="B27">Wu et al., 2021</xref>). CSC uses both topological and biological characteristics to identify protein complexes (<xref ref-type="bibr" rid="B13">Liu et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B23">Sharma et&#x20;al., 2018</xref>). DPCMNE detects protein complexes <italic>via</italic> multilevel network embedding (<xref ref-type="bibr" rid="B14">Meng et al., 2021</xref>). PC2P formalizes protein complexes as biclique spanned subgraphs and converts the problem of detecting protein complex to coherent partition (<xref ref-type="bibr" rid="B17">Omranian et&#x20;al., 2021</xref>). A semi-supervised model based on non-negative matrix tri-factorization is also used to detect protein complex (<xref ref-type="bibr" rid="B12">Liu et&#x20;al., 2021</xref>). In the FCAN-PCI, the semantic similarity of proteins and the topology of PPI network are integrated into a fuzzy clustering model (<xref ref-type="bibr" rid="B19">Pan et&#x20;al., 2021</xref>). GECA proposes a model based on the gene expression and core-attachment (<xref ref-type="bibr" rid="B16">Noori et&#x20;al., 2021</xref>). The idenPC-MIIP method modifies the weights of original network by defining mutually important neighbors on the weighted network and then identifies protein complexes using a greedy algorithm (<xref ref-type="bibr" rid="B27">Wu et&#x20;al., 2021</xref>)</p>
</sec>
<sec sec-type="methods" id="s2">
<title>Methods</title>
<p>For a PPI network <italic>N</italic>, TSSN computes the edge aggregation coefficient as the topology characteristics of <italic>N</italic>, makes use of the GO annotation as the biological characteristics of <italic>N</italic>, and then constructs a weighted network. NNP identifies protein complexes based on this weighted network.</p>
<sec id="s2-1">
<title>TSSN</title>
<p>A PPI network can be seen as an undirected graph <italic>G</italic>&#x3d; (<italic>V</italic>, <italic>E</italic>), and each protein is a node in <italic>V</italic>. Two proteins interact with each other if and only if there is an edge between the two nodes representing two proteins. In order to describe the structural similarity among proteins in the PPI network, Jaccard coefficient between two nodes <italic>u</italic> and <italic>v</italic> in <italic>G</italic>&#x3d; (<italic>V</italic>, <italic>E</italic>) is defined as follows:<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x222a;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where <italic>N</italic>(<italic>u</italic>) [or <italic>N</italic>(<italic>v</italic>)] represents the set of all neighbor nodes of protein <italic>u</italic> (or <italic>v</italic>) in the network.</p>
<p>We adopted the simGIC method (<xref ref-type="bibr" rid="B24">Tian and Guo, 2017</xref>), which is an improved method from the GIC (<xref ref-type="bibr" rid="B20">Pesquita et&#x20;al., 2007</xref>) to calculate semantic similarity between proteins. Assuming that proteins <italic>u</italic> and <italic>v</italic> are annotated by term sets <italic>A</italic>
<sub>
<italic>&#x3d;</italic>
</sub>{<italic>T</italic>
<sub>1</sub>
<italic>, T</italic>
<sub>2</sub>
<italic>, &#x22ef;, T</italic>
<sub>
<italic>m</italic>
</sub>} and <italic>B</italic>
<sub>
<italic>&#x3d;</italic>
</sub>
<italic>{S</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, S</italic>
<sub>
<italic>2</italic>
</sub>
<italic>, &#x22ef;, S</italic>
<sub>
<italic>n</italic>
</sub>
<italic>}</italic> respectively, the semantic similarity between <italic>u</italic> and <italic>v</italic> is defined as follows:<disp-formula id="e2">
<mml:math id="m2">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>B</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>B</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>Where <italic>IC</italic>(<italic>A</italic>) is the set of {&#x2212;log(<italic>T</italic>
<sub>1</sub>), &#x2212;log(<italic>T</italic>
<sub>2</sub>),&#x2026;, &#x2212;log(<italic>T</italic>
<sub>
<italic>m</italic>
</sub>)}<italic>,</italic> and <italic>p</italic>(<italic>T</italic>
<sub>
<italic>i</italic>
</sub>) represents the times that GO terms or single function of protein appear in the specified term&#x20;data.</p>
<p>Here, the similarity between two proteins <italic>u</italic> and <italic>v</italic> is defined as the average between their topological similarity and semantic similarity, that is,<disp-formula id="e3">
<mml:math id="m3">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where the value of <italic>s</italic>(<italic>u,v</italic>) is&#x20;[0,1].</p>
</sec>
<sec id="s2-2">
<title>NNP</title>
<p>Given a weighted network <italic>G&#x3d;</italic> (<italic>V, E, W</italic>), where <italic>V &#x3d;</italic> {<italic>v</italic>
<sub>1</sub>
<italic>, v</italic>
<sub>2</sub>
<italic>, &#x22ef;, v</italic>
<sub>
<italic>m</italic>
</sub>}, <italic>E &#x3d;</italic> {<italic>e</italic>
<sub>1</sub>
<italic>, e</italic>
<sub>2</sub>
<italic>, &#x22ef;, e</italic>
<sub>
<italic>n</italic>
</sub>}, <italic>W &#x3d;</italic> {<italic>w</italic>(<italic>e</italic>
<sub>1</sub>)<italic>, w</italic>(<italic>e</italic>
<sub>2</sub>)<italic>, &#x22ef;, w</italic>(<italic>e</italic>
<sub>
<italic>n</italic>
</sub>)}, and <italic>w</italic>(<italic>e</italic>
<sub>
<italic>i</italic>
</sub>) represents the weight of the edge <italic>e</italic>
<sub>
<italic>i</italic>
</sub>. The distance between the nodes <italic>v</italic>
<sub>
<italic>i</italic>
</sub> and <italic>v</italic>
<sub>
<italic>j</italic>
</sub> is the minimum among all lengths of paths. <italic>V</italic>
<sub>
<italic>j</italic>
</sub> is denoted as the set of nodes with the distance 2 between <italic>v</italic>
<sub>
<italic>j</italic>
</sub>, which is referred to as the&#x20;set of second-order neighbor nodes between <italic>vj</italic>. The network <italic>G</italic>
<sub>
<italic>j</italic>
</sub>
<italic>&#x3d;</italic> (<italic>V</italic>
<sub>
<italic>j</italic>
</sub>
<italic>, E</italic>
<sub>
<italic>j</italic>
</sub>
<italic>, W</italic>
<sub>
<italic>j</italic>
</sub>) is derived by <italic>V</italic>
<sub>
<italic>j</italic>
</sub>. The weighed degree of <italic>v</italic>
<sub>
<italic>j</italic>
</sub> in <italic>G</italic> is defined as follows:<disp-formula id="e4">
<mml:math id="m4">
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where (<italic>v</italic>
<sub>
<italic>j</italic>
</sub>
<italic>, v</italic>
<sub>
<italic>i</italic>
</sub>)<inline-formula id="inf1">
<mml:math id="m5">
<mml:mo>&#x2208;</mml:mo>
</mml:math>
</inline-formula>
<italic>E</italic> and <italic>w</italic>(<italic>v</italic>
<sub>
<italic>j</italic>
</sub>
<italic>, v</italic>
<sub>
<italic>i</italic>
</sub>) indicates the weight of the edge between node <italic>j</italic> and node <italic>i</italic>. The average weighted degree of <italic>v</italic>
<sub>
<italic>j</italic>
</sub> in <italic>G</italic> is computed by the following equation:<disp-formula id="e5">
<mml:math id="m6">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>W</mml:mi>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mo>/</mml:mo>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>The weighted neighbor ratio is defined as follows:<disp-formula id="e6">
<mml:math id="m7">
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>W</mml:mi>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>
</p>
<p>In order to assess complexes, we compute the tightness degree of a complex <italic>G&#x3d;</italic> (<italic>V, E, W</italic>) as follows:<disp-formula id="e7">
<mml:math id="m8">
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>G</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>
</p>
<p>For two complexes C1 and C2, the overlap ratio (OL) between them is defined as follows:<disp-formula id="e8">
<mml:math id="m9">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#xb7;</mml:mo>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>
</p>
<p>NNP identifies complexes by four main steps. First, the NNP uses the TSSN method to compute the similarity among proteins, and then builds a PPI weighted network and neighbor networks. Second, it calculates a conditional threshold in order to reduce the noise, and then the network is transformed into a matrix, which is arranged in descending order according to the average weighted degree (AWD) of nodes to form a seed list. Third, it selects nodes from the seed list iteratively as the initial complex to cluster, and then removes or retains the node according to the weighted neighbor ratio (WN) until all nodes list are solved. Finally, it calculates the OL among protein complexes and judges whether the complexes are retained or discarded through the network tightness (WDt). Finally, the complex set was obtained. <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> shows the workflow of NNP. The pseudo code can be seen in the Algorithm.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Workflow of the NNP.</p>
</caption>
<graphic xlink:href="fgene-12-792265-g001.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="results|discussion" id="s3">
<title>Results and Discussion</title>
<p>In order to assess the TSSN method, we compare the protein complexes identified by three classical methods, that is, ClusterONE, MCODE, and MCL, respectively, based on the PPI networks with the weight computed by TSSN and the PPI networks without weight. We compare the results of protein complexes predicted by CFinder, ClusterONE, MCODE, MCL, EA, and NNP methods.</p>
<sec id="s3-1">
<title>Datasets</title>
<p>In all experiments, we use the PPI data of yeast downloaded from the DIP database (<ext-link ext-link-type="uri" xlink:href="https://dip.doe-mbi.ucla.edu/dip/Download.cgi?SM=7&amp;TX=4932">https://dip.doe-mbi.ucla.edu/dip/Download.cgi?SM&#x3d;7&#x26;TX&#x3d;4932</ext-link>), version 20170205. In order to reduce the noise of data, we delete the repeated interactions and the circle of a node to itself. Then the PPI network contains 5,115 nodes and 22,552 edges. GO annotations and ontology data of yeast are downloaded from the website (<ext-link ext-link-type="uri" xlink:href="http://www.geneontology.org/">http://www.geneontology.org/</ext-link>).</p>
</sec>
<sec id="s3-2">
<title>Reference Sets</title>
<p>Here, two standard sets, namely, CYC2008 (<xref ref-type="bibr" rid="B21">Pu et&#x20;al., 2009</xref>) and NewMIPS (<xref ref-type="bibr" rid="B6">Friedel et&#x20;al., 2008</xref>), are used in the experiments, where CYC2008 is downloaded from (<ext-link ext-link-type="uri" xlink:href="http://wodaklab.org/cyc2008/downloads">http://wodaklab.org/cyc2008/downloads</ext-link>). These data are predicted by biological methods, including 408 complexes and 1,628 proteins. The NewMIPS is a set of protein complexes, including 428 complexes and 1,171 proteins.</p>
</sec>
<sec id="s3-3">
<title>Metrics</title>
<p>For a prediction algorithm, its effectiveness is measured by four indexes: recall, precision, F1, and overlap ratio. The recall value <italic>R</italic> is the ratio of the number of complexes which are identified by methods and matched with the complexes in the standard set to the number of complexes in the standard set; the precision value <italic>P</italic> is the ratio of the number of complexes which are identified by methods and matched with the complexes in the standard set to the number of all complexes identified by the algorithm. F1 is the harmonic average of <italic>P</italic> and <italic>R</italic>, that is,<disp-formula id="e9">
<mml:math id="m10">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>
</p>
<p>To judge the biological significance of complexes, a functional enrichment analysis is used to analyze the gene annotation information in the GO database, that is, <italic>p</italic>-value. The calculation method is given as follows:<disp-formula id="e10">
<mml:math id="m11">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi>i</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mstyle>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>where <italic>m</italic> is the number of identified complexes that are the same as those in the standard data set, <italic>F</italic> the complexes in the standard data set, <italic>V</italic> the number of proteins contained in the PPI network, and <italic>C</italic> the number of identified complexes. Here, if <italic>p-value</italic> is less than 0.01, the complex is regarded with biological significance.</p>
</sec>
</sec>
<sec sec-type="results" id="s4">
<title>Results</title>
<p>In all recorded experimental results, we use CYC2008 as the standard set and set the threshold of OL as 0.2. OL represents the overlap rate between the two complexes. The value of OL being 0.2 indicates that the identified complex is considered correct when the OL with the standard complex reaches&#x20;0.2.</p>
<table-wrap id="t02">
<label>Algorithm</label>
<caption>
<p>detecting protein complexes.</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td>
<inline-graphic xlink:href="fgene-12-792265-fx1.tif"/>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<xref ref-type="table" rid="T1">Table&#x20;1</xref> shows the results. For each method in <xref ref-type="table" rid="T1">Table&#x20;1</xref>, u represents the methods that are used to identify the complexes from the unweighted networks and T represents the methods that are used to identify the complexes from the weighted networks computed by the TSSN. From <xref ref-type="table" rid="T1">Table&#x20;1</xref>, we can see that the precision values for the weighted networks computed by the TSSN method are higher than those for the unweighted networks. So the TSSN method is efficient for computing the weigh values of networks.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Results of methods are used in the unweighted networks and weighted networks computed by the TSSN.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Metrics<break/>Method</th>
<th align="center">
<italic>R</italic>
</th>
<th align="center">
<italic>P</italic>
</th>
<th align="center">F1</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">ClusterOne-u</td>
<td align="char" char=".">0.32</td>
<td align="char" char=".">0.415</td>
<td align="char" char=".">0.361</td>
</tr>
<tr>
<td align="left">ClusterOne-T</td>
<td align="char" char=".">
<bold>0.34</bold>
</td>
<td align="char" char=".">
<bold>0.43</bold>
</td>
<td align="char" char=".">
<bold>0.38</bold>
</td>
</tr>
<tr>
<td align="left">MCODE-u</td>
<td align="char" char=".">0.21</td>
<td align="char" char=".">0.49</td>
<td align="char" char=".">0.294</td>
</tr>
<tr>
<td align="left">MCODE-T</td>
<td align="char" char=".">
<bold>0.23</bold>
</td>
<td align="char" char=".">
<bold>0.51</bold>
</td>
<td align="char" char=".">
<bold>0.317</bold>
</td>
</tr>
<tr>
<td align="left">MCL-u</td>
<td align="char" char=".">0.58</td>
<td align="char" char=".">0.21</td>
<td align="char" char=".">0.308</td>
</tr>
<tr>
<td align="left">MCL-T</td>
<td align="char" char=".">
<bold>0.605</bold>
</td>
<td align="char" char=".">
<bold>0.228</bold>
</td>
<td align="char" char=".">
<bold>0.331</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold values represents the experimental results on ClusterOne, MCode and MCL weighted by the TSSN method.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>The precision results of the NNP algorithm depend on the thresholds of weighted neighbor ratio (WNT). <xref ref-type="table" rid="T2">Table&#x20;2</xref> shows that F1 values gradually increase with the increase in <italic>t</italic> values if the thresholds of WNT is (0,0.2), and F1 gradually decreases as a whole if the t values of WNT continue to increase from 0.2. So F1 can reach the maximum 0.42 if values of WNT are (0.2, 0.25). <xref ref-type="table" rid="T3">Table&#x20;3</xref> shows the precision values of NNP on different thresholds of WNT. When the WNT value is 0.22, the precision is 0.5, which is slightly higher than the other five values. Therefore, it is reasonable for the NNP algorithm to set the threshold of the WNT as&#x20;0.22.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>F1 values of NNP on different thresholds of WNT.</p>
</caption>
<table>
<tbody valign="top">
<tr>
<td align="left">
<italic>t</italic>
</td>
<td align="center">0</td>
<td align="center">0.1</td>
<td align="center">0.2</td>
<td align="center">0.3</td>
<td align="center">0.4</td>
<td align="center">0.5</td>
<td align="center">0.6</td>
<td align="center">0.7</td>
<td align="center">0.8</td>
<td align="center">0.9</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">F1</td>
<td align="char" char=".">0.4</td>
<td align="char" char=".">0.41</td>
<td align="char" char=".">
<bold>0.42</bold>
</td>
<td align="char" char=".">0.41</td>
<td align="char" char=".">0.4</td>
<td align="char" char=".">0.39</td>
<td align="char" char=".">0.395</td>
<td align="char" char=".">0.37</td>
<td align="char" char=".">0.3</td>
<td align="char" char=".">0.2</td>
<td align="char" char=".">0.13</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold values shows that when the threshold t is 0.2, the value of F1 reaches a maximum of 0.42.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Precision values of NNP on different thresholds of WNT.</p>
</caption>
<table>
<tbody valign="top">
<tr>
<td align="left">t</td>
<td align="center">0.2</td>
<td align="center">0.21</td>
<td align="center">0.22</td>
<td align="center">0.23</td>
<td align="center">0.24</td>
<td align="center">0.25</td>
</tr>
<tr>
<td align="left">Precision</td>
<td align="char" char=".">0.491</td>
<td align="char" char=".">0.492</td>
<td align="char" char=".">
<bold>0.5</bold>
</td>
<td align="char" char=".">0.495</td>
<td align="char" char=".">0.493</td>
<td align="char" char=".">0.493</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold values shows that when the threshold t is 0.5, the precision value reaches the maximum 0.5.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>
<xref ref-type="table" rid="T4">Table&#x20;4</xref> lists the comparison of the cluster information identified by the six algorithms compared with CYC2008. CYC2008 is selected as the benchmark, and its average size is 4.71; the closer the average size of the cluster identified by a method is to 4.71, the more accurate the method is. Among the six algorithms, the average size of clusters identified by the NNP is 4.54, which is closest to the size of clusters in the standard data. So the recognition result of NNP has high theoretical reliability.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Each algorithm identifies the cluster information.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">No.</th>
<th align="center">Algorithm</th>
<th align="center">Number</th>
<th align="center">Average</th>
<th align="center">Coverage</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">1</td>
<td align="left">CYC2008</td>
<td align="char" char=".">408</td>
<td align="char" char=".">4.71</td>
<td align="char" char=".">1,628</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">CFinder</td>
<td align="char" char=".">178</td>
<td align="char" char=".">11.31</td>
<td align="char" char=".">2,147</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">ClusterONE</td>
<td align="char" char=".">413</td>
<td align="char" char=".">5</td>
<td align="char" char=".">1898</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">MCODE</td>
<td align="char" char=".">110</td>
<td align="char" char=".">6.5</td>
<td align="char" char=".">1,299</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">NNP</td>
<td align="char" char=".">538</td>
<td align="char" char=".">4.54</td>
<td align="char" char=".">1937</td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">MCL</td>
<td align="char" char=".">623</td>
<td align="char" char=".">6.57</td>
<td align="char" char=".">4096</td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">EA</td>
<td align="char" char=".">398</td>
<td align="char" char=".">13.5</td>
<td align="char" char=".">2,661</td>
</tr>
<tr>
<td align="left">8</td>
<td align="left">PC2P</td>
<td align="char" char=".">434</td>
<td align="char" char=".">4.50</td>
<td align="char" char=".">1953</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<xref ref-type="table" rid="T5">Table&#x20;5</xref> shows the results identified by the CFinder, ClusterONE, MCODE, MCL, EA, NNP, and PC2P methods for three complexes randomly selected from DIP. CFI is the mRNA cleavage factor complex with size 5; NEC is the nuclear exosome complex with size 12, and DRC is the DNA-directed RNA polymerase II complex. The table shows that six methods recognize the same proteins as the CYC2008 for the CFI, that is, OL 100%, OL of NNP, and MCL is both 100% for NEC. The OL of PC2P is 83.3%. The OL of EA and that of MCODE are the same, which is 91.7%, ranking second. There is one missed protein: YHR081W. CFinder has two missed proteins and the OL is 84%. The OL of PC2P is 83.3%. So, the accuracy of ClusterONE is low. For DRC, the performance of NNP and ClusterONE is better, while the OL value of EA is 83.3%. There are many omissive and wrong proteins detected by CFinder, MCODE, MCL, and PC2P. The OL of CFinder is 56.3%. The OL of PC2P is only&#x20;53.3%.</p>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Three complexes identified by methods were analyzed from the DIP.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Algorithm<break/>Protein complex</th>
<th align="center">CFinder (%)</th>
<th align="center">Cluster<break/>-ONE</th>
<th align="center">MCODE (%)</th>
<th align="center">NNP (%)</th>
<th align="center">MCL (%)</th>
<th align="center">EA (%)</th>
<th align="center">PC2P (%)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">CFI</td>
<td align="char" char=".">100</td>
<td align="char" char=".">100%</td>
<td align="char" char=".">100</td>
<td align="char" char=".">100</td>
<td align="char" char=".">100</td>
<td align="char" char=".">100</td>
<td align="char" char=".">83.3</td>
</tr>
<tr>
<td align="left">NEC</td>
<td align="char" char=".">83.3</td>
<td align="char" char=".">64.1%</td>
<td align="char" char=".">91.7</td>
<td align="char" char=".">100</td>
<td align="char" char=".">100</td>
<td align="char" char=".">91.7</td>
<td align="char" char=".">83.3</td>
</tr>
<tr>
<td align="left">DRC</td>
<td align="char" char=".">56.3</td>
<td align="char" char=".">100%</td>
<td align="char" char=".">61.4</td>
<td align="char" char=".">91.7</td>
<td align="char" char=".">67.5</td>
<td align="char" char=".">83.3</td>
<td align="char" char=".">53.3</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<xref ref-type="table" rid="T6">Table&#x20;6</xref> shows the results of six methods. In terms of precision, the value of CFinder is lowest, which is only 26.98%, and the value of NNP is largest compared with other algorithms, reaching 51.07%. The precision of MCODE lists second, reaching 50.1%. Although the precision of MCODE is high, the recall is low, which leads to the low F1 value. From the table, it is obvious that the F1 of NNP is max among all other methods. So NNP has better accuracy in identifying protein complexes than other methods.</p>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>Results of protein complexes recognized by algorithms.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Metrics method</th>
<th align="center">
<italic>R</italic>
</th>
<th align="center">
<italic>P</italic>
</th>
<th align="center">F1</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">CFinder</td>
<td align="char" char=".">0.3408</td>
<td align="char" char=".">0.2698</td>
<td align="char" char=".">0.3012</td>
</tr>
<tr>
<td align="left">ClusterONE</td>
<td align="char" char=".">0.4068</td>
<td align="char" char=".">0.3554</td>
<td align="char" char=".">0.3794</td>
</tr>
<tr>
<td align="left">MCODE</td>
<td align="char" char=".">0.2293</td>
<td align="char" char=".">0.501</td>
<td align="char" char=".">0.3146</td>
</tr>
<tr>
<td align="left">NNP</td>
<td align="char" char=".">
<bold>0.3515</bold>
</td>
<td align="char" char=".">
<bold>0.5107</bold>
</td>
<td align="char" char=".">
<bold>0.4164</bold>
</td>
</tr>
<tr>
<td align="left">MCL</td>
<td align="char" char=".">0.3326</td>
<td align="char" char=".">0.4093</td>
<td align="char" char=".">0.367</td>
</tr>
<tr>
<td align="left">EA</td>
<td align="char" char=".">0.34</td>
<td align="char" char=".">0.383</td>
<td align="char" char=".">0.3602</td>
</tr>
<tr>
<td align="left">PC2P</td>
<td align="char" char=".">0.4340</td>
<td align="char" char=".">0.1935</td>
<td align="char" char=".">0.2677</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold values show that the experimental results of the NNP method are optimal.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>
<xref ref-type="table" rid="T7">Table&#x20;7</xref> lists the number of protein complexes identified by CFinder, ClusterONE, MCODE, MCL, EA, NNP, and PC2P from DIP data set, matched with CYC2008. As shown in <xref ref-type="table" rid="T7">Table&#x20;7</xref>, the protein complexes identified by NNP based on the DIP data set are perfectly matched with 17 protein complexes. The MCODE only has six complexes perfectly matched with the standard set. The PC2P has no perfectly matched complex with the standard set. Therefore, compared with other algorithms, the NNP algorithm can accurately and perfectly match more protein complexes on the DIP data&#x20;set.</p>
<table-wrap id="T7" position="float">
<label>TABLE 7</label>
<caption>
<p>Numbers of protein complexes perfectly matched by each algorithm for DIP data&#x20;set.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Algorithm</th>
<th align="center">Perfect matching</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">CFinder</td>
<td align="char" char=".">11</td>
</tr>
<tr>
<td align="left">ClusterONE</td>
<td align="char" char=".">10</td>
</tr>
<tr>
<td align="left">MCODE</td>
<td align="char" char=".">6</td>
</tr>
<tr>
<td align="left">NNP</td>
<td align="char" char=".">
<bold>17</bold>
</td>
</tr>
<tr>
<td align="left">MCL</td>
<td align="char" char=".">15</td>
</tr>
<tr>
<td align="left">EA</td>
<td align="char" char=".">14</td>
</tr>
<tr>
<td align="left">PC2P</td>
<td align="char" char=".">0</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold values show that the experimental results of the NNP method are optimal.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>
<xref ref-type="table" rid="T8">Table&#x20;8</xref> lists some protein complexes with low <italic>p-values</italic> identified by the NNP algorithm on the DIP, which can show that the protein complexes identified by the NNP algorithm have significant biological significance. <xref ref-type="table" rid="T9">Table&#x20;9</xref> lists three protein complexes perfectly matched with DIP and NewMIPS identified by the NNP method.</p>
<table-wrap id="T8" position="float">
<label>TABLE 8</label>
<caption>
<p>Protein complexes with lower <italic>p</italic>-value identified by the algorithm on the DIP.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">GO term</th>
<th align="center">OL (%)</th>
<th align="center">
<italic>p</italic>-value</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">mRNA processing</td>
<td align="char" char=".">96</td>
<td align="center">1.54E-36</td>
</tr>
<tr>
<td align="left">Small nuclear ribonucleo protein complex</td>
<td align="char" char=".">86.1</td>
<td align="center">2.73E-58</td>
</tr>
<tr>
<td align="left">mRNA splicing, <italic>via</italic> spliceosome</td>
<td align="char" char=".">95.7</td>
<td align="center">4.48E-38</td>
</tr>
<tr>
<td align="left">Transferase activity, transferring glycosyl groups</td>
<td align="char" char=".">89.59</td>
<td align="center">1.81E-76</td>
</tr>
<tr>
<td align="left">Ribosomal small subunit biogenesis</td>
<td align="char" char=".">88.2</td>
<td align="center">2.45E-48</td>
</tr>
<tr>
<td align="left">Transporter activity</td>
<td align="char" char=".">94.38</td>
<td align="center">6.84E-100</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T9" position="float">
<label>TABLE 9</label>
<caption>
<p>Algorithm perfectly matches the protein complex on the DIP.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">GO term</th>
<th align="center">OL (%)</th>
<th align="center">
<italic>p</italic>-value</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">mRNA metabolic process</td>
<td align="char" char=".">100</td>
<td align="center">7.37E-27</td>
</tr>
<tr>
<td align="left">Anaphase-promoting complex&#x2013;dependent catabolic process</td>
<td align="char" char=".">100</td>
<td align="center">4.68E-24</td>
</tr>
<tr>
<td align="left">Polyadenylation-dependent snoRNA 3&#x2032;-end processing</td>
<td align="char" char=".">100</td>
<td align="center">1.45E-32</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec sec-type="conclusion" id="s5">
<title>Conclusion</title>
<p>Considering the topological structure of the PPI network, it introduces the gene ontology in biological information. We propose the methods for computing weight of protein interaction network and the recognizing of protein complexes on the weighted network. By comparing with other algorithms, the TSSN method based on topological features and GO term similarity can filter the noise, which can reduce the impact of noise data. The NNP algorithm can identify the protein complexes. The experimental results show that the NNP is superior to other classical algorithms.</p>
<p>In the future, we will adopt new technologies to detect false-positive edges and predict false-negative edges in the PPI network, thus improving the quality of the PPI network. Machine learning methods will be used to detect protein complexes based on their biological characteristics. Finally, since static PPI networks only contain the interaction between proteins and cannot reflect the dynamic characteristics of proteins interactions over time, we will study how to build a dynamic PPI network and identify protein complexes in the dynamic network.</p>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>XW, NZ, and JW proposed and designed the method. XW and NZ performed the experiments. All authors wrote the manuscript.</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>This work has been supported by the National Natural Science Foundations of China (62002181, 62061035) and the Self-topic/Open Project of Ecological Big Data Engineering Research Center of the Ministry of Education.</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aldecoa</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Mar&#xed;n</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Jerarca: Efficient Analysis of Complex Networks Using Hierarchical Clustering</article-title>. <source>PLoS ONE</source> <volume>5</volume> (<issue>7</issue>), <fpage>e11585</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0011585</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altaf-Ul-Amin</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Shinbo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Mihara</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kurokawa</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kanaya</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Development and Implementation of an Algorithm for Detection of Protein Complexes in Large Interaction Networks</article-title>. <source>BMC bioinformatics</source> <volume>7</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-7-207</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bader</surname>
<given-names>G. D.</given-names>
</name>
<name>
<surname>Hogue</surname>
<given-names>C. W.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>An Automated Method for Finding Molecular Complexes in Large Protein Interaction Networks</article-title>. <source>BMC bioinformatics</source> <volume>4</volume> (<issue>1</issue>), <fpage>2</fpage>&#x2013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-4-2</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cho</surname>
<given-names>Y.-R.</given-names>
</name>
<name>
<surname>Hwang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Ramanathan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Semantic Integration to Identify Overlapping Functional Modules in Protein Interaction Networks</article-title>. <source>BMC bioinformatics</source> <volume>8</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-8-265</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Enright</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Van Dongen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ouzounis</surname>
<given-names>C. A.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>An Efficient Algorithm for Large-Scale Detection of Protein Families</article-title>. <source>Nucleic Acids Res.</source> <volume>30</volume> (<issue>7</issue>), <fpage>1575</fpage>&#x2013;<lpage>1584</lpage>. <pub-id pub-id-type="doi">10.1093/nar/30.7.1575</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedel</surname>
<given-names>C. C.</given-names>
</name>
<name>
<surname>Krumsiek</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zimmer</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2009</year>). &#x201c;<article-title>Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast</article-title>,&#x201d; in <source>Annual International Conference on Research in Computational Molecular Biology</source>, <volume>16</volume>, <fpage>971</fpage>&#x2013;<lpage>987</lpage>. <pub-id pub-id-type="doi">10.1089/cmb.2009.0023</pub-id>
<source>J.&#x20;Comput. Biol.</source> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gavin</surname>
<given-names>A.-C.</given-names>
</name>
<name>
<surname>Aloy</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Grandi</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Krause</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Boesche</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Marzioch</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2006</year>). <article-title>Proteome Survey Reveals Modularity of the Yeast Cell&#x20;Machinery</article-title>. <source>Nature</source> <volume>440</volume> (<issue>7084</issue>), <fpage>631</fpage>&#x2013;<lpage>636</lpage>. <pub-id pub-id-type="doi">10.1038/nature04532</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Halim</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Waqas</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hussain</surname>
<given-names>S. F.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Clustering Large Probabilistic Graphs Using Multi-Population Evolutionary Algorithm</article-title>. <source>Inf. Sci.</source> <volume>317</volume>, <fpage>78</fpage>&#x2013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2015.04.043</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>King</surname>
<given-names>A. D.</given-names>
</name>
<name>
<surname>Przulj</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Jurisica</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Protein Complex Prediction via Cost-Based Clustering</article-title>. <source>Bioinformatics</source> <volume>20</volume> (<issue>17</issue>), <fpage>3013</fpage>&#x2013;<lpage>3020</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bth351</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lei</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>F.-X.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Pedrycz</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Protein Complex Identification through Markov Clustering with Firefly Algorithm on Dynamic Protein-Protein Interaction Networks</article-title>. <source>Inf. Sci.</source> <volume>329</volume>, <fpage>303</fpage>&#x2013;<lpage>316</lpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2015.09.028</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leung</surname>
<given-names>H. C. M.</given-names>
</name>
<name>
<surname>Xiang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Yiu</surname>
<given-names>S. M.</given-names>
</name>
<name>
<surname>Chin</surname>
<given-names>F. Y. L.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Predicting Protein Complexes from Ppi Data: a Core-Attachment Approach</article-title>. <source>J.&#x20;Comput. Biol.</source> <volume>16</volume> (<issue>2</issue>), <fpage>133</fpage>&#x2013;<lpage>144</lpage>. <pub-id pub-id-type="doi">10.1089/cmb.2008.01TT</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Identifying Protein Complexes with Clear Module Structure Using Pairwise Constraints in Protein Interaction Networks</article-title>. <source>Front. Genet.</source> <volume>12</volume>, <fpage>786</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2021.664786</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Jeon</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A Network Hierarchy-Based Method for Functional Module Detection in Protein-Protein Interaction Networks</article-title>. <source>J.&#x20;Theor. Biol.</source> <volume>455</volume>, <fpage>26</fpage>&#x2013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1016/j.jtbi.2018.06.026</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meng</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xiang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>DPCMNE: Detecting Protein Complexes from Protein-Protein Interaction Networks via Multi-Level Network Embedding</article-title>. <source>Ieee/acm Trans. Comput. Biol. Bioinf.</source>, <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2021.3050102</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nepusz</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Paccanaro</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Detecting Overlapping Protein Complexes in Protein-Protein Interaction Networks</article-title>. <source>Nat. Methods</source> <volume>9</volume> (<issue>5</issue>), <fpage>471</fpage>&#x2013;<lpage>472</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.1938</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Noori</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Al-A&#x2019;Araji</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Al-Shamery</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Identifying Protein&#x20;Complexes from Protein-Protein Interaction Networks Based on the Gene Expression Profile and Core-Attachment Approach</article-title>. <source>J.&#x20;Bioinform. Comput. Biol.</source> <volume>19</volume> (<issue>3</issue>), <fpage>2150009</fpage>. <pub-id pub-id-type="doi">10.1142/S0219720021500098</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Omranian</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Angeleska</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Nikoloski</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>PC2P: Parameter-free Network-Based Prediction of Protein Complexes</article-title>. <source>Bioinformatics</source> <volume>37</volume> (<issue>1</issue>), <fpage>73</fpage>&#x2013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btaa1089</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ou-Yang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.-F.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A Multi-Network Clustering Method for Detecting Protein Complexes from Multiple Heterogeneous Networks</article-title>. <source>BMC bioinformatics</source> <volume>18</volume> (<issue>13</issue>), <fpage>23</fpage>&#x2013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1186/s12859-017-1877-4</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>Z.-H.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Identifying Protein Complexes from Protein-Protein Interaction Networks Based on Fuzzy Clustering and GO Semantic Information</article-title>. <source>Ieee/acm Trans. Comput. Biol. Bioinf.</source> <volume>14</volume> (<issue>8</issue>), <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2021.3095947</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pesquita</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Faria</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bastos</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Falcao</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Couto</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>July)Evaluating Go-Based Semantic Similarity Measures</article-title>. <source>Proc. 10th Annu. Bio-Ontologies Meet.</source> <volume>37</volume> (<issue>40</issue>), <fpage>38</fpage>. </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Cho</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Wodak</surname>
<given-names>S. J.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Up-to-date Catalogues of Yeast Protein Complexes</article-title>. <source>Nucleic Acids Res.</source> <volume>37</volume> (<issue>3</issue>), <fpage>825</fpage>&#x2013;<lpage>831</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkn1005</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qi</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Balem</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Klein-Seetharaman</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bar-Joseph</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Protein Complex Identification by Supervised Graph Local Clustering</article-title>. <source>Bioinformatics</source> <volume>24</volume> (<issue>13</issue>), <fpage>i250</fpage>&#x2013;<lpage>i268</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btn164</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Bhattacharyya</surname>
<given-names>D. K.</given-names>
</name>
<name>
<surname>Kalita</surname>
<given-names>J.&#x20;K.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Detecting Protein&#x20;Complexes Based on a Combination of Topological and&#x20;Biological Properties in Protein-Protein Interaction Network</article-title>.&#x20;<source>J.&#x20;Genet. Eng. Biotechnol.</source> <volume>16</volume> (<issue>1</issue>), <fpage>217</fpage>&#x2013;<lpage>226</lpage>. <pub-id pub-id-type="doi">10.1016/j.jgeb.2017.11.005</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tian</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>M. Z.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>An Improved Method for Measuring the Functional Similarity of Genes</article-title>. <source>Intell. Comp. Appl.</source> <volume>7</volume> (<issue>5</issue>), <fpage>123</fpage>&#x2013;<lpage>126</lpage>. <pub-id pub-id-type="doi">10.3969/j.issn.2095-2163.2017.05.034</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Jeong</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yoon</surname>
<given-names>B.-J.</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>ClusterM: a Scalable Algorithm for Computational Prediction of Conserved Protein Complexes across Multiple Protein Interaction Networks</article-title>. <source>BMC genomics</source> <volume>21</volume> (<issue>10</issue>), <fpage>1</fpage>&#x2013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1186/s12864-020-07010-1</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Kwoh</surname>
<given-names>C.-K.</given-names>
</name>
<name>
<surname>Ng</surname>
<given-names>S.-K.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>A Core-Attachment Based Method to Detect Protein Complexes in Ppi Networks</article-title>. <source>BMC bioinformatics</source> <volume>10</volume> (<issue>1</issue>), <fpage>1</fpage>&#x2013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-10-169</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>idenPC-CAP: Identify Protein Complexes from Weighted RNA-Protein Heterogeneous Interaction Networks Using Co-assemble Partner Relation</article-title>. <source>Brief. Bioinform.</source> <volume>22</volume> (<issue>4</issue>), <fpage>bbaa372</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbaa372</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>idenPC-MIIP: Identify Protein Complexes from Weighted PPI Networks Using Mutual Important Interacting Partner Relation</article-title>. <source>Brief. Bioinformatics</source> <volume>22</volume> (<issue>2</issue>), <fpage>1972</fpage>&#x2013;<lpage>1983</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbaa016</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>