<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2014.00302</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research Article</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Proteins comparison through probabilistic optimal structure local alignment</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Micale</surname> <given-names>Giovanni</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/157967"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Pulvirenti</surname> <given-names>Alfredo</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/89738"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Giugno</surname> <given-names>Rosalba</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/71882"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Ferro</surname> <given-names>Alfredo</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://community.frontiersin.org/people/u/31052"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Computer Science, University of Pisa</institution> <country>Pisa, Italy</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Clinical and Molecular Biomedicine, University of Catania</institution> <country>Catania, Italy</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Alfredo Benso, Politecnico di Torino, Italy</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Lennart Martens, Ghent University and VIB, Belgium; Yu Xue, Huazhong University of Science and Technology, China</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Alfredo Pulvirenti, Department of Clinical and Molecular Biomedicine, University of Catania, Via Andrea Doria 6, 95037 Catania, Italy e-mail: <email>apulvirenti&#x00040;dmi.unict.it</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics.</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>09</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="collection">
<year>2014</year>
</pub-date>
<volume>5</volume>
<elocation-id>302</elocation-id>
<history>
<date date-type="received">
<day>31</day>
<month>05</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>08</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2014 Micale, Pulvirenti, Giugno and Ferro.</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract><p>Multiple local structure comparison helps to identify common structural motifs or conserved binding sites in 3D structures in distantly related proteins. Since there is no best way to compare structures and evaluate the alignment, a wide variety of techniques and different similarity scoring schemes have been proposed. Existing algorithms usually compute the best superposition of two structures or attempt to solve it as an optimization problem in a simpler setting (e.g., considering contact maps or distance matrices). Here, we present PROPOSAL (PROteins comparison through Probabilistic Optimal Structure local ALignment), a stochastic algorithm based on iterative sampling for multiple local alignment of protein structures. Our method can efficiently find conserved motifs across a set of protein structures. Only the distances between all pairs of residues in the structures are computed. To show the accuracy and the effectiveness of PROPOSAL we tested it on a few families of protein structures. We also compared PROPOSAL with two state-of-the-art tools for pairwise local alignment on a dataset of manually annotated motifs. PROPOSAL is available as a Java 2D standalone application or a command line program at <ext-link ext-link-type="uri" xlink:href="http://ferrolab.dmi.unict.it/proposal/proposal.html">http://ferrolab.dmi.unict.it/proposal/proposal.html</ext-link>.</p></abstract>
<kwd-group>
<kwd>structure comparison</kwd>
<kwd>protein comparison</kwd>
<kwd>local alignment</kwd>
<kwd>protein families</kwd>
<kwd>motifs identification</kwd>
<kwd>binding sites identification</kwd>
</kwd-group>
<counts>
<fig-count count="12"/>
<table-count count="3"/>
<equation-count count="9"/>
<ref-count count="47"/>
<page-count count="11"/>
<word-count count="6933"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="s1">
<title>1. Introduction</title>
<p>Protein function is commonly deduced by sequence analysis. On the other hand, most protein interactions, such as catalytic activity or gene regulation (transcription, maturation, etc.), depend on sub-regions of their 3D structures, called structural or binding motifs. Havranek and Baker (<xref ref-type="bibr" rid="B12">2009</xref>) show that the identification of protein-DNA interactions can help discover placements for the protein backbone. This contributes to identify the desired position and interaction of the side-chain atoms, which are responsible for protein function.</p>
<p>Since the structure of many proteins is still unknown and proteins with similar structural motifs often exhibit similar biological properties even when they are distantly related, 3D structure comparison can help characterize the role of many proteins. As stated in Eidhammer et al. (<xref ref-type="bibr" rid="B8">2000</xref>), there is no best way to make the comparison or to evaluate the alignments. Since no notion of common ancestor exists, there is a huge variety of plausible relatedness models. Forthermore, from an algorithmic standpoint, 3D structure comparison is an NP-hard problem (Goldman et al., <xref ref-type="bibr" rid="B11">1999</xref>). Structural comparison is usually performed by local alignments since these are more sensitive than the global ones. Indeed, proteins with dissimilar folds may share common binding sites or interfaces. Some of them start from a specified motif (called template) in a query protein structure and search for similarities in a reference set of 3D structures.</p>
<p>MolLoc (Angaran et al., <xref ref-type="bibr" rid="B1">2009</xref>) is a web server for comparing known binding sites, cavities or user-defined sets of residues of two or more molecular surfaces. The algorithm builds a structural alignment maximizing the extension of surface superposition. MultiBind (Shatsky et al., <xref ref-type="bibr" rid="B35">2006</xref>; Peleg et al., <xref ref-type="bibr" rid="B33">2008</xref>) recognizes common spatial chemical binding patterns in a set of proteins by solving a 3D k-partite matching problem through efficient geometric hashing techniques. MAPPIS (Peleg et al., <xref ref-type="bibr" rid="B32">2007</xref>, <xref ref-type="bibr" rid="B33">2008</xref>) relies on a similar algorithm and performs multiple alignment of protein-protein interfaces, predicting hot spot residues that contribute to the conserved patterns of the interactions. LabelHash (Moll et al., <xref ref-type="bibr" rid="B28">2010</xref>), in a preprocessing phase, builds reference hash sets to guarantee instant lookup of partial motif matches. Then, these latter are expanded using a variant of the match augmentation algorithm (Chen et al., <xref ref-type="bibr" rid="B5">2007</xref>). In general, the matching task can be performed with a few algorithmic techniques, such as linear programming (Lancia et al., <xref ref-type="bibr" rid="B24">2001</xref>; Wohlers et al., <xref ref-type="bibr" rid="B44">2009</xref>), dynamic programming (Orengo and Taylor, <xref ref-type="bibr" rid="B31">1996</xref>; Jung and Lee, <xref ref-type="bibr" rid="B19">2000</xref>; Ye and Godzik, <xref ref-type="bibr" rid="B47">2003</xref>), depth-first searching (Stark and Russell, <xref ref-type="bibr" rid="B38">2003</xref>; Ausiello et al., <xref ref-type="bibr" rid="B2">2005</xref>; Chen et al., <xref ref-type="bibr" rid="B5">2007</xref>), graph theory (Jambon et al., <xref ref-type="bibr" rid="B18">2003</xref>; Spriggs et al., <xref ref-type="bibr" rid="B37">2003</xref>; Hofbauer et al., <xref ref-type="bibr" rid="B13">2004</xref>; Huan et al., <xref ref-type="bibr" rid="B17">2006</xref>; Weskamp et al., <xref ref-type="bibr" rid="B43">2007</xref>; Najmanovich et al., <xref ref-type="bibr" rid="B30">2008</xref>; Konc and Janezic, <xref ref-type="bibr" rid="B22">2010</xref>), geometric hashing (Bachar et al., <xref ref-type="bibr" rid="B3">1993</xref>; Wallace et al., <xref ref-type="bibr" rid="B41">1997</xref>; Shatsky et al., <xref ref-type="bibr" rid="B35">2006</xref>; Moll et al., <xref ref-type="bibr" rid="B28">2010</xref>), Markov chains and Monte Carlo methods (Holm and Sander, <xref ref-type="bibr" rid="B16">1993</xref>; Kawabata, <xref ref-type="bibr" rid="B21">2003</xref>) and combinatorial optimization (Shindyalov and Bourne, <xref ref-type="bibr" rid="B36">1998</xref>; Bertolazzi et al., <xref ref-type="bibr" rid="B4">2010</xref>).</p>
<p>Other approaches align two protein structures with no information about the location of potentially conserved binding sites. Among these we have ProBiS (Konc and Janezic, <xref ref-type="bibr" rid="B22">2010</xref>, <xref ref-type="bibr" rid="B23">2012</xref>) which solves the problem by making use of a maximum clique algorithm; SMAP (Xie and Bourne, <xref ref-type="bibr" rid="B45">2008</xref>; Xie et al., <xref ref-type="bibr" rid="B46">2009</xref>), a software package which includes a method to characterize protein structures using geometric potential, and a sequence order independent profile-profile alignment tool (SOIPPA); DaliLite (Holm and Park, <xref ref-type="bibr" rid="B15">2000</xref>; Holm et al., <xref ref-type="bibr" rid="B14">2008</xref>) which computes optimal and suboptimal structural alignments, by optimizing a scoring function given by the weighted sum of similarities of intramolecular distances.</p>
<p>To establish alignment quality several similarity scoring schemes exist. Among these the most used are the Root Mean Square Deviation (RMSD) of the optimal rigid-body superposition (Kabsch, <xref ref-type="bibr" rid="B20">1976</xref>), the distance map similarity (Holm and Sander, <xref ref-type="bibr" rid="B16">1993</xref>) and the Contact Map Overlap (CMO) (Lancia et al., <xref ref-type="bibr" rid="B24">2001</xref>; Di Lena et al., <xref ref-type="bibr" rid="B7">2010</xref>).</p>
<p>In this paper, we present PROPOSAL (PROteins comparison through Probabilistic Optimal Structure local ALignment), a stochastic algorithm for local alignment of 3D protein structures. PROPOSAL relies on Markov Chain Monte Carlo in connection to a Gibbs Sampling strategy which has been applied to solve the multiple local sequence alignment problem (Lawrence et al., <xref ref-type="bibr" rid="B25">1993</xref>) as well as the multiple protein-protein interaction network alignment (Micale et al., <xref ref-type="bibr" rid="B27">2014</xref>).</p>
<p>We tested PROPOSAL on the J. Skolnick benchmark (Lancia et al., <xref ref-type="bibr" rid="B24">2001</xref>) and a set of known manually curated motifs, taken from the Catalytic Site Atlas (CSA) (Furnham et al., <xref ref-type="bibr" rid="B9">2013</xref>). Results clearly show that the algorithm is accurate and identifies many highly conserved substructures and known functional binding sites across many proteins. Given its non-deterministic nature, it is very fast even on a large number of structures. We also compared PROPOSAL with two state-of-the-art systems, ProBiS (Konc and Janezic, <xref ref-type="bibr" rid="B22">2010</xref>, <xref ref-type="bibr" rid="B23">2012</xref>) and SMAP (Xie and Bourne, <xref ref-type="bibr" rid="B45">2008</xref>; Xie et al., <xref ref-type="bibr" rid="B46">2009</xref>) in solving a pairwise local alignment problem. The results clearly show that PROPOSAL can align proteins with different degrees of sequence similarity in reasonable time, with the highest precision.</p>
<p>A Java 2D standalone application with the integration of JMol for 3D visualization of alignments is freely available for download at the following URL <ext-link ext-link-type="uri" xlink:href="http://ferrolab.dmi.unict.it/proposal/proposal.html">http://ferrolab.dmi.unict.it/proposal/proposal.html</ext-link>, along with a command line version of PROPOSAL and a complete user documentation.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and methods</title>
<p>Let <italic>P</italic> &#x0003D; {<italic>P</italic><sub>1</sub>, <italic>P</italic><sub>2</sub>,&#x02026;, <italic>P<sub>N</sub></italic>} be a set of <italic>N</italic> 3D protein structures and let <italic>w</italic> be a positive integer, with <italic>w</italic> &#x02265; 3. The goal of local protein structure alignment is to find <italic>N</italic> substructures of <italic>w</italic> residues, one for each protein, such that structure similarity is locally maximized. We call <italic>w</italic> the size of the local alignment.</p>
<p>PROPOSAL is able to find approximate solutions to the problem through a greedy and stochastic technique, by using a Markov Chain Monte Carlo (MCMC) in connection to Gibbs sampling (Geman and Geman, <xref ref-type="bibr" rid="B10">1984</xref>).</p>
<p>PROPOSAL is an iterative method. In each iteration it tries to find an optimal local alignment of size <italic>w</italic>, starting from a predefined triplet of amino acids (e.g., AAC), called fingerprint. Since the fingerprint changes at every iteration and there are 20 amino acids, the maximum number of iterations performed by PROPOSAL has been set to 20<sup>3</sup> &#x0003D; 8000.</p>
<p>A single iteration consists of three phases. In the first one, called <italic>bootstrap phase</italic>, Gibbs sampling is used to find a local alignment of <italic>N</italic> substructures (one for each protein), composed by 3 residues each. These substructures, called <italic>seeds</italic> of the alignment, represent small potential conserved motifs shared by the <italic>N</italic> 3D protein structures.</p>
<p>The quality of the seeds alignment is quantified according to a proper scoring scheme based on the average Root Mean Square Deviation (RMSD) between the aligned substructures, considering all possible pairs of proteins. The best alignments will have the lowest average RMSD.</p>
<p>Let <italic>C</italic> &#x0003D; {<italic>C</italic><sub>1</sub>, <italic>C</italic><sub>2</sub>,&#x02026;, <italic>C<sub>k</sub></italic>} and <italic>D</italic> &#x0003D; {<italic>D</italic><sub>1</sub>, <italic>D</italic><sub>2</sub>,&#x02026;, <italic>D<sub>k</sub></italic>} be two sets of residues. The RMSD between <italic>C</italic> and <italic>D</italic> is given by the root mean-square deviation of the C&#x003B1; atomic coordinates of residues, after performing an optimal rigid body superposition. The RMSD is defined as follows:
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>D</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>w</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>z</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:msqrt></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <italic>C<sub>ix</sub></italic>, <italic>C<sub>iy</sub></italic>, <italic>C<sub>iz</sub></italic> and <italic>D<sub>ix</sub></italic>, <italic>D<sub>iy</sub></italic>, <italic>D<sub>iz</sub></italic> are the 3D coordinates of residues <italic>C<sub>i</sub></italic> and <italic>D<sub>i</sub></italic>, respectively, after the superposition.</p>
<p>We computed RMSDs using QCP (Liu et al., <xref ref-type="bibr" rid="B26">2010</xref>), a recently proposed algorithm that finds the optimal alignment by using a Newton-Raphson quaternion-based method.</p>
<p>Each seeds alignment having average RMSD &#x02264;1 &#x000C5; is extended by adding one residue at the time, until we reach an alignment of <italic>N</italic> motifs, each having <italic>w</italic> residues. The <italic>extension phase</italic> is performed stochastically through Gibbs sampling.</p>
<p>Finally, in the third phase, the alignment is refined, by iteratively removing and adding single nodes to each aligned motif. This <italic>refinement phase</italic> produces the final local alignment (see Figure <xref ref-type="fig" rid="F1">1</xref>). The set of local alignments is then filtered by removing highly overlapping alignments.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Outline of PROPOSAL</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0001.tif"/>
</fig>
<sec>
<title>2.1. Bootstrap phase</title>
<p>The goal of the bootstrap phase is to find an optimal alignment of small substructures of 3 nodes, called seeds. A seed is represented by a triple of residues <italic>A</italic> &#x0003D; (<italic>A</italic><sub>1</sub>, <italic>A</italic><sub>2</sub>, <italic>A</italic><sub>3</sub>).</p>
<p>The set of possible candidates for the initial alignment consists of all seeds satisfying the following conditions:</p>
<list list-type="alpha-lower">
<list-item><p>All residues within the seed are at distance less than 10 &#x000C5;;</p></list-item>
<list-item><p>The residue symbols in the triple must match the fingerprint of the corresponding iteration of PROPOSAL.</p></list-item>
</list>
<p>Feasible candidates are seeds satisfying both (a) and (b). If one or more proteins contain no feasible candidates, the search stops and a new iteration of PROPOSAL begins.</p>
<p>Once a set of suitable candidates is generated, PROPOSAL tries to construct an optimal initial alignment through Gibbs Sampling on top of a Monte Carlo Markov Chain (MCMC). In the MCMC each state represents an alignment of <italic>N</italic> seeds, one from each protein structure.</p>
<p>Starting from a random initial state (i.e., a random initial alignment), the sampling method iteratively performs a transition from a state of the chain to another, by replacing a randomly chosen seed of the current alignment with a feasible candidate of the same protein, according to a properly defined transition probability distribution. When Gibbs sampling stops, the last current alignment is returned. If the sampling procedure is iterated a sufficient number of times, it converges to a local optimum solution.</p>
<p>A critical task is to establish when Gibbs sampling can be stopped. The procedure ends when the alignment of seeds does not change. Let <inline-formula><mml:math id="M10"><mml:mrow><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> be the probability that a protein structure is never selected in <italic>i</italic> consecutive iterations of Gibbs sampling. The number of iterations of Gibbs sampling is determined by the following parameter <italic>k</italic>:
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mi>max</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mi>k</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>:</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:msup><mml:mi>k</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup></mml:msup><mml:mo>&#x0003E;</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
where &#x003B1; is a user-defined probability threshold. If the alignment does not change for <italic>k</italic> consecutive iterations, the Gibbs sampling is stopped. The lower is &#x003B1;, the more precise and slower will be the sampling procedure. Therefore, &#x003B1; represents a trade-off between accuracy and speed of PROPOSAL.</p>
<p>The transition probability is defined on top of a similarity score, based on the distances between the residues of the seeds. Let <italic>Dist(R</italic><sub>1</sub>, <italic>R</italic><sub>2</sub>) be the euclidean distance between the two residues <italic>R</italic><sub>1</sub> and <italic>R</italic><sub>2</sub> of a 3D structure. Given two seeds <italic>A</italic> &#x0003D; (<italic>A</italic><sub>1</sub>, <italic>A</italic><sub>2</sub>, <italic>A</italic><sub>3</sub>) and <italic>B</italic> &#x0003D; (<italic>B</italic><sub>1</sub>, <italic>B</italic><sub>2</sub>, <italic>B</italic><sub>3</sub>), we define the pairwise distance between <italic>A</italic> and <italic>B</italic> as:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mrow><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mn>3</mml:mn></mml:munderover><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>Now, let <italic>S</italic> &#x0003D; {<italic>S</italic><sub>1</sub>, <italic>S</italic><sub>2</sub>, &#x02026;, <italic>S<sub>N</sub></italic>} be the alignment of seeds at the <italic>i</italic>-th iteration of Gibbs sampling and suppose we have to replace <italic>S<sub>j</sub></italic> by a feasible candidate <italic>X</italic> of the same protein. The similarity score of <italic>X</italic> is defined as the inverse of the product of all pair distances between <italic>X</italic> and the seeds of the current alignment (except <italic>S<sub>j</sub></italic>):</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mrow><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mrow><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>The transition probability is then computed by normalizing such similarity scores in [0,1].</p>
</sec>
<sec>
<title>2.2. Extension of the alignment</title>
<p>In the extension phase, the alignment of residues is extended up to size <italic>w</italic> by iteratively adding <italic>N</italic> residues to the current alignment, one from each protein.</p>
<p>Suppose that we start from a substructure alignment of size <italic>w</italic>&#x02032;&#x0003C;<italic>w</italic>. The goal is to find an optimal alignment of <italic>N</italic> residues <italic>R</italic><sub>1</sub>, <italic>R</italic><sub>2</sub>,&#x02026;, <italic>R<sub>N</sub></italic>, one for each protein, and add such residues to the substructure alignment. <italic>R<sub>i</sub></italic> must be at distance at most equal to 10 &#x000C5; from residues in the corresponding current aligned substructure. At the end of this process, the alignment size will be <italic>w</italic>&#x02032;&#x0002B;1.</p>
<p>Each extension step is performed through a Gibbs sampling strategy similar to the one used during the bootstrap phase. In the extension phase the similarity score takes into account:</p>
<list list-type="alpha-lower">
<list-item><p>The symbol of a candidate residue;</p></list-item>
<list-item><p>The distances between the candidate residue and the aligned residues of the same structure.</p></list-item>
</list>
<p>Let <italic>SA</italic> &#x0003D; {<italic>SA</italic><sub>1</sub>, <italic>SA</italic><sub>2</sub>, &#x02026;, <italic>SA<sub>N</sub></italic>} be the current alignment of size <italic>w</italic>&#x02032;, where each <italic>SA<sub>i</sub></italic> &#x0003D; {<italic>R</italic><sub><italic>i</italic>,1</sub>, <italic>R</italic><sub><italic>i</italic>,2</sub>,&#x02026;, <italic>R</italic><sub><italic>i,w</italic>&#x02032;</sub>} is a set of residues, and let <italic>A<sup>m</sup></italic> &#x0003D; {<italic>A<sup>m</sup></italic><sub>1</sub>, <italic>A<sup>m</sup></italic><sub>2</sub>,&#x02026;, <italic>A<sup>m</sup><sub>N</sub></italic>} be the alignment of candidate residues at the generic <italic>m</italic>-th iteration of Gibbs sampling.</p>
<p>Suppose we replace <italic>A<sup>m</sup><sub>j</sub></italic> with a candidate residue <italic>X</italic>. First, we define a similarity score, <italic>SimSymb(X)</italic> which evaluates the similarity between the symbol of <italic>X</italic> and the symbols of residues in <italic>A<sup>m</sup></italic> (except <italic>A<sup>m</sup><sub>j</sub></italic>):
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mrow><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>S</mml:mi><mml:mi>y</mml:mi><mml:mi>m</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mtext>&#x000A0;</mml:mtext></mml:mstyle><mml:mtext>SIMMATRIX</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mi>k</mml:mi><mml:mi>m</mml:mi></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
where S<sc>im</sc>M<sc>atrix</sc>(<italic>X,A<sup>m</sup><sub>k</sub></italic>) is a BLOSUM similarity score between <italic>X</italic> and <italic>A<sup>m</sup><sub>k</sub></italic>.</p>
<p>Then, we define another similarity function, <italic>SimDist(X)</italic>:
<disp-formula id="E6"><label>(6)</label><mml:math id="M6"><mml:mrow><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mrow><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mi>k</mml:mi><mml:mi>m</mml:mi></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
where <italic>PairDist(X)</italic> is defined as follow:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M7"><mml:mrow><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mi>A</mml:mi><mml:mi>k</mml:mi><mml:mi>m</mml:mi></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup></mml:munderover><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>A</mml:mi><mml:mi>k</mml:mi><mml:mi>m</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>Finally, the similarity score of <italic>X</italic>, <italic>Sim(X)</italic>, is the product of <italic>SimSymb(X)</italic> and <italic>SimDist(X)</italic>. Again, the transition probability of <italic>X</italic> is the normalization of <italic>Sim(X)</italic> in [0,1].</p>
</sec>
<sec>
<title>2.3. Refinement phase</title>
<p>The goal of the refinement phase is to increase the quality of the discovered alignment. An alignment of residues is iteratively removed from the current alignment of substructures and replaced with a new one. The number of iterations is bounded by a user-defined parameter called <italic>IterRefine</italic>. According to our experimental results (Section 3.2), a good accuracy can be achieved with relatively small values of such parameter (e.g., 10).</p>
<p>The replaced alignment is chosen according to a <italic>Badness</italic> function defined below.</p>
<p>Let <italic>SA</italic> &#x0003D; {<italic>SA</italic><sub>1</sub>, <italic>SA</italic><sub>2</sub>, &#x02026;, <italic>SA<sub>N</sub></italic>} be the final alignment of size <italic>w</italic>, where each <italic>SA<sub>i</sub></italic> &#x0003D; {<italic>R</italic><sub><italic>i</italic>,1</sub>, <italic>R</italic><sub><italic>i</italic>,2</sub>,&#x02026;, <italic>R<sub>i,w</sub></italic>} is a set of residues. We can view the alignment <italic>SA</italic> as a matrix <italic>R[N,w]</italic>, where each column represents an alignment of residues and <italic>R[i,j]</italic> is the <italic>j</italic>-th aligned residue of the <italic>i</italic>-th substructure. Our final goal is to compute a <italic>Badness</italic> score for each column of <italic>SA</italic> and remove the column that maximizes the <italic>Badness</italic> score function from <italic>SA</italic>.</p>
<p>First, given two aligned residues <italic>R[i,k]</italic> and <italic>R[j,k]</italic>, we define the function <italic>PairDistAligned</italic> as follows:</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M8"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>A</mml:mi><mml:mi>l</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>]</mml:mo><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>]</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mi>h</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mi>w</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>]</mml:mo><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>]</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mrow><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>]</mml:mo><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mo stretchy='false'>]</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The <italic>Badness</italic> of a generic column <italic>k</italic> is:</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M9"><mml:mrow><mml:mi>B</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mi>P</mml:mi></mml:mstyle><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>A</mml:mi><mml:mi>l</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>]</mml:mo><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy='false'>]</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Once the column with the highest <italic>Badness</italic> score is removed, a new single extension step is performed through the Gibbs sampling procedure described in Section 2.2.</p>
</sec>
<sec>
<title>2.4. Filtering overlapping alignments</title>
<p>The alignments produced by PROPOSAL are sorted according to the average RMSD across all possible pairs of structures. This sorted list is finally post-processed to filter highly overlapping alignments. Let <italic>SA<sup>i</sup></italic> &#x0003D; {<italic>SA<sup>i</sup></italic><sub>1</sub>, <italic>SA<sup>i</sup></italic><sub>2</sub>,&#x02026;, <italic>SA<sup>i</sup><sub>N</sub></italic>} be the local alignment of rank <italic>i</italic> in the sorted list. We define <italic>Perc(SA<sup>i</sup><sub>k</sub>)</italic> as the percentage of residues in the substructure <italic>SA<sup>i</sup><sub>k</sub></italic> observed in the previous <italic>i</italic> &#x02212; 1 alignments, and <italic>Perc(SA<sup>i</sup>)</italic> as the average value of <italic>Perc(SA<sup>i</sup><sub>k</sub>)</italic> across all the aligned substructures. If <italic>Perc(SA<sup>i</sup>)</italic> is above a given threshold <italic>Overlap</italic>, the alignment is discarded.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>3. Results</title>
<p>Three different case studies have been investigated. In the first one we analyzed the performance of our method and the effects of input parameters, using the 33 structures of Skolnick&#x00027;s dataset benchmark (Lancia et al., <xref ref-type="bibr" rid="B24">2001</xref>), a set of large protein domains which has been used in several recent studies related to structural comparison of proteins (Pulim et al., <xref ref-type="bibr" rid="B34">2008</xref>; Di Lena et al., <xref ref-type="bibr" rid="B7">2010</xref>).</p>
<p>In the second case study, we compared PROPOSAL to SMAP (Xie and Bourne, <xref ref-type="bibr" rid="B45">2008</xref>; Xie et al., <xref ref-type="bibr" rid="B46">2009</xref>) and ProBis (Konc and Janezic, <xref ref-type="bibr" rid="B22">2010</xref>, <xref ref-type="bibr" rid="B23">2012</xref>), two algorithms for local pairwise structural alignment, on a dataset of known motifs derived from the literature and taken from the Catalytic Site Atlas (CSA) (Furnham et al., <xref ref-type="bibr" rid="B9">2013</xref>).</p>
<p>In the last case study, following the work of Moll et al. (<xref ref-type="bibr" rid="B29">2011</xref>), we used a subset of these CSA motifs to test PROPOSAL as a local multiple aligner.</p>
<p>PROPOSAL has been implemented in Java 7 and all tests have been performed with an Intel Core i7-2670 2.2 Ghz CPU with 8 GB of RAM.</p>
<p>PROPOSAL needs a few parameters to be set:</p>
<list list-type="bullet">
<list-item><p><italic>w</italic>: the size of the final alignments;</p></list-item>
<list-item><p>&#x003B1;: the probability which determines the number of Gibbs Sampling iterations in the bootstrap and extension phases;</p></list-item>
<list-item><p><italic>IterRefine</italic>: the number of iterations during the refinement phase;</p></list-item>
<list-item><p><italic>AvgOverlap</italic>: a threshold bounding the average overlapping percentage of alignments.</p></list-item>
</list>
<p>The default values of parameters have been experimentally established as follows:</p>
<list list-type="bullet">
<list-item><p>&#x003B1; &#x0003D; 0.05;</p></list-item>
<list-item><p><italic>IterRefine</italic> &#x0003D; 10.</p></list-item>
</list>
<p>Both &#x003B1; and <italic>IterRefine</italic> parameters have been chosen to guarantee an optimal trade-off between speed and accuracy.</p>
<sec>
<title>3.1. Tests on Skolnick dataset</title>
<p>The dataset is divided into four categories, depending on similarity degree and sequence length. Table <xref ref-type="table" rid="T1">1</xref> synthesizes the features of each family with respect to the number of proteins, the average sequence length and the average similarity.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Skolnick&#x00027;s dataset families</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top"><bold>Family</bold></th>
<th align="center" valign="top"><bold>Proteins</bold></th>
<th align="center" valign="top"><bold>Avg_seq_length</bold></th>
<th align="center" valign="top"><bold>Avg_similarity (%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Flavodoxin-like fold CheY-related</td>
<td align="center" valign="top">8</td>
<td align="center" valign="top">124</td>
<td align="center" valign="top">15&#x02013;30</td>
</tr>
<tr>
<td align="left" valign="top">Ferritin</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">170</td>
<td align="center" valign="top">7&#x02013;70</td>
</tr>
<tr>
<td align="left" valign="top">Plastocyanin</td>
<td align="center" valign="top">8</td>
<td align="center" valign="top">99</td>
<td align="center" valign="top">35&#x02013;90</td>
</tr>
<tr>
<td align="left" valign="top">TIM Barrel</td>
<td align="center" valign="top">11</td>
<td align="center" valign="top">250</td>
<td align="center" valign="top">30&#x02013;90</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To evaluate the reliability of PROPOSAL we considered different values of <italic>w</italic>, depending on proteins sequence similarity. We chose <italic>w</italic> &#x0003D; 10 for the CheY-related proteins&#x00027; family, <italic>w</italic> &#x0003D; 12 for the Ferritin family, <italic>w</italic> &#x0003D; 15 for the Plastocyanin proteins, and <italic>w</italic> &#x0003D; 20 for the TIM Barrel family. In all experiments, we set <italic>AvgOverlap &#x0003D; 50%</italic> to reduce the final set of alignments. Table <xref ref-type="table" rid="T2">2</xref> gives the running time of PROPOSAL and the RMSD of the best alignments.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>Running time and LRMSD of the best alignments on Skolnick&#x00027;s dataset</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top"><bold>Family</bold></th>
<th align="center" valign="top"><bold>W</bold></th>
<th align="center" valign="top"><bold>Running_time (s)</bold></th>
<th align="center" valign="top"><bold>Best_RMSD (&#x000C5;)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Flavodoxin-like fold CheY-related</td>
<td align="center" valign="top">10</td>
<td align="center" valign="top">33.95</td>
<td align="center" valign="top">1.539</td>
</tr>
<tr>
<td align="left" valign="top">Ferritin</td>
<td align="center" valign="top">12</td>
<td align="center" valign="top">46.102</td>
<td align="center" valign="top">0.428</td>
</tr>
<tr>
<td align="left" valign="top">Plastocyanin</td>
<td align="center" valign="top">15</td>
<td align="center" valign="top">135.936</td>
<td align="center" valign="top">0.575</td>
</tr>
<tr>
<td align="left" valign="top">TIM Barrel</td>
<td align="center" valign="top">20</td>
<td align="center" valign="top">1542.929</td>
<td align="center" valign="top">0.428</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The best alignments have been generated through the 2D alignment of their contact maps. A protein contact map is a 2D matrix storing the distances between all possible amino acids pairs of a 3D protein structure. It is represented as a graph where nodes are amino acids and edges connect nodes having a distance less than a fixed cut-off, usually 7&#x02013;12 &#x000C5;. A contact map is a signature of a protein structure with respect to its 3D coordinates (Vassura et al., <xref ref-type="bibr" rid="B40">2008</xref>).</p>
<p>Figures <xref ref-type="fig" rid="F2">2</xref>&#x02013;<xref ref-type="fig" rid="F5">5</xref> show the 10 &#x000C5; cut-off contact map alignments. It can be seen that a good structural correspondence between proteins is guaranteed even when the value of <italic>w</italic> increases. In most cases the absence of few edges or the presence of new links between nodes are due to pairs of residues whose distance is very close to the cut-off.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Best alignment of the 8 Flavodoxin-like fold CheY-related contact maps with <italic>W</italic> &#x0003D; 10</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0002.tif"/>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Best alignment of the 6 Ferritin contact maps with <italic>W</italic> &#x0003D; 12</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0003.tif"/>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Best alignment of the 8 Plastocyanin contact maps with <italic>W</italic> &#x0003D; 15</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0004.tif"/>
</fig>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>Best alignment of the 11 TIM Barrel contact maps with <italic>W</italic> &#x0003D; 20</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0005.tif"/>
</fig>
<p>We analyzed label similarity of the four best alignments, by building the sequence logos (Crooks et al., <xref ref-type="bibr" rid="B6">2004</xref>) of mapped residues (Figures <xref ref-type="fig" rid="F6">6</xref>&#x02013;<xref ref-type="fig" rid="F9">9</xref>). Each position contains a graphical representation of the frequencies of residues in that position within the final mapping. Amino acids are represented with different colors, depending on their chemical properties: basic residues (K, R, H) are colored in blue, the acidic ones (D, E) in purple, the neutral ones (Q, N, P, S, C) in green, the hydrophobic ones (V, L, I, W, F, M, Y) in orange, and the remaining ones (G, T, A) in red.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>Sequence logo of mapped residues in the best alignment of the 8 Flavodoxin-like fold CheY-related proteins</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0006.tif"/>
</fig>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p><bold>Sequence logo of mapped residues in the best alignment of the 6 Ferritin proteins</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0007.tif"/>
</fig>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p><bold>Sequence logo of mapped residues in the best alignment of the 8 Plastocyanin proteins</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0008.tif"/>
</fig>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p><bold>Sequence logo of mapped residues in the best alignment of the 11 TIM Barrel proteins</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0009.tif"/>
</fig>
<p>Sequence logos reflect the average sequence similarity of proteins within each family: Plastocyanin and TIM Barrel proteins show the best label correspondence. The alignment of Ferritin proteins is quite interesting, since the structural similarity is high, the average LRMSD is very low (0.428 &#x000C5;, Table <xref ref-type="table" rid="T2">2</xref>), but the corresponding sequence logo shows remarkable dissimilarities between mapped residues. This is an example confirming that protein structural similarity and protein sequence similarity are not always related.</p>
<p>Next, we investigated the effects of varying PROPOSAL parameters. The default values are <italic>N</italic> &#x0003D; 6, <italic>w</italic> &#x0003D; 15, &#x003B1; &#x0003D; 0.05, and <italic>IterRefine &#x0003D; 10</italic>. First, we analyse how parameters influence the running time (Figure <xref ref-type="fig" rid="F10">10</xref>) by varying one parameter and leaving the rest unchanged. Figure <xref ref-type="fig" rid="F10">10A</xref> depicts the running time varying the number <italic>N</italic> of structures. Figure <xref ref-type="fig" rid="F10">10B</xref> deals with the effect of varying <italic>w</italic> from 1 to 20. Figure <xref ref-type="fig" rid="F10">10C</xref> reports the PROPOSAL behavior with &#x003B1; ranging from 0.01 to 0.30. Finally, in Figure <xref ref-type="fig" rid="F10">10D</xref> different values of <italic>IterRefine</italic> (from 1 to 30) are considered. As expected, when <italic>N</italic> and <italic>w</italic> grow and <italic>alpha</italic> decreases, the running time goes up. Such a trend is even more evident in the TIM Barrel family which has the highest average protein sequence length and similarity.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p><bold>Running time of PROPOSAL as a function of (A) number of proteins (N); (B) w; (C) &#x003B1;; (D) iterRefine</bold>. Default values: <italic>N</italic> &#x0003D; 6, <italic>w</italic> &#x0003D; 15, &#x003B1; &#x0003D; 0.05, <italic>IterRefine &#x0003D; 10</italic>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0010.tif"/>
</fig>
<p>Figure <xref ref-type="fig" rid="F11">11</xref> shows the influence of &#x003B1; and <italic>IterRefine</italic> on the global accuracy of PROPOSAL. We measured the average RMSD over all the computed alignments. In Figure <xref ref-type="fig" rid="F11">11A</xref> <italic>alpha</italic> varies from 0.01 to 0.30 and <italic>IterRefine</italic> is set to 10, while in Figure <xref ref-type="fig" rid="F11">11B</xref> <italic>iterRefine</italic> varies from 1 to 30 and &#x003B1; is set to 0.05. Default values (<italic>w</italic> &#x0003D; 15 and <italic>N</italic> &#x0003D; 6) were assigned. As expected, the best performance of our method are obtained with low values of &#x003B1; and high values of <italic>IterRefine</italic>. However, if we also consider the influence of such parameters on running time (in particular the <italic>IterRefine</italic> parameter), the best trade-off between speed and accuracy can be achieved with 0.01 &#x02264; &#x003B1; &#x02264; 0.1 and <italic>IterRefine</italic> &#x0003D; 10.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p><bold>Average LRMSD of the alignments returned by PROPOSAL on varying (A) &#x003B1; and (B) IterRefine</bold>. Default values: <italic>N</italic> &#x0003D; 6, <italic>w</italic> &#x0003D; 15, &#x003B1; &#x0003D; 0.05, <italic>IterRefine &#x0003D; 10</italic>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0011.tif"/>
</fig>
</sec>
<sec>
<title>3.2. Tests on pairwise alignments</title>
<p>As far as we are concerned, PROPOSAL is the first algorithm proposed for multiple local alignments of protein structures. On the other hand a few existing tools can solve the pairwise local structure alignment problem (Holm and Park, <xref ref-type="bibr" rid="B15">2000</xref>; Xie and Bourne, <xref ref-type="bibr" rid="B45">2008</xref>; Konc and Janezic, <xref ref-type="bibr" rid="B22">2010</xref>). According to the experiment results reported in Konc and Janezic (<xref ref-type="bibr" rid="B22">2010</xref>) and Moll et al. (<xref ref-type="bibr" rid="B29">2011</xref>), ProBiS and SMAP seem to be the best existing pairwise local structure alignment methods.</p>
<p>In order to compare PROPOSAL with ProBiS and SMAP, we run all the algorithms on a properly defined dataset of pairwise alignments.</p>
<p>First of all, we collected a set of 346 non-redundant literature derived small query motifs (having 4-6 residues), taken from CSA (Catalytic Site Atlas) (Furnham et al., <xref ref-type="bibr" rid="B9">2013</xref>). CSA is a database of hand-annotated entries, containing enzyme active sites (i.e., a set of residues thought to be directly involved in the reaction catalyzed by an enzyme). The complete list of these motifs and the corresponding PDB structures is available in the supplementary material Table <xref ref-type="supplementary-material" rid="SM1">S1</xref>.</p>
<p>Then, we used LabelHash, which is the state-of-the-art tool for substructure matching, to search for a match between each query motif and the rest of the dataset. Finally we selected all matches with <italic>RMSD</italic> &#x02264; 1.5 &#x000C5;. This resulted in a final reference dataset of 6380 pairwise alignments (the dataset is available in the supplementary material Table <xref ref-type="supplementary-material" rid="SM2">S2</xref>).</p>
<p>The dataset has many highly dissimilar pairs of proteins. In order to analyse the sequence similarity between the 6380 couples of proteins with the lowest RMSD alignments, we run BLAST and considered the percentage of residues with positive matches in the shortest sequence. We call <italic>PPos</italic> the latter measure. Among the 6380 couples, 3835 (&#x02243; 60%) have <italic>PPos</italic> &#x0003C; 5% and 6173 (&#x02243; 97%) have <italic>PPos</italic> &#x0003C; 15%.</p>
<p>For each couple, we run PROPOSAL with no overlapping filter (<italic>AvgOverlap</italic> &#x0003D; 100%) and <italic>w</italic> equals to the number of residues of the query motif. We ran SMAP and ProBiS with default parameter values.</p>
<p>We analyzed the performance of the three methods on the 6380 pairwise alignments, by taking into account three parameters:</p>
<list list-type="bullet">
<list-item><p>Query motif coverage (QMC): the highest percentage of residues of the query motif which are present in an alignment returned by each algorithm;</p></list-item>
<list-item><p>RMSD of the alignment with highest QMC;</p></list-item>
<list-item><p>Running time;</p></list-item>
</list>
<p>We analyzed the average values of these parameters by considering different ranges of <italic>PPos</italic> similarities. All results are plotted in Figure <xref ref-type="fig" rid="F12">12</xref>.</p>
<fig id="F12" position="float">
<label>Figure 12</label>
<caption><p><bold>Average (A) highest QMC, (B) corresponding alignment RMSD and (C) running time of PROPOSAL, ProBiS, and SMAP for different ranges of <italic>PPos</italic> similarity values</bold>.</p></caption>
<graphic xlink:href="fgene-05-00302-g0012.tif"/>
</fig>
<p>PROPOSAL exhibits the highest QMC for highly dissimilar proteins, while for medium and high <italic>PPos</italic> similarities ProBiS is the best method (Figure <xref ref-type="fig" rid="F12">12A</xref>). However, in all the tested instances PROPOSAL yields the lowest average RMSD with respect to both ProBiS and SMAP. Furthermore, the difference between RMSDs tends to increase as long as <italic>PPos</italic> decreases (Figure <xref ref-type="fig" rid="F12">12B</xref>). We also notice that the average QMC and RMSD of PROPOSAL alignments are approximately constant for all values of <italic>PPos</italic>, while ProBiS and SMAP seem to be quite sensitive to protein similarity.</p>
<p>Finally, ProBiS is by far the fastest algorithm for all possible ranges of <italic>PPos</italic> similarity values (Figure <xref ref-type="fig" rid="F12">12C</xref>), while PROPOSAL and SMAP have similar running times (except for 80% &#x02264; <italic>PPos</italic> &#x02264; 100%, where SMAP is faster). It is worth noting that our method has been designed for solving the multiple alignment problem, while ProBiS and SMAP have been efficiently implemented for comparing pairs of protein structures. Moreover, PROPOSAL and SMAP have been implemented in Java, while ProBiS has been written in C&#x0002B;&#x0002B;. Interestingly, our method is faster when <italic>PPos</italic> ranges from 10 to 30%. However, when proteins are very dissimilar, the convergence of Gibbs sampling in the bootstrap phase may be slower. On the other hand, when proteins are very similar PROPOSAL performs more extension and refinement phases, producing more feasible alignments. A similar trend holds for ProBiS, where the best performance is obtained when <italic>PPos</italic> ranges from 15 to 60%.</p>
</sec>
<sec>
<title>3.3. Tests on multiple alignments</title>
<p>In the last case study, we run PROPOSAL on a different set of 172 motifs taken from CSA to test the capability of our method to detect known conserved binding sites in the multiple case (see supplementary material Table <xref ref-type="supplementary-material" rid="SM3">S3</xref>).</p>
<p>The dataset has been built by selecting literature derived motifs of proteins belonging to fully qualified EC classes with at most 25 elements. This resulted in a final set of 172 motifs, spanning 162 distinct EC classes.</p>
<p>EC class (Webb, <xref ref-type="bibr" rid="B42">1993</xref>) is a code having the format &#x0201C;EC&#x0201D; followed by four numbers separated by periods. It denotes the type of reaction catalyzed by an enzyme. An EC class is fully qualified if all four numbers are specified (e.g., 1.1.1.149 is fully qualified, while 1.1.1 or 1.1 are not).</p>
<p>For each EC family, we run PROPOSAL on the set of protein structures belonging to that family. We fixed <italic>w</italic> equals to the number of residues in the corresponding motif and <italic>AvgOverlap</italic> &#x0003D; 100% (i.e., no overlapping filter). The remaining parameters were set up to the default values.</p>
<p>We filtered out all alignments with average RMSD above 1 &#x000C5;, taking for each query motif the local alignment with maximum QMC. In case of ties on QMC, the alignment with minimum average RMSD was chosen. PROPOSAL successfully completed all the alignments in about 29 h, with an average QMC of 50.08% and average running time of 10 min. In Table <xref ref-type="table" rid="T3">3</xref> we report motifs with highest QMC and the RMSD of the corresponding alignment (see supplementary material Table <xref ref-type="supplementary-material" rid="SM3">S3</xref> for the complete list of results). Results clearly show the ability to identify known motifs from scratch. Out of 172 motifs, 24 have QMC &#x02265; 75% and 126 have specificity &#x02265; 50%.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p><bold>CSA motifs with QMC &#x02265; 75%</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top"><bold>Protein</bold></th>
<th align="left" valign="top"><bold>EC_class</bold></th>
<th align="center" valign="top"><bold>Motif</bold></th>
<th align="center" valign="top"><bold>QMC (%)</bold></th>
<th align="left" valign="top"><bold>Avg_RMSD</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1YBV</td>
<td align="left" valign="top">1.1.1.252</td>
<td align="left" valign="top">[138, 182, 164, 178]</td>
<td align="center" valign="top">100</td>
<td align="left" valign="top">0.06814209</td>
</tr>
<tr>
<td align="left" valign="top">1QRR</td>
<td align="left" valign="top">3.13.1.1</td>
<td align="left" valign="top">[183, 186, 145, 182]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.032653827</td>
</tr>
<tr>
<td align="left" valign="top">1MRQ</td>
<td align="left" valign="top">1.1.1.149</td>
<td align="left" valign="top">[50, 117, 84, 55]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.063025678</td>
</tr>
<tr>
<td align="left" valign="top">1GQ8</td>
<td align="left" valign="top">3.1.1.11</td>
<td align="left" valign="top">[136, 157, 113, 135]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.075239285</td>
</tr>
<tr>
<td align="left" valign="top">2JXR</td>
<td align="left" valign="top">3.4.23.25</td>
<td align="left" valign="top">[215, 32, 218, 33]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.088753575</td>
</tr>
<tr>
<td align="left" valign="top">1RK2</td>
<td align="left" valign="top">2.7.1.15</td>
<td align="left" valign="top">[252, 253, 255, 254]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.092735469</td>
</tr>
<tr>
<td align="left" valign="top">2PGD</td>
<td align="left" valign="top">1.1.1.44</td>
<td align="left" valign="top">[187, 190, 130, 183]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.119390475</td>
</tr>
<tr>
<td align="left" valign="top">1VAS</td>
<td align="left" valign="top">3.1.25.1</td>
<td align="left" valign="top">[22, 26, 23, 2]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.126902935</td>
</tr>
<tr>
<td align="left" valign="top">1CZF</td>
<td align="left" valign="top">3.2.1.15</td>
<td align="left" valign="top">[180, 201, 202, 223]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.15027138</td>
</tr>
<tr>
<td align="left" valign="top">1PJB</td>
<td align="left" valign="top">1.4.1.1</td>
<td align="left" valign="top">[269, 117, 95, 74]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.178174017</td>
</tr>
<tr>
<td align="left" valign="top">1RPX</td>
<td align="left" valign="top">5.1.3.1</td>
<td align="left" valign="top">[185, 43, 41, 74]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.222043962</td>
</tr>
<tr>
<td align="left" valign="top">1L1L</td>
<td align="left" valign="top">1.17.4.2</td>
<td align="left" valign="top">[119, 408, 419, 410]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.226235418</td>
</tr>
<tr>
<td align="left" valign="top">1DB3</td>
<td align="left" valign="top">4.2.1.47</td>
<td align="left" valign="top">[134, 160, 132, 156]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.252066199</td>
</tr>
<tr>
<td align="left" valign="top">1IM5</td>
<td align="left" valign="top">3.5.1.19</td>
<td align="left" valign="top">[129, 10, 133, 94]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.294630848</td>
</tr>
<tr>
<td align="left" valign="top">1ODT</td>
<td align="left" valign="top">3.1.1.41</td>
<td align="left" valign="top">[181, 269, 182, 298]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.40228864</td>
</tr>
<tr>
<td align="left" valign="top">1PVD</td>
<td align="left" valign="top">4.1.1.1</td>
<td align="left" valign="top">[28, 477, 114, 115]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.454688806</td>
</tr>
<tr>
<td align="left" valign="top">1U5U</td>
<td align="left" valign="top">4.2.1.92</td>
<td align="left" valign="top">[137, 67, 66, 193]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.613085033</td>
</tr>
<tr>
<td align="left" valign="top">1E94</td>
<td align="left" valign="top">3.4.25.2</td>
<td align="left" valign="top">[45, 33, 124, 1]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.617527201</td>
</tr>
<tr>
<td align="left" valign="top">1Z9H</td>
<td align="left" valign="top">5.3.99.3</td>
<td align="left" valign="top">[110, 113, 112, 107]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.677534589</td>
</tr>
<tr>
<td align="left" valign="top">1B66</td>
<td align="left" valign="top">4.2.3.12</td>
<td align="left" valign="top">[88, 42, 133, 89]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.7033398</td>
</tr>
<tr>
<td align="left" valign="top">2NAC</td>
<td align="left" valign="top">1.2.1.2</td>
<td align="left" valign="top">[284, 146, 313, 332]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.78435381</td>
</tr>
<tr>
<td align="left" valign="top">1QTN</td>
<td align="left" valign="top">3.4.22.61</td>
<td align="left" valign="top">[258, 360, 350, 317]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.798793943</td>
</tr>
<tr>
<td align="left" valign="top">1P4R</td>
<td align="left" valign="top">2.1.2.3</td>
<td align="left" valign="top">[431, 267, 592, 266]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.936798151</td>
</tr>
<tr>
<td align="left" valign="top">1BWZ</td>
<td align="left" valign="top">5.1.1.7</td>
<td align="left" valign="top">[217, 73, 208, 159]</td>
<td align="center" valign="top">75</td>
<td align="left" valign="top">0.941959894</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Each motif is represented as a list of residue ids of the corresponding reference protein</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>In Torrance et al. (<xref ref-type="bibr" rid="B39">2005</xref>); Moll et al. (<xref ref-type="bibr" rid="B28">2010</xref>), authors observed that the EC-class coverage of a motif has not been considered for the design of CSA. Consequently, some motifs may be not conserved across all proteins in an EC class. This may be the origin of failures of PROPOSAL on the alignment tasks with QMC &#x0003C;50%). In some cases CSA motifs could contain one or more residues with few global matches. Moreover, two motifs could match mutually exclusive sets of proteins within the corresponding EC class. These cases may cause a drastic increase of average RMSD for that specific motif. Examples of such CSA motifs are reported in Moll et al. (<xref ref-type="bibr" rid="B28">2010</xref>). In order to overcome these problems, methods like Geometric Sieving (Chen et al., <xref ref-type="bibr" rid="B5">2007</xref>) can be applied to refine a given motif and increase sensitivity while keeping high specificity values.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>4. Discussion</title>
<p>PROPOSAL is a stochastic algorithm for local alignment of 3D protein structures relying on Markov Chain Monte Carlo in connection to a Gibbs Sampling strategy. PROPOSAL is a parameter-based algorithm. In our experimental analysis on the Skolnick&#x00027;s dataset (see Section 3.1) we showed that the most critical ones are &#x003B1; and <italic>IterRefine</italic>, because these influence both speed and accuracy. The best trade-off is achieved with &#x003B1; ranging from 0.01 to 0.1 and <italic>IterRefine</italic> set to 10. Therefore, default values for the algorithm are set to &#x003B1; &#x0003D; 0.05 and <italic>IterRefine</italic> &#x0003D; 10. The running time of PROPOSAL on Skolnick&#x00027;s dataset resulted sublinear (with respect to the number of proteins, <italic>w</italic>, &#x003B1; and <italic>IterRefine</italic>) for family of proteins with low and medium similarity (CheY-related, Ferritin and Plastocyanin) and linear for highly similar and long proteins (TIM Barrel).</p>
<p>Since PROPOSAL is the first multiple structure local alignment method, we compared it with two pairwise local alignment algorithms (ProBiS and SMAP) on a dataset of couples of query motifs and target proteins (see Section 3.2). The accuracy of PROPOSAL is defined by the highest percentage of residues of the query motif which are present in an alignment returned by each algorithm (query motif coverage), together with the quality of the alignment (RMSD score).</p>
<p>PROPOSAL strongly outperforms the other methods on the quality of the alignments, independently of proteins&#x00027; similarity. Concerning the coverage, it is constant on proteins&#x00027; similarity, whereas SMAP and ProBiS have low coverage for dissimilar proteins. However, ProBiS is 5 times faster than PROPOSAL and SMAP.</p>
<p>Finally, we run PROPOSAL as a multiple aligner on a subset of the above query motifs (see Section 3.3). Once again, PROPOSAL yields high quality alignments with coverage scores comparable to those obtained in the pairwise local case. Experiments also show that PROPOSAL is a valuable alternative algorithm to both identify new motifs and refine existing ones.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack>
<p>We wish to thank anonymous reviewer for their helpful suggestions and comments. We also wish to thank Dario Veneziano for reviewing the English of the final version of the article.</p>
</ack>
<sec sec-type="supplementary-material" id="s5">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://www.frontiersin.org/journal/10.3389/fgene.2014.00302/abstract">http://www.frontiersin.org/journal/10.3389/fgene.2014.00302/abstract</ext-link></p>
<supplementary-material xlink:href="Table1.XLSX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table2.XLSX" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table3.XLSX" id="SM3" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Angaran</surname> <given-names>S.</given-names></name> <name><surname>Bock</surname> <given-names>M. E.</given-names></name> <name><surname>Garutti</surname> <given-names>C.</given-names></name> <name><surname>Guerra</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>Molloc: a web tool for the local structural alignment of molecular surfaces</article-title>. <source>Nucleic Acids Res</source>. <volume>37</volume>, <fpage>W565</fpage>&#x02013;<lpage>W570</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkp405</pub-id><pub-id pub-id-type="pmid">19465382</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ausiello</surname> <given-names>G.</given-names></name> <name><surname>Via</surname> <given-names>A.</given-names></name> <name><surname>Helmer-Citterich</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>Query3d: a new method for high-throughput analysis of functional residues in protein structures</article-title>. <source>BMC Bioinformatics</source> <volume>6</volume>(<supplement>Suppl. 4</supplement>):<fpage>S5</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-6-S4-S5</pub-id><pub-id pub-id-type="pmid">16351754</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bachar</surname> <given-names>O.</given-names></name> <name><surname>Fischer</surname> <given-names>D.</given-names></name> <name><surname>Nussinov</surname> <given-names>R.</given-names></name> <name><surname>Wolfson</surname> <given-names>H.</given-names></name></person-group> (<year>1993</year>). <article-title>A computer vision based technique for 3-d sequence-independent structural comparison of proteins</article-title>. <source>Protein Eng. Design Select</source>. <volume>6</volume>, <fpage>279</fpage>&#x02013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1093/protein/6.3.279</pub-id><pub-id pub-id-type="pmid">8506262</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bertolazzi</surname> <given-names>P.</given-names></name> <name><surname>Guerra</surname> <given-names>C.</given-names></name> <name><surname>Liuzzi</surname> <given-names>G.</given-names></name></person-group> (<year>2010</year>). <article-title>A global optimization algorithm for protein surface alignment</article-title>. <source>BMC Bioinformatics</source> <volume>11</volume>:<fpage>488</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-488</pub-id><pub-id pub-id-type="pmid">20920230</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>B. Y.</given-names></name> <name><surname>Fofanov</surname> <given-names>V. Y.</given-names></name> <name><surname>Bryant</surname> <given-names>D. H.</given-names></name> <name><surname>Dodson</surname> <given-names>B. D.</given-names></name> <name><surname>Kristensen</surname> <given-names>D. M.</given-names></name> <name><surname>Lisewski</surname> <given-names>A. M.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>The mash pipeline for protein function prediction and an algorithm for the geometric refinement of 3d motifs</article-title>. <source>J. Comput. Biol</source>. <volume>14</volume>, <fpage>791</fpage>&#x02013;<lpage>816</lpage>. <pub-id pub-id-type="doi">10.1089/cmb.2007.R017</pub-id><pub-id pub-id-type="pmid">17691895</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crooks</surname> <given-names>G. E.</given-names></name> <name><surname>Hon</surname> <given-names>G.</given-names></name> <name><surname>Chandonia</surname> <given-names>J.</given-names></name> <name><surname>Brenner</surname> <given-names>S. E.</given-names></name></person-group> (<year>2004</year>). <article-title>Weblogo: a sequence logo generator</article-title>. <source>Genome Res</source>. <volume>14</volume>, <fpage>1188</fpage>&#x02013;<lpage>1190</lpage>. <pub-id pub-id-type="doi">10.1101/gr.849004</pub-id><pub-id pub-id-type="pmid">15173120</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Di Lena</surname> <given-names>P.</given-names></name> <name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Margara</surname> <given-names>L.</given-names></name> <name><surname>Vassura</surname> <given-names>M.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2010</year>). <article-title>Fast overlapping of protein contact maps by alignment of eigenvectors</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>2250</fpage>&#x02013;<lpage>2258</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq402</pub-id><pub-id pub-id-type="pmid">20610612</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eidhammer</surname> <given-names>I.</given-names></name> <name><surname>Jonassen</surname> <given-names>I.</given-names></name> <name><surname>Taylor</surname> <given-names>W. R.</given-names></name></person-group> (<year>2000</year>). <article-title>Structure comparison and structure patterns</article-title>. <source>J. Comput. Biol</source>. <volume>7</volume>, <fpage>685</fpage>&#x02013;<lpage>716</lpage>. <pub-id pub-id-type="doi">10.1089/106652701446152</pub-id><pub-id pub-id-type="pmid">11153094</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Furnham</surname> <given-names>N.</given-names></name> <name><surname>Holliday</surname> <given-names>G. L.</given-names></name> <name><surname>De Beer</surname> <given-names>T. A. P.</given-names></name> <name><surname>Jacobsen</surname> <given-names>J. O. B.</given-names></name> <name><surname>Pearson</surname> <given-names>W. R.</given-names></name> <name><surname>Thornton</surname> <given-names>J. M.</given-names></name></person-group> (<year>2013</year>). <article-title>The catalytic site atlas 2.0: cataloging catalytic sites and residues identified in enzymes</article-title>. <source>Nucleic Acids Res</source>. <volume>42</volume>, <fpage>D485</fpage>&#x02013;<lpage>W489</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkt1243</pub-id><pub-id pub-id-type="pmid">24319146</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Geman</surname> <given-names>S.</given-names></name> <name><surname>Geman</surname> <given-names>D.</given-names></name></person-group> (<year>1984</year>). <article-title>Stochastic relaxation, gibbs distributions, and the bayesian restoration of images</article-title>. <source>IEEE Trans. Patt. Anal. Mach. Intell</source>. <volume>6</volume>, <fpage>721</fpage>&#x02013;<lpage>741</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.1984.4767596</pub-id><pub-id pub-id-type="pmid">22499653</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Goldman</surname> <given-names>D.</given-names></name> <name><surname>Istrail</surname> <given-names>S.</given-names></name> <name><surname>Papadimitriou</surname> <given-names>C. H.</given-names></name></person-group> (<year>1999</year>). <article-title>Algorithmic aspects of protein structure similarity</article-title>, in <source>Foundations of Computer Science, 1999. 40th Annual Symposium on</source> (<publisher-loc>New York, NY</publisher-loc>), <fpage>512</fpage>&#x02013;<lpage>521</lpage>. <pub-id pub-id-type="doi">10.1109/SFFCS.1999.814624</pub-id><pub-id pub-id-type="pmid">18052775</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Havranek</surname> <given-names>J.</given-names></name> <name><surname>Baker</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>Motif-directed flexible backbone design of functional interactions</article-title>. <source>Protein Sci</source>. <volume>18</volume>, <fpage>1293</fpage>&#x02013;<lpage>1305</lpage>. <pub-id pub-id-type="doi">10.1002/pro.142</pub-id><pub-id pub-id-type="pmid">19472357</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hofbauer</surname> <given-names>C.</given-names></name> <name><surname>Lohninger</surname> <given-names>H.</given-names></name> <name><surname>Aszodi</surname> <given-names>A.</given-names></name></person-group> (<year>2004</year>). <article-title>Surfcomp: a novel graph-based approach to molecular surface comparison</article-title>. <source>J. Chem. Inform. Comput. Sci</source>. <volume>44</volume>, <fpage>837</fpage>&#x02013;<lpage>847</lpage>. <pub-id pub-id-type="doi">10.1021/ci0342371</pub-id><pub-id pub-id-type="pmid">15154748</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holm</surname> <given-names>L.</given-names></name> <name><surname>Kaariainen</surname> <given-names>S.</given-names></name> <name><surname>Rosenstrom</surname> <given-names>P.</given-names></name> <name><surname>Schenkel</surname> <given-names>A.</given-names></name></person-group> (<year>2008</year>). <article-title>Searching protein structure databases with dalilite v.3</article-title>. <source>Bioinformatics</source> <volume>24</volume>, <fpage>2780</fpage>&#x02013;<lpage>2781</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btn507</pub-id><pub-id pub-id-type="pmid">18818215</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holm</surname> <given-names>L.</given-names></name> <name><surname>Park</surname> <given-names>J.</given-names></name></person-group> (<year>2000</year>). <article-title>Dalilite workbench for protein structure comparison</article-title>. <source>Bioinformatics</source> <volume>16</volume>, <fpage>566</fpage>&#x02013;<lpage>567</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/16.6.566</pub-id><pub-id pub-id-type="pmid">10980157</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holm</surname> <given-names>L.</given-names></name> <name><surname>Sander</surname> <given-names>C.</given-names></name></person-group> (<year>1993</year>). <article-title>Protein structure comparison by alignment of distance matrices</article-title>. <source>J. Mol. Biol</source>. <volume>233</volume>, <fpage>123</fpage>&#x02013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1006/jmbi.1993.1489</pub-id><pub-id pub-id-type="pmid">8377180</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Huan</surname> <given-names>J.</given-names></name> <name><surname>Bandyopadhyay</surname> <given-names>D.</given-names></name> <name><surname>Prins</surname> <given-names>J.</given-names></name> <name><surname>Snoeyink</surname> <given-names>J.</given-names></name> <name><surname>Tropsha</surname> <given-names>A.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name></person-group> (<year>2006</year>). <article-title>Distance-based identification of structure motifs in proteins using constrained frequent subgraph mining</article-title>, in <source>Computational systems Bioinformatics/Life Sciences Society. Computational Systems Bioinformatics Conference</source>, <fpage>227</fpage>&#x02013;<lpage>238</lpage>. <pub-id pub-id-type="pmid">17369641</pub-id></citation> 
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jambon</surname> <given-names>M.</given-names></name> <name><surname>Imberty</surname> <given-names>A.</given-names></name> <name><surname>Del&#x000E9;age</surname> <given-names>G.</given-names></name> <name><surname>Geourjon</surname> <given-names>C.</given-names></name></person-group> (<year>2003</year>). <article-title>A new bioinformatic approach to detect common 3d sites in protein structures</article-title>. <source>Proteins</source> <volume>52</volume>, <fpage>137</fpage>&#x02013;<lpage>145</lpage>. <pub-id pub-id-type="doi">10.1002/prot.10339</pub-id><pub-id pub-id-type="pmid">12833538</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jung</surname> <given-names>J.</given-names></name> <name><surname>Lee</surname> <given-names>B.</given-names></name></person-group> (<year>2000</year>). <article-title>Protein structure alignment using environmental profiles</article-title>. <source>Protein Eng. Design Select</source>. <volume>13</volume>, <fpage>535</fpage>&#x02013;<lpage>543</lpage>. <pub-id pub-id-type="doi">10.1093/protein/13.8.535</pub-id><pub-id pub-id-type="pmid">10964982</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kabsch</surname> <given-names>W.</given-names></name></person-group> (<year>1976</year>). <article-title>A solution for the best rotation to relate two sets of vectors</article-title>. <source>Acta Cryst</source>. <volume>32</volume>, <fpage>922</fpage>&#x02013;<lpage>923</lpage>. <pub-id pub-id-type="doi">10.1107/S0567739476001873</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kawabata</surname> <given-names>T.</given-names></name></person-group> (<year>2003</year>). <article-title>Matras: a program for protein 3d structure comparison</article-title>. <source>Nucleic Acids Res</source>. <volume>31</volume>, <fpage>3367</fpage>&#x02013;<lpage>3369</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkg581</pub-id><pub-id pub-id-type="pmid">12824329</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Konc</surname> <given-names>J.</given-names></name> <name><surname>Janezic</surname> <given-names>D.</given-names></name></person-group> (<year>2010</year>). <article-title>Probis algorithm for detection of structurally similar protein binding sites by local structural alignment</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>1160</fpage>&#x02013;<lpage>1168</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq100</pub-id><pub-id pub-id-type="pmid">20305268</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Konc</surname> <given-names>J.</given-names></name> <name><surname>Janezic</surname> <given-names>D.</given-names></name></person-group> (<year>2012</year>). <article-title>Probis-2012: web server and web services for detection of structurally similar binding sites in proteins</article-title>. <source>Nucleic Acids Res</source>. <volume>40</volume>, <fpage>W214</fpage>&#x02013;<lpage>W221</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gks435</pub-id><pub-id pub-id-type="pmid">22600737</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lancia</surname> <given-names>G.</given-names></name> <name><surname>Carr</surname> <given-names>R.</given-names></name> <name><surname>Walenz</surname> <given-names>B.</given-names></name> <name><surname>Istrail</surname> <given-names>S.</given-names></name></person-group> (<year>2001</year>). <article-title>101 optimal pdb structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem</article-title>, in <source>Proceedings of the Fifth Annual International Conference on Computational Biology</source>, RECOMB &#x00027;01 (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>193</fpage>&#x02013;<lpage>202</lpage>. <pub-id pub-id-type="doi">10.1145/369133.369199</pub-id></citation> 
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lawrence</surname> <given-names>C.</given-names></name> <name><surname>Altschul</surname> <given-names>S.</given-names></name> <name><surname>Boguski</surname> <given-names>M.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Neuwald</surname> <given-names>A.</given-names></name> <name><surname>Wootton</surname> <given-names>J.</given-names></name></person-group> (<year>1993</year>). <article-title>Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment</article-title>. <source>Science</source> <volume>262</volume>, <fpage>208</fpage>&#x02013;<lpage>214</lpage>. <pub-id pub-id-type="doi">10.1126/science.8211139</pub-id><pub-id pub-id-type="pmid">8211139</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>P.</given-names></name> <name><surname>Agrafiotis</surname> <given-names>D.</given-names></name> <name><surname>Theobald</surname> <given-names>D.</given-names></name></person-group> (<year>2010</year>). <article-title>Fast determination of the optimal rotational matrix for macromolecular superpositions</article-title>. <source>J. Comput. Chem</source>. <volume>31</volume>, <fpage>1561</fpage>&#x02013;<lpage>1563</lpage>. <pub-id pub-id-type="doi">10.1002/jcc.21439</pub-id><pub-id pub-id-type="pmid">20017124</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Micale</surname> <given-names>G.</given-names></name> <name><surname>Pulvirenti</surname> <given-names>A.</given-names></name> <name><surname>Giugno</surname> <given-names>R.</given-names></name> <name><surname>Ferro</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Gasoline: a greedy and stochastic algorithm for optimal local multiple alignment of interaction networks</article-title>. <source>PLoS ONE</source> <volume>9</volume>:<fpage>e98750</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0098750</pub-id><pub-id pub-id-type="pmid">24911103</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moll</surname> <given-names>M.</given-names></name> <name><surname>Bryant</surname> <given-names>D. H.</given-names></name> <name><surname>Kavraki</surname> <given-names>L. E.</given-names></name></person-group> (<year>2010</year>). <article-title>The labelhash algorithm for substructure matching</article-title>. <source>BMC Bioinformatics</source> <volume>11</volume>:<fpage>555</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-555</pub-id><pub-id pub-id-type="pmid">21070651</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moll</surname> <given-names>M.</given-names></name> <name><surname>Bryant</surname> <given-names>D. H.</given-names></name> <name><surname>Kavraki</surname> <given-names>L. E.</given-names></name></person-group> (<year>2011</year>). <article-title>The labelhash server and tools for substructure-based functional annotation</article-title>. <source>Bioinformatics</source> <volume>27</volume>, <fpage>2161</fpage>&#x02013;<lpage>2162</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btr343</pub-id><pub-id pub-id-type="pmid">21659320</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Najmanovich</surname> <given-names>R.</given-names></name> <name><surname>Kurbatova</surname> <given-names>N.</given-names></name> <name><surname>Thornton</surname> <given-names>J.</given-names></name></person-group> (<year>2008</year>). <article-title>Detection of 3d atomic similarities and their use in the discrimination of small molecule protein-binding sites</article-title>. <source>Bioinformatics</source> <volume>24</volume>, <fpage>i105</fpage>&#x02013;<lpage>i111</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btn263</pub-id><pub-id pub-id-type="pmid">18689810</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Orengo</surname> <given-names>C. A.</given-names></name> <name><surname>Taylor</surname> <given-names>W. R.</given-names></name></person-group> (<year>1996</year>). <article-title>Ssap: sequential structure alignment program for protein structure comparison</article-title>. <source>Methods Enzymol</source>. <volume>266</volume>, <fpage>617</fpage>&#x02013;<lpage>635</lpage>. <pub-id pub-id-type="doi">10.1016/S0076-6879(96)66038-8</pub-id><pub-id pub-id-type="pmid">8743709</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peleg</surname> <given-names>A. S.</given-names></name> <name><surname>Shatsky</surname> <given-names>M.</given-names></name> <name><surname>Nussinov</surname> <given-names>R.</given-names></name> <name><surname>Wolfson</surname> <given-names>H. J.</given-names></name></person-group> (<year>2007</year>). <article-title>Spatial chemical conservation of hot spot interactions in protein-protein complexes</article-title>. <source>BMC Biol</source>. <volume>5</volume>:<fpage>43</fpage>. <pub-id pub-id-type="doi">10.1186/1741-7007-5-43</pub-id><pub-id pub-id-type="pmid">17925020</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peleg</surname> <given-names>A. S.</given-names></name> <name><surname>Shatsky</surname> <given-names>M.</given-names></name> <name><surname>Nussinov</surname> <given-names>R.</given-names></name> <name><surname>Wolfson</surname> <given-names>H. J.</given-names></name></person-group> (<year>2008</year>). <article-title>Multibind and mappis: webservers for multiple alignment of protein 3d-binding sites and their interactions</article-title>. <source>Nucleic Acids Res</source>. <volume>36</volume>, <fpage>W260</fpage>&#x02013;<lpage>W264</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkn185</pub-id><pub-id pub-id-type="pmid">18467424</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pulim</surname> <given-names>V.</given-names></name> <name><surname>Berger</surname> <given-names>B.</given-names></name> <name><surname>Bienkowska</surname> <given-names>J.</given-names></name></person-group> (<year>2008</year>). <article-title>Optimal contact map alignment of protein-protein interfaces</article-title>. <source>Bioinformatics</source> <volume>24</volume>, <fpage>2324</fpage>&#x02013;<lpage>2328</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btn432</pub-id><pub-id pub-id-type="pmid">18710876</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shatsky</surname> <given-names>M.</given-names></name> <name><surname>Peleg</surname> <given-names>A. S.</given-names></name> <name><surname>Nussinov</surname> <given-names>R.</given-names></name> <name><surname>Wolfson</surname> <given-names>H. J.</given-names></name></person-group> (<year>2006</year>). <article-title>The multiple common point set problem and its application to molecule binding pattern detection</article-title>. <source>J. Comput. Biol</source>. <volume>13</volume>, <fpage>407</fpage>&#x02013;<lpage>428</lpage>. <pub-id pub-id-type="doi">10.1089/cmb.2006.13.407</pub-id><pub-id pub-id-type="pmid">16597249</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shindyalov</surname> <given-names>I. N.</given-names></name> <name><surname>Bourne</surname> <given-names>P. E.</given-names></name></person-group> (<year>1998</year>). <article-title>Protein structure alignment by incremental combinatorial extension (ce) of the optimal path</article-title>. <source>Protein Eng. Design Select</source>. <volume>11</volume>, <fpage>739</fpage>&#x02013;<lpage>747</lpage>. <pub-id pub-id-type="doi">10.1093/protein/11.9.739</pub-id><pub-id pub-id-type="pmid">9796821</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spriggs</surname> <given-names>R. V.</given-names></name> <name><surname>Artymiuk</surname> <given-names>P. J.</given-names></name> <name><surname>Willett</surname> <given-names>P.</given-names></name></person-group> (<year>2003</year>). <article-title>Searching for patterns of amino acids in 3d protein structures</article-title>. <source>J. Chem. Inf. Comput. Sci</source>. <volume>43</volume>, <fpage>412</fpage>&#x02013;<lpage>421</lpage>. <pub-id pub-id-type="doi">10.1021/ci0255984</pub-id><pub-id pub-id-type="pmid">12653503</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stark</surname> <given-names>A.</given-names></name> <name><surname>Russell</surname> <given-names>R. B.</given-names></name></person-group> (<year>2003</year>). <article-title>Annotation in three dimensions. pints: patterns in non-homologous tertiary structures</article-title>. <source>Nucleic Acids Res</source>. <volume>31</volume>, <fpage>3341</fpage>&#x02013;<lpage>3344</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkg506</pub-id><pub-id pub-id-type="pmid">12824322</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Torrance</surname> <given-names>J. W.</given-names></name> <name><surname>Bartlett</surname> <given-names>G. J.</given-names></name> <name><surname>Porter</surname> <given-names>C. T.</given-names></name> <name><surname>Thornton</surname> <given-names>J. M.</given-names></name></person-group> (<year>2005</year>). <article-title>Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families</article-title>. <source>J. Mol. Biol</source>. <volume>347</volume>, <fpage>565</fpage>&#x02013;<lpage>581</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmb.2005.01.044</pub-id><pub-id pub-id-type="pmid">15755451</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vassura</surname> <given-names>M.</given-names></name> <name><surname>Margara</surname> <given-names>L.</given-names></name> <name><surname>Di Lena</surname> <given-names>P.</given-names></name> <name><surname>Medri</surname> <given-names>F.</given-names></name> <name><surname>Fariselli</surname> <given-names>P.</given-names></name> <name><surname>Casadio</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>Reconstruction of 3d structures from protein contact maps</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinf</source>. <volume>5</volume>, <fpage>357</fpage>&#x02013;<lpage>367</lpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2008.27</pub-id><pub-id pub-id-type="pmid">18670040</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wallace</surname> <given-names>A. C.</given-names></name> <name><surname>Borkakoti</surname> <given-names>N.</given-names></name> <name><surname>Thornton</surname> <given-names>J. M.</given-names></name></person-group> (<year>1997</year>). <article-title>Tess: a geometric hashing algorithm for deriving 3d coordinate templates for searching structural databases. application to enzyme active sites</article-title>. <source>Protein Sci</source>. <volume>6</volume>, <fpage>2308</fpage>&#x02013;<lpage>2323</lpage>. <pub-id pub-id-type="doi">10.1002/pro.5560061104</pub-id><pub-id pub-id-type="pmid">9385633</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webb</surname> <given-names>E. C.</given-names></name></person-group> (<year>1993</year>). <article-title>Enzyme nomenclature: recommendations 1992 of the nomenclature committee of the international union of biochemistry and molecular biology</article-title>. <source>Biochem. Educ</source>. <volume>21</volume>, <fpage>102</fpage>. <pub-id pub-id-type="doi">10.1016/0307-4412(93)90058-8</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weskamp</surname> <given-names>N.</given-names></name> <name><surname>Hullermeier</surname> <given-names>E.</given-names></name> <name><surname>Kuhn</surname> <given-names>D.</given-names></name> <name><surname>Klebe</surname> <given-names>G.</given-names></name></person-group> (<year>2007</year>). <article-title>Multiple graph alignment for the structural analysis of protein active sites</article-title>. <source>IEEE ACM Trans. Comput. Biol. Bioinform</source>. <volume>4</volume>, <fpage>310</fpage>&#x02013;<lpage>320</lpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2007.358301</pub-id><pub-id pub-id-type="pmid">17473323</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wohlers</surname> <given-names>I.</given-names></name> <name><surname>Petzold</surname> <given-names>L.</given-names></name> <name><surname>Domingues</surname> <given-names>F. S.</given-names></name> <name><surname>Klau</surname> <given-names>G. W.</given-names></name></person-group> (<year>2009</year>). <article-title>Paul: protein structural alignment using integer linear programming and lagrangian relaxation</article-title>. <source>BMC Bioinformatics</source> <volume>10</volume>(<supplement>Suppl. 13</supplement>):<fpage>P2</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-10-S13-P2</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>L.</given-names></name> <name><surname>Bourne</surname> <given-names>P. E.</given-names></name></person-group> (<year>2008</year>). <article-title>Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>105</volume>, <fpage>5441</fpage>&#x02013;<lpage>5446</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0704422105</pub-id><pub-id pub-id-type="pmid">18385384</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>L.</given-names></name> <name><surname>Xie</surname> <given-names>L.</given-names></name> <name><surname>Bourne</surname> <given-names>P. E.</given-names></name></person-group> (<year>2009</year>). <article-title>A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery</article-title>. <source>Bioinformatics</source> <volume>25</volume>, <fpage>i305</fpage>&#x02013;<lpage>i312</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp220</pub-id><pub-id pub-id-type="pmid">19478004</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ye</surname> <given-names>Y.</given-names></name> <name><surname>Godzik</surname> <given-names>A.</given-names></name></person-group> (<year>2003</year>). <article-title>Flexible structure alignment by chaining aligned fragment pairs allowing twists</article-title>. <source>Bioinformatics</source> <volume>19</volume>, <fpage>ii246</fpage>&#x02013;<lpage>ii255</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btg1086</pub-id><pub-id pub-id-type="pmid">14534198</pub-id></citation>
</ref>
</ref-list>
</back>
</article>
