<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Mol. Biosci.</journal-id>
<journal-title>Frontiers in Molecular Biosciences</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Mol. Biosci.</abbrev-journal-title>
<issn pub-type="epub">2296-889X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">647915</article-id>
<article-id pub-id-type="doi">10.3389/fmolb.2021.647915</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Molecular Biosciences</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Protein Docking Model Evaluation by Graph Neural Networks</article-title>
<alt-title alt-title-type="left-running-head">Wang et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Docking Model Assessment with GNNs</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Xiao</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1185978/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Flannery</surname>
<given-names>Sean T.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Kihara</surname>
<given-names>Daisuke</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/211526/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Department of Computer Science, Purdue University, <addr-line>West Lafayette</addr-line>, <addr-line>IN</addr-line>, <country>United&#x20;States</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Department of Biological Sciences, Purdue University, <addr-line>West Lafayette</addr-line>, <addr-line>IN</addr-line>, <country>United&#x20;States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/689768/overview">Elif Ozkirimli</ext-link>, Roche, Switzerland</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1189260/overview">Manon R&#xe9;au</ext-link>, Utrecht University, Netherlands</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1189516/overview">Ilya Vakser</ext-link>, University of Kansas, United&#x20;States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/754964/overview">Martin Zacharias</ext-link>, Technical University of Munich, Germany</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Daisuke Kihara, <email>dkihara@purdue.edu</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Biological Modeling and Simulation, a section of the journal Frontiers in Molecular Biosciences</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>25</day>
<month>05</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>8</volume>
<elocation-id>647915</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>12</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>04</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Wang, Flannery and Kihara.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Wang, Flannery and Kihara</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Physical interactions of proteins play key functional roles in many important cellular processes. To understand molecular mechanisms of such functions, it is crucial to determine the structure of protein complexes. To complement experimental approaches, which usually take a considerable amount of time and resources, various computational methods have been developed for predicting the structures of protein complexes. In computational modeling, one of the challenges is to identify near-native structures from a large pool of generated models. Here, we developed a deep learning&#x2013;based approach named Graph Neural Network&#x2013;based DOcking decoy eValuation scorE (GNN-DOVE). To evaluate a protein docking model, GNN-DOVE extracts the interface area and represents it as a graph. The chemical properties of atoms and the inter-atom distances are used as features of nodes and edges in the graph, respectively. GNN-DOVE was trained, validated, and tested on docking models in the Dockground database and further tested on a combined dataset of Dockground and ZDOCK benchmark as well as a CAPRI scoring dataset. GNN-DOVE performed better than existing methods, including DOVE, which is our previous development that uses a convolutional neural network on voxelized structure models.</p>
</abstract>
<kwd-group>
<kwd>protein docking</kwd>
<kwd>docking model evaluation</kwd>
<kwd>graph neural networks</kwd>
<kwd>deep learning</kwd>
<kwd>protein structure prediction</kwd>
</kwd-group>
<contract-num rid="cn001">R01GM123055</contract-num>
<contract-num rid="cn002">DMS1614777 CMMI1825941 MCB1925643 DBI2003635</contract-num>
<contract-sponsor id="cn001">National Institutes of Health<named-content content-type="fundref-id">10.13039/100000002</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">National Science Foundation<named-content content-type="fundref-id">10.13039/100000001</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Experimentally determined protein structures provide fundamental information about the physicochemical nature of the biological function of protein complexes. With the recent advances in cryo-electron microscopy, the number of experimentally determined protein complex structures has been increasing rapidly. However, experimental methods are costly in terms of money and time. To aid the experimental efforts, computational modeling approaches for protein complex structures, often referred to as protein docking (<xref ref-type="bibr" rid="B1">Aderinwale et&#x20;al., 2020</xref>), have been extensively studied over the past two decades.</p>
<p>Protein docking methods aim to build the overall quaternary structure of a protein complex from the tertiary structure information of individual chains. Similar to other protein structure modeling methods, protein docking can also be divided into two main categories: template-based methods (<xref ref-type="bibr" rid="B55">Tuncbag et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B4">Anishchenko et&#x20;al., 2015</xref>), which use a known structure as a scaffold of modeling, and <italic>ab initio</italic> methods, which assemble individual structures and score generated models to choose most plausible ones. In <italic>ab initio</italic> methods, various approaches were used for molecular structure representations (<xref ref-type="bibr" rid="B57">Venkatraman et&#x20;al., 2009</xref>; <xref ref-type="bibr" rid="B47">Pierce et&#x20;al., 2011</xref>). These include docking conformational searches, such as fast Fourier transform (<xref ref-type="bibr" rid="B23">Katchalski-Katzir et&#x20;al., 1992</xref>; <xref ref-type="bibr" rid="B41">Padhorny et&#x20;al., 2016</xref>), geometric hashing (<xref ref-type="bibr" rid="B10">Fischer et&#x20;al., 1995</xref>; <xref ref-type="bibr" rid="B57">Venkatraman et&#x20;al., 2009</xref>), and particle swarm optimization (<xref ref-type="bibr" rid="B36">Moal and Bates, 2010</xref>), as well as considering protein flexibility (<xref ref-type="bibr" rid="B16">Gray et&#x20;al., 2003</xref>; <xref ref-type="bibr" rid="B40">Oliwa and Shen, 2015</xref>). The development of new methods aims to extend and surpass the capabilities of simple pairwise docking, such as multichain docking (<xref ref-type="bibr" rid="B50">Schneidman&#x2010;Duhovny et&#x20;al., 2005</xref>; <xref ref-type="bibr" rid="B8">Esquivel&#x2010;Rodr&#xed;guez et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B48">Ritchie and Grudinin, 2016</xref>), peptide&#x2013;protein docking (<xref ref-type="bibr" rid="B28">Kurcinski et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B3">Alam et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B29">Kurcinski et&#x20;al., 2020</xref>), docking with disordered proteins (<xref ref-type="bibr" rid="B42">Peterson et&#x20;al., 2017</xref>), docking order prediction (<xref ref-type="bibr" rid="B44">Peterson et&#x20;al., 2018a</xref>; <xref ref-type="bibr" rid="B43">Peterson et&#x20;al., 2018b</xref>), and docking for cryo-EM maps (<xref ref-type="bibr" rid="B7">Esquivel-Rodr&#xed;guez and Kihara, 2012</xref>; <xref ref-type="bibr" rid="B56">van Zundert et&#x20;al., 2015</xref>). Researchers have also applied recent advances in deep learning to further boost docking performance (<xref ref-type="bibr" rid="B2">Akbal-Delibas et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B5">Degiacomi, 2019</xref>; <xref ref-type="bibr" rid="B11">Gainza et&#x20;al., 2020</xref>).</p>
<p>Although substantial improvements have been made in <italic>ab initio</italic> protein docking, selecting near-native (i.e.,&#x20;correct) models out of a large number of produced models, which are often called decoys, is still challenging. The difficulty is partly due to a substantial imbalance in the number of near-native models and incorrect decoys in a generated decoy pool. The accuracy of scoring decoys certainly determines the overall performance of protein docking, and thus, there is active development of scoring functions (<xref ref-type="bibr" rid="B37">Moal et&#x20;al., 2013</xref>) for docking models. Recognizing the importance of scoring, the Critical Assessment of PRediction of Interactions (CAPRI) (<xref ref-type="bibr" rid="B31">Lensink et&#x20;al., 2018</xref>), which is the community-based protein docking prediction experiment, has arranged a specific category of evaluating scoring methods, where participants are asked to select 10 plausible decoys from thousands of decoys provided by the organizers. Over the last two decades, various approaches have been developed for scoring decoys. The main categories include physics-based potentials (<xref ref-type="bibr" rid="B2">Akbal-Delibas et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B5">Degiacomi, 2019</xref>; <xref ref-type="bibr" rid="B11">Gainza et&#x20;al., 2020</xref>), scoring based on interface shape (<xref ref-type="bibr" rid="B2">Akbal-Delibas et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B27">Kingsley et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B5">Degiacomi, 2019</xref>; <xref ref-type="bibr" rid="B11">Gainza et&#x20;al., 2020</xref>), knowledge-based statistical potentials (<xref ref-type="bibr" rid="B34">Lu et&#x20;al., 2003</xref>; <xref ref-type="bibr" rid="B18">Huang and Zou, 2008</xref>), machine learning methods (<xref ref-type="bibr" rid="B9">Fink et&#x20;al., 2011</xref>), evolutionary profiles of interface residues (<xref ref-type="bibr" rid="B38">Nadaradjane et&#x20;al., 2018</xref>), and deep learning methods using interface structures (<xref ref-type="bibr" rid="B59">Wang et&#x20;al., 2019</xref>).</p>
<p>In our previous work, we developed a model selection method for protein docking, that is, DOVE (<xref ref-type="bibr" rid="B59">Wang et&#x20;al., 2019</xref>), which uses a convolutional deep neural network (CNN) as the core of its architecture. DOVE captures atoms and interaction energies of atoms located at the interface of a docking model using a cube of 20<sup>3</sup> or 40<sup>3</sup>&#xa0;&#xc5;<sup>3</sup> and judges if the model is correct or incorrect according to the CAPRI criteria (<xref ref-type="bibr" rid="B20">Janin et&#x20;al., 2003</xref>). We showed that DOVE performed better than existing methods. However, DOVE has a critical limitation&#x2014;since it captures an interface with a fixed-size cube, only a part of the interface is captured when the interface region is too large. This often caused an erroneous prediction. In addition, a 3D grid representation of an interface often includes voxels of void space where no atoms exist inside, which is not efficient in memory usage and may even be detrimental for accurate prediction. In this work, we address this limitation of DOVE by applying a graph neural network (GNN) (<xref ref-type="bibr" rid="B49">Scarselli et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B61">Wu et&#x20;al., 2020</xref>), which has previously been successful in representing molecular properties (<xref ref-type="bibr" rid="B6">Duvenaud et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B51">Smith et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B32">Lim et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B65">Zubatyuk et&#x20;al., 2019</xref>). Using a GNN allows all atoms at an interface of any size to be captured in a more flexible manner. The GNN representation of the interface also is rotationally invariant, meaning arbitrary rotations of a candidate model are accounted for when training and predicting docking scores. To the&#x20;best of our knowledge, this is the first method that applies GNNs&#x20;to the protein docking problem. Compared to DOVE and other existing methods, GNN-DOVE demonstrated substantial improvement in a benchmark&#x20;study.</p>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>Materials and Methods</title>
<p>We first introduce the datasets used for training and testing GNN-DOVE. Subsequently, we introduce the graph neural network architecture and the training process of GNN-DOVE.</p>
<sec id="s2-1">
<title>Docking Decoy Datasets</title>
<p>To train and test GNN-DOVE, we first used the Dockground dataset 1.0 (available at <ext-link ext-link-type="uri" xlink:href="http://dockground.compbio.ku.edu/downloads/unbound/decoy/decoys1.0.zip">http://dockground.compbio.ku.edu/downloads/unbound/decoy/decoys1.0.zip</ext-link>) (<xref ref-type="bibr" rid="B33">Liu et&#x20;al., 2008</xref>). Docking decoys in this dataset were built by Gramm-X (<xref ref-type="bibr" rid="B54">Tovchigrechko and Vakser, 2005</xref>). The dataset includes 58 target complexes, each with averages of 9.83 correct and 98.5 incorrect decoys. A decoy was considered as correct following the CAPRI criteria (<xref ref-type="bibr" rid="B31">Lensink et&#x20;al., 2018</xref>), which consider interface root mean square deviation (iRMSD), ligand RMSD (lRMSD), and the fraction of native contacts (fnat). The iRMSD is the C&#x3b1; RMSD of interface residues with respect to the native structure. Interface residues in a complex are defined as all the residues within 10.0&#xa0;&#xc5; from any residues of the other subunit. lRMSD is the C&#x3b1; RMSD of ligands when receptors are superimposed, and fnat is the fraction of contacting residue pairs, that is, residue pairs with any heavy atom pairs within 5.0&#xa0;&#xc5;, that exist in the native structure.</p>
<p>To remove redundancy, we grouped the 58 complexes using sequence alignment and TM-align (<xref ref-type="bibr" rid="B64">Zhang and Skolnick, 2004</xref>). Two complexes were assigned to the same group if at least one pair of proteins from the two complexes had a TM-score of over 0.5 and sequence identity of 30% or higher. This resulted in 29 groups (<xref ref-type="table" rid="T1">Table&#x20;1</xref>). In <xref ref-type="table" rid="T1">Table&#x20;1</xref>, complexes (PDB IDs) of the same group are shown in lower case in a parenthesis followed by the PDB ID of the representative. These groups were split into four subgroups to perform four-fold cross-validation, where three subsets were used for training, while one testing subset was used for testing the accuracy of the model. Thus, by cross-validation, we have four models tested on four independent testing sets. Among the training set, we used 80% of the complexes (i.e.,&#x20;unique dimers) for training a model and the remaining 20% of the complexes as a validation set, which was used to determine the best hyper-parameter set for training. In the results, the accuracy of targets when treated in the testing set was reported. To have a fair comparison with DOVE (<xref ref-type="bibr" rid="B59">Wang et&#x20;al., 2019</xref>), DOVE was also newly trained and tested using this protocol.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Dockground dataset splits for training and testing GNN.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Fold</th>
<th align="center">PDB ID</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">1</td>
<td align="left">1A2K, 1E96 (1he1, 1he8, 1wq1), 1F6M, 1MA9 (2btf), 1G20, 1KU6, 1T6G, 1UGH, 1YVB, 2CKH, 3PRO</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">1AKJ (1p7q, 2bnq), 1DFJ, 1NBF (1r4m, 1xd3, 2bkr), 1GPW, 1HXY, 1U7F, 1UEX, 1ZY8, 2GOO, 1EWY</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">1AVW (1<italic>b</italic>th, 1bui, 1cho, 1ezu, 1ook, 1oph, 1ppf, 1tx6, 1xx9, 2fi4, 2kai, 1r0r, 2sni, 3sic)</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">1BVN (1tmq), 1F51, 1FM9, 1A2Y (1g6v, 1gpq, 1jps, 1wej, 1l9b, 1s6v), 1W1I, 2A5T, 3FAP</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>There are in total 29 representative targets shown in the upper case; targets in the lower case in a parenthesis indicate that they belong to the same&#x20;group.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Subsequently, we further trained and validated the GNN-DOVE network with a combined dataset of Dockground (ver 1.0) and ZDOCK (ver 4.0) (<xref ref-type="bibr" rid="B19">Hwang et&#x20;al., 2010</xref>), which includes 58 target complexes from Dockground and 120 target complexes from ZDOCK. ZDOCK has 110 more targets, but they were discarded because either GOAP (<xref ref-type="bibr" rid="B64">Zhou and Skolnick, 2011</xref>) or ITScore (<xref ref-type="bibr" rid="B18">Huang and Zou, 2008</xref>) failed to process them, or fnat could not be computed due to inconsistency of the sequence in the structures provided in the ZDOCK dataset from the native complex structure in PDB. The same criteria mentioned above were used to group the targets into 71 groups. Among them, we used 45 groups for training, 11 groups for validation, and 15 groups (19 complexes) for testing. Since a decoy set for each target in ZDOCK is much larger (around 54,000) than Dockground, we reduced the number of ZDOCK decoys for a target to 400. Up to 200 correct decoys (i.e.,&#x20;decoys with an acceptable or higher CAPRI quality) were selected if available, including at most 50&#x20;high-quality decoys, at most 50&#x20;medium-quality decoys, and the rest were selected from acceptable quality decoys. Then, the remaining 400 decoys were filled with negative decoys. One-third of negative decoys were selected from those with an iRMSD less than 7&#xa0;&#xc5;, another third came from those with an iRMSD between 7 and 10&#xa0;&#xc5;, and the rest came from those with ones with an iRMSD over 10&#xa0;&#xc5;.</p>
<p>Finally, we tested GNN-DOVE on decoy sets of 13 targets in the CAPRI Score_set (<xref ref-type="bibr" rid="B30">Lensink and Wodak, 2014</xref>), which consists of 13 scoring targets from the CAPRI round 13 to round 26 (<xref ref-type="bibr" rid="B21">Janin, 2010</xref>; <xref ref-type="bibr" rid="B22">Janin, 2013</xref>). Each decoy set included 500 to 2,000 models generated using different methods by CAPRI participants.</p>
</sec>
<sec id="s2-2">
<title>The GNN-DOVE Algorithm</title>
<p>In this section, we describe GNN-DOVE, which uses the graph neural network. The GNN-DOVE algorithm is inspired by a recent work in drug&#x2013;target interactions (<xref ref-type="bibr" rid="B32">Lim et&#x20;al., 2019</xref>), which designed a two-graph representation for capturing intermolecular interactions for protein&#x2013;ligand interactions. We will first explain how the 3D structural information of a protein&#x2013;complex interface is embedded as a graph. Then, we describe how we used a graph attention mechanism to focus on the intermolecular interaction between a receptor and a ligand protein. The overall protocol is illustrated in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>. For an input protein docking decoy, the interface region is identified as a set of residues located within 10.0&#xa0;&#xc5; of any residues of the other protein. A residue&#x2013;residue distance is defined as the shortest distance among any heavy atom pairs across the two residues. Using the extracted interface region, two graphs are built representing two types of interactions: the graph <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> describes heavy atoms at the interface region, which only considers the covalent bonds between atoms of interface residues within each subunit as edges. Another graph <inline-formula id="inf2">
<mml:math id="m2">
<mml:mrow>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> connects both covalent (thus includes <inline-formula id="inf3">
<mml:math id="m3">
<mml:mrow>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>) and non-covalent residue interaction as edges, where a non-covalent atom pair is defined as those which are closer than 10.0&#xa0;&#xc5; of each other. Both graphs will be processed by a graph neural network (GNN) to output a score, which is a probability that the docking decoy has a CAPRI acceptable quality (thus making higher scores better).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Framework of GNN-DOVE. GNN-DOVE extracts the interface region of protein complex and further reconstructs graph with/without intermolecular interactions as input, then outputs the probability that indicates if the input structure is acceptable or not. <bold>(A)</bold> Overall logical steps of the pipeline. <bold>(B)</bold> Architecture of the GNN network with the gated graph attention mechanism.</p>
</caption>
<graphic xlink:href="fmolb-08-647915-g001.tif"/>
</fig>
</sec>
<sec id="s2-3">
<title>Building Graphs</title>
<p>A key feature of this work is the graph representation of an interface region of a complex model. Graph <italic>G</italic> is defined by <italic>G &#x3d;</italic> (<italic>V</italic>, <italic>E</italic>, and <italic>A</italic>), where <italic>V</italic> denotes the node set, <italic>E</italic> is a set of edges, and <italic>A</italic> is the adjacency matrix, which numerically represents the connectivity of the graph. For a graph <italic>G</italic> with <italic>N</italic> nodes, the adjacency matrix <italic>A</italic> has a dimension of <italic>N&#x2a;N</italic>, where <inline-formula id="inf4">
<mml:math id="m4">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> if the <italic>i-</italic>th node and the <italic>j-</italic>th node are connected, and <inline-formula id="inf5">
<mml:math id="m5">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> otherwise. The adjacency matrix <inline-formula id="inf6">
<mml:math id="m6">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> for graph <inline-formula id="inf7">
<mml:math id="m7">
<mml:mrow>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> describes covalent bonds at the interface and thus defined as follows:<disp-formula id="e1">
<mml:math id="m8">
<mml:mrow>
<mml:msubsup>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#xa0;</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>m</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>m</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mi>y</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi mathvariant="italic">cov</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>&#xa0;</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>The matrix <inline-formula id="inf8">
<mml:math id="m9">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> for <inline-formula id="inf9">
<mml:math id="m10">
<mml:mrow>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> describes both covalent bonds and non-covalent interactions between atoms within 10.0&#xa0;&#xc5; to each other. It is defined as follows:<disp-formula id="e2">
<mml:math id="m11">
<mml:mrow>
<mml:msubsup>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>10</mml:mn>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>&#xa0;</mml:mo>
<mml:mi mathvariant="normal">&#xc5;</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>10</mml:mn>
<mml:mo>&#xa0;</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="normal">&#xc5;</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>w</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>where <inline-formula id="inf10">
<mml:math id="m12">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> denotes the distance between the <italic>i-</italic>th and the <italic>j-</italic>th atoms. <inline-formula id="inf11">
<mml:math id="m13">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> and <inline-formula id="inf12">
<mml:math id="m14">
<mml:mi>&#x3c3;</mml:mi>
</mml:math>
</inline-formula> are learnable parameters, whose initial values are 0.0 and 1.0, respectively. The formula <inline-formula id="inf13">
<mml:math id="m15">
<mml:mrow>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> decays as the distance increases between&#x20;atoms.</p>
<p>Compared to the previous voxel representation used in DOVE, the graph representation encodes the distance information more flexibly and naturally. Note that the representation is rotationally invariant and any size of interaction regions can be taken into analysis. Also, memory usage is more efficient as void spaces are not represented as is needed for the voxel representation.</p>
<p>As for the node features in the graph, we considered the physicochemical properties of atoms. We used the same features as used in previous works (<xref ref-type="bibr" rid="B32">Lim et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B53">Torng and Altman, 2019</xref>) as shown in <xref ref-type="table" rid="T2">Table&#x20;2</xref>. Thus, the length of a feature vector of a node from <xref ref-type="table" rid="T2">Table&#x20;2</xref> was 23 (&#x3d;5 &#x2b; 6&#x2b;5 &#x2b; 6&#x2b;1), which was embedded by a one-layer fully connected (FC) network into 140 features.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Atom features.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Features</th>
<th align="center">Representation</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Atom type</td>
<td align="center">C, N, O, S, H (one hot)</td>
</tr>
<tr>
<td align="left">The degree (connections) of atom</td>
<td align="center">0, 1, 2, 3, 4, 5 (one hot)</td>
</tr>
<tr>
<td align="left">The number of connected hydrogen atoms</td>
<td align="center">0, 1, 2, 3, 4 (one hot)</td>
</tr>
<tr>
<td align="left">The number of implicit valence electrons</td>
<td align="center">0, 1, 2, 3, 4, 5 (one hot)</td>
</tr>
<tr>
<td align="left">Aromatic</td>
<td align="center">0 or 1</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2-4">
<title>Attention and Gate-Augmented Mechanism</title>
<p>The constructed graphs are used as the input to the GNN. More formally, graphs are the adjacency matrix A<sup>1</sup> and A<sup>2</sup>, and the node features, <inline-formula id="inf14">
<mml:math id="m16">
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#xa0;</mml:mo>
<mml:mo>&#x22ef;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> with <inline-formula id="inf15">
<mml:math id="m17">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>F</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, where F is the dimension of the node feature.</p>
<p>We first explain the attention mechanism of our GNN. With the input graph of <inline-formula id="inf16">
<mml:math id="m18">
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, the pure graph attention coefficient is defined in <xref ref-type="disp-formula" rid="e3">Eq. 3</xref>, which denotes the relative importance between the <italic>i-</italic>th and the <italic>j-</italic>th node:<disp-formula id="e3">
<mml:math id="m19">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
<mml:mi mathvariant="italic">&#x3a4;</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>E</mml:mi>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
<mml:mi mathvariant="italic">&#x3a4;</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>E</mml:mi>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:math>
<label>(3)</label>
</disp-formula>where <inline-formula id="inf17">
<mml:math id="m20">
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf18">
<mml:math id="m21">
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> are the transformed feature representations defined by <inline-formula id="inf19">
<mml:math id="m22">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>W</mml:mi>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf20">
<mml:math id="m23">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>W</mml:mi>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>. <inline-formula id="inf21">
<mml:math id="m24">
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>F</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> are learnable matrices in the GNN. <inline-formula id="inf22">
<mml:math id="m25">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf23">
<mml:math id="m26">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> become identical to satisfy the symmetrical property of the graph by adding <inline-formula id="inf24">
<mml:math id="m27">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
<mml:mi mathvariant="italic">&#x3a4;</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>E</mml:mi>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
<mml:mi mathvariant="italic">&#x3a4;</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf25">
<mml:math id="m28">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
<mml:mi mathvariant="italic">&#x3a4;</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>E</mml:mi>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>. The coefficient will only be computed for <italic>i</italic> and <italic>j</italic> where <inline-formula id="inf26">
<mml:math id="m29">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>Attention coefficients will also be computed for elements in the adjacency matrices. They are formulated in the following form for the element (<italic>i</italic>, <italic>j</italic>):<disp-formula id="e4">
<mml:math id="m30">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mfrac>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf27">
<mml:math id="m31">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the normalized attention coefficient for the <italic>i-</italic>th and the <italic>j-</italic>th node pair, <inline-formula id="inf28">
<mml:math id="m32">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the symmetrical graph attention coefficient computed in <xref ref-type="disp-formula" rid="e3">Eq. 3</xref>, and <inline-formula id="inf29">
<mml:math id="m33">
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the set of neighbors of the <italic>i-</italic>th node that includes interacting nodes <italic>j</italic> where <inline-formula id="inf30">
<mml:math id="m34">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. The purpose of <xref ref-type="disp-formula" rid="e4">Eq. 4</xref> is to consider both the physical structure of the interaction, <italic>A</italic>
<sub>
<italic>ij</italic>
</sub>, and the normalized attention coefficient, e<sub>ij</sub>, to define the attention.</p>
<p>Based on the attention mechanism, the new node feature of each node is updated by considering its neighboring nodes, which is a linear combination of the neighboring node features with the final attention coefficient <inline-formula id="inf31">
<mml:math id="m35">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>:<disp-formula id="e5">
<mml:math id="m36">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:math>
<label>(5)</label>
</disp-formula>Furthermore, the gate mechanism is further applied to update the node feature since it is known to significantly boost the performance of GNN (<xref ref-type="bibr" rid="B63">Zhang et&#x20;al., 2018</xref>). The basic idea is similar to that of ResNet (<xref ref-type="bibr" rid="B17">He et&#x20;al., 2016</xref>), where the residual connection from the input helps to avoid information loss, alleviating the gradient collapse problem of the conventional backpropagation. The gated graph attention can be viewed as a linear combination of <inline-formula id="inf32">
<mml:math id="m37">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf33">
<mml:math id="m38">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, as defined in <xref ref-type="disp-formula" rid="e6">Eq. 6</xref>:<disp-formula id="e6">
<mml:math id="m39">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:math>
<label>(6)</label>
</disp-formula>where <inline-formula id="inf34">
<mml:math id="m40">
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf35">
<mml:math id="m41">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>F</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is a weight vector that is multiplied (dot product) with the vector <inline-formula id="inf36">
<mml:math id="m42">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf37">
<mml:math id="m43">
<mml:mi>b</mml:mi>
</mml:math>
</inline-formula> is a constant value. Both D and b are learnable parameters and are shared among different nodes. <inline-formula id="inf38">
<mml:math id="m44">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> denotes the concatenation vector of <inline-formula id="inf39">
<mml:math id="m45">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>We refer to attention and gate-augmented mechanism as the gate-augmented graph attention layer (GAT). Then, we can simply denote <inline-formula id="inf40">
<mml:math id="m46">
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The node embedding can be iteratively updated by <inline-formula id="inf41">
<mml:math id="m47">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, which aggregates information from neighboring&#x20;nodes.</p>
</sec>
<sec id="s2-5">
<title>Graph Neural Network Architecture of GNN-DOVE</title>
<p>Using the <inline-formula id="inf42">
<mml:math id="m48">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> mechanism described before, we adopted four layers of <inline-formula id="inf43">
<mml:math id="m49">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> in GNN-DOVE to process the node embedding information from neighbors and to output the updated node embedding (<xref ref-type="fig" rid="F1">Figure&#x20;1B</xref>). For the two adjacency matrices <inline-formula id="inf44">
<mml:math id="m50">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf45">
<mml:math id="m51">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, we used a shared GAT. The initial input of the network is atom features. With two matrices, <inline-formula id="inf46">
<mml:math id="m52">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf47">
<mml:math id="m53">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, we have <inline-formula id="inf48">
<mml:math id="m54">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf49">
<mml:math id="m55">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. To focus only on the intermolecular interactions within an input protein complex model, we subtracted the embedding of the two graphs as the final node embedding. By subtracting the updated embedding <inline-formula id="inf50">
<mml:math id="m56">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> from <inline-formula id="inf51">
<mml:math id="m57">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, we can capture the aggregation information that only comes from the intermolecular interactions with other nodes in the protein complex model. Thus, the output node feature is defined as<disp-formula id="e7">
<mml:math id="m58">
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>Then, the updated <inline-formula id="inf52">
<mml:math id="m59">
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> will become <inline-formula id="inf53">
<mml:math id="m60">
<mml:mrow>
<mml:msup>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> to iteratively augment the information through the three following <inline-formula id="inf54">
<mml:math id="m61">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> layers. After the node embeddings were updated by the four <inline-formula id="inf55">
<mml:math id="m62">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> layers, the node embedding of the whole graph was summed up as the entire graph representation, which is considered as the overall intermolecular interaction representation of the protein complex model:<disp-formula id="e8">
<mml:math id="m63">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mstyle>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>Finally, FC layers were applied to <inline-formula id="inf56">
<mml:math id="m64">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> to classify whether the protein complex model is correct or incorrect. In total, four FC layers were applied. The first layer takes 140 feature values from <xref ref-type="disp-formula" rid="e8">Eq. 8</xref>. The three subsequent layers have a dimension of 128. RELU activation functions were used between the FC layers, and a sigmoid function was applied for the last layer to output a probability&#x20;value.</p>
<p>The source code of GNN-DOVE is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/kiharalab/GNN_DOVE">https://github.com/kiharalab/GNN_DOVE</ext-link>.</p>
</sec>
<sec id="s2-6">
<title>Training Networks</title>
<p>Since the dataset was highly imbalanced with more incorrect decoys than acceptable ones, we balanced the training data by sampling the same number of acceptable and incorrect decoys in each batch. We sampled the same number of correct and incorrect decoys. To achieve this, a positive (i.e.,&#x20;correct) decoy may be sampled multiple times in one epoch of training.</p>
<p>For training, cross-entropy loss (<xref ref-type="bibr" rid="B15">Goodfellow et&#x20;al., 2016</xref>) was used as the loss function, and the Adam optimizer (<xref ref-type="bibr" rid="B26">Kingma and Ba, 2015</xref>) was used for parameter optimization. To avoid overfitting, a dropout (<xref ref-type="bibr" rid="B52">Srivastava et&#x20;al., 2014</xref>) of 0.3 was applied for every layer, except the last FC layer. Models were trained for 100 epochs with a batch size of 32. Weights of every layer were initialized using the Glorot uniform (<xref ref-type="bibr" rid="B13">Glorot and Bengio, 2010</xref>) to have a zero-centered Gaussian distribution, and bias was initialized to 0 for all layers.</p>
<p>First, we performed four-fold cross-validation on the Dockground dataset (<xref ref-type="table" rid="T1">Table&#x20;1</xref>). For fold 1, where we used the fold 1 subset as testing and the other three subsets for training and validation, 16&#x20;hyper-parameter combinations with learning rates of 0.2, 0.02, 0.002, and 0.0002 and a weight decay in Adam of 0, 1e-1, 1e-2, 1e-3, 1e-4, and 1e-5 were tested. Among these combinations, we found a learning rate of 0.002 with a weight decay of 0 achieved the highest accuracy on the validation set. We used this parameter combination throughout the other three folds in the cross-validation. The training process generally converged after approximately 30 epochs.</p>
<p>Next, we used the combined dataset of Dockground and ZDOCK for further training. We adopted transfer learning on this dataset by starting from the models pretrained on the Dockground dataset. The training was performed in two stages: In the first stage, nine hyper-parameter combinations with learning rates of 0.002, 0.0002, and 0.00002 and weight decay of 1e-4, 1e-5, and 0 were tested on the fold 1 model. We found that a combination of a learning rate of 0.0002 and weight decay of 0 performed the best when evaluated on its validation set. We used this hyper-parameter combination to train the fold 2, 3, and 4 models and selected the fold 1 model for further training because it showed the highest accuracy on the validation set. In the second stage, we used a smaller learning rate of 0.00002 and weight decay 0 to further fine tune the fold 1 model for another 30 epochs. The resulting model was evaluated on the testing set of the combined Dockground and ZDOCK dataset. Further, we applied the model to the dataset of CAPRI scoring targets.</p>
</sec>
<sec id="s2-7">
<title>DOVE</title>
<p>We compared the performance of GNN-DOVE with its predecessor, DOVE. Here, we briefly describe the DOVE algorithm. DOVE is a CNN-based method for evaluating protein docking models. It first extracts the interface region of an input protein complex model, and the region is put into a 40&#x2a;40&#x2a;40&#xa0;&#xc5;<sup>3</sup> cube as input. A seven-layer CNN, which consists of three convolutional layers, two pooling layers, and two fully connected layers, was adopted to process the voxel input. The output of DOVE is the probability that indicates whether the input model is acceptable or not. For input features, DOVE took atom types as well as atom-based interaction energy values from GOAP (<xref ref-type="bibr" rid="B64">Zhou and Skolnick, 2011</xref>) and ITScore (<xref ref-type="bibr" rid="B18">Huang and Zou, 2008</xref>). Since voxelized structure input is not rotationally invariant, DOVE needed to augment training data by rotations.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec id="s3-1">
<title>Performance on the Dockground Dataset</title>
<p>We evaluated the performance of GNN-DOVE on the Dockground dataset. GNN-DOVE was compared with DOVE and five other existing structure model scoring methods, such as GOAP (<xref ref-type="bibr" rid="B62">Zhou and Skolnick, 2011</xref>), ITScore (<xref ref-type="bibr" rid="B18">Huang and Zou, 2008</xref>), ZRANK (<xref ref-type="bibr" rid="B45">Pierce and Weng, 2007</xref>), ZRANK2 (<xref ref-type="bibr" rid="B46">Pierce and Weng, 2008</xref>), and IRAD (<xref ref-type="bibr" rid="B58">Vreven et&#x20;al., 2011</xref>). The test set results were reported for GNN-DOVE and DOVE. Both GOAP and ITScore were run in two different ways. First, as originally designed, the entire complex structure model was input. The other way was to input only the interface residues that are within 10&#xa0;&#xc5; of the interacting protein (denoted as GOAP-Interface and ITScore-Interface). Thus, GNN-DOVE was compared with a total of eight methods. As for DOVE, we used a cube size of 40<sup>3</sup>&#xa0;&#xc5;<sup>3</sup> and heavy atom distributions as input feature because this setting performed the best among other settings tested on the Dockground dataset in the original paper (<xref ref-type="bibr" rid="B60">Wang et&#x20;al., 2020</xref>) (<xref ref-type="fig" rid="F4">Figure&#x20;4</xref> in the paper, the setting was named as DOVE-Atom 40). For this work, DOVE was newly retrained using the same four-fold cross-validation as GNN-DOVE.</p>
<p>
<xref ref-type="fig" rid="F2">Figure&#x20;2</xref> shows the hit rate of GNN-DOVE in comparison with the other methods. A hit rate of a method is the fraction of target complexes where the method ranked at least one acceptable model based on the CAPRI criteria within each top rank. Targets were evaluated when they were in the heldout testing set from the four-fold cross-testing we performed. In <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>, we show three panels. Panel A shows the fraction of targets where a method had at least one hit among each rank cutoff. Panel B shows the hit rates for a method were averaged first within each of the 29 groups, and then re-averaged over the groups. Panel C shows the hit rate when targets with similar interface structures were grouped.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Performance on the Dockground dataset. GNN-DOVE was compared with DOVE and seven other scoring methods. <bold>(A)</bold> The panel shows the fraction of target complexes among the 58 complexes in the benchmark set for which a method selected at least one acceptable model (within top <italic>x</italic> scored models). <bold>(B)</bold> Considering the complexes are grouped into 29 groups, we also compared the hit rate of different methods based on the group classification. The hit rates for complexes in each group were averaged and then re-averaged over the 29 groups. <bold>(C)</bold> Results when 46 complex groups were considered that were formed with interface similarity. The hit rates for complexes in each group were averaged and then re-averaged over the 46 groups.</p>
</caption>
<graphic xlink:href="fmolb-08-647915-g002.tif"/>
</fig>
<p>
<xref ref-type="fig" rid="F2">Figure&#x20;2</xref> shows that GNN-DOVE (dotted line in light green) performed better than the other methods. GNN-DOVE was able to rank correct models within earlier ranks in many target complexes. Within the top 10 rank, GNN-DOVE achieved a hit rate of 89.7%, while the next best method, DOVE, achieved 81.0%, and the third best method, GOAP, obtained 70.7% (<xref ref-type="fig" rid="F2">Figure&#x20;2A</xref>). When we further compared the hit rates considering the target groups (<xref ref-type="fig" rid="F2">Figure&#x20;2B</xref>), GNN-DOVE consistently outperformed other methods. The gap between GNN-DOVE and DOVE against the other existing methods also increased. Among the other seven existing methods, GOAP showed the highest hit rate at 5th rank, followed by ZRANK2 in both panels, while ITScore-Interface had the lowest hit rates on this dataset. In <xref ref-type="fig" rid="F2">Figure&#x20;2C</xref>, we evaluated the methods&#x2019; performance when target complexes were grouped considering their docking interface area similarity, which was evaluated by TM-Score. For a complex, an interface was defined as residues that are closer than 10&#xa0;&#xc5; to any residue of the docking partner. To run TM-align to obtain TM-Score for two interfaces, we prepared two versions of PDB files for each interface: one with residues from the receptor first followed by residues from the ligand and the other with the opposite order. Then, we computed TM-Score for four combinations of the files from the two interfaces and selected the largest TM-Score among them. A pair of interfaces was grouped if one of the computed TM-score values of the interface regions was 0.5 or higher. This process formed 46 groups. The hit rate was computed for each complex first, then averaged within each group, and finally re-averaged across 46 groups. GNN-DOVE still showed the highest hit rate among the methods compared when considering top 10&#x20;ranks.</p>
<p>In <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>, we show results on each test set from the four-fold cross validation. GNN-DOVE showed the highest hit rate in early&#x20;ranks.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>The hit rate is shown for each fold in the cross validation on the Dockground dataset. Protein complexes in the test set of each fold are listed in <xref ref-type="table" rid="T1">Table&#x20;1</xref>. In the same way as <xref ref-type="fig" rid="F2">Figure&#x20;2A</xref>, a hit rate was computed for individual complexes separately and averaged over the complexes. <bold>(A)</bold> The hit rate of the fold 1 test set. The model was trained on the fold 2, 3, and 4 subsets. <bold>(B)</bold> The fold 2 test set. <bold>(C)</bold> The fold 3 test set. <bold>(D)</bold> The fold 4 test&#x20;set.</p>
</caption>
<graphic xlink:href="fmolb-08-647915-g003.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, we compared iRMSD, lRMSD, and fnat values of the methods. These metrics are used for defining the quality levels in CAPRI. The best value among the top 10 ranked decoys was plotted. For the majority of the cases (49 out of the 58 targets), GNN-DOVE selected a decoy within an iRMSD of 4&#xa0;&#xc5; (one of the criteria for the acceptable quality level in CAPRI). This is in sharp contrast to the other methods (<xref ref-type="fig" rid="F4">Figure&#x20;4A</xref>), where the iRMSD of many targets they selected were larger (worse) than GNN-DOVE. In terms of iRMSD, the second best method was DOVE, where 44 targets were within an iRMSD of 4&#xa0;&#xc5;. A similar situation was observed for lRMSD. GNN-DOVE selected a decoy within an lRMSD of 10&#xa0;&#xc5; (one of the criteria for the acceptable quality level in CAPRI) for 50 targets, while the second best method, DOVE, selected 45 targets within 10&#xa0;&#xc5; lRMSD. In terms of fnat (larger being more accurate), GNN-DOVE only missed 5 targets in selecting at least one model with an fnat over 0.1 (one of the criteria for acceptable quality level in CAPRI). The plot shows that GNN-DOVE had a larger fnat value than the other existing methods for most of the targets, as indicated by many data points below the diagonal&#x20;line.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Comparison of iRMSD, lRMSD, and fnat. For each method, the best value among the top 10 scored decoys was plotted. <bold>(A)</bold> Comparison against all eight methods. <bold>(B)</bold> Comparison against DOVE.</p>
</caption>
<graphic xlink:href="fmolb-08-647915-g004.tif"/>
</fig>
<p>
<xref ref-type="fig" rid="F4">Figure&#x20;4B</xref> compares GNN-DOVE against DOVE. In terms of iRMSD, lRMSD, and fnat, GNN-DOVE outperformed DOVE for 26 targets (22 ties), 27 targets (20 ties), and 27 targets (17 ties targets), respectively. Overall, GNN-DOVE outperformed the eight existing methods for all three metrics.</p>
</sec>
<sec id="s3-2">
<title>T-SNE Analysis</title>
<p>To illustrate how GNN-DOVE classified decoys, we used t-SNE (<xref ref-type="bibr" rid="B35">Maaten and Hinton, 2008</xref>) to visualize GNN-DOVE&#x2019;s encoding of decoys in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>. t-SNE is a dimension-reduction method to visualize similarities of high-dimensional data points. Since we employed a four-fold cross-validation, a plot was provided for each of the four testing sets. In all the plots, particularly in Fold 3 and Fold 4, most of the acceptable decoys (black circles) were distinguished from incorrect ones (gray crosses), which indicates a good representation and generalization ability of the graph neural networks for this problem.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>t-SNE plots of decoy selection. Decoys from all the testing target complexes in the four different folds in the cross-testing are plotted, which in total include 580 correct decoys (black circles) and 5,591 incorrect decoys (gray stars). Encoded features of those decoys are taken from the output of the last fully connected layer of GNN, which is a vector of 128 elements. To visualize the different embedding, we use t-SNE to project them into a 2D space. The four panels correspond to the embedding of models on the four-fold testing&#x20;sets.</p>
</caption>
<graphic xlink:href="fmolb-08-647915-g005.tif"/>
</fig>
</sec>
<sec id="s3-3">
<title>Examples of Decoys for Comparison With DOVE</title>
<p>We mentioned above that a limitation of DOVE is its usage of a fix-sized cube of 40<sup>3</sup>&#xa0;&#xc5;<sup>3</sup>, which cannot capture the entire interface region if the interface is too large to fit in the cube. Here, we show two examples of such cases, which led to misclassification by DOVE but correct classification by GNN-DOVE. In <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>, the interface region of a decoy is shown in blue and green, and the atoms that did not fit in the cube are shown in a sphere representation in&#x20;red.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Examples of decoys with an acceptable quality but not selected within the top 10 by DOVE. Two subunits docked are shown in cyan and light brown, and the interface regions of the two subunits are presented in the stick representation and in blue and green, respectively. To highlight the missed atoms from the input cube of DOVE, they are shown in red spheres. <bold>(A)</bold> A medium-quality decoy for 1bui. iRMSD: 2.54&#xa0;&#xc5;, lRMSD: 2.93&#xa0;&#xc5;, fnat: 0.551. <bold>(B)</bold> A medium-quality decoy for 1g20. iRMSD: 2.14&#xa0;&#xc5;, lRMSD: 3.86&#xa0;&#xc5;, fnat: 0.453.</p>
</caption>
<graphic xlink:href="fmolb-08-647915-g006.tif"/>
</fig>
<p>The first example (<xref ref-type="fig" rid="F6">Figure&#x20;6A</xref>) shows a decoy of a protein complex of plasminogen and staphylokinase (PDB ID: 1bui), which has an acceptable quality by the CAPRI criteria. For this decoy, 59 atoms (in red) out of 1,022 atoms at the interface were not included in the cube. Because of this, it was ranked the 65th out of 110 decoys by DOVE, while it was ranked 15th by GNN-DOVE. For this target, GNN-DOVE ranked five hits within the top 10 scoring decoys and eight hits within the top 20. In contrast, DOVE could not rank any hit within the top 20. The first hit by DOVE was found at the 35th&#x20;rank.</p>
<p>The second example (<xref ref-type="fig" rid="F6">Figure&#x20;6B</xref>) is an acceptable model for the nitrogenase complex (PDB ID: 1g20). As shown, many interface atoms, 497 out of 1,843, were outside the cube. DOVE ranked this decoy 28th, while GNN-DOVE ranked this decoy 10th. DOVE had 0 hits within the top 10 and had only one hit within top 20. On the other hand, GNN-DOVE was very successful for this target, where all the top 10 selections were correct models.</p>
</sec>
<sec id="s3-4">
<title>Performance on the Combined Dockground and ZDOCK Dataset</title>
<p>Next, we examined the performance of GNN-DOVE on the 19 complexes in the test set of the combined Dockground and ZDOCK dataset. In <xref ref-type="table" rid="T3">Table&#x20;3</xref>, we showed the total number of hits among top 10 ranks by GNN-DOVE and the same five other methods, that is, GOAP, ITScore, ZRANK, ZRANK2, and IRAD, as we used in <xref ref-type="fig" rid="F2">Figures 2</xref>&#x2013;<xref ref-type="fig" rid="F4">4</xref>. GNN-DOVE achieved the highest hit rate of 0.842, followed by ZRANK with 0.789. GNN-DOVE ranked at the top among the methods consistently when the group hit rate was considered. We note that some of the existing methods performed perfectly for specific complexes, choosing 10 hits within the top 10. However, many methods failed to select any top hits for other target complexes. In contrast, GNN-DOVE showed the most stable performance across different complexes.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Performance on the Dockground&#x2b;ZDOCK testing dataset.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">ID</th>
<th align="center">GNN-DOVE</th>
<th align="center">GOAP</th>
<th align="center">ITScore</th>
<th align="center">ZRANK</th>
<th align="center">ZRANK2</th>
<th align="center">IRAD</th>
<th align="center">Total</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">1AK4</td>
<td align="center">1</td>
<td align="center">10</td>
<td align="center">1</td>
<td align="center">1</td>
<td align="center">7</td>
<td align="center">0</td>
<td align="center">179</td>
</tr>
<tr>
<td align="left">1AY7</td>
<td align="center">8</td>
<td align="center">0</td>
<td align="center">3</td>
<td align="center">9</td>
<td align="center">8</td>
<td align="center">8</td>
<td align="center">176</td>
</tr>
<tr>
<td align="left">1EER</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">3</td>
<td align="center">0</td>
<td align="center">41</td>
</tr>
<tr>
<td align="left">1GLA</td>
<td align="center">5</td>
<td align="center">1</td>
<td align="center">0</td>
<td align="center">8</td>
<td align="center">4</td>
<td align="center">8</td>
<td align="center">165</td>
</tr>
<tr>
<td align="left">1HCF</td>
<td align="center">9</td>
<td align="center">0</td>
<td align="center">8</td>
<td align="center">3</td>
<td align="center">3</td>
<td align="center">7</td>
<td align="center">183</td>
</tr>
<tr>
<td align="left">1JIW</td>
<td align="center">3</td>
<td align="center">0</td>
<td align="center">2</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">2</td>
<td align="center">106</td>
</tr>
<tr>
<td align="left">1JTG</td>
<td align="center">8</td>
<td align="center">0</td>
<td align="center">10</td>
<td align="center">10</td>
<td align="center">0</td>
<td align="center">10</td>
<td align="center">177</td>
</tr>
<tr>
<td align="left">1KAC</td>
<td align="center">7</td>
<td align="center">0</td>
<td align="center">5</td>
<td align="center">8</td>
<td align="center">2</td>
<td align="center">6</td>
<td align="center">183</td>
</tr>
<tr>
<td align="left">1KTZ</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">3</td>
<td align="center">0</td>
<td align="center">77</td>
</tr>
<tr>
<td align="left">1MAH</td>
<td align="center">9</td>
<td align="center">0</td>
<td align="center">8</td>
<td align="center">9</td>
<td align="center">0</td>
<td align="center">9</td>
<td align="center">179</td>
</tr>
<tr>
<td align="left">2MTA</td>
<td align="center">7</td>
<td align="center">0</td>
<td align="center">4</td>
<td align="center">9</td>
<td align="center">0</td>
<td align="center">9</td>
<td align="center">186</td>
</tr>
<tr>
<td align="left">2VDB</td>
<td align="center">9</td>
<td align="center">1</td>
<td align="center">9</td>
<td align="center">7</td>
<td align="center">2</td>
<td align="center">6</td>
<td align="center">173</td>
</tr>
<tr>
<td align="left">3D5S</td>
<td align="center">7</td>
<td align="center">0</td>
<td align="center">10</td>
<td align="center">6</td>
<td align="center">1</td>
<td align="center">5</td>
<td align="center">156</td>
</tr>
<tr>
<td align="left">1BUH (1)</td>
<td align="center">3</td>
<td align="center">8</td>
<td align="center">9</td>
<td align="center">6</td>
<td align="center">4</td>
<td align="center">9</td>
<td align="center">183</td>
</tr>
<tr>
<td align="left">1FQ1 (1)</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">20</td>
</tr>
<tr>
<td align="left">1JWH (1)</td>
<td align="center">6</td>
<td align="center">6</td>
<td align="center">7</td>
<td align="center">6</td>
<td align="center">2</td>
<td align="center">8</td>
<td align="center">171</td>
</tr>
<tr>
<td align="left">2OZA (1)</td>
<td align="center">1</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">19</td>
</tr>
<tr>
<td align="left">1EFN (2)</td>
<td align="center">1</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">4</td>
<td align="center">3</td>
<td align="center">4</td>
<td align="center">130</td>
</tr>
<tr>
<td align="left">1GCQ (2)</td>
<td align="center">2</td>
<td align="center">9</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">8</td>
<td align="center">4</td>
<td align="center">142</td>
</tr>
<tr>
<td align="left">Hit rate</td>
<td align="center">0.842</td>
<td align="center">0.368</td>
<td align="center">0.684</td>
<td align="center">0.789</td>
<td align="center">0.737</td>
<td align="center">0.737</td>
<td align="center">&#x2014;</td>
</tr>
<tr>
<td align="left">Group HR</td>
<td align="center">0.867</td>
<td align="center">0.333</td>
<td align="center">0.717</td>
<td align="center">0.833</td>
<td align="center">0.767</td>
<td align="center">0.767</td>
<td align="left"/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>In the ID column, the number in a parentheses indicates which group the target belongs to. Thus, four complexes belong to the same similarity group, and the other two belong to another group. The rest of the complexes are single entry groups. Group HR indicates the group hit rate. In Group HR, the fraction of complexes within each group that have at least one hit (acceptable model) within the top 10 ranks was first computed, and then averaged across all the groups. The total column indicates the total number of acceptable docking models for a given target.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3-5">
<title>Performance on the CAPRI Scoring Dataset</title>
<p>Finally, we evaluate GNN-DOVE on another independent dataset, the CAPRI Score_set. This dataset was chosen to be able to compare GNN-DOVE on a larger number of existing methods which participated in the corresponding CAPRI rounds. In <xref ref-type="table" rid="T4">Table&#x20;4</xref>, we show detailed results of GNN-DOVE and the other five methods for each target. For each method, the number of decoys within the quality categories of acceptable, medium, and high (in this order) of the top 10 models are listed.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Performance on the CAPRI scoring dataset.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">ID</th>
<th align="center">GNN-DOVE</th>
<th align="center">GOAP</th>
<th align="center">ITScore</th>
<th align="center">ZRANK</th>
<th align="center">ZRANK2</th>
<th align="center">IRAD</th>
<th align="center">Total</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">(T29)</td>
<td align="center">2/0/0</td>
<td align="center">1/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">2/2/0</td>
<td align="center">1/1/0</td>
<td align="center">167/78/2</td>
</tr>
<tr>
<td align="left">(T30)</td>
<td align="center">1/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">2/0/0</td>
</tr>
<tr>
<td align="left">T32</td>
<td align="center">0/0/0</td>
<td align="center">1/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">15/3/0</td>
</tr>
<tr>
<td align="left">T35</td>
<td align="center">1/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">3/0/0</td>
</tr>
<tr>
<td align="left">(T37)</td>
<td align="center">0/0/0</td>
<td align="center">1/0/0</td>
<td align="center">3/0/1</td>
<td align="center">1/0/0</td>
<td align="center">4/1/0</td>
<td align="center">4/1/0</td>
<td align="center">99/46/11</td>
</tr>
<tr>
<td align="left">T39</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">4/3/0</td>
</tr>
<tr>
<td align="left">(T40)</td>
<td align="center">4/4/0</td>
<td align="center">1/0/1</td>
<td align="center">7/3/4</td>
<td align="center">1/1/0</td>
<td align="center">9/8/1</td>
<td align="center">3/3/0</td>
<td align="center">588/206/193</td>
</tr>
<tr>
<td align="left">T41</td>
<td align="center">5/0/0</td>
<td align="center">4/2/2</td>
<td align="center">1/1/0</td>
<td align="center">4/0/0</td>
<td align="center">2/0/0</td>
<td align="center">3/0/0</td>
<td align="center">371/120/2</td>
</tr>
<tr>
<td align="left">T46</td>
<td align="center">1/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">5/0/0</td>
<td align="center">6/0/0</td>
<td align="center">6/0/0</td>
<td align="center">24/0/0</td>
</tr>
<tr>
<td align="left">T47</td>
<td align="center">9/4/5</td>
<td align="center">10/0/10</td>
<td align="center">2/1/0</td>
<td align="center">9/5/4</td>
<td align="center">9/3/5</td>
<td align="center">10/2/7</td>
<td align="center">611/307/278</td>
</tr>
<tr>
<td align="left">T50</td>
<td align="center">6/0/0</td>
<td align="center">0/0/0</td>
<td align="center">4/1/0</td>
<td align="center">0/0/0</td>
<td align="center">2/0/0</td>
<td align="center">2/0/0</td>
<td align="center">133/36/0</td>
</tr>
<tr>
<td align="left">T53</td>
<td align="center">2/2/0</td>
<td align="center">7/6/0</td>
<td align="center">3/0/0</td>
<td align="center">1/0/0</td>
<td align="center">7/3/0</td>
<td align="center">4/2/0</td>
<td align="center">130/17/0</td>
</tr>
<tr>
<td align="left">(T54)</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">0/0/0</td>
<td align="center">19/1/0</td>
</tr>
<tr>
<td align="left">Hit</td>
<td align="center">9/3/1</td>
<td align="center">7/2/3</td>
<td align="center">6/4/2</td>
<td align="center">6/2/1</td>
<td align="center">8/5/2</td>
<td align="center">8/5/1</td>
<td align="center">13/10/5</td>
</tr>
<tr>
<td align="left">Hit-NR</td>
<td align="center">6/2/1</td>
<td align="center">4/2/2</td>
<td align="center">4/3/0</td>
<td align="center">4/1/1</td>
<td align="center">5/2/1</td>
<td align="center">5/2/1</td>
<td align="center">8/7/2</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The IDs in parentheses are those which have structure or sequence similarity to one of the complexes used in training. Results for a complex by a method have three numbers separated by /. The first number is the number of decoys selected within the top 10 ranked models, which has an acceptable or better quality. The second and third numbers are the number of models with medium or higher quality, and the number of high-quality models. The numbers in the total column indicate the total number of decoys in the three quality classifications in the decoy set of each target. The last two rows report the summary of the performance. Three numbers are the number of targets where the method identified at least one acceptable or higher-quality models, at least one medium- or higher-quality models, or at least one high-quality model, respectively. The hit row lists the results when all 13 targets were considered. Hit-NR only considers targets that are not in parentheses.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>GNN-DOVE had hits for the largest number of targets, that is, nine, when decoys of acceptable or higher quality were considered. When decoys in a medium or higher quality were considered, ITScore, ZRANK2, and IRAD had hits for five targets, while GNN-DOVE had hits for three targets. It is worth noting that GNN-DOVE successfully identified correct models in two difficult targets, T30 and T35, which only contained two and three acceptable models in the decoy sets, while all the other methods failed to select any correct decoys among the top&#x20;10.</p>
<p>In <xref ref-type="table" rid="T5">Table&#x20;5</xref>, we further compared GNN-DOVE with the top groups who participated in the model scoring task for the 13 CAPRI scoring targets. The results were taken from <xref ref-type="table" rid="T2">Table&#x20;2</xref> of the article by <xref ref-type="bibr" rid="B12">Geng et&#x20;al. (2020)</xref>. In total, 37 scoring groups have submitted their scores during this challenge and among them we list here only groups with five or more submitted targets. In addition to the CAPRI participants the table also includes the latest protein docking evaluation approaches, iScore (<xref ref-type="bibr" rid="B12">Geng et&#x20;al., 2020</xref>) and GraphRank (<xref ref-type="bibr" rid="B12">Geng et&#x20;al., 2020</xref>).</p>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Ranking of GNN-DOVE among other scorer groups on the CAPRI scoring dataset.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Group</th>
<th colspan="2" align="center">Performance</th>
<th rowspan="2" align="center">&#x23; Submitted targets</th>
</tr>
<tr>
<th align="center">All</th>
<th align="center">Nonredundant</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">iScore</td>
<td align="center">9/6/2</td>
<td align="center">6/5/1</td>
<td align="center">13 (8)</td>
</tr>
<tr>
<td align="left">GNN-DOVE</td>
<td align="center">9/3/1</td>
<td align="center">6/2/1</td>
<td align="center">13 (8)</td>
</tr>
<tr>
<td align="left">GraphRank</td>
<td align="center">8/4/1</td>
<td align="center">5/3/1</td>
<td align="center">13 (8)</td>
</tr>
<tr>
<td align="left">Bates</td>
<td align="center">8/4/1</td>
<td align="center">5/2/0</td>
<td align="center">10 (5)</td>
</tr>
<tr>
<td align="left">Bonvin</td>
<td align="center">8/3/2</td>
<td align="center">5/2/1</td>
<td align="center">9 (5)</td>
</tr>
<tr>
<td align="left">Weng</td>
<td align="center">8/2/3</td>
<td align="center">5/2/1</td>
<td align="center">9 (6)</td>
</tr>
<tr>
<td align="left">Zou</td>
<td align="center">7/1/4</td>
<td align="center">5/1/2</td>
<td align="center">9 (6)</td>
</tr>
<tr>
<td align="left">Wang</td>
<td align="center">6/3/2</td>
<td align="center">4/2/1</td>
<td align="center">6 (4)</td>
</tr>
<tr>
<td align="left">Fernandez-Recio</td>
<td align="center">5/3/2</td>
<td align="center">4/4/1</td>
<td align="center">8 (7)</td>
</tr>
<tr>
<td align="left">Elber</td>
<td align="center">5/1/1</td>
<td align="center">4/1/0</td>
<td align="center">5 (4)</td>
</tr>
<tr>
<td align="left">Wolfson</td>
<td align="center">4/0/1</td>
<td align="center">1/0/0</td>
<td align="center">5 (2)</td>
</tr>
<tr>
<td align="left">Camacho</td>
<td align="center">3/1/2</td>
<td align="center">1/1/1</td>
<td align="center">5 (2)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Results of the existing methods were taken from <xref ref-type="table" rid="T2">Table&#x20;2</xref> of the article by <xref ref-type="bibr" rid="B12">Geng et&#x20;al. (2020)</xref>. The numbers in the nonredundant column only considered targets in <xref ref-type="table" rid="T4">Table&#x20;4</xref> that are not in the parentheses. The last column shows the number of targets that each group has submitted their prediction among the 13 targets listed in <xref ref-type="table" rid="T4">Table&#x20;4</xref>. The numbers in parentheses report the number of submitted targets among those which do not have similarity to the training set we used (i.e.,&#x20;discarding the targets in parentheses in <xref ref-type="table" rid="T4">Table&#x20;4</xref>).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>GNN-DOVE tied with iScore when decoys of acceptable or higher quality were considered. When medium- or higher-quality decoys were considered, GNN-DOVE performed second to iScore. In this list, except for GNN-DOVE, iScore, and GraphRank, all the other groups were human groups, which may have used manual intervention using expert knowledge. Thus, the results show that GNN-DOVE is also highly competitive against human experts.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>In this work, we developed GNN-DOVE for protein docking decoy selection, which used a graph neural network (GNN). We used the gate-augmented attention mechanism to capture the atom interaction pattern at the interface region of protein docking models. The benchmark on the Dockground dataset demonstrated that GNN-DOVE outperformed DOVE, along with other existing scoring functions compared. We further trained GNN-DOVE on a larger dataset and evaluated two more datasets, including the CAPRI Score_set, which confirmed superior performance of GNN-DOVE to existing methods.</p>
<p>To assess the quality of structure models, considering multi-body (atom or residue) interactions (<xref ref-type="bibr" rid="B14">Gniewek et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B24">Kim and Kihara, 2014</xref>; <xref ref-type="bibr" rid="B25">Kim and Kihara, 2016</xref>; <xref ref-type="bibr" rid="B39">Olechnovic and Venclovas, 2017</xref>) have been proven to be an effective approach. GNNs consider patterns of multiatom interactions by representing the interactions as a graph structure. Since a graph is a natural representation of molecular structures, GNNs may be applied in various problems in structural bioinformatics and cheminformatics.</p>
<p>The performance of GNN-DOVE likely would be improved by considering other physicochemical properties of atoms such as atom-wise binding energies, as well as sequence conservation of residues that can be computed from a multiple sequence alignment of homologous proteins. Application to multichain complexes remains a potential path for future&#x20;work.</p>
</sec>
</body>
<back>
<sec id="s5">
<title>Data Availability Statement</title>
<p>The datasets presented in this study can be found in online repositories. The Dockground docking dataset was downloaded from the Dockground database (<ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://dockground.compbio.ku.edu">http://dockground.compbio.ku.edu</ext-link>) at the link <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://dockground.compbio.ku.edu/downloads/unbound/decoy/decoys1.0.zip">http://dockground.compbio.ku.edu/downloads/unbound/decoy/decoys1.0.zip</ext-link>. The ZDOCK dataset was downloaded from the ZDOCK decoy sets (<ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://zlab.umassmed.edu/zdock/decoys.shtml">https://zlab.umassmed.edu/zdock/decoys.shtml</ext-link>) at the link <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://zlab.umassmed.edu/zdock/decoys_bm4_zd3.0.2_6deg.tar.gz">https://zlab.umassmed.edu/zdock/decoys_bm4_zd3.0.2_6deg.tar.gz</ext-link>. The CAPRI score set was downloaded from <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://cb.iri.univ-lille1.fr/Users/lensink/Score_set">http://cb.iri.univ-lille1.fr/Users/lensink/Score_set</ext-link>.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>XW and STF conceived the initial version of the study. XW and DK designed this work in the current form. XW developed the codes in communication with STF. XW performed the computation, and XW and DK analyzed the results. XW wrote the initial draft of the manuscript, and DK critically edited&#x20;it. XW, STF, and DK edited the manuscript in the revision.</p>
</sec>
<sec id="s7">
<title>Funding</title>
<p>We declare that all the sources of funding received for this&#x20;research have been submitted. This work was partly supported by the National Institutes of Health (R01GM133840 and R01GM123055) and the National Science Foundation (DMS1614777, CMMI1825941, MCB1925643, and DBI2003635).</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ack>
<p>The authors are grateful to Jacob Verburgt for proofreading the manuscript, and Sai Raghavendra Maddhuri Venkata Subramaniya and Aashish Jain for testing the GNN-DOVE code on GitHub.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aderinwale</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Christoffer</surname>
<given-names>C. W.</given-names>
</name>
<name>
<surname>Sarkar</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Alnabati</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Computational Structure Modeling for Diverse Categories of Macromolecular Interactions</article-title>. <source>Curr. Opin. Struct. Biol.</source> <volume>64</volume>, <fpage>1</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1016/j.sbi.2020.05.017</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akbal-Delibas</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Farhoodi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Pomplun</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Haspel</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Accurate Refinement of Docked Protein Complexes Using Evolutionary Information and Deep Learning</article-title>. <source>J.&#x20;Bioinform. Comput. Biol.</source> <volume>14</volume>, <fpage>1642002</fpage>. <pub-id pub-id-type="doi">10.1142/s0219720016420026</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alam</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Goldstein</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Porter</surname>
<given-names>K. A.</given-names>
</name>
<name>
<surname>Kozakov</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Schueler-Furman</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>High-resolution Global Peptide-Protein Docking Using Fragments-Based PIPER-FlexPepDock</article-title>. <source>PLoS Comput. Biol.</source> <volume>13</volume>, <fpage>e1005905</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005905</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Anishchenko</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Kundrotas</surname>
<given-names>P. J.</given-names>
</name>
<name>
<surname>Tuzikov</surname>
<given-names>A. V.</given-names>
</name>
<name>
<surname>Vakser</surname>
<given-names>I. A.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Structural Templates for Comparative Protein Docking</article-title>. <source>Proteins</source> <volume>83</volume>, <fpage>1563</fpage>&#x2013;<lpage>1570</lpage>. <pub-id pub-id-type="doi">10.1002/prot.24736</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Degiacomi</surname>
<given-names>M. T.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Coupling Molecular Dynamics and Deep Learning to Mine Protein Conformational Space</article-title>. <source>Structure</source> <volume>27</volume>, <fpage>1034</fpage>&#x2013;<lpage>1040</lpage>. <comment>e1033</comment>. <pub-id pub-id-type="doi">10.1016/j.str.2019.03.018</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Duvenaud</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Maclaurin</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Aguilera-Iparraguirre</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>G&#x00F3;mez-Bombarelli</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Hirzel</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Aspuru-Guzik</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <source>Advances in Neural Information Processing Systems</source>, <fpage>2224</fpage>&#x2013;<lpage>2232</lpage>.</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Esquivel&#x2010;Rodr&#xed;guez</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y. D.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Multi&#x2010;LZerD</surname>
</name>
</person-group> (<year>2012</year>). <article-title>Multiple Protein Docking for Asymmetric Complexes</article-title>. <source>Proteins: Struct. Funct. Bioinformatics</source> <volume>80</volume>, <fpage>1818</fpage>&#x2013;<lpage>1833</lpage>. <pub-id pub-id-type="doi">10.1002/prot.24079</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Esquivel-Rodr&#xed;guez</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Fitting Multimeric Protein Complexes into Electron Microscopy Maps Using 3D Zernike Descriptors</article-title>. <source>J.&#x20;Phys. Chem. B</source>. <volume>116</volume>, <fpage>6854</fpage>&#x2013;<lpage>6861</lpage>. <pub-id pub-id-type="doi">10.1021/jp212612t</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fink</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Hochrein</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wolowski</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Merkl</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Gronwald</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>PROCOS: Computational Analysis of Protein-Protein Complexes</article-title>. <source>J.&#x20;Comput. Chem.</source> <volume>32</volume>, <fpage>2575</fpage>&#x2013;<lpage>2586</lpage>. <pub-id pub-id-type="doi">10.1002/jcc.21837</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fischer</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>S. L.</given-names>
</name>
<name>
<surname>Wolfson</surname>
<given-names>H. L.</given-names>
</name>
<name>
<surname>Nussinov</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>A Geometry-Based Suite of Moleculardocking Processes</article-title>. <source>J.&#x20;Mol. Biol.</source> <volume>248</volume>, <fpage>459</fpage>&#x2013;<lpage>477</lpage>. <pub-id pub-id-type="doi">10.1016/s0022-2836(95)80063-8</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gainza</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Sverrisson</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Monti</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Rodol&#xe0;</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Boscaini</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bronstein</surname>
<given-names>M. M.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Deciphering Interaction Fingerprints from Protein Molecular Surfaces Using Geometric Deep Learning</article-title>. <source>Nat. Methods</source> <volume>17</volume>, <fpage>184</fpage>&#x2013;<lpage>192</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-019-0666-6</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Geng</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Jung</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Renaud</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Honavar</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Bonvin</surname>
<given-names>A. M. J.&#x20;J.</given-names>
</name>
<name>
<surname>Xue</surname>
<given-names>L. C.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>iScore: a Novel Graph Kernel-Based Function for Scoring Protein-Protein Docking Models</article-title>. <source>Bioinformatics</source> <volume>36</volume>, <fpage>112</fpage>&#x2013;<lpage>121</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz496</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Glorot</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2010</year>). <source>Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics</source>, <fpage>249</fpage>&#x2013;<lpage>256</lpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gniewek</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Leelananda</surname>
<given-names>S. P.</given-names>
</name>
<name>
<surname>Kolinski</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Jernigan</surname>
<given-names>R. L.</given-names>
</name>
<name>
<surname>Kloczkowski</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Multibody Coarse-Grained Potentials for Native Structure Recognition and Quality Assessment of Protein Models</article-title>. <source>Proteins</source> <volume>79</volume>, <fpage>1923</fpage>&#x2013;<lpage>1929</lpage>. <pub-id pub-id-type="doi">10.1002/prot.23015</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Goodfellow</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Courville</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Deep Learning</source>. <publisher-name>MIT press Cambridge</publisher-name>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gray</surname>
<given-names>J.&#x20;J.</given-names>
</name>
<name>
<surname>Moughon</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Schueler-Furman</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Kuhlman</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Rohl</surname>
<given-names>C. A.</given-names>
</name>
<etal/>
</person-group> (<year>2003</year>). <article-title>Protein-Protein Docking with Simultaneous Optimization of Rigid-Body Displacement and Side-Chain Conformations</article-title>. <source>J.&#x20;Mol. Biol.</source> <volume>331</volume>, <fpage>281</fpage>&#x2013;<lpage>299</lpage>. <pub-id pub-id-type="doi">10.1016/s0022-2836(03)00670-3</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Paper Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>. <pub-id pub-id-type="doi">10.1109/cvpr.2016.90</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>S.-Y.</given-names>
</name>
<name>
<surname>Zou</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>An Iterative Knowledge-Based Scoring Function for Protein-Protein Recognition</article-title>. <source>Proteins</source> <volume>72</volume>, <fpage>557</fpage>&#x2013;<lpage>579</lpage>. <pub-id pub-id-type="doi">10.1002/prot.21949</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hwang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Vreven</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Janin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Protein-protein Docking Benchmark Version 4.0</article-title>. <source>Proteins</source> <volume>78</volume>, <fpage>3111</fpage>&#x2013;<lpage>3114</lpage>. <pub-id pub-id-type="doi">10.1002/prot.22830</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Janin</surname>
<given-names>J.&#x20;l.</given-names>
</name>
<name>
<surname>Henrick</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Moult</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Eyck</surname>
<given-names>L. T.</given-names>
</name>
<name>
<surname>Sternberg</surname>
<given-names>M. J.&#x20;E.</given-names>
</name>
<name>
<surname>Vajda</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2003</year>). <article-title>CAPRI: a Critical Assessment of Predicted Interactions</article-title>. <source>Proteins</source> <volume>52</volume>, <fpage>2</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1002/prot.10381</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Janin</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>The Targets of CAPRI Rounds 13-19</article-title>. <source>Proteins</source> <volume>78</volume>, <fpage>3067</fpage>&#x2013;<lpage>3072</lpage>. <pub-id pub-id-type="doi">10.1002/prot.22774</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Janin</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>The Targets of CAPRI Rounds 20-27</article-title>. <source>Proteins</source> <volume>81</volume>, <fpage>2075</fpage>&#x2013;<lpage>2081</lpage>. <pub-id pub-id-type="doi">10.1002/prot.24375</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Katchalski-Katzir</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Shariv</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Eisenstein</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Friesem</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Aflalo</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Vakser</surname>
<given-names>I. A.</given-names>
</name>
</person-group> (<year>1992</year>). <article-title>Molecular Surface Recognition: Determination of Geometric Fit between Proteins and Their Ligands by Correlation Techniques</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>89</volume>, <fpage>2195</fpage>&#x2013;<lpage>2199</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.89.6.2195</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Detecting Local Residue Environment Similarity for Recognizing Near-Native Structure Models</article-title>. <source>Proteins</source> <volume>82</volume>, <fpage>3255</fpage>&#x2013;<lpage>3272</lpage>. <pub-id pub-id-type="doi">10.1002/prot.24658</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Protein Structure Prediction Using Residue- and Fragment-Environment Potentials in CASP11</article-title>. <source>Proteins</source> <volume>84</volume> (<issue>Suppl. 1</issue>), <fpage>105</fpage>&#x2013;<lpage>117</lpage>. <pub-id pub-id-type="doi">10.1002/prot.24920</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kingma</surname>
<given-names>D. P.</given-names>
</name>
<name>
<surname>Ba</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2015</year>). <source>Paper Presented at the International Conference on Learning Representations</source>.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kingsley</surname>
<given-names>L. J.</given-names>
</name>
<name>
<surname>Esquivel-Rodr&#xed;guez</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lill</surname>
<given-names>M. A.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Ranking Protein-Protein Docking Results Using Steered Molecular Dynamics and Potential of Mean Force Calculations</article-title>. <source>J.&#x20;Comput. Chem.</source> <volume>37</volume>, <fpage>1861</fpage>&#x2013;<lpage>1865</lpage>. <pub-id pub-id-type="doi">10.1002/jcc.24412</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kurcinski</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Badaczewska&#x2010;Dawid</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kolinski</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kolinski</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kmiecik</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Flexible Docking of Peptides to Proteins Using CABS&#x2010;dock</article-title>. <source>Protein Sci.</source> <volume>29</volume>, <fpage>211</fpage>&#x2013;<lpage>222</lpage>. <pub-id pub-id-type="doi">10.1002/pro.3771</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kurcinski</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Jamroz</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Blaszczyk</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kolinski</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kmiecik</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>CABS-dock Web Server for the Flexible Docking of Peptides to Proteins without Prior Knowledge of the Binding Site</article-title>. <source>Nucleic Acids Res.</source> <volume>43</volume>, <fpage>W419</fpage>&#x2013;<lpage>W424</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv456</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lensink</surname>
<given-names>M. F.</given-names>
</name>
<name>
<surname>Velankar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Baek</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Heo</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Seok</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wodak</surname>
<given-names>S. J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>The Challenge of Modeling Protein Assemblies: the CASP12-CAPRI Experiment</article-title>. <source>Proteins</source> <volume>86</volume>, <fpage>257</fpage>&#x2013;<lpage>273</lpage>. <pub-id pub-id-type="doi">10.1002/prot.25419</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lensink</surname>
<given-names>M. F.</given-names>
</name>
<name>
<surname>Wodak</surname>
<given-names>S. J.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Score_set: a CAPRI Benchmark for Scoring Protein Complexes</article-title>. <source>Proteins</source> <volume>82</volume>, <fpage>3163</fpage>&#x2013;<lpage>3169</lpage>. <pub-id pub-id-type="doi">10.1002/prot.24678</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lim</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ryu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Choe</surname>
<given-names>Y. J.</given-names>
</name>
<name>
<surname>Ham</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>W. Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Predicting Drug-Target Interaction Using a Novel Graph Neural Network with 3D Structure-Embedded Graph Representation</article-title>. <source>J.&#x20;Chem. Inf. Model.</source> <volume>59</volume>, <fpage>3981</fpage>&#x2013;<lpage>3988</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.9b00387</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Vakser</surname>
<given-names>I. A.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>DOCKGROUND Protein-Protein Docking Decoy Set</article-title>. <source>Bioinformatics</source> <volume>24</volume>, <fpage>2634</fpage>&#x2013;<lpage>2635</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btn497</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Skolnick</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Development of Unified Statistical Potentials Describing Protein-Protein Interactions</article-title>. <source>Biophysical J.</source> <volume>84</volume>, <fpage>1895</fpage>&#x2013;<lpage>1901</lpage>. <pub-id pub-id-type="doi">10.1016/s0006-3495(03)74997-2</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maaten</surname>
<given-names>L. v. d.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Visualizing Data Using T-SNE</article-title>. <source>J.&#x20;Machine Learn. Res.</source> <volume>9</volume>, <fpage>2579</fpage>&#x2013;<lpage>2605</lpage>. </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moal</surname>
<given-names>I. H.</given-names>
</name>
<name>
<surname>Bates</surname>
<given-names>P. A.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>SwarmDock and the Use of Normal Modes in Protein-Protein Docking</article-title>. <source>Ijms</source> <volume>11</volume>, <fpage>3623</fpage>&#x2013;<lpage>3648</lpage>. <pub-id pub-id-type="doi">10.3390/ijms11103623</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moal</surname>
<given-names>I. H.</given-names>
</name>
<name>
<surname>Torchala</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bates</surname>
<given-names>P. A.</given-names>
</name>
<name>
<surname>Fern&#xe1;ndez-Recio</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>The Scoring of Poses in Protein-Protein Docking: Current Capabilities and Future Directions</article-title>. <source>BMC Bioinformatics</source> <volume>14</volume>, <fpage>286</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-14-286</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Nadaradjane</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Guerois</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Andreani</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Protein-Protein Docking Using Evolutionary Information</article-title>,&#x201d; in <source>Protein Complex Assembly</source> (<publisher-name>Springer</publisher-name>), <fpage>429</fpage>&#x2013;<lpage>447</lpage>. <pub-id pub-id-type="doi">10.1007/978-1-4939-7759-8_28</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Olechnovic</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Venclovas</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>VoroMQA: Assessment of Protein Structure Quality Using Interatomic Contact Areas</article-title>. <source>Proteins</source> <volume>85</volume>, <fpage>1131</fpage>&#x2013;<lpage>1145</lpage>. <pub-id pub-id-type="doi">10.1002/prot.25278</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oliwa</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>cNMA: a Framework of Encounter Complex-Based Normal Mode Analysis to Model Conformational Changes in Protein Interactions</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>i151</fpage>&#x2013;<lpage>i160</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv252</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Padhorny</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kazennov</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Zerbe</surname>
<given-names>B. S.</given-names>
</name>
<name>
<surname>Porter</surname>
<given-names>K. A.</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Mottarella</surname>
<given-names>S. E.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>Protein-protein Docking by Fast Generalized Fourier Transforms on 5D Rotational Manifolds</article-title>. <source>Proc. Natl. Acad. Sci. USA</source> <volume>113</volume>, <fpage>E4286</fpage>&#x2013;<lpage>E4293</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1603929113</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peterson</surname>
<given-names>L. X.</given-names>
</name>
<name>
<surname>Shin</surname>
<given-names>W.-H.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2018a</year>). <article-title>Improved Performance in CAPRI Round 37 Using LZerD Docking and Template-Based Modeling with Combined Scoring Functions</article-title>. <source>Proteins</source> <volume>86</volume>, <fpage>311</fpage>&#x2013;<lpage>320</lpage>. <pub-id pub-id-type="doi">10.1002/prot.25376</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peterson</surname>
<given-names>L. X.</given-names>
</name>
<name>
<surname>Togawa</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Esquivel-Rodriguez</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Terashi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Christoffer</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Roy</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2018b</year>). <article-title>Modeling the Assembly Order of Multimeric Heteroprotein Complexes</article-title>. <source>PLoS Comput. Biol.</source> <volume>14</volume>, <fpage>e1005937</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005937</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peterson</surname>
<given-names>L. X.</given-names>
</name>
<name>
<surname>Roy</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Christoffer</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Terashi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Modeling Disordered Protein Interactions from Biophysical Principles</article-title>. <source>PLoS Comput. Biol.</source> <volume>13</volume>, <fpage>e1005485</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005485</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pierce</surname>
<given-names>B. G.</given-names>
</name>
<name>
<surname>Hourai</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Accelerating Protein Docking in ZDOCK Using an Advanced 3D Convolution Library</article-title>. <source>PloS one</source> <volume>6</volume>, <fpage>e24657</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0024657</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pierce</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>A Combination of Rescoring and Refinement Significantly Improves Protein Docking Performance</article-title>. <source>Proteins</source> <volume>72</volume>, <fpage>270</fpage>&#x2013;<lpage>279</lpage>. <pub-id pub-id-type="doi">10.1002/prot.21920</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pierce</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>ZRANK: Reranking Protein Docking Predictions with an Optimized Energy Function</article-title>. <source>Proteins</source> <volume>67</volume>, <fpage>1078</fpage>&#x2013;<lpage>1086</lpage>. <pub-id pub-id-type="doi">10.1002/prot.21373</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ritchie</surname>
<given-names>D. W.</given-names>
</name>
<name>
<surname>Grudinin</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Spherical Polar Fourier Assembly of Protein Complexes with Arbitrary Point Group Symmetry</article-title>. <source>J.&#x20;Appl. Cryst.</source> <volume>49</volume>, <fpage>158</fpage>&#x2013;<lpage>167</lpage>. <pub-id pub-id-type="doi">10.1107/s1600576715022931</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Scarselli</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Gori</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tsoi</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Hagenbuchner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Monfardini</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>The Graph Neural Network Model</article-title>. <source>IEEE Trans. Neural Netw.</source> <volume>20</volume>, <fpage>61</fpage>&#x2013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1109/TNN.2008.2005605</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schneidman&#x2010;Duhovny</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Inbar</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Nussinov</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wolfson</surname>
<given-names>H. J.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Geometry&#x2010;based Flexible and Symmetric Protein Docking</article-title>. <source>Proteins: Struct. Funct. Bioinformatics</source> <volume>60</volume>, <fpage>224</fpage>&#x2013;<lpage>231</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gki481</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>J.&#x20;S.</given-names>
</name>
<name>
<surname>Isayev</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Roitberg</surname>
<given-names>A. E.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>ANI-1: an Extensible Neural Network Potential with DFT Accuracy at Force Field Computational Cost</article-title>. <source>Chem. Sci.</source> <volume>8</volume>, <fpage>3192</fpage>&#x2013;<lpage>3203</lpage>. <pub-id pub-id-type="doi">10.1039/c6sc05720a</pub-id> </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Srivastava</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Krizhevsky</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Salakhutdinov</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Dropout: a Simple Way to Prevent Neural Networks from Overfitting</article-title>. <source>J.&#x20;Machine Learn. Res.</source> <volume>15</volume>, <fpage>1929</fpage>&#x2013;<lpage>1958</lpage>. <pub-id pub-id-type="doi">10.5555/2627435.2670313</pub-id> </citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Torng</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>R. B.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Graph Convolutional Neural Networks for Predicting Drug-Target Interactions</article-title>. <source>J.&#x20;Chem. Inf. Model.</source> <volume>59</volume>, <fpage>4131</fpage>&#x2013;<lpage>4149</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.9b00628</pub-id> </citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tovchigrechko</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Vakser</surname>
<given-names>I. A.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Development and Testing of an Automated Approach to Protein Docking</article-title>. <source>Proteins</source> <volume>60</volume>, <fpage>296</fpage>&#x2013;<lpage>301</lpage>. <pub-id pub-id-type="doi">10.1002/prot.20573</pub-id> </citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tuncbag</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Gursoy</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Nussinov</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Keskin</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Predicting Protein-Protein Interactions on a Proteome Scale by Matching Evolutionary and Structural Similarities at Interfaces Using PRISM</article-title>. <source>Nat. Protoc.</source> <volume>6</volume>, <fpage>1341</fpage>&#x2013;<lpage>1354</lpage>. <pub-id pub-id-type="doi">10.1038/nprot.2011.367</pub-id> </citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Zundert</surname>
<given-names>G. C. P.</given-names>
</name>
<name>
<surname>Melquiond</surname>
<given-names>A. S. J.</given-names>
</name>
<name>
<surname>Bonvin</surname>
<given-names>A. M. J.&#x20;J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Integrative Modeling of Biomolecular Complexes: HADDOCKing with Cryo-Electron Microscopy Data</article-title>. <source>Structure</source> <volume>23</volume>, <fpage>949</fpage>&#x2013;<lpage>960</lpage>. <pub-id pub-id-type="doi">10.1016/j.str.2015.03.014</pub-id> </citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Venkatraman</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y. D.</given-names>
</name>
<name>
<surname>Sael</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Protein-protein Docking Using Region-Based 3D Zernike Descriptors</article-title>. <source>BMC Bioinformatics</source> <volume>10</volume>, <fpage>407</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-10-407</pub-id> </citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vreven</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Hwang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Integrating Atom-Based and Residue-Based Scoring Functions for Protein-Protein Docking</article-title>. <source>Protein Sci.</source> <volume>20</volume>, <fpage>1576</fpage>&#x2013;<lpage>1586</lpage>. <pub-id pub-id-type="doi">10.1002/pro.687</pub-id> </citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Terashi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Christoffer</surname>
<given-names>C. W.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Protein Docking Model Evaluation by 3D Deep Convolutional Neural Networks</article-title>. <source>Bioinformatics</source> <volume>36</volume>, <fpage>2113</fpage>&#x2013;<lpage>2118</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz870</pub-id> </citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Terashi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Christoffer</surname>
<given-names>C. W.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Protein Docking Model Evaluation by 3D Deep Convolutional Neural Networks</article-title>. <source>Bioinformatics</source> <volume>36</volume>, <fpage>2113</fpage>&#x2013;<lpage>2118</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz870</pub-id> </citation>
</ref>
<ref id="B61">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Long</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>P. S.</given-names>
</name>
</person-group> (<year>2020</year>). <source>A Comprehensive Survey on Graph Neural Networks</source>. <publisher-name>IEEE Transactions on Neural Networks and Learning Systems</publisher-name>.</citation>
</ref>
<ref id="B63">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>King</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Yeung</surname>
<given-names>D.-y</given-names>
</name>
</person-group> (<year>2018</year>). <conf-name>34th Conference on Uncertainty in Artificial Intelligence 2018</conf-name>, <conf-loc>Monterey, CA</conf-loc>. <publisher-loc>Arlington, VA</publisher-loc>: <publisher-name>AUAI Press</publisher-name>.</citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Skolnick</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Scoring Function for Automated Assessment of Protein Structure Template Quality</article-title>. <source>Proteins</source> <volume>57</volume>, <fpage>702</fpage>&#x2013;<lpage>710</lpage>. <pub-id pub-id-type="doi">10.1002/prot.20264</pub-id> </citation>
</ref>
<ref id="B64">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Skolnick</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>GOAP: a Generalized Orientation-dependent, All-Atom Statistical Potential for Protein Structure Prediction</article-title>. <source>Biophysical J.</source> <volume>101</volume>, <fpage>2043</fpage>&#x2013;<lpage>2052</lpage>. <pub-id pub-id-type="doi">10.1016/j.bpj.2011.09.012</pub-id> </citation>
</ref>
<ref id="B65">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zubatyuk</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>J.&#x20;S.</given-names>
</name>
<name>
<surname>Leszczynski</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Isayev</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-In-Molecules Neural Network</article-title>. <source>Sci. Adv.</source> <volume>5</volume>, <fpage>eaav6490</fpage>. <pub-id pub-id-type="doi">10.1126/sciadv.aav6490</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>