<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Immunol.</journal-id>
<journal-title>Frontiers in Immunology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Immunol.</abbrev-journal-title>
<issn pub-type="epub">1664-3224</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fimmu.2023.1108303</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Immunology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>PepSim: T-cell cross-reactivity prediction via comparison of peptide sequence and peptide-HLA structure</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Hall-Swan</surname>
<given-names>Sarah</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1885455"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Slone</surname>
<given-names>Jared</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rigo</surname>
<given-names>Mauricio M.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1307290"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Antunes</surname>
<given-names>Dinler A.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/420724"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Liz&#xe9;e</surname>
<given-names>Gregory</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/473525"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Kavraki</surname>
<given-names>Lydia E.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/136640"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Department of Computer Science, Rice University</institution>, <addr-line>Houston, TX</addr-line>, <country>United States</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Department of Biology and Biochemistry, University of Houston</institution>, <addr-line>Houston, TX</addr-line>, <country>United States</country>
</aff>
<aff id="aff3">
<sup>3</sup>
<institution>Department of Melanoma Medical Oncology, University of Texas MD Anderson Cancer Center</institution>, <addr-line>Houston, TX</addr-line>, <country>United States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Lawrence J. Stern, University of Massachusetts Medical School, United States</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Gustavo Fioravanti Vieira, Universidade La Salle Canoas, Brazil; Thomas Schmitt, Fred Hutchinson Cancer Research Center, United States</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Lydia E. Kavraki, <email xlink:href="mailto:kavraki@rice.edu">kavraki@rice.edu</email>
</p>
</fn>
<fn fn-type="other" id="fn002">
<p>This article was submitted to T Cell Biology, a section of the journal Frontiers in Immunology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>28</day>
<month>04</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1108303</elocation-id>
<history>
<date date-type="received">
<day>25</day>
<month>11</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>04</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Hall-Swan, Slone, Rigo, Antunes, Liz&#xe9;e and Kavraki</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Hall-Swan, Slone, Rigo, Antunes, Liz&#xe9;e and Kavraki</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>Peptide-HLA class I (pHLA) complexes on the surface of tumor cells can be targeted by cytotoxic T-cells to eliminate tumors, and this is one of the bases for T-cell-based immunotherapies. However, there exist cases where therapeutic T-cells directed towards tumor pHLA complexes may also recognize pHLAs from healthy normal cells. The process where the same T-cell clone recognizes more than one pHLA is referred to as T-cell cross-reactivity and this process is driven mainly by features that make pHLAs similar to each other. T-cell cross-reactivity prediction is critical for designing T-cell-based cancer immunotherapies that are both effective and safe.</p>
</sec>
<sec>
<title>Methods</title>
<p>Here we present PepSim, a novel score to predict T-cell cross-reactivity based on the structural and biochemical similarity of pHLAs.</p>
</sec>
<sec>
<title>Results and discussion</title>
<p>We show our method can accurately separate cross-reactive from non-crossreactive pHLAs in a diverse set of datasets including cancer, viral, and self-peptides. PepSim can be generalized to work on any dataset of class I peptide-HLAs and is freely available as a web server at pepsim.kavrakilab.org.</p>
</sec>
</abstract>
<kwd-group>
<kwd>T-cell cross-reactivity</kwd>
<kwd>peptide-HLA</kwd>
<kwd>immunotherapy</kwd>
<kwd>structure comparison</kwd>
<kwd>sequence similarity</kwd>
</kwd-group>
<contract-sponsor id="cn001">U.S. National Library of Medicine<named-content content-type="fundref-id">10.13039/100000092</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">National Institutes of Health<named-content content-type="fundref-id">10.13039/100000002</named-content>
</contract-sponsor>
<counts>
<fig-count count="6"/>
<table-count count="3"/>
<equation-count count="2"/>
<ref-count count="39"/>
<page-count count="13"/>
<word-count count="6571"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>T Cell Biology</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<title>Introduction</title>
<p>The cellular immune response is a vital part of our defense mechanism against various diseases, including viral infection and cancer. As part of this immune response, cytotoxic T-cells are specialized to defend against specific diseases by pinpointing and eliminating infected cells. This occurs <italic>via</italic> interaction of the T-cell receptor (TCR) with the peptide-human leukocyte antigen class I (pHLA) complex on the surface of the target cells (<xref ref-type="bibr" rid="B1">1</xref>). The pHLA is formed when HLA receptors bind to peptides inside the cell and display them on the cell surface. TCRs are specialized to recognize pHLAs when the peptide that is being presented is not normally produced by the cell, which means that the T-cell can respond against non-self peptides.</p>
<p>An intrinsic feature of T-cells is cross-reactivity, which refers to the natural ability of a TCR to recognize more than one pHLA (<xref ref-type="bibr" rid="B2">2</xref>). In the context of a viral infection, cross-reactivity allows for a broader response of a single T-cell against multiple viral targets (e.g., variants of the same virus or related viruses) (<xref ref-type="bibr" rid="B3">3</xref>). However, the broader specificity caused by cross-reactivity can be the source of dangerous off-target toxicity in the context of cancer immunotherapy (e.g., T-cell-based immunotherapy) (<xref ref-type="bibr" rid="B4">4</xref>). One method of immunotherapy is adoptive T-cell transfer, where a large number of tumor-specific T-cells are delivered to the patient to amplify the immune response against the tumor. Because of cross-reactivity, there exist cases where therapeutic T-cells directed towards specific tumor pHLA complexes may also recognize self-peptide-HLAs, causing autoimmune side effects (<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B6">6</xref>). Therefore, preventing T-cell cross-reactivity in these cases is critical for designing T-cell-based cancer immunotherapies that are both effective and safe.</p>
<p>T-cell cross-reactivity is driven by the similarity between pHLAs, but the nature of that similarity is not fully defined. Previous studies have shown that peptide sequence similarity is not sufficient in all cases to predict T-cell cross-reactivity and highlighted the importance of pHLA structure and biochemical properties such as electrostatic potential (<xref ref-type="bibr" rid="B7">7</xref>&#x2013;<xref ref-type="bibr" rid="B9">9</xref>). Previous computational works on T-cell cross-reactivity define similarity in different ways. For example, one method defines peptide sequence similarity as the number of identical amino acids at each position in the peptide (<xref ref-type="bibr" rid="B10">10</xref>). Because each position in a peptide is not equally important to TCR-pHLA binding, the authors also examined the experimentally determined structure of TCR-pHLA complexes available in the Protein Data Bank (<xref ref-type="bibr" rid="B11">11</xref>) to determine the positions of the peptide that are in contact with the TCR. Those positions that are in contact with the TCR are deemed &#x201c;important&#x201d; and therefore considered in the calculation of sequence similarity. A similar method is employed by JanusMatrix (<xref ref-type="bibr" rid="B12">12</xref>), a part of EpiVax&#x2019;s proprietary immunogenicity screening kit. Focusing on what they define as the &#x201c;TCR facing residues&#x201d;, the authors define the similarity between peptides as identical amino acids. Additional methods of predicting T-cell cross-reactivity include RACER, a method of predicting TCR-pHLA binding affinity using supervised machine learning techniques (<xref ref-type="bibr" rid="B13">13</xref>). Also, iCrossR and Expitope use transcript and tissue abundance levels of peptide sequences to predict the likelihood of off-target toxicity (<xref ref-type="bibr" rid="B14">14</xref>, <xref ref-type="bibr" rid="B15">15</xref>). Expitope 2.0 is available as a web server. Finally, a method developed by Antunes et&#xa0;al. (<xref ref-type="bibr" rid="B7">7</xref>) and later optimized by Mendes et&#xa0;al. (<xref ref-type="bibr" rid="B8">8</xref>) implicitly accounted for both structural information and biochemical features through the analysis of 2D images of the TCR-interacting surfaces of pHLAs (<xref ref-type="bibr" rid="B7">7</xref>, <xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>In this paper, we present PepSim, a novel computational method for calculating the similarity between pHLAs to predict T-cell cross-reactivity. Our method calculates a similarity score based on peptide sequence and 3D structural information. We focus the structural analysis on the region of the pHLA that interacts with the TCR, specifically analyzing the pHLA surface. We show that our score can differentiate between cross-reactive and non-cross-reactive pHLAs with high accuracy using five datasets of viral, cancer, or self-peptides. Each dataset includes peptides that were experimentally determined to be recognized by the same TCR.</p>
<p>We define a novel similarity score that is calculated between pHLAs. The input is a list of peptides and the structures of those peptides bound to the HLA. These structures can be crystal structures (i.e., from the Protein Data Bank (<xref ref-type="bibr" rid="B11">11</xref>)) or generated by modeling programs such as APE-Gen (<xref ref-type="bibr" rid="B16">16</xref>) or DockTope (<xref ref-type="bibr" rid="B17">17</xref>). We calculate the sequence similarity between peptides as well as the structural and biochemical similarity between pHLAs, once the peptide has been docked on the HLA. The output is a 2D matrix where element <inline-formula>
<mml:math display="inline" id="im1">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the similarity score between peptides (and corresponding pHLAs) <inline-formula>
<mml:math display="inline" id="im2">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im3">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula>. A low score indicates higher similarity than a high score. The matrix can then be used to cluster the peptides. Peptides (and pHLAs) that are clustered together based on the similarity score are considered more likely to be cross-reactive. PepSim is available as a web server at pepsim.kavrakilab.org.</p>
</sec>
<sec id="s2">
<title>Methods</title>
<sec id="s2_1">
<title>Sequence similarity</title>
<p>We calculate the sequence similarity between each pair of input peptides in three ways. The input for each method is the list of peptide sequences, and the output is a 2D matrix containing the pairwise similarity scores. The first sequence similarity score is calculated using a BLOSUM matrix, where each pair of amino acids is assigned an integer value based on the relative frequencies of amino acids (<xref ref-type="bibr" rid="B18">18</xref>). The BLOSUM62 matrix values are calculated based on amino acid sequence alignments with less than 62% identity. In our method, the similarity between two peptides is defined as the sum of the BLOSUM62 values for the amino acid pair at each position of the peptides, which is a common method of calculating sequence similarity. Secondly, we calculated the pairwise similarity between peptides using the similarity matrix calculated by HLAthena (<xref ref-type="bibr" rid="B19">19</xref>), which calculates the entropy at each peptide position based on the entire dataset and uses the entropy to weight the importance of each peptide position when calculating similarity based on the PMBEC similarity matrix. Lastly, we calculated the Hamming distance between two peptides, defined as the number of amino acid positions that differ between the two peptides (i.e., AAAA and AAAB have a Hamming distance of 1 because only position 4 differs). The combination of these three similarity metrics was empirically observed to give the best results.</p>
</sec>
<sec id="s2_2">
<title>Structural and biochemical similarity</title>
<p>The pHLA pairwise similarity is calculated starting from a dataset of pHLA structures. The output is another 2D matrix of pairwise similarity scores. Depending on the source of the structures, their reference frame may be different, so we first align the structures using the align function of PyMOL (<xref ref-type="bibr" rid="B20">20</xref>). Then, we extract the solvent-accessible surface value from each structure using the program MSMS with a density of 3.0 and a probe size of 1.5&#xc5; (<xref ref-type="bibr" rid="B21">21</xref>). MSMS computes the surfaces as a triangular mesh, which we then downsample to a resolution of 1.0&#xc5;using pymesh (<xref ref-type="bibr" rid="B22">22</xref>). We then annotate the vertices of this mesh with biochemical features, specifically the electrostatic potential, hydrophobicity, and hydrogen bond potential. The electrostatic potential of the surface is calculated using APBS (<xref ref-type="bibr" rid="B23">23</xref>). The hydrophobicity of each amino acid in the pHLA is assigned based on the Kyte-Doolittle scale (<xref ref-type="bibr" rid="B24">24</xref>), and assigned to each surface point based on the closest amino acid. Finally, the hydrogen bond potential at each point is calculated based on the free hydrogens of the closest amino acid residues. We calculate the hydrogen bond potential using the data preparation method of MaSIF (<xref ref-type="bibr" rid="B25">25</xref>), based on an orientation-dependent hydrogen bonding potential (<xref ref-type="bibr" rid="B26">26</xref>). In brief, the hydrogen bond potential at a vertex is calculated based on the vertex&#x2019;s distance and angle from potential hydrogen donors (polar hydrogens) and potential acceptors (nitrogen or oxygen). The potential ranges between -1 (hydrogen bond acceptor) and +1 (hydrogen bond donor).</p>
<p>To account for the T-cell only interacting with a specific part of the pHLA complex, we define the TCR-interacting region as a round patch centered on the peptide bound to the HLA. The center of the patch is calculated by finding the closest surface point to the peptide&#x2019;s center of mass. A circular patch is extracted by selecting the vertices in the neighborhood of the center point up to 16 edges away from the center, as defined by the triangular mesh. The surface patch mesh is converted to a point cloud where each point is a vertex from the mesh, and each point is annotated with the biochemical features.</p>
<p>To calculate the similarity between pHLAs, we first perform a pairwise alignment of the point clouds using the Iterative Closest Point (ICP) algorithm (<xref ref-type="bibr" rid="B27">27</xref>). This alignment uses only geometric information and does not take into account the annotated biochemical features. The ICP is an iterative procedure that aligns a source point cloud <inline-formula>
<mml:math display="inline" id="im4">
<mml:mi>S</mml:mi>
</mml:math>
</inline-formula> to a target cloud <inline-formula>
<mml:math display="inline" id="im5">
<mml:mi>T</mml:mi>
</mml:math>
</inline-formula> in 3D space by iterating over three main steps. The first is to create a corresponding point set <inline-formula>
<mml:math display="inline" id="im6">
<mml:mi>C</mml:mi>
</mml:math>
</inline-formula> by matching points in the source point cloud to points on the target point cloud within a distance of <inline-formula>
<mml:math display="inline" id="im7">
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> &#xc5;. By taking points within distance <inline-formula>
<mml:math display="inline" id="im8">
<mml:mi>&#x3f5;</mml:mi>
</mml:math>
</inline-formula> of each other, we account for the fact that there may be an unequal number of points in point clouds <inline-formula>
<mml:math display="inline" id="im9">
<mml:mi>S</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im10">
<mml:mi>T</mml:mi>
</mml:math>
</inline-formula>. The <inline-formula>
<mml:math display="inline" id="im11">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> ball approach is a standard procedure in ICP (<xref ref-type="bibr" rid="B27">27</xref>). The second step is to calculate the rotation and translation that will best minimize the distance <inline-formula>
<mml:math display="inline" id="im12">
<mml:mi>D</mml:mi>
</mml:math>
</inline-formula> between each corresponding point pair (i.e., to find the best transformation to align each source point to its corresponding target point). For a pair of point clouds <inline-formula>
<mml:math display="inline" id="im13">
<mml:mi>S</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im14">
<mml:mi>T</mml:mi>
</mml:math>
</inline-formula> and the set of corresponding points <inline-formula>
<mml:math display="inline" id="im15">
<mml:mi>C</mml:mi>
</mml:math>
</inline-formula>, the distance <inline-formula>
<mml:math display="inline" id="im16">
<mml:mi>D</mml:mi>
</mml:math>
</inline-formula> is defined as</p>
<disp-formula>
<mml:math display="block" id="M1">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mo>&#x2016;</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mo>&#x2016;</mml:mo>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Where <inline-formula>
<mml:math display="inline" id="im17">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the vector of <inline-formula>
<mml:math display="inline" id="im18">
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math display="inline" id="im19">
<mml:mi>y</mml:mi>
</mml:math>
</inline-formula>, and <inline-formula>
<mml:math display="inline" id="im20">
<mml:mi>z</mml:mi>
</mml:math>
</inline-formula> coordinates of point <inline-formula>
<mml:math display="inline" id="im21">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> of point cloud <inline-formula>
<mml:math display="inline" id="im22">
<mml:mi>S</mml:mi>
</mml:math>
</inline-formula>. The third step of ICP is to transform the source points using the rotation and translation found in the previous step. These three steps are repeated, recreating the corresponding point set for each iteration until convergence: when the change in <inline-formula>
<mml:math display="inline" id="im23">
<mml:mi>D</mml:mi>
</mml:math>
</inline-formula> is less than <inline-formula>
<mml:math display="inline" id="im24">
<mml:mrow>
<mml:mn>1.0</mml:mn>
<mml:mi>e</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>06</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, or until 30 iterations have been performed. ICP results in a final distance <inline-formula>
<mml:math display="inline" id="im25">
<mml:mi>D</mml:mi>
</mml:math>
</inline-formula> and the indices of the corresponding points for each point cloud.</p>
<p>We define a new score function to calculate the similarity between aligned point clouds using the geometric coordinates and the biochemical features. We expand the ICP distance <inline-formula>
<mml:math display="inline" id="im26">
<mml:mi>D</mml:mi>
</mml:math>
</inline-formula> so that each vertex in the point clouds has six dimensions, to include not only the geometric coordinates <inline-formula>
<mml:math display="inline" id="im27">
<mml:mi>x</mml:mi>
</mml:math>
</inline-formula>, <inline-formula>
<mml:math display="inline" id="im28">
<mml:mi>y</mml:mi>
</mml:math>
</inline-formula>, and <inline-formula>
<mml:math display="inline" id="im29">
<mml:mi>z</mml:mi>
</mml:math>
</inline-formula>, but also the biochemical features electrostatic potential, hydrogen bond potential, and hydrophobicity. We also account for the size of the corresponding point set, as a low number of corresponding points indicates that the source and target point clouds are not well aligned. Our distance score <inline-formula>
<mml:math display="inline" id="im30">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is defined as</p>
<disp-formula>
<mml:math display="block" id="M2">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2016;</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mo>&#x2016;</mml:mo>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>This score is calculated between each pair of point clouds, resulting in a 2D matrix where is element <inline-formula>
<mml:math display="inline" id="im31">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the score between point clouds <inline-formula>
<mml:math display="inline" id="im32">
<mml:mi>s</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im33">
<mml:mi>t</mml:mi>
</mml:math>
</inline-formula>.</p>
</sec>
<sec id="s2_3">
<title>Combining similarity scores</title>
<p>Each similarity calculation method described above (i.e., BLOSUM62, HLAthena, Hamming distance, structural and biochemical) produces a 2D matrix, where element <inline-formula>
<mml:math display="inline" id="im34">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the similarity between peptides/pHLAs <inline-formula>
<mml:math display="inline" id="im35">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im36">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula>. Each matrix is normalized by subtracting the mean of the matrix and dividing it by the standard deviation. Then, all matrices are summed element-wise, producing a 2D matrix. This matrix provides the similarity score between each pair of pHLAs.</p>
</sec>
<sec id="s2_4">
<title>Clustering</title>
<p>The pairwise similarity scores produced by our method can be used to cluster the peptides. To validate the method, we performed agglomerative clustering using scikit-learn (<xref ref-type="bibr" rid="B28">28</xref>). We performed clustering with ward linkage or average linkage. We used agglomerative clustering to create 2 clusters (with parameters n_clusters=2 and distance_threshold=None) to represent the two possible clusters &#x201c;cross-reactive peptides&#x201d; and &#x201c;non-cross-reactive peptides&#x201d;. We also ran agglomerative clustering to create any number of clusters (with n_clusters=None and distance_threshold=0) and used the result to build a dendrogram to visualize the distance between the different peptides.</p>
<p>We also validated the method using K-nearest-neighbors (KNN) clustering (<xref ref-type="bibr" rid="B29">29</xref>). KNN is supervised clustering, meaning the true label of each peptide is known, except the peptide we are attempting to label based on its nearest neighbors. We used k values between 1 and 8 and used KNN to label each peptide based on all the other peptides in the dataset.</p>
</sec>
<sec id="s2_5">
<title>Datasets</title>
<p>We tested our method on five datasets, as explained below. The full list of peptides in each dataset is provided in the Supplementary Data.</p>
<sec id="s2_5_1">
<title>Dataset 1</title>
<p>Melanoma-associated antigen 3 (MAGE-A3) is an antigen expressed in multiple tumor types, and the MAGE-A3<sub>168&#x2013;176</sub> peptide is recognized by a specific T-cell clone. Gee et&#xa0;al. discovered 60 additional peptides that are recognized by the same T-cell clone (<xref ref-type="bibr" rid="B30">30</xref>). We use these 60 peptides in addition to MAGEA3 as a positive control for cross-reactivity, for a total of 61 peptides. Negative controls were obtained by searching IEDB for peptides that bind to the same HLA allele (HLA-A*01) (<xref ref-type="bibr" rid="B31">31</xref>). 60 peptides were chosen at random, and 59 were chosen for being similar in sequence to the 61 positive controls (i.e., fitting the pattern (EDK) (AGVLIMPFYW) (ED) (PWHST) (MYLK) (DEGN) (AGPVLIMF) (MYFL) (FYL)). The peptide-HLA structures for all 180 peptides were modeled in their docked position to HLA-A*01 using the peptide-HLA modeling tool APE-Gen (<xref ref-type="bibr" rid="B16">16</xref>).</p>
</sec>
<sec id="s2_5_2">
<title>Dataset 2</title>
<p>The second dataset was obtained from a previous study on T-cell cross-reactivity in HCV peptides (<xref ref-type="bibr" rid="B32">32</xref>). This dataset contains 28 peptides, each labeled with a T-cell response level (11 high response, 3 intermediate response, 13 low response, and 1 no response). The pHLA structures were obtained from CrossTope, a curated database of pHLA structures modeled using DockTope (<xref ref-type="bibr" rid="B33">33</xref>).</p>
</sec>
<sec id="s2_5_3">
<title>Dataset 3</title>
<p>The third dataset is an expansion of the second dataset, containing the 28 HCV peptides and 45 additional peptides (<xref ref-type="bibr" rid="B7">7</xref>). The pHLA structures were obtained from CrossTope (<xref ref-type="bibr" rid="B33">33</xref>).</p>
</sec>
<sec id="s2_5_4">
<title>Dataset 4</title>
<p>The fourth dataset contains 8 Dengue viral peptides, four of which are recognized by the same T-cell, and 4 of which are not (<xref ref-type="bibr" rid="B34">34</xref>). The pHLA structures were obtained from CrossTop (<xref ref-type="bibr" rid="B33">33</xref>).</p>
</sec>
<sec id="s2_5_5">
<title>Dataset 5</title>
<p>The fifth dataset contains 11 peptides, including the cross-reactive pair of peptides HEV-1527 and MYH9-478 and 9 negative controls (<xref ref-type="bibr" rid="B35">35</xref>). The pHLA structures were obtained from CrossTope (<xref ref-type="bibr" rid="B33">33</xref>).</p>
</sec>
</sec>
<sec id="s2_6">
<title>PepSim web server</title>
<p>The PepSim scoring method is available through a web server interface (see <xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1</bold>
</xref>). Users may upload a dataset in PDB format, including the peptide that they wish to target. The target peptide will be used as the reference for the final ranking of peptides based on similarity. After submission and execution, users can visualize the peptides in a dendrogram based on agglomerative clustering of the peptides based on the similarity scores. The peptides are also visualized in a 2D scatter plot created by using non-metric multidimensional scaling (NMDS) (<xref ref-type="bibr" rid="B36">36</xref>). The users also receive a ranked list of the peptides based on their similarity to the given target. If they want to perform offline analysis, including clustering, users can download all the results, which include the 2D array of pairwise similarity scores.</p>
<fig id="f1" position="float">
<label>Figure&#xa0;1</label>
<caption>
<p>PepSim web server interface. <bold>(A)</bold> The home page allows users to (1) name the peptide they wish to target, either using the amino acid sequence or the file name, (2) give their email address, and (3) upload the PDB files of the peptide-HLAs they with to compare, including the target. A link is sent to the given email address for users to review the results. <bold>(B)</bold> On the results page, users can download the results, view the input peptides in ranked order of similarity to the target, and view interactive plots of either a dendrogram or scatter plot.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fimmu-14-1108303-g001.tif"/>
</fig>
<p>The PepSim web server is implemented using Docker (<xref ref-type="bibr" rid="B37">37</xref>), with the backend implemented in Django (<xref ref-type="bibr" rid="B38">38</xref>). The submitted jobs are managed by a distributed task queue by Celery (<xref ref-type="bibr" rid="B39">39</xref>). The webserver is currently hosted on a virtual machine in the Owl Research Infrastructure Open Nebula (ORION) VM Pool on Rice University Campus.</p>
</sec>
</sec>
<sec id="s3" sec-type="results">
<title>Results</title>
<sec id="s3_1">
<title>Accurate separation of cross-reactive from non-cross-reactive peptides</title>
<p>To test the accuracy of our similarity score, we performed the same pipeline on five datasets. On dataset 1, containing 61 peptides that are recognized by the same T-cell and 119 negative decoys, we performed agglomerative clustering with the similarity scores to create two clusters. The resulting clustering had a sensitivity of 98.36% and a specificity of 96.64%. A visualization of the clustering results, as well as the true clustering, can be seen in <xref ref-type="fig" rid="f2">
<bold>Figures&#xa0;2A, B</bold>
</xref>. This clustering produced 1 false negative (ASDPMNHYY), and 4 peptides that have not been experimentally determined to produce a T-cell response (ELDPTNMTY, DSDPTGTAY, ELDPDNETY, ELDPNNAVY).</p>
<fig id="f2" position="float">
<label>Figure&#xa0;2</label>
<caption>
<p>2D embedding of peptides in dataset 1 based on pairwise similarity scores, colored based on <bold>(A, C, E)</bold> true cluster labels and <bold>(B, D, F)</bold> agglomerative clustering results shows the accuracy of PepSim similarity scores. <bold>(A, B)</bold> use the full PepSim similarity score. In <bold>(A)</bold> peptides that are recognized by the same T-cell are red, and the negative decoys are blue. In <bold>(B)</bold>, the yellow cluster corresponds to the cross-reactive peptides. These peptides are clustered using Ward linkage, and the clustering has a sensitivity of 98.36% (1 false negative) and a specificity of 96.64%. <bold>(C, D)</bold> use the sequence analysis similarity score. In <bold>(D)</bold> the peptides are clustered using Ward linkage, and the clustering has a sensitivity of 96.72% (2 false negatives) and a specificity of 97.48% (3 false positives). <bold>(E, F)</bold> use the structural and biochemical similarity scores. In <bold>(F)</bold> the peptides are clustered using Ward linkage and the clustering has a sensitivity of 100% (0 false negatives) and a specificity of 44% (67 false positives). The 2D embedding is created using NMDS (<xref ref-type="bibr" rid="B36">36</xref>).</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fimmu-14-1108303-g002.tif"/>
</fig>
<p>We also performed agglomerative clustering on dataset 3, containing 28 HCV peptides and 45 other viral peptides. One of the HCV peptides causes no T-cell response and the other 27 cause some level of T-cell response. As seen in <xref ref-type="fig" rid="f3">
<bold>Figures&#xa0;3A, B</bold>
</xref>, all 28 HCV peptides are clustered in the same cluster, and the decoys are in the other cluster. Given that all but one HCV peptide trigger a T-cell response, this clustering produces only one false positive (G3-18). This exemplifies that our similarity score can differentiate between peptides from different origins, as all the decoys are different viral peptides.</p>
<fig id="f3" position="float">
<label>Figure&#xa0;3</label>
<caption>
<p>2D embedding of peptides in dataset 3 based on pairwise similarity scores, colored based on <bold>(A, C, E)</bold> true cluster labels and <bold>(B, D, F)</bold> agglomerative clustering result shows the accuracy of PepSim similarity scores. <bold>(A, B)</bold> use the full PepSim similarity score. In <bold>(A)</bold> peptides that are recognized by the same T-cell are red, and the negative decoys are blue. In <bold>(B)</bold>, the yellow cluster corresponds to the cross-reactive peptides. These peptides are clustered using Average linkage, and the clustering has a sensitivity of 100% (0 false negatives) and a specificity of 97.78% (1 false positive). The false positive is a pHLA that was experimentally determined to have no T cell response. <bold>(C, D)</bold> use the sequence analysis similarity score. In <bold>(D)</bold> the clustering also has a sensitivity of 100% (0 false negatives) and a specificity of 97.78% (1 false positive). <bold>(E, F)</bold> use the structural and biochemical similarity scores. In <bold>(F)</bold> the peptides are clustered using Ward linkage, and the clustering has a sensitivity of 100% (0 false negatives) and a specificity of 95.56% (2 false positives). The 2D embedding is created using NMDS (<xref ref-type="bibr" rid="B36">36</xref>).</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fimmu-14-1108303-g003.tif"/>
</fig>
<p>We performed a similar experiment on datasets 2, 4, and 5. In these cases, we did not specify the number of clusters and produced dendrograms for a full visual of the clusters. Dataset 2 contains the same 28 HCV-derived peptides as dataset 3. We see in <xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4A</bold>
</xref> that the final clustering separates the peptides with a high or intermediate T-cell response from most of the peptides with low or no response. The peptides with high or intermediate responses are split into different clusters, but within their separate clusters, they are clustered together. The dendrogram defines two clusters, but if we split the larger cluster based on the dendrogram branches within it, we completely separate the set of high-response peptides from the low and no response groups. The other set of high and intermediate response peptides are in the other cluster in the dendrogram, with one pHLA that produces a low T-cell response (G1-03).</p>
<fig id="f4" position="float">
<label>Figure&#xa0;4</label>
<caption>
<p>Dendrogram of peptides in dataset 2 based on agglomerative clustering of pairwise similarity scores shows the moderate success of PepSim to differentiate between cross-reactive and non-cross-reactive pHLAs. The peptides that cause a high or intermediate response from the T-cell are labeled in green and clustered into two separate clusters. The peptides that cause a low or no response from the T-cell are in black. In <bold>(A)</bold>, the clustering is based on the entire PepSim similarity score. In <bold>(B)</bold> the clustering is using the sequence similarity scores. In <bold>(C)</bold> the clustering is using the structural and biochemical similarity scores. The best clustering is achieved with the entire PepSim similarity score.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fimmu-14-1108303-g004.tif"/>
</fig>
<p>Datasets 4 and 5 are smaller datasets with a low number of cross-reactive peptides. Dataset 4 shows a complete separation of cross-reactive and non-cross-reactive peptides, as seen in <xref ref-type="fig" rid="f5">
<bold>Figure&#xa0;5</bold>
</xref>. The cross-reactive peptides are very similar in sequence. Dataset 5 contains two cross-reactive peptides: HEV-1527 and MYH9-478. As seen in <xref ref-type="fig" rid="f6">
<bold>Figure&#xa0;6</bold>
</xref>, HEV-1527 and MYH9-478 are not separated into a different cluster from the negative decoys, but they are on the same branch of the dendrogram. If you select the target peptide HEV-1527, then the closest peptide is MYH9-478, and vice versa. Given that this method would start with a target peptide, then we can accurately identify the cross-reactive peptide from this dataset.</p>
<fig id="f5" position="float">
<label>Figure&#xa0;5</label>
<caption>
<p>Dendrogram of peptides in dataset 4 based on agglomerative clustering of pairwise similarity scores shows PepSim&#x2019;s ability to separate cross-reactive from non-cross-reactive pHLAs. The cross-reactive peptides are colored green. Peptides are evenly split into two clusters with all cross-reactive peptides in one cluster and all non-cross-reactive peptides in the other. These peptides are clustered using Ward linkage, and Average linkage produces the same clusters.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fimmu-14-1108303-g005.tif"/>
</fig>
<fig id="f6" position="float">
<label>Figure&#xa0;6</label>
<caption>
<p>Dendrogram of peptides in dataset 5 based on agglomerative clustering of pairwise similarity scores shows PepSim&#x2019;s ability to separate cross-reactive from non-cross-reactive pHLAs. The peptides that are recognized by the same T-cell (HEV-1257 and MYH9-478) are labeled in green and are in the same branch of the dendrogram. If you select the target peptide HEV-1527, then the closest peptide is MYH9-478, and vice versa.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fimmu-14-1108303-g006.tif"/>
</fig>
</sec>
<sec id="s3_2">
<title>Comparison with predictions in the literature</title>
<p>In 2015, Mendes used multivariate statistical methods to perform structure-based prediction of T-cell cross-reactivity among the 28 viral peptides in dataset 2 (<xref ref-type="bibr" rid="B8">8</xref>). Using their method, they clustered the peptides into two distinct clusters. The peptides that trigger a high T-cell response are placed in one cluster and the peptides with a low response are in the other cluster. One of the peptides with an intermediate response in the cluster with the high response peptides, and the other two are with the low response peptides. In contrast, with our method, the peptides with a high response are split into two different clusters, and the peptides with an intermediate response are clustered with the peptides with a high response.</p>
<p>Dataset 3, being an expansion of dataset 2, was previously studied in 2011 by Antunes et&#xa0;al. (<xref ref-type="bibr" rid="B7">7</xref>) Only 10 of the HCV peptides were included in the dataset (A0201_0031, A0201_0051, A0201_0052, A0201_0053, A0201_0054, A0201_0055, A0201_0056, A0201_0057, and A0201_0058), along with 45 other peptides. All 10 of the HCV peptides were clustered together, along with five other peptides not derived from HCV (A0201_0014, A0201_0083, A0201_0076, A0201_0095, and A0201_0073). In contrast, our method results in all the HCV peptides clustered together with only one false positive, A0201_0033, which is an HCV peptide that causes no T-cell response. We can also look at our results on dataset 2, where peptides A0201_0051-58 are clustered together with no other clusters, and A0201_0031 is in the other cluster.</p>
</sec>
<sec id="s3_3">
<title>The effects of structural analysis</title>
<p>To examine the effectiveness of the structural and biochemical analysis, we repeated the experiments with two variations of the similarity score. We defined one variation of the score to use only sequence analysis, and the other variation to use only the structural and biochemical analysis.</p>
<p>In dataset 1, when we use only sequence analysis, the final agglomerative clustering (using Ward linkage) has a sensitivity of 96.72% (2 false negatives) and specificity of 97.48% (3 false positives), as seen in <xref ref-type="fig" rid="f2">
<bold>Figures&#xa0;2C, D</bold>
</xref>. Compared to the similarity with the structural and biochemical analysis included, there is one more false negative and one less false positive when we remove the structural and biochemical analysis.</p>
<p>When we use the similarity score calculated from only the structural and biochemical analysis, the sensitivity increases to 100%, but the specificity decreases to 44%, as seen in <xref ref-type="fig" rid="f2">
<bold>Figures&#xa0;2E, F</bold>
</xref>.</p>
<p>We also used K-nearest-neighbors to determine the effectiveness of the similarity score. As seen in <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref>, we achieved the highest sensitivity and specificity when we use all the components of the similarity score (sensitivity of 100% and specificity of 98.3%). We also achieve high accuracy with the variations of the similarity score, including using only the structure and biochemical analysis</p>
<table-wrap id="T1" position="float">
<label>Table&#xa0;1</label>
<caption>
<p>Accuracy of leave-one-out KNN cross-validation on Dataset 1.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" rowspan="2" align="center">k</th>
<th valign="top" colspan="2" align="center">Complete score</th>
<th valign="top" colspan="2" align="center">Only sequence</th>
<th valign="top" colspan="2" align="center">Only structure/biochemical</th>
</tr>
<tr>
<th valign="top" align="center">sensitivity</th>
<th valign="top" align="center">specificity</th>
<th valign="top" align="center">sensitivity</th>
<th valign="top" align="center">specificity</th>
<th valign="top" align="center">sensitivity</th>
<th valign="top" align="center">specificity</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="center">1</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">2</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.918</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">3</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.966</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.958</td>
<td valign="top" align="right">0.951</td>
<td valign="top" align="right">0.933</td>
</tr>
<tr>
<td valign="top" align="center">4</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.983</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.966</td>
<td valign="top" align="right">0.885</td>
<td valign="top" align="right">0.953</td>
</tr>
<tr>
<td valign="top" align="center">5</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.924</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.882</td>
<td valign="top" align="right">0.902</td>
<td valign="top" align="right">0.875</td>
</tr>
<tr>
<td valign="top" align="center">6</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.958</td>
<td valign="top" align="right">0.984</td>
<td valign="top" align="right">0.899</td>
<td valign="top" align="right">0.902</td>
<td valign="top" align="right">0.924</td>
</tr>
<tr>
<td valign="top" align="center">7</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.832</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.824</td>
<td valign="top" align="right">0.902</td>
<td valign="top" align="right">0.857</td>
</tr>
<tr>
<td valign="top" align="center">8</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.874</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.849</td>
<td valign="top" align="right">0.902</td>
<td valign="top" align="right">0.916</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In dataset 3, removing the structural and biochemical analysis from the similarity score does not change the final clustering when using Ward or Average linkage, as seen in <xref ref-type="fig" rid="f3">
<bold>Figures 3C, D</bold>
</xref>. Similarly, using only the structural and biochemical analysis results in one additional false positive when using Ward linkage (A0201_0014). <xref ref-type="fig" rid="f3">
<bold>Figures&#xa0;3E, F</bold>
</xref> shows the 2D embedding of the peptides based on the structural and biochemical similarity score and compared to <xref ref-type="fig" rid="f3">
<bold>Figures&#xa0;3A, B</bold>
</xref> the clusters are not as separated.</p>
<p>We also used K-nearest-neighbors to determine the effectiveness of the similarity score. As seen in <xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>, we achieved the highest sensitivity and specificity when we use all the components of the similarity score (sensitivity of 100% and specificity of 95.6%). We also achieve high accuracy with the variations of the similarity score, including using only the structure and biochemical analysis.</p>
<table-wrap id="T2" position="float">
<label>Table&#xa0;2</label>
<caption>
<p>Accuracy of leave-one-out KNN cross-validation on Dataset 3.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" rowspan="2" align="center">k</th>
<th valign="top" colspan="2" align="center">Complete score</th>
<th valign="top" colspan="2" align="center">Only sequence</th>
<th valign="top" colspan="2" align="center">Only structure/biochemical</th>
</tr>
<tr>
<th valign="top" align="center">sensitivity</th>
<th valign="top" align="center">specificity</th>
<th valign="top" align="center">sensitivity</th>
<th valign="top" align="center">specificity</th>
<th valign="top" align="center">sensitivity</th>
<th valign="top" align="center">specificity</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="center">1</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">2</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">3</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.933</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.911</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.822</td>
</tr>
<tr>
<td valign="top" align="center">4</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.956</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.933</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.911</td>
</tr>
<tr>
<td valign="top" align="center">5</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.867</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.867</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.778</td>
</tr>
<tr>
<td valign="top" align="center">6</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.867</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.889</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.867</td>
</tr>
<tr>
<td valign="top" align="center">7</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.844</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.867</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.800</td>
</tr>
<tr>
<td valign="top" align="center">8</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.844</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.889</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.867</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>justification=centering</p>
<p>Dataset 2 provides more interesting results. When the structural and biochemical analysis is removed from the similarity score, the dendrogram clustering shows that the high and intermediate responders are still separated into two different clusters, as seen in <xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4B</bold>
</xref>. One of the clusters has three false positives compared to the one false positive in <xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4A</bold>
</xref>.</p>
<p>When using only structural and biochemical analysis, there is one cluster of all peptides with a high or intermediate T-cell response, and the other peptides with a high response are separated from each other (<xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4C</bold>
</xref>). Therefore, the clusters are less accurate when only using the structural and biochemical analysis when compared to the complete score. However, the peptides derived from different viral genotypes are clustered together.</p>
<p>We also used K-nearest-neighbors to determine the effectiveness of the similarity score. As seen in <xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>, we achieved the highest sensitivity and specificity when we use all the components of the similarity score (sensitivity of 100% and specificity of 92.9%). We also achieve high accuracy with the variations of the similarity score, including using only the structure and biochemical analysis.</p>
<table-wrap id="T3" position="float">
<label>Table&#xa0;3</label>
<caption>
<p>Accuracy of leave-one-out KNN cross-validation on Dataset 2.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="middle" align="center" rowspan="2">k</th>
<th valign="top" colspan="2" align="center">Complete score</th>
<th valign="top" colspan="2" align="center">Only sequence</th>
<th valign="top" colspan="2" align="center">Only structure/biochemical</th>
</tr>
<tr>
<th valign="top" align="center">sensitivity</th>
<th valign="top" align="center">specificity</th>
<th valign="top" align="center">sensitivity</th>
<th valign="top" align="center">specificity</th>
<th valign="top" align="center">sensitivity</th>
<th valign="top" align="center">specificity</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="center">1</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">2</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">3</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">4</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.714</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">5</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
</tr>
<tr>
<td valign="top" align="center">6</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">7</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.857</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.857</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">1.000</td>
</tr>
<tr>
<td valign="top" align="center">8</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">1.000</td>
<td valign="top" align="right">0.929</td>
<td valign="top" align="right">0.857</td>
<td valign="top" align="right">1.000</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Datasets 4 and 5 show little change when removing the structural and biochemical analysis and when removing the sequence analysis. In dataset 4, the four cross-reactive peptides and four non-cross-reactive peptides are separated into different clusters with 100% accuracy. In dataset 5, again HEV-1527 and MYH9-478 are not separated into a different cluster from the negative decoys, but they are on the same branch of the dendrogram.</p>
</sec>
</sec>
<sec id="s4" sec-type="discussion">
<title>Discussion</title>
<p>T-cell cross-reactivity can cause devastating side effects in T-cell-based cancer immunotherapy, therefore it is of vital importance that we predict cross-reactivity when choosing immunotherapy targets. Here we proposed a scoring method to determine the similarity between peptide-HLA complexes to predict T-cell cross-reactivity. The metric we used here incorporates several methods of comparing peptides and peptide-HLAs including sequence, structure, and biochemical analysis. Peptide-HLAs that are more similar to each other are more likely to trigger an immune response from the same T-cell (<xref ref-type="bibr" rid="B7">7</xref>&#x2013;<xref ref-type="bibr" rid="B9">9</xref>), thus our score can be used to predict T-cell cross-reactivity.</p>
<p>When we run our method on dataset 1, the agglomerative clustering accurately separates the cross-reactive peptides from the decoys with 1 false negative and 4 false positives. In this case, the decoys have not been experimentally validated, so what we call a false positive may actually be cross-reactive and is a good candidate for further experimentation. We successfully separate most of the cross-reactive peptides from the negative decoys that are similar in sequence to the cross-reactive peptides. We have partial success when we use dataset 2. We are not able to reproduce the previous results of Mendes and colleagues (<xref ref-type="bibr" rid="B8">8</xref>), but their method was specialized for dataset 2. They had previous knowledge of the TCR-interaction contacts, so specific areas of the peptide-HLAs were selected for analysis. Our method is designed to work on any dataset of class I peptide-HLAs. As shown in our results on dataset 2, the peptides from genotype 6 are clustered together, and the peptides from genotype 1that trigger a T-cell response are also clustered together. Interestingly, when we use only the structural and biochemical analysis in the score, we get one cluster of peptides from multiple genotypes that are all cross-reactive and the other cross-reactive peptides are spread out in the other cluster. This clustering of different genotypes does not occur when we use only sequence analysis or when combining the sequence and structure analysis. Therefore, we can assume that the structural and biochemical analysis is recognizing similarities between peptide-HLAs that the sequence analysis is missing. Our KNN analysis has high sensitivity and specificity regardless of the score composition. In dataset 3, an expansion of dataset 2, we successfully separate the HCV-derived peptides from the negative decoys. Lastly, in the small datasets 4 and 5, the cross-reactive peptides are clustered together, showcasing how our method works on smaller datasets.</p>
<p>In this paper, we have presented a comparison to a previous method of predicting T-cell cross-reactivity <italic>via</italic> statistical analysis of the peptide-HLA structural features (<xref ref-type="bibr" rid="B8">8</xref>). This previous method has better results compared to our score, but each is only presented on a single dataset and specialized to that dataset. It is also based on 2D image analysis, and therefore only partially accounts for the structural and biochemical features of the pHLA complexes. The structure is simplified from 3Dto a 2D image, whereas in our method we use the 3D structure. We have also analyzed the effects of structural and biochemical information on the accuracy of our tool. We find that structural and biochemical analysis is useful in determining the similarity between peptide-HLA complexes, but peptide sequence analysis is also vital to accurately determining peptide-HLA similarity.</p>
<p>There are other methods of cross-reactivity prediction that can be potentially compared to PepSim. JanusMatrix is part of EpiVax&#x2019;s proprietary immunogenicity screening kit and thus cannot be freely compared to PepSim. In the original study, JanusMatrix is used to find potential cross-reactivity to defined T-cell epitopes (one from HCV and one from influenza), but the authors only provide the number of cross-reactive hits and not the peptides that are potentially cross-reactive (<xref ref-type="bibr" rid="B12">12</xref>). Therefore, a comparison is difficult. Expitope 2.0 is available as a web server where the user inputs a single peptide and receives a calculated cross-reactivity index chart. In the original study, the authors show that Expitope successfully predicts the cross-reactivity between MAGEA3 and the Titin-derived peptide (<xref ref-type="bibr" rid="B15">15</xref>). In this paper, we have shown that PepSim successfully predicts cross-reactivity to MAGEA3. RACER is an energy model for predicting TCR-pMHC binding affinity and can be used to predict T-cell cross-reactivity. RACER is shown to accurately predict TCR recognition rates when tested on datasets of class II MHCs, and thus we cannot compare PepSim, which was designed for class I MHCs.</p>
<p>As we have shown, our method applies to datasets of different sizes and content. Dataset 1 includes cancer peptides and self-peptides, and the other datasets consist of different viral peptides. Each dataset is also of a different size, but each experiment produces an accurate clustering of the peptides. Also, our method is T-cell independent, meaning no information on the TCR including sequence or structure is necessary to compute the similarity score. Our score provides a likelihood of triggering a cross-reactive response, based on the driving impact of the pMHC similarity. However, experimental measurements of cross-reactivity between these pHLAs might provide different results depending on the specific T-cell clone that is used in the experiments. Our ability to perfect the methods presented in this paper is limited by data availability, as we are achieving high classification performance on the presented datasets. Thus, it is likely that similar methods will be able to build on this work to achieve greater performance as more T-cell cross-reactivity data becomes available.</p>
<p>In terms of usability, Pepsim is available as a web server. PepSim takes the PDB files of each pHLA structure. The input files can be from multiple different sources, such as experimentally determined structures from the Protein Data Bank (<xref ref-type="bibr" rid="B11">11</xref>) or computational modeling software such as APE-Gen (<xref ref-type="bibr" rid="B16">16</xref>) or DockTope (<xref ref-type="bibr" rid="B17">17</xref>). With modeling software, researchers can generate structures for any number of peptide-HLA pairs. Given computational cost, we recommend using a dataset of at most 500 peptide-HLAs. PepSim is designed to be TCR independent, so it can be used when the peptide residues that are important to TCR recognition are unknown. However, we recognize that knowing the important residue positions would potentially improve PepSim&#x2019;s predictions, so a user of the web server has the option of specifying weights for the different peptide residue positions. In addition, although PepSim performs well in this study without needing to change the weights of the different sub-scores, the web server user can input their own sub-score weights. Additionally, the user defines a &#x201c;target&#x201d; pHLA, and PepSim outputs a ranked list of the input pHLAs in order of likelihood of cross-reactivity with the target pHLA. PepSim also generates the 2D score matrix that the user can use for further analysis, including clustering, and PepSim generates a dendrogram with the results of hierarchical clustering.</p>
</sec>
<sec id="s5" sec-type="conclusion">
<title>Conclusion</title>
<p>PepSim helps to fill a gap in the existing methods for predicting T-cell cross-reactivity. Previous attempts to incorporate structural features into cross-reactivity analysis were hindered by the lack of structures and the high computational demand of sampling methods, but we can overcome these limitations by relying on fast modeling through APE-Gen, and efficient algorithms for geometrical comparisons. In a large dataset (Dataset 1), we were able to accurately separate cross-reactive from non-cross-reactive peptides. Our method can also be generalized, as demonstrated in other smaller datasets (Datasets 2-5). Additionally, our method does not depend on the size and content of datasets and can be used in a T-cell-independent manner. PepSim is available as a webserver at pepsim.kavrakilab.org.</p>
</sec>
<sec id="s6" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="SM1"><bold>Supplementary Material</bold></xref>. Further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s7" sec-type="author-contributions">
<title>Author contributions</title>
<p>SH-S, MR, LK, and GL contributed to the conception and design of the study. SH-S and JS developed the methodology and software. SH-S and MR curated data. SH-S wrote the manuscript, performed the experiments, and created the web server. LK, DA, and GL were responsible for the supervision and project administration. LK was responsible for funding acquisition. All authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<sec id="s8" sec-type="funding-information">
<title>Funding</title>
<p>SHS is supported by a National Library of Medicine Training Program fellowship (T15LM007093-29). MR is supported by a Computational Cancer Biology Training Program fellowship CPRIT Grant No. RP170593. This work is also supported by NIH U01CA258512.</p>
</sec>
<ack>
<title>Acknowledgments</title>
<p>We thank the Center for Research Computing (CRC) at Rice University for supporting our use of the ORION VM Pool. Use of CRC resources is supported by the Data Analysis and Visualization Cyberinfrastructure funded by NSF (OCI-0959097) and by Rice University.</p>
</ack>
<sec id="s9" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
<p>Details on the datasets used in this study are available in the supplementary file Datasets.xls.</p>
</sec>
<sec id="s10" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s11" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fimmu.2023.1108303/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fimmu.2023.1108303/full#supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet_1.xlsx" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vandiedonck</surname> <given-names>C</given-names>
</name>
<name>
<surname>Knight</surname> <given-names>JC</given-names>
</name>
</person-group>. <article-title>The human major histocompatibility complex as a paradigm in genomics research</article-title>. <source>Briefings Funct Genomics Proteomics</source> (<year>2009</year>) <volume>8</volume>:<page-range>379&#x2013;94</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/bfgp/elp010</pub-id>
</citation>
</ref>
<ref id="B2">
<label>2</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Petrova</surname> <given-names>G</given-names>
</name>
<name>
<surname>Ferrante</surname> <given-names>A</given-names>
</name>
<name>
<surname>Gorski</surname> <given-names>J</given-names>
</name>
</person-group>. <article-title>Cross-reactivity of t cells and its role in the immune system</article-title>. <source>Crit Rev Immunol</source> (<year>2012</year>) <volume>32</volume>:<page-range>349&#x2013;72</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1615/critrevimmunol.v32.i4.50</pub-id>
</citation>
</ref>
<ref id="B3">
<label>3</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dykema</surname> <given-names>AG</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>B</given-names>
</name>
<name>
<surname>Woldemeskel</surname> <given-names>BA</given-names>
</name>
<name>
<surname>Garliss</surname> <given-names>CC</given-names>
</name>
<name>
<surname>Cheung</surname> <given-names>LS</given-names>
</name>
<name>
<surname>Choudhury</surname> <given-names>D</given-names>
</name>
<etal/>
</person-group>. <article-title>Functional characterization of cd4+ t-cell receptors cross-reactive for sars-cov-2 and endemic coronaviruses</article-title>. <source>J Clin Invest</source> (<year>2021</year>) <volume>131</volume>:<elocation-id>e146922</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1172/jci146922</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Waldman</surname> <given-names>AD</given-names>
</name>
<name>
<surname>Fritz</surname> <given-names>JM</given-names>
</name>
<name>
<surname>Lenardo</surname> <given-names>MJ</given-names>
</name>
</person-group>. <article-title>A guide to cancer immunotherapy: from t cell basic science to clinical practice</article-title>. <source>Nat Rev Immunol</source> (<year>2020</year>) <volume>20</volume>:<page-range>651&#x2013;68</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/s41577-020-0306-5</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Raman</surname> <given-names>MCC</given-names>
</name>
<name>
<surname>Rizkallah</surname> <given-names>PJ</given-names>
</name>
<name>
<surname>Simmons</surname> <given-names>R</given-names>
</name>
<name>
<surname>Donnellan</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Dukes</surname> <given-names>J</given-names>
</name>
<name>
<surname>Bossi</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>Direct molecular mimicry enables off-target cardiovascular toxicity by an enhanced affinity tcr designed for cancer immunotherapy</article-title>. <source>Sci Rep</source> (<year>2016</year>) <volume>6</volume>:<page-range>1&#x2013;10</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/srep18851</pub-id>
</citation>
</ref>
<ref id="B6">
<label>6</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Antunes</surname> <given-names>DA</given-names>
</name>
<name>
<surname>Rigo</surname> <given-names>MM</given-names>
</name>
<name>
<surname>Freitas</surname> <given-names>MV</given-names>
</name>
<name>
<surname>Mendes</surname> <given-names>MFA</given-names>
</name>
<name>
<surname>Sinigaglia</surname> <given-names>M</given-names>
</name>
<name>
<surname>Liz&#xe9;e</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>Interpreting t-cell cross-reactivity through structure: implications for tcr-based cancer immunotherapy</article-title>. <source>Front Immunol</source> (<year>2017</year>) <volume>8</volume>. doi:&#xa0;<pub-id pub-id-type="doi">10.3389/fimmu.2017.01210</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Antunes</surname> <given-names>DA</given-names>
</name>
<name>
<surname>Rigo</surname> <given-names>MM</given-names>
</name>
<name>
<surname>Silva</surname> <given-names>JP</given-names>
</name>
<name>
<surname>Cibulski</surname> <given-names>SP</given-names>
</name>
<name>
<surname>Sinigaglia</surname> <given-names>M</given-names>
</name>
<name>
<surname>Chies</surname> <given-names>JA</given-names>
</name>
<etal/>
</person-group>. <article-title>Structural in silico analysis of cross-genotype-reactivity among naturally occurring HCV NS3-1073-variants in the context of HLA-a * 02:01 allele</article-title>. <source>Mol Immunol</source> (<year>2011</year>) <volume>48</volume>:<page-range>1461&#x2013;7</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.molimm.2011.03.019</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mendes</surname> <given-names>MF</given-names>
</name>
<name>
<surname>Antunes</surname> <given-names>DA</given-names>
</name>
<name>
<surname>Rigo</surname> <given-names>MM</given-names>
</name>
<name>
<surname>Sinigaglia</surname> <given-names>M</given-names>
</name>
<name>
<surname>Vieira</surname> <given-names>GF</given-names>
</name>
</person-group>. <article-title>Improved structural method for t-cell cross-reactivity prediction</article-title>. <source>Mol Immunol</source> (<year>2015</year>) <volume>67</volume>:<page-range>303&#x2013;10</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.molimm.2015.06.017</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khan</surname> <given-names>JM</given-names>
</name>
<name>
<surname>Ranganathan</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Understanding tr binding to pmhc complexes: how does a tr scan many pmhc complexes yet preferentially bind to one</article-title>. <source>PLoS One</source> (<year>2011</year>) <volume>6</volume>:<elocation-id>e17194</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1371/journal.pone.0017194</pub-id>
</citation>
</ref>
<ref id="B10">
<label>10</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dhanik</surname> <given-names>A</given-names>
</name>
<name>
<surname>Kirshner J</surname> <given-names>R</given-names>
</name>
<name>
<surname>MacDonald</surname> <given-names>D</given-names>
</name>
<name>
<surname>Thurston</surname> <given-names>G</given-names>
</name>
<name>
<surname>C Lin</surname> <given-names>H</given-names>
</name>
<name>
<surname>J Murphy</surname> <given-names>A</given-names>
</name>
<etal/>
</person-group>. <article-title>In-silico discovery of cancer-specific peptide-hla complexes for targeted therapy</article-title>. <source>BMC Bioinf</source> (<year>2016</year>) <volume>17</volume>:<page-range>1&#x2013;14</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s12859-016-1150-2</pub-id>
</citation>
</ref>
<ref id="B11">
<label>11</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berman</surname> <given-names>HM</given-names>
</name>
</person-group>. <article-title>The protein data bank</article-title>. <source>Nucleic Acids Res</source> (<year>2000</year>) <volume>28</volume>:<page-range>235&#x2013;42</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/nar/28.1.235</pub-id>
</citation>
</ref>
<ref id="B12">
<label>12</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moise</surname> <given-names>L</given-names>
</name>
<name>
<surname>Gutierrez</surname> <given-names>AH</given-names>
</name>
<name>
<surname>Bailey-Kellogg</surname> <given-names>C</given-names>
</name>
<name>
<surname>Terry</surname> <given-names>F</given-names>
</name>
<name>
<surname>Leng</surname> <given-names>Q</given-names>
</name>
<name>
<surname>Abdel Hady</surname> <given-names>KM</given-names>
</name>
<etal/>
</person-group>. <article-title>The two-faced t cell epitope</article-title>. <source>Hum Vaccines Immunotherapeutics</source> (<year>2013</year>) <volume>9</volume>:<page-range>1577&#x2013;86</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.4161/hv.24615</pub-id>
</citation>
</ref>
<ref id="B13">
<label>13</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lin</surname> <given-names>X</given-names>
</name>
<name>
<surname>George</surname> <given-names>JT</given-names>
</name>
<name>
<surname>Schafer</surname> <given-names>NP</given-names>
</name>
<name>
<surname>Chau</surname> <given-names>KN</given-names>
</name>
<name>
<surname>Birnbaum</surname> <given-names>ME</given-names>
</name>
<name>
<surname>Clementi</surname> <given-names>C</given-names>
</name>
<etal/>
</person-group>. <article-title>Rapid assessment of t-cell receptor specificity of the immune repertoire</article-title>. <source>Nat Comput Sci</source> (<year>2021</year>) <volume>1</volume>:<page-range>362&#x2013;73</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1101/2020.04.06.028415</pub-id>
</citation>
</ref>
<ref id="B14">
<label>14</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jaravine</surname> <given-names>V</given-names>
</name>
<name>
<surname>Raffegerst</surname> <given-names>S</given-names>
</name>
<name>
<surname>Schendel</surname> <given-names>DJ</given-names>
</name>
<name>
<surname>Frishman</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Assessment of cancer and virus antigens for cross-reactivity in human tissues</article-title>. <source>Bioinformatics</source> (<year>2016</year>) <volume>33</volume>:<page-range>104&#x2013;11</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/bioinformatics/btw567</pub-id>
</citation>
</ref>
<ref id="B15">
<label>15</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jaravine</surname> <given-names>V</given-names>
</name>
<name>
<surname>M&#xf6;sch</surname> <given-names>A</given-names>
</name>
<name>
<surname>Raffegerst</surname> <given-names>S</given-names>
</name>
<name>
<surname>Schendel</surname> <given-names>DJ</given-names>
</name>
<name>
<surname>Frishman</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Expitope 2.0: a tool to assess immunotherapeutic antigens for their potential cross-reactivity against naturally expressed proteins in human tissues</article-title>. <source>BMC Cancer</source> (<year>2017</year>) <volume>17</volume>:<fpage>892</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s12885-017-3854-8</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abella</surname> <given-names>J</given-names>
</name>
<name>
<surname>Antunes</surname> <given-names>D</given-names>
</name>
<name>
<surname>Clementi</surname> <given-names>C</given-names>
</name>
<name>
<surname>Kavraki</surname> <given-names>L</given-names>
</name>
</person-group>. <article-title>Ape-gen: a fast method for generating ensembles of bound peptide-mhc conformations</article-title>. <source>Molecules</source> (<year>2019</year>) <volume>24</volume>:<fpage>881</fpage>. doi: <pub-id pub-id-type="doi">10.3390/molecules24050881</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Menegatti Rigo</surname> <given-names>M</given-names>
</name>
<name>
<surname>Amaral Antunes</surname> <given-names>D</given-names>
</name>
<name>
<surname>Vaz de Freitas</surname> <given-names>M</given-names>
</name>
<name>
<surname>Fabiano de Almeida Mendes</surname> <given-names>M</given-names>
</name>
<name>
<surname>Meira</surname> <given-names>L</given-names>
</name>
<name>
<surname>Sinigaglia</surname> <given-names>M</given-names>
</name>
<etal/>
</person-group>. <article-title>Docktope: a web-based tool for automated pmhc-i modelling</article-title>. <source>Sci Rep</source> (<year>2015</year>) <volume>5</volume>:<fpage>18413</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/srep18413</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Henikoff</surname> <given-names>S</given-names>
</name>
<name>
<surname>Henikoff</surname> <given-names>JG</given-names>
</name>
</person-group>. <article-title>Amino acid substitution matrices from protein blocks</article-title>. <source>Proc Natl Acad Sci</source> (<year>1992</year>) <volume>89</volume>:<page-range>10915&#x2013;9</page-range>. doi: <pub-id pub-id-type="doi">10.1073/pnas.89.22.10915</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sarkizova</surname> <given-names>S</given-names>
</name>
<name>
<surname>Klaeger</surname> <given-names>S</given-names>
</name>
<name>
<surname>Le</surname> <given-names>PM</given-names>
</name>
<name>
<surname>Li</surname> <given-names>LW</given-names>
</name>
<name>
<surname>Oliveira</surname> <given-names>G</given-names>
</name>
<name>
<surname>Keshishian</surname> <given-names>H</given-names>
</name>
<etal/>
</person-group>. <article-title>A large peptidome dataset improves HLA class I epitope prediction across most of the human population</article-title>. <source>Nat Biotechnol</source> (<year>2019</year>) <volume>38</volume>(<issue>2</issue>):<page-range>199&#x2013;209</page-range>. doi: <pub-id pub-id-type="doi">10.1038/s41587-019-0322-9</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Schr&#xf6;dinger</surname> <given-names>LLC</given-names>
</name>
</person-group>. <source>The PyMOL molecular graphics system, version 1.8</source>. (<year>2015</year>). Available at: <uri xlink:href="http://www.pymol.org/pymol">http://www.pymol.org/pymol</uri>.</citation>
</ref>
<ref id="B21">
<label>21</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sanner</surname> <given-names>MF</given-names>
</name>
<name>
<surname>Olson</surname> <given-names>AJ</given-names>
</name>
<name>
<surname>Spehner</surname> <given-names>JC</given-names>
</name>
</person-group>. <article-title>Reduced surface: an efficient way to compute molecular surfaces</article-title>. <source>Biopolymers</source> (<year>1996</year>) <volume>38</volume>:<page-range>305&#x2013;20</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1002/(sici)1097-0282(199603)38:3&lt;305::aid-bip4&gt;3.0.co;2-y</pub-id>
</citation>
</ref>
<ref id="B22">
<label>22</label>
<citation citation-type="web">
<source>Pymesh - geometric processing library for python</source> (<year>2019</year>). Available at: <uri xlink:href="https://github.com/PyMesh/PyMesh">https://github.com/PyMesh/PyMesh</uri>.</citation>
</ref>
<ref id="B23">
<label>23</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jurrus</surname> <given-names>E</given-names>
</name>
<name>
<surname>Engel</surname> <given-names>D</given-names>
</name>
<name>
<surname>Star</surname> <given-names>K</given-names>
</name>
<name>
<surname>Monson</surname> <given-names>K</given-names>
</name>
<name>
<surname>Brandi</surname> <given-names>J</given-names>
</name>
<name>
<surname>Felberg</surname> <given-names>LE</given-names>
</name>
<etal/>
</person-group>. <article-title>Improvements to the APBS biomolecular solvation software suite</article-title>. <source>Protein Sci</source> (<year>2017</year>) <volume>27</volume>:<page-range>112&#x2013;28</page-range>. doi: <pub-id pub-id-type="doi">10.1002/pro.3280</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Kyte</surname> <given-names>J</given-names>
</name>
<name>
<surname>Doolittle</surname> <given-names>RF</given-names>
</name>
</person-group>. <article-title>A simple method for displaying the hydropathic character of a protein</article-title>. <source>J Mol Biol</source> (<year>1982</year>) <volume>157</volume>:<page-range>105&#x2013;32</page-range>. doi: <pub-id pub-id-type="doi">10.1016/0022-2836(82)90515-0</pub-id>
</citation>
</ref>
<ref id="B25">
<label>25</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gainza</surname> <given-names>P</given-names>
</name>
<name>
<surname>Sverrisson</surname> <given-names>F</given-names>
</name>
<name>
<surname>Monti</surname> <given-names>F</given-names>
</name>
<name>
<surname>Rodol&#xe0;</surname> <given-names>E</given-names>
</name>
<name>
<surname>Boscaini</surname> <given-names>D</given-names>
</name>
<name>
<surname>Bronstein</surname> <given-names>MM</given-names>
</name>
<etal/>
</person-group>. <article-title>Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning</article-title>. <source>Nat Methods</source> (<year>2019</year>) <volume>17</volume>:<page-range>184&#x2013;92</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/s41592-019-0666-6</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kortemme</surname> <given-names>T</given-names>
</name>
<name>
<surname>Morozov</surname> <given-names>AV</given-names>
</name>
<name>
<surname>Baker</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein&#x2013;protein complexes</article-title>. <source>J Mol Biol</source> (<year>2003</year>) <volume>326</volume>:<page-range>1239&#x2013;59</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/s0022-2836(03)00021-4</pub-id>
</citation>
</ref>
<ref id="B27">
<label>27</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Arun</surname> <given-names>KS</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>TS</given-names>
</name>
<name>
<surname>Blostein</surname> <given-names>SD</given-names>
</name>
</person-group>. <article-title>Least-squares fitting of two 3-d point sets</article-title>, in: <source>IEEE Transactions on pattern analysis and machine intelligence PAMI-9</source> (<year>1987</year>) <page-range>698&#x2013;700</page-range>. (Accessed <access-date>2022-04-12</access-date>).</citation>
</ref>
<ref id="B28">
<label>28</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pedregosa</surname> <given-names>F</given-names>
</name>
<name>
<surname>Varoquaux</surname> <given-names>G</given-names>
</name>
<name>
<surname>Gramfort</surname> <given-names>A</given-names>
</name>
<name>
<surname>Michel</surname> <given-names>V</given-names>
</name>
<name>
<surname>Thirion</surname> <given-names>B</given-names>
</name>
<name>
<surname>Grisel</surname> <given-names>O</given-names>
</name>
<etal/>
</person-group>. <article-title>Scikit-learn: machine learning in Python</article-title>. <source>J Mach Learn Res</source> (<year>2011</year>) <volume>12</volume>:<page-range>2825&#x2013;30</page-range>.</citation>
</ref>
<ref id="B29">
<label>29</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fix</surname> <given-names>E</given-names>
</name>
<name>
<surname>Hodges</surname> <given-names>JL</given-names>
</name>
</person-group>. <article-title>Discriminatory analysis. nonparametric discrimination: Consistency properties</article-title>. <source>Tech. Rep. 4, USAF School of Aviation Medicine</source> (<year>1951</year>).</citation>
</ref>
<ref id="B30">
<label>30</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gee</surname> <given-names>MH</given-names>
</name>
<name>
<surname>Yang</surname> <given-names>X</given-names>
</name>
<name>
<surname>Garcia</surname> <given-names>KC</given-names>
</name>
</person-group>. <article-title>Facile method for screening clinical T cell receptors for off-target peptide-HLA reactivity</article-title>. <source>bioRxiv</source> (<year>2018</year>). doi: <pub-id pub-id-type="doi">10.1101/472480</pub-id>
</citation>
</ref>
<ref id="B31">
<label>31</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vita</surname> <given-names>R</given-names>
</name>
<name>
<surname>Mahajan</surname> <given-names>S</given-names>
</name>
<name>
<surname>Overton</surname> <given-names>JA</given-names>
</name>
<name>
<surname>Dhanda</surname> <given-names>SK</given-names>
</name>
<name>
<surname>Martini</surname> <given-names>S</given-names>
</name>
<name>
<surname>Cantrell</surname> <given-names>JR</given-names>
</name>
<etal/>
</person-group>. <article-title>The immune epitope database (IEDB): 2018 update</article-title>. <source>Nucleic Acids Res</source> (<year>2018</year>) <volume>47</volume>:<page-range>D339&#x2013;43</page-range>. doi: <pub-id pub-id-type="doi">10.1093/nar/gky1006</pub-id>
</citation>
</ref>
<ref id="B32">
<label>32</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fytili</surname> <given-names>P</given-names>
</name>
<name>
<surname>Dalekos</surname> <given-names>G</given-names>
</name>
<name>
<surname>Schlaphoff</surname> <given-names>V</given-names>
</name>
<name>
<surname>Suneetha</surname> <given-names>P</given-names>
</name>
<name>
<surname>Sarrazin</surname> <given-names>C</given-names>
</name>
<name>
<surname>Zauner</surname> <given-names>W</given-names>
</name>
<etal/>
</person-group>. <article-title>Cross-genotype-reactivity of the immunodominant HCV CD8 t-cell epitope NS3-1073</article-title>. <source>Vaccine</source> (<year>2008</year>) <volume>26</volume>:<page-range>3818&#x2013;26</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.vaccine.2008.05.045</pub-id>
</citation>
</ref>
<ref id="B33">
<label>33</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sinigaglia</surname> <given-names>M</given-names>
</name>
<name>
<surname>Antunes</surname> <given-names>DA</given-names>
</name>
<name>
<surname>Rigo</surname> <given-names>MM</given-names>
</name>
<name>
<surname>Chies</surname> <given-names>JAB</given-names>
</name>
<name>
<surname>Vieira</surname> <given-names>GF</given-names>
</name>
</person-group>. <article-title>CrossTope: a curate repository of 3d structures of immunogenic peptide: MHC complexes</article-title>. <source>Database</source> (<year>2013</year>) <volume>2013</volume>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/database/bat002</pub-id>
</citation>
</ref>
<ref id="B34">
<label>34</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Halstead</surname> <given-names>SB</given-names>
</name>
</person-group>. <article-title>Identifying protective dengue vaccines: guide to mastering an empirical process</article-title>. <source>Vaccine</source> (<year>2013</year>) <volume>31</volume>:<page-range>4501&#x2013;7</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.vaccine.2013.06.079</pub-id>
</citation>
</ref>
<ref id="B35">
<label>35</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Soon</surname> <given-names>CF</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>S</given-names>
</name>
<name>
<surname>Suneetha</surname> <given-names>PV</given-names>
</name>
<name>
<surname>Antunes</surname> <given-names>DA</given-names>
</name>
<name>
<surname>Manns</surname> <given-names>MP</given-names>
</name>
<name>
<surname>Raha</surname> <given-names>S</given-names>
</name>
<etal/>
</person-group>. <article-title>Hepatitis e virus (HEV)-specific t cell receptor cross-recognition: implications for immunotherapy</article-title>. <source>Front Immunol</source> (<year>2019</year>) <volume>10</volume>:<page-range>1&#x2013;14</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.3389/fimmu.2019.02076</pub-id>
</citation>
</ref>
<ref id="B36">
<label>36</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kruskal</surname> <given-names>JB</given-names>
</name>
</person-group>. <article-title>Nonmetric multidimensional scaling: a numerical method</article-title>. <source>Psychometrika</source> (<year>1964</year>) <volume>29</volume>:<page-range>115&#x2013;29</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/bf02289694</pub-id>
</citation>
</ref>
<ref id="B37">
<label>37</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Merkel</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Docker: lightweight linux containers for consistent development and deployment</article-title>. <source>Linux Journal</source> (<year>2014</year>) <volume>2014</volume>:<fpage>2</fpage>.</citation>
</ref>
<ref id="B38">
<label>38</label>
<citation citation-type="web">
<source>Django (Version 1.5) [Computer Software]</source> (<year>2013</year>). Available at: <uri xlink:href="https://www.djangoproject.com/">https://www.djangoproject.com/</uri>.</citation>
</ref>
<ref id="B39">
<label>39</label>
<citation citation-type="web">
<source>Celery - distributed task queue - celery 5.0.5 documentation</source> (<year>2021</year>). Available at: <uri xlink:href="https://docs.celeryproject.org/en/stable/">https://docs.celeryproject.org/en/stable/</uri>.</citation>
</ref>
</ref-list>
</back>
</article>