<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Immunol.</journal-id>
<journal-title>Frontiers in Immunology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Immunol.</abbrev-journal-title>
<issn pub-type="epub">1664-3224</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fimmu.2023.1236080</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Immunology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Bw4 ligand and direct T-cell receptor binding induced selection on HLA A and B alleles</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Levi</surname>
<given-names>Reut</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1074646"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Levi</surname>
<given-names>Lee</given-names>
</name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Louzoun</surname>
<given-names>Yoram</given-names>
</name>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/88412"/>
</contrib>
</contrib-group>
<aff id="aff1">
<institution>Department of Mathematics, Bar-Ilan University</institution>, <addr-line>Ramat Gan</addr-line>, <country>Israel</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Jonathan Kaye, Cedars Sinai Medical Center, United States</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Philip Bradley, Fred Hutchinson Cancer Center, United States; Johannes Schetelig, University Hospital Carl Gustav Carus, Germany</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Yoram Louzoun, <email xlink:href="mailto:louzouy@math.biu.ac.il">louzouy@math.biu.ac.il</email>
</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>21</day>
<month>11</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1236080</elocation-id>
<history>
<date date-type="received">
<day>07</day>
<month>06</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>10</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Levi, Levi and Louzoun</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Levi, Levi and Louzoun</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>The HLA region is the hallmark of balancing selection, argued to be driven by the pressure to present a wide variety of viral epitopes. As such selection on the peptide-binding positions has been proposed to drive HLA population genetics. MHC molecules also directly binds to the T-Cell Receptor and killer cell immunoglobulin-like receptors (KIR).</p>
</sec>
<sec>
<title>Methods</title>
<p>We here combine the HLA allele frequencies in over six-million Hematopoietic Stem Cells (HSC) donors with a novel machine-learning-based method to predict allele frequency. </p>
</sec>
<sec>
<title>Results</title>
<p>We show for the first time that allele frequency can be predicted from their sequences. This prediction yields a natural measure for selection. The strongest selection is affecting KIR binding regions, followed by the peptide-binding cleft. The selection from the direct interaction with the KIR and TCR is centered on positively charged residues (mainly Arginine), and some positions in the peptide-binding cleft are not associated with the allele frequency, especially Tyrosine residues. </p>
</sec>
<sec>
<title>Discussion</title>
<p>These results suggest that the balancing selection for peptide presentation is combined with a positive selection for KIR and TCR binding.</p>
</sec>
</abstract>
<kwd-group>
<kwd>selection</kwd>
<kwd>HLA</kwd>
<kwd>balancing</kwd>
<kwd>machine learning</kwd>
<kwd>allele</kwd>
<kwd>Bw4</kwd>
<kwd>T cell receptor</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="2"/>
<equation-count count="5"/>
<ref-count count="75"/>
<page-count count="10"/>
<word-count count="6424"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-in-acceptance</meta-name>
<meta-value>T Cell Biology</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<label>1</label>
<title>Introduction</title>
<p>A major challenge in understanding the evolutionary forces that act on species and affect their genetic variation is the identification of loci and positions under selection. In a simple model of directional selection, a novel mutation is favored if it confers a selective advantage to the organism (positive selection) (<xref ref-type="bibr" rid="B1">1</xref>). However, in some loci, balancing selection has been proposed to favor a large number of alleles in the same locus (<xref ref-type="bibr" rid="B2">2</xref>).</p>
<p>A hallmark of balancing selection is the MHC (See <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref> for all abbreviations) region, encoding the MHC molecule that presents peptides to T lymphocytes (<xref ref-type="bibr" rid="B3">3</xref>), denoted HLA in humans. The HLA region is the most diverse loci in the human genome (<xref ref-type="bibr" rid="B4">4</xref>). The selection has been argued to emerge from the need to bind peptides from different pathogens. As such, it is centered on peptide-binding positions in the MHC molecule (<xref ref-type="bibr" rid="B5">5</xref>). Classical HLA genes include two main groups - A, B and C denoted class I presenting intra-cellular peptides, and DR and DQ denoted class II, typically presenting extracellular peptides. Most of the variations among alleles are indeed concentrated in the peptide-binding regions in the second and third exons of the class I loci and the second exon of the class II loci (<xref ref-type="bibr" rid="B6">6</xref>). We currently have limited accuracy of DP allele frequencies. Thus, DP was not studied in the current analysis.</p>
<table-wrap id="T1" position="float">
<label>Table&#xa0;1</label>
<caption>
<p>List of acronyms used in the current analysis.</p>
</caption>
<table frame="hsides">
<tbody>
<tr>
<td valign="top" align="center">AA</td>
<td valign="top" align="left">Amino Acid</td>
</tr>
<tr>
<td valign="top" align="center">TCR</td>
<td valign="top" align="left">T-Cell Receptor</td>
</tr>
<tr>
<td valign="top" align="center">MHC</td>
<td valign="top" align="left">Major Histocompatibility Complex</td>
</tr>
<tr>
<td valign="top" align="center">HLA</td>
<td valign="top" align="left">Human Leukocyte Antigen</td>
</tr>
<tr>
<td valign="top" align="center">CDR</td>
<td valign="top" align="left">Complementarity-Determining Region</td>
</tr>
<tr>
<td valign="top" align="center">KIR</td>
<td valign="top" align="left">Killer cell Immunoglobulin-like Receptor</td>
</tr>
<tr>
<td valign="top" align="center">NK</td>
<td valign="top" align="left">Natural Killer</td>
</tr>
<tr>
<td valign="top" align="center">PB</td>
<td valign="top" align="left">Peptide-Binding</td>
</tr>
<tr>
<td valign="top" align="center">NPB</td>
<td valign="top" align="left">Non-Peptide-Binding</td>
</tr>
<tr>
<td valign="top" align="center">LILR</td>
<td valign="top" align="left">Leukocyte Immunoglobulin-Like Receptor</td>
</tr>
<tr>
<td valign="top" align="center">TSP</td>
<td valign="top" align="left">Trans-Species Polymorphism</td>
</tr>
<tr>
<td valign="top" align="center">ESP</td>
<td valign="top" align="left">Electrostatic Surface Potential</td>
</tr>
<tr>
<td valign="top" align="center">SVR</td>
<td valign="top" align="left">Support Vector Regression</td>
</tr>
<tr>
<td valign="top" align="center">RBF</td>
<td valign="top" align="left">Radial Basis Function</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The main evidence for balancing selection in HLA are trans-species polymorphism (TSP) and high diversity. Many distinct mechanisms have been proposed to induce this balancing selection (<xref ref-type="bibr" rid="B7">7</xref>), including direct selection by pathogens, heterozygote advantage (<xref ref-type="bibr" rid="B8">8</xref>, <xref ref-type="bibr" rid="B9">9</xref>), MHC-dependent mate choice (assortative mating) and sexual selection, including MHC dependence on mother-fetal interactions and the apparent olfactory recognition of MHC haplotypes (<xref ref-type="bibr" rid="B10">10</xref>).</p>
<p>However, in humans, MHC-I also has direct interactions with three other molecules that could affect the selection of HLA alleles. The MHC-I molecule has direct interaction with the TCR and plays a role in TCR-HLA peptide binding. Recently, the direct interaction of the TCR and the HLA was shown to be affected by the V gene and CDR3 sequence of the TCR <italic>&#x3b2;</italic> chain (<xref ref-type="bibr" rid="B11">11</xref>&#x2013;<xref ref-type="bibr" rid="B14">14</xref>). NK cells also bind MHC-I molecules via two distinct groups of receptors, killer immuno-globulin-like receptors (KIRs) and CD94:NKG2. Natural killer cells are lymphocytes of the innate immune response that provide an important defense against infection, particularly viral infections (<xref ref-type="bibr" rid="B15">15</xref>&#x2013;<xref ref-type="bibr" rid="B17">17</xref>). KIRs are inhibitory and activating receptors expressed mostly on the surface of NK cells and some T-cells. KIRs recognize broad groups of HLA class I molecules, mainly through the Bw4 binding domain in the A and B HLA alleles (<xref ref-type="bibr" rid="B18">18</xref>). Bw4 is a public epitope present on a subset of HLA-B and on some HLA-A alleles. NK cells can induce cell death in cells lacking Bw4.</p>
<p>MHC-I molecules are also the ligands for the leukocyte immunoglobulin-like receptors (LILR) of which LILRB1 and LILRB2 are the best characterized (<xref ref-type="bibr" rid="B18">18</xref>). A variety of HLA allotypes bind LILRB1 and LILRB2 with varying affinities, especially LILRB2, which shows considerable variation across HLA alleles (<xref ref-type="bibr" rid="B19">19</xref>). The LILRB1 and LILRB2 receptors are inhibitory receptors found mainly on myeloid cells such as dendritic cells and macrophages; signaling via LILR influences their activation (<xref ref-type="bibr" rid="B20">20</xref>). We here show that in humans, the direct interaction of MHC-I with TCRs and KIR molecules has a direct signature of selection in the HLA region.</p>
<p>Several methods were proposed for the identification of positions associated with selection (<xref ref-type="bibr" rid="B21">21</xref>), including among others, the examination of surplus in heterozygous genotypes (<xref ref-type="bibr" rid="B22">22</xref>), identification of local uplifted genetic variance (<xref ref-type="bibr" rid="B23">23</xref>), polymorphisms (<xref ref-type="bibr" rid="B24">24</xref>), changes in the range of sites frequencies toward common frequencies (<xref ref-type="bibr" rid="B25">25</xref>&#x2013;<xref ref-type="bibr" rid="B27">27</xref>), deviation of genetic diversity from neutral models (<xref ref-type="bibr" rid="B28">28</xref>), presence of trans-species polymorphism (<xref ref-type="bibr" rid="B29">29</xref>, <xref ref-type="bibr" rid="B30">30</xref>), explicit models of polymorphism patterns (<xref ref-type="bibr" rid="B31">31</xref>, <xref ref-type="bibr" rid="B32">32</xref>), correlation of environmental features and allele frequencies (<xref ref-type="bibr" rid="B33">33</xref>), and others. Most of these methods are based on the distribution of nucleotides and amino acids at the appropriate position. As such, they are indirect evidence for selection. Recently some frequency-based methods were also developed (<xref ref-type="bibr" rid="B34">34</xref>&#x2013;<xref ref-type="bibr" rid="B37">37</xref>). Such methods are based on the principle that non-neutral evolution leaves a signature of selection on the allele frequencies.</p>
<p>A more direct measure of selection would be to measure the effect of each amino acid in each position on the allele frequency. While in most genes the sampling depth and the polymorphism do not allow for such a direct measurement, the HLA locus is polymorphic enough (over 24,000 alleles in A, B, and C, and more than 7,000 as defined by the amino acid sequence of exons 2 and 3 only), and has a large enough coverage (over 39 million typed donors worldwide) (<xref ref-type="bibr" rid="B38">38</xref>, <xref ref-type="bibr" rid="B39">39</xref>). We have recently demonstrated the validity of the frequency estimates of HLA haplotypes and their adequacy for population structure modeling (<xref ref-type="bibr" rid="B37">37</xref>, <xref ref-type="bibr" rid="B40">40</xref>, <xref ref-type="bibr" rid="B41">41</xref>). We here show for the first time that allele sequence can be used to predict allele frequency on a test set. We then show that the coefficients of the prediction algorithm highlight a strong additional selection induced on the HLA locus on regions not binding the peptide, but rather NK cells or directly the T-Cell Receptor (TCR). To the best of our knowledge, this is the first prediction of allele frequency in the human population from their sequence in any locus.</p>
</sec>
<sec id="s2" sec-type="results">
<label>2</label>
<title>Results</title>
<p>To show that the HLA allele amino acid composition can be used to predict the frequency of an unseen allele, we regressed the log allele frequency on the amino acid composition represented as a one-hot per position (see Online Methods for formalism and training-test division and <xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1A</bold>
</xref> for a schematic scheme). We used the HLA allele frequencies imputed from the HLA typings from 6.59 million donors of the National Marrow Donor Program registry. The frequencies are divided into 21 detailed and 5 broad sub-population across the US (<xref ref-type="bibr" rid="B42">42</xref>) (see <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S1</bold>
</xref> for details). We tested multiple linear and non-linear regression methods for each population. Formally, for the linear regressors, each population <inline-formula>
<mml:math display="inline" id="im11">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula>, and each allele <inline-formula>
<mml:math display="inline" id="im12">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula> in locus <inline-formula>
<mml:math display="inline" id="im13">
<mml:mi>L</mml:mi>
</mml:math>
</inline-formula> (A, B or C), with frequency <inline-formula>
<mml:math display="inline" id="im14">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, we mark:</p>
<fig id="f1" position="float">
<label>Figure&#xa0;1</label>
<caption>
<p>Frequency prediction. <bold>(A)</bold> Schematic description of <inline-formula>
<mml:math display="inline" id="im1">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> estimate. We regress the log frequency on the AA in all positions and obtain a coefficient <inline-formula>
<mml:math display="inline" id="im2">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula> per AA per position. We normalize the sum of <inline-formula>
<mml:math display="inline" id="im3">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula> to be 0 in each position. <inline-formula>
<mml:math display="inline" id="im4">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the sum of the absolute of <inline-formula>
<mml:math display="inline" id="im5">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula> values per position. <bold>(B)</bold> The average Spearman correlation test between the predicted and real log frequencies over all populations for different models. Complex models actually have a worse prediction than linear models. <bold>(C)</bold> Spearman correlation as a function of the population. The stars represent the broad populations (AFA, API, CAU, HIS and NAM). <bold>(D)</bold> The Spearman correlation between the log real <inline-formula>
<mml:math display="inline" id="im6">
<mml:mi>y</mml:mi>
</mml:math>
</inline-formula> values (allele&#x2019;s frequencies) and the predicted ones by SVR on the test set (the amino acid sequence of each allele) for each locus separately and each population, where the blue bars represent the A locus, the pink bars represent the B locus and the green bars represent the C locus. The A and B loci have consistent positive correlations, while the C locus has no correlation. <bold>(E)</bold> The sum of the absolute of <inline-formula>
<mml:math display="inline" id="im7">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula> values per position (the <inline-formula>
<mml:math display="inline" id="im8">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula> values are defined as the regression coefficient of the SVR), where the black/redstars represent the peptide-binding or the Bw4 positions respectively, and the dark blue bars represent the significant positions. <bold>(F, G)</bold> The distribution of the <inline-formula>
<mml:math display="inline" id="im9">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values <bold>(F)</bold> and the <inline-formula>
<mml:math display="inline" id="im10">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo stretchy="false">/</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values <bold>(G)</bold> for each region.Pink dots are significantly different from the null model.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fimmu-14-1236080-g001.tif"/>
</fig>
<disp-formula>
<label>(1)</label>
<mml:math display="block" id="M1">
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mo>&#x2211;</mml:mo>
<mml:mo>&#x200b;</mml:mo>
</mml:msup>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:msub>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mi>k</mml:mi>
</mml:munder>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>*</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im15">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is a one-hot representation of the allele sequence, <inline-formula>
<mml:math display="inline" id="im16">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the coefficients for the appropriate population and <inline-formula>
<mml:math display="inline" id="im17">
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>&#x3b2;</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is a regularization term that varies among methods (e.g. Ridge LASSO). The formalism is similar in non-linear methods (see Methods). We have also tested the possibility of regression of all loci simultaneously. In such cases, an additional term was added to the regression representing the locus. Finally, we also performed a similar regression on all populations simultaneously. In this case, an additional one-hot term was added for the population (see Methods).</p>
<p>The RBF Support-Vector Regression (SVR) produced the highest average test correlation (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1B</bold>
</xref>), but a linear SVR had almost similar scores (ANOVA test between all the models <inline-formula>
<mml:math display="inline" id="im18">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mn>9.74</mml:mn>
<mml:mi>e</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>27</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, T-test between the RBF SVR and the linear SVR <inline-formula>
<mml:math display="inline" id="im19">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mn>0.001</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>). Thus, to get a simple explanation of the coefficients, we trained the SVR model with the linear kernel on all loci together. We thus used the linear SVR model for all loci together.</p>
<p>Note that more precise results can be obtained for specific loci and populations using other models (<xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S9</bold>
</xref>), but as further mentioned their coefficients fail to detect previously reported selection, and were thus not used. Moreover, the samemodel is mainly predicting the difference between populations, and not the direct effect of the sequence on the log-frequency.</p>
<p>The correlation between the predicted and real log frequencies decreases with the population size (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1C</bold>
</xref>, the broad populations (AFA, API, CAU, HIS and NAM) are marked with a star), as a result, the correlation for the broad groups is lower than for the detailed groups in general. We thus focus on the detailed groups in the remainder of the analysis. The correlation is highest in the A locus (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1D</bold>
</xref>), followed by B (0.178 vs 0.102). The average correlation in the C locus is almost null (0.03) and non-significant.</p>
<p>This may be due to the sequence differences and failure to learn from A and B to C which has fewer alleles. To check that this is not the case, we performed a regression on each of the loci separately. Again, in the C locus, the prediction models fail to predict the frequencies of the alleles (<xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figure S3</bold>
</xref>). Therefore, the lack of prediction at the C locus is not due to its difference from the A and B loci, nor is it because of the number of alleles, which is similar among loci (2,196 in C vs 2,477 in A and 3,219 in B).</p>
<p>In the linear models, each amino acid at each position is associated with a coefficient (<inline-formula>
<mml:math display="inline" id="im20">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula>), we computed the sum of the absolute of <inline-formula>
<mml:math display="inline" id="im21">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula> values (<inline-formula>
<mml:math display="inline" id="im22">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>) per position. A low <inline-formula>
<mml:math display="inline" id="im23">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> implies that mutations in this position have a minimal effect on the allele frequencies, and a high <inline-formula>
<mml:math display="inline" id="im24">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> implies that some AAs in this position are strongly correlated with a high or low allele frequency. As is the case for most selection measures, this is no proof of causality, since different positions may be in Linkage Disequilibrium (LD).</p>
<p>The regression coefficients were consistent among the different populations, with an average correlation of <inline-formula>
<mml:math display="inline" id="im25">
<mml:mrow>
<mml:mn>0.3</mml:mn>
<mml:mo>&#xb1;</mml:mo>
<mml:mn>0.009</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> over the large enough coefficients (<inline-formula>
<mml:math display="inline" id="im26">
<mml:mrow>
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mi>i</mml:mi>
</mml:munder>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>&#x3b2;</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>|</mml:mo>
<mml:mo>&gt;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, when computing the correlation on all positions, it is closeto 1, but this is because many positions have values near 0).</p>
<p>
<inline-formula>
<mml:math display="inline" id="im27">
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is per definition biased toward positions with a more diverse amino acid composition (<xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figure S4</bold>
</xref>). This is expected since such positions are also the ones most associated with selection. Still, we have examined several possible methods of <inline-formula>
<mml:math display="inline" id="im28">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> estimation, including the sum of absolute values (as above), the average of absolute values, and the average of absolute values weighted by the frequency of each AA at the appropriate position. The sum of absolute values best reproduces known results on the selection affecting the peptide binding domain and was thus kept (<xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figure S5A</bold>
</xref> vs <xref ref-type="supplementary-material" rid="SM1">
<bold>S5B, C</bold>
</xref>). Similarly, a single model trained on all the populations together had less distinctive <inline-formula>
<mml:math display="inline" id="im29">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values in the peptide binding region than outside (<xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figure S5D</bold>
</xref>), and was thus ignored.</p>
<p>As expected, the positions with the highest <inline-formula>
<mml:math display="inline" id="im30">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the peptide-binding region positions. However, surprisingly, those are followed by the Bw4 KIR ligand (see Online Methods for the definition of HLA positions, <xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1E</bold>
</xref> and <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figure S8</bold>
</xref>), where the black/red stars represent the peptide-binding or the Bw4 positions respectively). The dark blue bars represent the significant positions. A significant position is defined as <inline-formula>
<mml:math display="inline" id="im31">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> larger than the 95th percentile of the <inline-formula>
<mml:math display="inline" id="im32">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values in the null model (where all the frequencies are mixed - see Methods). Note that there is some overlap between PB and Bw4. However, there is a clear selection for Bw4. Positions 80-83 are significantly selected, and only 80,81 are PB, while 76 and 77 are not selected and are PB.</p>
<p>We further divided the 183 positions of exons 2 and 3 into 4 regions (the loop, helices, groove, and the Bw4 region), two exons (exon 2 and exon 3 regions) and peptide-binding/non-peptide-binding regions (PB vs NPB - see <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref> for all abbreviations and <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S5</bold>
</xref> for all groups&#x2019; positions). We computed <inline-formula>
<mml:math display="inline" id="im33">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> for each region (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1F</bold>
</xref>). The <inline-formula>
<mml:math display="inline" id="im34">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values in the Bw4 region and the PB region are the highest, and significantly different than others for A and Bloci. No difference was detected between the other divisions (Kruskal Wallis test and U-test p-values results are shown in <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S4</bold>
</xref>).</p>
<p>To validate the selection in positions outside the peptide binding domain, we compared the <inline-formula>
<mml:math display="inline" id="im35">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> based prediction to a more classical (albeit indirect) method - the ratio of non-synonymous to synonymous substitutions (<inline-formula>
<mml:math display="inline" id="im36">
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo stretchy="false">/</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>) <xref ref-type="bibr" rid="B43">43</xref> population. <inline-formula>
<mml:math display="inline" id="im37">
<mml:mi>&#x3c9;</mml:mi>
</mml:math>
</inline-formula> measures selection pressures by comparing the rate of synonymous (<inline-formula>
<mml:math display="inline" id="im38">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>S</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>) and non-synonymous substitutions (<inline-formula>
<mml:math display="inline" id="im39">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>) at each codon. The expected ratio <inline-formula>
<mml:math display="inline" id="im40">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
<mml:mo stretchy="false">/</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>S</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is computed assuming an equal mutation rate at all positions, but different rates between or within purines and pyrimidines. If selection favors new mutations affecting the phenotypes, a higher ratio is expected, and vice versa. This intuitive interpretation of <inline-formula>
<mml:math display="inline" id="im41">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
<mml:mo stretchy="false">/</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>S</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is supported by theoretical work on the relationship between the <inline-formula>
<mml:math display="inline" id="im42">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
<mml:mo stretchy="false">/</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>S</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> statistic and the underlying selection pressure in a Wright-Fisher model (<xref ref-type="bibr" rid="B44">44</xref>, <xref ref-type="bibr" rid="B45">45</xref>).We compared the <inline-formula>
<mml:math display="inline" id="im43">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> based results with the <inline-formula>
<mml:math display="inline" id="im44">
<mml:mi>&#x3c9;</mml:mi>
</mml:math>
</inline-formula> based results and obtained a similar trend, but a much clearer signal of <inline-formula>
<mml:math display="inline" id="im45">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1F</bold>
</xref>) than for <inline-formula>
<mml:math display="inline" id="im46">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo stretchy="false">/</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1G</bold>
</xref> for all loci and <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figure S1</bold>
</xref> for A, B, and C separated) in Bw4 and PB. Note that many NPB positions also have high and significant <inline-formula>
<mml:math display="inline" id="im47">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values and some PB have low <inline-formula>
<mml:math display="inline" id="im48">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values, as further discussed. We have repeated the results here with a Kimura model (<xref ref-type="bibr" rid="B46">46</xref>), with similar results (<xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figure S6</bold>
</xref>).</p>
<p>High <inline-formula>
<mml:math display="inline" id="im49">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values may simply represent the appearance time during HLA evolution. AA appearing at some position early in the HLA evolution can be expected to be associated with frequent alleles. To test if high <inline-formula>
<mml:math display="inline" id="im50">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula> values only represent evolution time, the time of appearance of each amino acid in each position in the HLA phylogeny was compared with <inline-formula>
<mml:math display="inline" id="im51">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The HLA locus is known to have passed recombination and gene conversion events (<xref ref-type="bibr" rid="B47">47</xref>). Thus, standard phylogeny may fail to capture the HLA locus evolution. To detect such events, we first built a tree of all class I alleles together (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2A</bold>
</xref>), using RDP4 (<xref ref-type="bibr" rid="B48">48</xref>). RDP4 builds phylogenies and in parallel, detects recombination events. When the phylogeny of A, B, and C HLA alleles was computed on the same tree, a clear separation into the A, B, and C loci appears, except for B*07:13, B*67:02, B*73:01, and B*73:02, which appeared on different branches of the tree than the other alleles in their locus (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2A</bold>
</xref>, for a higher resolution view of the tree, please refer to the following link: <ext-link ext-link-type="uri" xlink:href="https://itol.embl.de/tree/109672384336311547023541">https://itol.embl.de/tree/109672384336311547023541</ext-link>). We repeated the analysis per locus (A, B, and C) without these alleles to detect within locus recombination events, which were further removed from the analysis (37 out of 7,892 alleles to have passed recombinations or gene conversion within exon 2 or 3 - <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S2</bold>
</xref>). We thenanalyzed exons 2 and 3 separately to avoid between exon recombinations (<xref ref-type="bibr" rid="B49">49</xref>) using the PHYLIP package (<xref ref-type="bibr" rid="B50">50</xref>), without the removed alleles mentioned in <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S3</bold>
</xref>. Some of these events were previously reported and others are new. The phylogeny was performed at the amino acid level to be consistent with the regression analysis. Thus, amino acid conserving convergent evolution events (the same amino acid with different nucleotides (<xref ref-type="bibr" rid="B51">51</xref>)) were ignored.</p>
<fig id="f2" position="float">
<label>Figure&#xa0;2</label>
<caption>
<p>Depth. <bold>(A)</bold> A phylogenetic tree built with all the HLA class I nucleotide sequences, using RDP4 and the maximum likelihood algorithm on A, B and C loci together. Except for very rare cases, A, B and C are clustered separately. <bold>(B)</bold> The average test correlation over all populations for each of the three models (only sequence, only depth and both). The blue bars represent the A locus, the pink bars represent the B locus and the green bars represent the C locus. <bold>(C)</bold> The correlation between the weights (<inline-formula>
<mml:math display="inline" id="im52">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula>) and depth (<inline-formula>
<mml:math display="inline" id="im53">
<mml:mi>H</mml:mi>
</mml:math>
</inline-formula>) vectors over all positions and AA for each population separately. <bold>(D, E)</bold> PyMOL visualization of the positions side chain in the PB region <bold>(D)</bold> and in the significant NPB region <bold>(E)</bold>. <bold>(F)</bold> Heatmap of <inline-formula>
<mml:math display="inline" id="im54">
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> values of all the significant positions in the NPB region. One can clearly see a dominant effect of Arginine (R). <bold>(G, H)</bold> PyMOL visualization of the MHC class I and the positions in the significant NPB region binding to TCR <bold>(G)</bold> and KIR molecules <bold>(H)</bold>. The orange color represents the positions and their side chain. <bold>(I)</bold> PyMOL visualization for the electrostatic surface potential (ESP) of MHC (left) and KIR (right). The blue color represents positive EPS and the red color represents negative EPS.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fimmu-14-1236080-g002.tif"/>
</fig>
<p>We defined <inline-formula>
<mml:math display="inline" id="im55">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> to be the average depth of each AA in the phylogenic tree of all HLA alleles (see Online Methods) at each position. We then tested whether the allele frequencies can be predicted using <inline-formula>
<mml:math display="inline" id="im56">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Three models were tested: A) Only sequence The sequence model used above. B) <bold>Only depth</bold> prediction using only <inline-formula>
<mml:math display="inline" id="im57">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and 0 when an AA was observed less than 3 times in a position, as in the first model. C) <bold>Both</bold> Both values as input (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2B</bold>
</xref>). Note that the depth model contains the sequence information since it also has 0 for AA not in the sequence, but it also contains information of the depth of each AA. One can see adding depth does not improve the prediction accuracy. Thus, the allele frequency is not strongly affected by the appearance time of its amino acids if at all. To further test for association between <inline-formula>
<mml:math display="inline" id="im58">
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula>
<mml:math display="inline" id="im59">
<mml:mrow>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, we computed the correlation between the weights in the depth independent model (A) and the depth vectors: <inline-formula>
<mml:math display="inline" id="im60">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula> vs <inline-formula>
<mml:math display="inline" id="im61">
<mml:mi>H</mml:mi>
</mml:math>
</inline-formula> over all positions and AA for each population separately (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2C</bold>
</xref>). In the A and B loci, correlations are weak and around 0, while in the C locus, correlations tend to be positive and very significant (<inline-formula>
<mml:math display="inline" id="im62">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mn>0.001</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>) (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2C</bold>
</xref>). Thus, the allele frequency in the C locus is strongly associated with their appearance time, in contrast with the A and B loci (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2B</bold>
</xref>). The time of appearance is not generalizable to new alleles. As such it cannot be used to predict frequency. This is consistent with the lack of prediction of the C allele frequencies using <inline-formula>
<mml:math display="inline" id="im63">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula>. Note that if allele frequency would be fully driven by peptide-binding, one would expect no difference between A, B and C.</p>
<p>Beyond the selection induced by Bw4 and PB domain, there are positions in the PB region with low <inline-formula>
<mml:math display="inline" id="im64">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> value, and positions not in PB and not in Bw4 with high beta value. Out of the 9 insignificant positions in the PB region with <inline-formula>
<mml:math display="inline" id="im65">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
<mml:mo>&lt;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, 6 of them (66%) are Tyrosine (<xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S7</bold>
</xref> for all AAs of these positions), Tyrosine is known for its low evolution rate, among others, because of the neighboring stop codon (<xref ref-type="bibr" rid="B52">52</xref>).</p>
<p>In contrast, there are also high <inline-formula>
<mml:math display="inline" id="im66">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values in the NPB region (<xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S6</bold>
</xref>). We used PyMOL (<xref ref-type="bibr" rid="B53">53</xref>) to compute the positions of their side chain. Interestingly, all these positions are predicted to face outside of the binding cleft toward the T-cell itself or other binding cells (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2E</bold>
</xref>), in contrast to the positions in the PB region that face the binding cleft (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2D</bold>
</xref>), suggesting a selection mediated by the direct interaction with other cells rather than the peptide.</p>
<p>There are two natural candidates for inducing this selection - T-cells and NK cells. To compare those, we used 3 TCR-MHC-I structures and 3 MHC-I KIR interactions with 3DL1, 2DL1 and 2DL2 receptors. We then computed the positions on the MHC molecule closest to the KIR or the TCR. 4 out of 34 significant positions were found to directly bind the TCR (positions 65, 151, 154 and 161) (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2G</bold>
</xref> and <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figures S2A, B</bold>
</xref>). 6 out of the 34 were computed by PyMOL to directly bind KIR molecules (positions 151, 145 and 79 were found to be common among all the structures, but in addition positions 142, 75 and 83 were also found in specific structures) (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2H</bold>
</xref> and <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figures S2C, D</bold>
</xref>). Most of these positions are Arginines. Some of those were found to bind two different KIR receptors. These results suggest a strong charge-mediated effect of KIR binding positions beyond the Bw4 domains, not only in B, but also in A HLA alleles. Note that the TCR variability is large. Thus, the three tested TCRs here may not represent the full variability, and the significant SPB that point outside may bind different TCRs.</p>
<p>To further understand the possible effect of charge on the difference in <inline-formula>
<mml:math display="inline" id="im67">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> among HLA positions with side chains toward other binding cells, we analyzed the <inline-formula>
<mml:math display="inline" id="im68">
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> values of all the significant positions in the NPB region (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2F</bold>
</xref>). We performed a Chi-Square test between the sum of the <inline-formula>
<mml:math display="inline" id="im69">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values for each amino acid and the sum of the <inline-formula>
<mml:math display="inline" id="im70">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values when mixing all these values (see Online Methods). The top 4 AA are R, G, H and K, with R the most significant, suggesting again that selection is strongly associated with a positive charge. A selection for charge may be simply the result of an opposite charge on the 2DL1 binding site. Indeed, when computing the electrostatic surface potential of 2DL1 molecules in front of the positions computed to bind 3DL1 in the MHC a clear negative chargecan be observed (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2I</bold>
</xref>, and detailed view in <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Figure S7</bold>
</xref>). Note that a positive charge was previously reported to be crucial in Immunoglobulin binding, especially in the context of autoimmunity (<xref ref-type="bibr" rid="B54">54</xref>, <xref ref-type="bibr" rid="B55">55</xref>). We here suggest that selection for positive charge in binding TCR and KIR may also be crucial.</p>
</sec>
<sec id="s3" sec-type="discussion">
<label>3</label>
<title>Discussion</title>
<p>Most population genetics methods use indirect measures to explain the gene diversity in present populations and the allele and genotype frequencies and identify selection pressures. We have here analyzed the Human Leukocyte Antigen (HLA) genes and shown that the sequence of HLA A and B alleles can be used to predict the appropriate alleles log frequency with a linear model, where each amino acid at each position contributes a constant value to the allele log frequency. The linear model has been found to be much better than the tested non-linear model suggesting that epistatic effects are limited. Interestingly, the relation between AA sequence and frequency was only present in A and B alleles suggesting a mechanism beyond peptide binding, which is similarin A, B, and C loci.</p>
<p>The relation between AA at a given position and the allele frequency can be explained by either selection or the time since the AA&#x2019;s first appearance in the phylogeny. An AA can be associated with a large allele frequency, either because it contributes to the fitness of the phenotype, or because it is ancient. We have previously addressed this problem through the branch imbalance following mutations (<xref ref-type="bibr" rid="B36">36</xref>). Given the very large number of alleles with measured frequencies, we could here compare directly the depth of each AA with its contribution to the allele frequency. We have shown that in the C locus, depth and contribution to size are highly correlated, but not in A and B.</p>
<p>To measure selection, we defined a novel score <inline-formula>
<mml:math display="inline" id="im71">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> for the relation between sequence and frequency, based on the sum of the regression coefficients&#x2019; absolute values. Applying this score to the MHC class I shows a clear selection in PB positions. However, there were many significant positions in the NPB region, with the strongest selection occurring at the Bw4 ligand. We computed the orientation of the AA side chains and showed that many of them bind directly to KIR even beyond the Bw4 regions. Some of the remaining positions bind directly to the TCR. We found no evidence for selection in LILR binding positions. While there are some sources in the literature of HLA positions that are reported to be bound to the LILR receptor (<xref ref-type="bibr" rid="B18">18</xref>), the current analysis was limited to exon 2 and 3, and the LILR binding region being farther away from the peptide binding cleft may affect other loci.</p>
<p>HLA allele frequencies have been argued to be mainly selected by a balancing selection for peptide-binding (<xref ref-type="bibr" rid="B56">56</xref>). However, our recent results suggest that the selection affecting the HLA region may be much more complex and dominated by a purifying selection at the haplotype level (<xref ref-type="bibr" rid="B37">37</xref>, <xref ref-type="bibr" rid="B41">41</xref>). We have here shown at the AA level that a very strong selection is induced by charge-mediated interactions between KIR and TCR and the MHC molecules. Such a selection may favor specific haplotypes in parallel with the binding peptide-induced balancing selection on alleles.</p>
<p>Multiple caveats have to be considered when analyzing these results. The most significant is the known Linkage Disequilibrium (LD) between HLA genes (<xref ref-type="bibr" rid="B57">57</xref>). Selection in the HLA locus may not limited to single genes, but may work on full haplotypes. Thus, the frequency of a gene in a population may actually be affected by other genes. This may explain the limited accuracy of the prediction based only on each gene sequence. A combined haplotype-based score may improve the accuracy of the current predictor and will be further studied. Another important caveat is the effect of AA diversity. The current selection score is affected by the number of AA candidates in each position. We have tested different score combinations. A score that would avoid this dependence may further improve the accuracy of the selection estimation.</p>
<p>An interesting conclusion from the current study would be that some new alleles may have a higher probability of emerging in the population. To predict such alleles, one would need beyond the current results, a model for the generation probability of alleles from the existing ones.</p>
</sec>
<sec id="s4">
<label>4</label>
<title>Methods</title>
<sec id="s4_1">
<label>4.1</label>
<title>Data</title>
<p>For the lineage analysis, we used the HLA class I allele&#x2019;s exon 2 and exon 3 sequences from the IMGT/HLA Database (<xref ref-type="bibr" rid="B58">58</xref>). To compute the allele&#x2019;s frequencies, we used the data of 6.59 million donor HLA typing from the National Marrow Donor Program Registry (<xref ref-type="bibr" rid="B37">37</xref>, <xref ref-type="bibr" rid="B42">42</xref>, <xref ref-type="bibr" rid="B59">59</xref>). It consists of the abundances of all different HLA haplotypes in the registry. Allele frequencies were derived as marginal sums of the haplotype frequencies. For example, to compute the one-locus A frequencies for a given allele, we merged all extended A C B DRB1 DQB1 haplotypes with the appropriate A allele into an A allele frequency (<xref ref-type="bibr" rid="B42">42</xref>).</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Training test split</title>
<p>We divided the data into training and test sets using the <italic>train_test_split</italic> method from the python scikit-learn library (<xref ref-type="bibr" rid="B60">60</xref>). The first group constitutes 80% of the data and was used for training and finding the best hyperparameters. The second group constitutes 20% of the data and was used as an external test. All the results are reported on the test group.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Neural Network Intelligence</title>
<p>NNI (<xref ref-type="bibr" rid="B61">61</xref>) was used for parameter hypertuning. For each algorithm, NNI was used in two steps for a broad hyperparameter tuning. First, a grid search of a wide range of parameters was performed to get the amplitude of the regularization. The second step was to refine the outcome by setting the tuner to Tree-structure Paezen Estimator (TPE) and running another search, while considering historical measurements. We then found the hyperparameters that produce the highest Spearman correlation on the internal validation set (our metric). The search space of the hyperparameters for the best model, SVR, is presented in <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S8</bold>
</xref>.</p>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Prediction model</title>
<p>One-hot (OH) vectors were used to represent amino acid sequences in <inline-formula>
<mml:math display="inline" id="im72">
<mml:mrow>
<mml:msup>
<mml:mi>&#x211d;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. Each vector describes the AA positions of the HLA of one population. These vectors, <inline-formula>
<mml:math display="inline" id="im73">
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, are used as an input to the regression learning models, and the predicted values <inline-formula>
<mml:math display="inline" id="im74">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are each allele log frequencies in the appropriate population.</p>
<disp-formula>
<label>(2)</label>
<mml:math display="block" id="M2">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>L</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im75">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the frequencies vector of the <italic>i</italic>-th population. Positions with less than 3 AA differing from the majority AA were ignored.</p>
<p>The OH vectors were the input to an SVR learning algorithm, for each population by itself. When the model was trained on A, B and C loci together, we added a OH vector at the end of the input sequence in order to separate the different alleles. When the model was trained on all the populations together, we added a OH vector at the end of the sequence (after the one-hot vector that separates the alleles) in order to differentiate between alleles that came from different populations.</p>
<p>For each training test division, the Spearman correlations were averaged across all ten trials. The SVR (Support Vector Regression - python scikit-learn library (<xref ref-type="bibr" rid="B62">62</xref>)) gave the highest correlation on the test set. We optimized each algorithm separately using NNI (<xref ref-type="bibr" rid="B61">61</xref>) on an internal validation set. The parameters for the best algorithm, SVR (the linear and non-linear), are presented in <xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>.</p>
<table-wrap id="T2" position="float">
<label>Table&#xa0;2</label>
<caption>
<p>SVR models hyperparameters.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="center"/>
<th valign="top" align="center">SVR</th>
<th valign="top" align="center">Linear SVR</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="center">
<bold>Normalization</bold>
</td>
<td valign="top" align="center">z_score</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="center">
<bold>Kernel</bold>
</td>
<td valign="top" align="center">RBF</td>
<td valign="top" align="center">Linear</td>
</tr>
<tr>
<td valign="top" align="center">
<bold>C</bold>
</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0.01</td>
</tr>
<tr>
<td valign="top" align="center">
<bold>Epsilon</bold>
</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">2</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The <inline-formula>
<mml:math display="inline" id="im76">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> score assigned to each position was calculated as a sum of the absolute value of the SVR coefficients attribute, which assigned a weight to the features.</p>
</sec>
<sec id="s4_5">
<label>4.5</label>
<title>Definition of HLA positions</title>
<p>The peptide-binding and the Bw4 positions are shown in <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S5</bold>
</xref>, as defined in (<xref ref-type="bibr" rid="B63">63</xref>, <xref ref-type="bibr" rid="B64">64</xref>).</p>
</sec>
<sec id="s4_6">
<label>4.6</label>
<title>Estimate of amino acid depth</title>
<p>To create the phylogenetic tree, we split our data into two exons (exon 2 - positions 1-90 and exon 3 - positions 91-183), removed the alleles mentioned in <xref ref-type="supplementary-material" rid="SM1">
<bold>Supp. Mat. Table S3</bold>
</xref> and built phylogenic trees for each of the exons separately using the PHYLIP package (<xref ref-type="bibr" rid="B50">50</xref>). The trees were built using a Maximum Parsimony algorithm for each locus by itself. We added a single gene from another locus to each such tree (A*01:02 for B and C, and B*07:03 for A). We checked that the root is indeed between the outgroup and all the alleles within this locus for all three loci. A Fitch algorithm was then used to estimate the sequence of the internal nodes in the tree.</p>
<p>Then, for each node in the tree, we calculated its level in the tree (<inline-formula>
<mml:math display="inline" id="im77">
<mml:mi>l</mml:mi>
</mml:math>
</inline-formula>) so that the level of the node is set to be the level of its son node plus 1, and the level of the leaves is set to be 0 as follows:</p>
<disp-formula>
<label>(3)</label>
<mml:math display="block" id="M3">
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Node <inline-formula>
<mml:math display="inline" id="im78">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> is the descendant of the node <inline-formula>
<mml:math display="inline" id="im79">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>. Note that all the leaves in the tree are the alleles and the edges are the sequences composed of the amino acids. For each position in the sequence and each level, we calculated the probability of each amino acids at this level in the tree.</p>
<p>To compare the weights vector (<inline-formula>
<mml:math display="inline" id="im80">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula>) to the phylogenetic results, we defined <inline-formula>
<mml:math display="inline" id="im81">
<mml:mi>H</mml:mi>
</mml:math>
</inline-formula> as the age matrix as follows:</p>
<disp-formula>
<label>(4)</label>
<mml:math display="block" id="M4">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mi>i</mml:mi>
<mml:mo>&#xb7;</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula>
<mml:math display="inline" id="im82">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula> is the amino acid, <inline-formula>
<mml:math display="inline" id="im83">
<mml:mrow>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>k</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the number of levels on the tree at the <inline-formula>
<mml:math display="inline" id="im84">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula>-th position and <inline-formula>
<mml:math display="inline" id="im85">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the probability of amino acid <inline-formula>
<mml:math display="inline" id="im86">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula> in level <inline-formula>
<mml:math display="inline" id="im87">
<mml:mi>i</mml:mi>
</mml:math>
</inline-formula>. We consider each value in the matrix as the depth of the <inline-formula>
<mml:math display="inline" id="im88">
<mml:mi>j</mml:mi>
</mml:math>
</inline-formula>-th amino acid at the <inline-formula>
<mml:math display="inline" id="im89">
<mml:mi>k</mml:mi>
</mml:math>
</inline-formula>-th position.</p>
</sec>
<sec id="s4_7">
<label>4.7</label>
<title>Correlation between beta and depth</title>
<p>To compare the weights vector, <inline-formula>
<mml:math display="inline" id="im90">
<mml:mi>&#x3b2;</mml:mi>
</mml:math>
</inline-formula> to the depth above, we compared the two matrices: <inline-formula>
<mml:math display="inline" id="im91">
<mml:mrow>
<mml:msup>
<mml:mi>H</mml:mi>
<mml:mrow>
<mml:mn>20</mml:mn>
<mml:mi>x</mml:mi>
<mml:mn>183</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> representing the age rate of each amino acid in each position, and <inline-formula>
<mml:math display="inline" id="im92">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b2;</mml:mi>
<mml:mrow>
<mml:mn>20</mml:mn>
<mml:mi>x</mml:mi>
<mml:mn>183</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> representing the coefficient of each amino acid on each position (foreach population). We checked the correlation between the beta and the depth values for each locus and each population (as a single flattened vector). We ignored AA absent from the data at any positions.</p>
</sec>
<sec id="s4_8">
<label>4.8</label>
<title>
<italic>DN/DS</italic> based estimates of selection</title>
<p>We used the nucleotide sequence of all the alleles for each of the loci from the (<xref ref-type="bibr" rid="B65">65</xref>) site (we ignored alleles containing non AA codes). We separated the sequence into codons, 3 nucleotides in each codon and converted each codon to its corresponding amino acid. For each column (each codon), we calculated the number of mutants in this column, and counted the number of mutants whose amino acid differs from the amino acid of the consensus in that column (<italic>diff_aa_mutants</italic>), where the consensus was based on the most frequent nucleotide in the same position among all loci. Then, for each column, we took the consensus codon, and changed each of its three nucleotides to the three remaining nucleotides. We converted these 3 nucleotides to amino acid and counted the number of amino acids different from the original amino acid (we divided by 9 to get a number between 0 and 1). We multiplied that number by the number of mutants in each codon to get the expected number of different amino acids. In order to get the real number of different amino acids for each codon, we divided the <italic>diff_aa_mutants</italic> value by the number of mutants in that codon.</p>
<p>Finally, for each codon, we calculated the ratio between the expected number of different amino acids and the real number of different amino acids and calculated the Chi-Square value by the following formula:</p>
<disp-formula>
<label>(5)</label>
<mml:math display="block" id="M5">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>q</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>N</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>x</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>x</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>and extracted the corresponding p-value for each codon.</p>
</sec>
<sec id="s4_9">
<label>4.9</label>
<title>MHC-I structures</title>
<p>We have analyzed several structures: 6TDQ (<xref ref-type="bibr" rid="B66">66</xref>) for visualization of PB and NPB regions, 1AO7 (<xref ref-type="bibr" rid="B67">67</xref>), 3WOW (<xref ref-type="bibr" rid="B68">68</xref>) and 1BD2 (<xref ref-type="bibr" rid="B69">69</xref>) for TCR-MHC-I structures, 1EFX (<xref ref-type="bibr" rid="B70">70</xref>), 1IM9 (<xref ref-type="bibr" rid="B71">71</xref>) and 5T6Z (<xref ref-type="bibr" rid="B72">72</xref>) for KIR-MHC-I structures, and 4NO0 (<xref ref-type="bibr" rid="B73">73</xref>), 1P7Q (<xref ref-type="bibr" rid="B74">74</xref>) for LILR-MHC-I structures. For each structure, we used (<xref ref-type="bibr" rid="B75">75</xref>) for calculating the distances between the MHC positions and the TCR or the KIR positions. We used PyMOL (<xref ref-type="bibr" rid="B53">53</xref>) for computing the positions of their side chain, for calculating the electrostatic surface potential (ESP) and for visualization.</p>
</sec>
<sec id="s4_10">
<label>4.10</label>
<title>Statistical test for selection</title>
<p>To test for the significant deviation of <inline-formula>
<mml:math display="inline" id="im93">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> at a given position in all loci combined from a null model, we compared the <inline-formula>
<mml:math display="inline" id="im94">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> value to the one obtained in the same position (with the same sequence), when the frequencies of each population were scrambled - i.e. the frequency of a given allele was assigned to a different allele over all loci. Significant positions were defined as positions where <inline-formula>
<mml:math display="inline" id="im95">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is higher than 95% of the <inline-formula>
<mml:math display="inline" id="im96">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values in the scrambled model. Note that <inline-formula>
<mml:math display="inline" id="im97">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is defined as the sum over all the absolute values of the coefficients over all populations and all AA at the appropriate position. The null model was computed over 100 Cross Validations (CV).</p>
<p>To test for AA consistently selected at <inline-formula>
<mml:math display="inline" id="im98">
<mml:mrow>
<mml:mi>&#x3b2;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> values of all the significant positions in the NPB region, we performed a Chi-Square test between the sum of the <inline-formula>
<mml:math display="inline" id="im99">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values for each amino acid and the sum of the <inline-formula>
<mml:math display="inline" id="im100">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>&#x3b2;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> values when mixing all these values over 100 random mixings. We computed how often each AA with a <inline-formula>
<mml:math display="inline" id="im101">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mn>0.05</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> appears in these cross-validations.</p>
</sec>
</sec>
<sec id="s5" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="s10">
<bold>Supplementary Material</bold>
</xref>. Further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s6" sec-type="author-contributions">
<title>Author contributions</title>
<p>YL supervised the work and wrote a part of the manuscript RL performed the analysis and wrote the manuscript LL performed part of the analysis. All authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<sec id="s7" sec-type="funding-information">
<title>Funding</title>
<p>The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The work of RL and YL was funded by ISF grant 870/20, a Vatat DSI grant and an Israel MOH grant.</p>
</sec>
<sec id="s8" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s9" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s10" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fimmu.2023.1236080/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fimmu.2023.1236080/full#supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet_1.pdf" id="SM1" mimetype="application/pdf"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Voight</surname> <given-names>BF</given-names>
</name>
<name>
<surname>Kudaravalli</surname> <given-names>S</given-names>
</name>
<name>
<surname>Wen</surname> <given-names>X</given-names>
</name>
<name>
<surname>Pritchard</surname> <given-names>JK</given-names>
</name>
</person-group>. <article-title>A map of recent positive selection in the human genome</article-title>. <source>PloS Biol</source> (<year>2006</year>) <volume>4</volume>(<issue>3</issue>):<elocation-id>e72</elocation-id>. doi: <pub-id pub-id-type="doi">10.1371/journal.pbio.0040072</pub-id>
</citation>
</ref>
<ref id="B2">
<label>2</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Charlesworth</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Balancing selection and its effects on sequences in nearby genome regions</article-title>. <source>PloS Genet</source> (<year>2006</year>) <volume>2</volume>(<issue>4</issue>):<elocation-id>e64</elocation-id>. doi: <pub-id pub-id-type="doi">10.1371/journal.pgen.0020064</pub-id>
</citation>
</ref>
<ref id="B3">
<label>3</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hedrick</surname> <given-names>PW</given-names>
</name>
<name>
<surname>Thomson</surname> <given-names>G</given-names>
</name>
</person-group>. <article-title>Evidence for balancing selection at HLA</article-title>. <source>Genetics</source> (<year>1983</year>) <volume>104</volume>(<issue>3</issue>):<page-range>449&#x2013;56</page-range>. doi: <pub-id pub-id-type="doi">10.1093/genetics/104.3.449</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cullen</surname> <given-names>M</given-names>
</name>
<name>
<surname>Perfetto</surname> <given-names>SP</given-names>
</name>
<name>
<surname>Klitz</surname> <given-names>W</given-names>
</name>
<name>
<surname>Nelson</surname> <given-names>G</given-names>
</name>
<name>
<surname>Carrington</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Complete sequence and gene map of a human major histocom patibility complex</article-title>. <source>Nature</source> (<year>1999</year>) <volume>401</volume>(<issue>6756</issue>):<page-range>921&#x2013;3</page-range>. doi: <pub-id pub-id-type="doi">10.1038/44853</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hughes</surname> <given-names>AL</given-names>
</name>
<name>
<surname>Ota</surname> <given-names>T</given-names>
</name>
<name>
<surname>Nei</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Positive darwinian selection promotes charge profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex molecules</article-title>. <source>Mol Biol Evol</source> (<year>1990</year>) <volume>7</volume>(<issue>6</issue>):<page-range>515&#x2013;24</page-range>. doi: <pub-id pub-id-type="doi">10.1093/oxfordjournals.molbev.a040626</pub-id>
</citation>
</ref>
<ref id="B6">
<label>6</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Slatkin</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Joint estimation of selection intensity and mutation rate under balancing selection with applications to HLA</article-title>. <source>bioRxiv.</source> (<year>2021</year>). doi: <pub-id pub-id-type="doi">10.1101/2021.11.18.469194</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pierini</surname> <given-names>F</given-names>
</name>
<name>
<surname>Lenz</surname> <given-names>TL</given-names>
</name>
</person-group>. <article-title>Divergent allele advantage at human MHC genes: Signatures of past and ongoing selection</article-title>. <source>Mol Biol Evol</source> (<year>2018</year>) <volume>35</volume>(<issue>9</issue>):<page-range>2145&#x2013;58</page-range>. doi: <pub-id pub-id-type="doi">10.1093/molbev/msy116</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Klein</surname> <given-names>J</given-names>
</name>
<name>
<surname>Sato</surname> <given-names>A</given-names>
</name>
<name>
<surname>Nikolaidis</surname> <given-names>N</given-names>
</name>
</person-group>. <article-title>MHC, TSP, and the origin of species: From immunogenetics to evolutionary genetics</article-title>. <source>Annu Rev Genet</source> (<year>2007</year>) <volume>41</volume>:<fpage>281</fpage>&#x2013;<lpage>304</lpage>. doi: <pub-id pub-id-type="doi">10.1146/annurev.genet.41.110306.130137</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Radwan</surname> <given-names>J</given-names>
</name>
<name>
<surname>Babik</surname> <given-names>W</given-names>
</name>
<name>
<surname>Kaufman</surname> <given-names>J</given-names>
</name>
<name>
<surname>Lenz</surname> <given-names>TL</given-names>
</name>
<name>
<surname>Winternitz</surname> <given-names>J</given-names>
</name>
</person-group>. <article-title>Advances in the evolutionary understanding of MHC polymorphism</article-title>. <source>Trends Genet</source> (<year>2020</year>) <volume>36</volume>(<issue>4</issue>):<fpage>298</fpage>&#x2013;<lpage>311</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tig.2020.01.008</pub-id>
</citation>
</ref>
<ref id="B10">
<label>10</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lenz</surname> <given-names>TL</given-names>
</name>
<name>
<surname>Mueller</surname> <given-names>B</given-names>
</name>
<name>
<surname>Trillmich</surname> <given-names>F</given-names>
</name>
<name>
<surname>Wolf</surname> <given-names>JB</given-names>
</name>
</person-group>. <article-title>Divergent allele advantage at MHC-DRB through direct and maternal genotypic effects and its consequences for allele pool composition and mating</article-title>. <source>Proc R Soc B: Biol Sci</source> (<year>2013</year>) <volume>280</volume>(<issue>1762</issue>):<fpage>20130714</fpage>. doi: <pub-id pub-id-type="doi">10.1098/rspb.2013.0714</pub-id>
</citation>
</ref>
<ref id="B11">
<label>11</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rudolph</surname> <given-names>MG</given-names>
</name>
<name>
<surname>Stanfield</surname> <given-names>RL</given-names>
</name>
<name>
<surname>Wilson</surname> <given-names>IA</given-names>
</name>
</person-group>. <article-title>How tcrs bind mhcs, peptides, and coreceptors</article-title>. <source>Annu Rev Immunol</source> (<year>2006</year>) <volume>24</volume>:<page-range>419&#x2013;66</page-range>. doi: <pub-id pub-id-type="doi">10.1146/annurev.immunol.23.021704.115658</pub-id>
</citation>
</ref>
<ref id="B12">
<label>12</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marrack</surname> <given-names>P</given-names>
</name>
<name>
<surname>Scott-Browne</surname> <given-names>JP</given-names>
</name>
<name>
<surname>Dai</surname> <given-names>S</given-names>
</name>
<name>
<surname>Gapin</surname> <given-names>L</given-names>
</name>
<name>
<surname>Kappler</surname> <given-names>JW</given-names>
</name>
</person-group>. <article-title>Evolutionarily conserved amino acids that control tcr-mhc interaction</article-title>. <source>Annu Rev Immunol</source> (<year>2008</year>) <volume>26</volume>:<fpage>171</fpage>&#x2013;<lpage>203</lpage>. doi: <pub-id pub-id-type="doi">10.1146/annurev.immunol.26.021607.090421</pub-id>
</citation>
</ref>
<ref id="B13">
<label>13</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Glazer</surname> <given-names>N</given-names>
</name>
<name>
<surname>Akerman</surname> <given-names>O</given-names>
</name>
<name>
<surname>Louzoun</surname> <given-names>Y</given-names>
</name>
</person-group>. <article-title>Naive and memory T cells TCR-HLA binding prediction</article-title>. <source>Oxford Open Immunol</source> (<year>2022</year>) <volume>3</volume>. doi: <pub-id pub-id-type="doi">10.1093/oxfimm/iqac001</pub-id>
</citation>
</ref>
<ref id="B14">
<label>14</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wucherpfennig</surname> <given-names>WK</given-names>
</name>
<name>
<surname>Call</surname> <given-names>MJ</given-names>
</name>
<name>
<surname>Deng</surname> <given-names>L</given-names>
</name>
<name>
<surname>Mariuzza</surname> <given-names>R</given-names>
</name>
</person-group>. <article-title>Structural alterations in peptide&#x2013;MHC recognition by self-reactive T cell receptors</article-title>. <source>Curr Opinion Immunol</source>. (<year>2009</year>) <volume>21</volume>(<issue>6</issue>):<page-range>590&#x2013;5</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.coi.2009.07.008</pub-id>
</citation>
</ref>
<ref id="B15">
<label>15</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andoniou</surname> <given-names>CE</given-names>
</name>
<name>
<surname>Andrews</surname> <given-names>DM</given-names>
</name>
<name>
<surname>Degli-Esposti</surname> <given-names>MA</given-names>
</name>
</person-group>. <article-title>Natural killer cells in viral infection: More than just killers</article-title>. <source>Immunol Rev</source> (<year>2006</year>) <volume>214</volume>(<issue>1</issue>):<page-range>239&#x2013;50</page-range>. doi: <pub-id pub-id-type="doi">10.1111/j.1600-065X.2006.00465.x</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khakoo</surname> <given-names>SI</given-names>
</name>
<name>
<surname>Carrington</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>KIR and disease: A model system or system of models</article-title>? <source>Immunol Rev</source> (<year>2006</year>) <volume>214</volume>(<issue>1</issue>):<fpage>186</fpage>&#x2013;<lpage>201</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1600-065X.2006.00459.x</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moesta</surname> <given-names>AK</given-names>
</name>
<name>
<surname>Norman</surname> <given-names>PJ</given-names>
</name>
<name>
<surname>Yawata</surname> <given-names>M</given-names>
</name>
<name>
<surname>Yawata</surname> <given-names>N</given-names>
</name>
<name>
<surname>Gleimer</surname> <given-names>M</given-names>
</name>
<name>
<surname>Parham</surname> <given-names>P</given-names>
</name>
</person-group>. <article-title>Synergistic polymorphism at two positions distal to the ligand-binding site makes KIR2DL2 a stronger receptor for HLA-C than kir2dl3</article-title>. <source>J Immunol</source> (<year>2008</year>) <volume>180</volume>(<issue>6</issue>):<page-range>3969&#x2013;79</page-range>. doi: <pub-id pub-id-type="doi">10.4049/jimmunol.180.6.3969</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Debebe</surname> <given-names>BJ</given-names>
</name>
<name>
<surname>Boelen</surname> <given-names>L</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>JC</given-names>
</name>
<name>
<surname>Thio</surname> <given-names>CL</given-names>
</name>
<name>
<surname>Astemborski</surname> <given-names>J</given-names>
</name>
<name>
<surname>Kirk</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>Identifying the immune interactions underlying HLA class I disease associations</article-title>. <source>Elife</source> (<year>2020</year>) <volume>9</volume>:<elocation-id>e54558</elocation-id>. doi: <pub-id pub-id-type="doi">10.7554/eLife.54558</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jones</surname> <given-names>DC</given-names>
</name>
<name>
<surname>Kosmoliaptsis</surname> <given-names>V</given-names>
</name>
<name>
<surname>Apps</surname> <given-names>R</given-names>
</name>
<name>
<surname>Lapaque</surname> <given-names>N</given-names>
</name>
<name>
<surname>Smith</surname> <given-names>I</given-names>
</name>
<name>
<surname>Kono</surname> <given-names>A</given-names>
</name>
<etal/>
</person-group>. <article-title>HLA class I allelic sequence and conformation regulate leukocyte Ig-like receptor binding</article-title>. <source>J Immunol</source> (<year>2011</year>) <volume>186</volume>(<issue>5</issue>):<page-range>2990&#x2013;7</page-range>. doi: <pub-id pub-id-type="doi">10.4049/jimmunol.1003078</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bashirova</surname> <given-names>AA</given-names>
</name>
<name>
<surname>Martin-Gayo</surname> <given-names>E</given-names>
</name>
<name>
<surname>Jones</surname> <given-names>DC</given-names>
</name>
<name>
<surname>Qi</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Apps</surname> <given-names>R</given-names>
</name>
<name>
<surname>Gao</surname> <given-names>X</given-names>
</name>
<etal/>
</person-group>. <article-title>LILRB2 interaction with HLA class I cor435 relates with control of HIV-1 infection</article-title>. <source>PloS Genet</source> (<year>2014</year>) <volume>10</volume>(<issue>3</issue>):<elocation-id>e1004196</elocation-id>. doi: <pub-id pub-id-type="doi">10.1371/journal.pgen.1004196</pub-id>
</citation>
</ref>
<ref id="B21">
<label>21</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Isildak</surname> <given-names>U</given-names>
</name>
<name>
<surname>Stella</surname> <given-names>A</given-names>
</name>
<name>
<surname>Fumagalli</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Distinguishing between recent balancing selection and incomplete sweep using deep neural networks</article-title>. <source>Mol Ecol Resour</source> (<year>2021</year>) <volume>21</volume>(<issue>8</issue>):<page-range>2706&#x2013;18</page-range>. doi: <pub-id pub-id-type="doi">10.1111/1755-0998.13379</pub-id>
</citation>
</ref>
<ref id="B22">
<label>22</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fumagalli</surname> <given-names>M</given-names>
</name>
<name>
<surname>Cagliani</surname> <given-names>R</given-names>
</name>
<name>
<surname>Pozzoli</surname> <given-names>U</given-names>
</name>
<name>
<surname>Riva</surname> <given-names>S</given-names>
</name>
<name>
<surname>Comi</surname> <given-names>G</given-names>
</name>
<name>
<surname>Menozzi</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>A population genetics study of the familial Mediterranean fever gene: Evidence of balancing selection under an overdominance regime</article-title>. <source>Genes Immun</source> (<year>2009</year>) <volume>10</volume>(<issue>8</issue>):<page-range>678&#x2013;86</page-range>. doi: <pub-id pub-id-type="doi">10.1038/gene.2009.59</pub-id>
</citation>
</ref>
<ref id="B23">
<label>23</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cagliani</surname> <given-names>R</given-names>
</name>
<name>
<surname>Fumagalli</surname> <given-names>M</given-names>
</name>
<name>
<surname>Riva</surname> <given-names>S</given-names>
</name>
<name>
<surname>Pozzoli</surname> <given-names>U</given-names>
</name>
<name>
<surname>Fracassetti</surname> <given-names>M</given-names>
</name>
<name>
<surname>Bresolin</surname> <given-names>N</given-names>
</name>
<etal/>
</person-group>. <article-title>Polymorphisms in the CPB2 gene are maintained by balancing selection and result in haplotype-preferential splicing of exon 7</article-title>. <source>Mol Biol Evol</source> (<year>2010</year>) <volume>27</volume>(<issue>8</issue>):<page-range>1945&#x2013;54</page-range>. doi: <pub-id pub-id-type="doi">10.1093/molbev/msq082</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Soni</surname> <given-names>V</given-names>
</name>
<name>
<surname>Vos</surname> <given-names>M</given-names>
</name>
<name>
<surname>Eyre-Walker</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>A new test suggests that balancing selection maintains hundreds of non-synonymous polymorphisms in the human genome</article-title>. <source>bioRxiv.</source> (<year>2021</year>). doi: <pub-id pub-id-type="doi">10.1101/2021.02.08.430226</pub-id>
</citation>
</ref>
<ref id="B25">
<label>25</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andr&#xe9;s</surname> <given-names>AM</given-names>
</name>
<name>
<surname>Hubisz</surname> <given-names>MJ</given-names>
</name>
<name>
<surname>Indap</surname> <given-names>A</given-names>
</name>
<name>
<surname>Torgerson</surname> <given-names>DG</given-names>
</name>
<name>
<surname>Degenhardt</surname> <given-names>JD</given-names>
</name>
<name>
<surname>Boyko</surname> <given-names>AR</given-names>
</name>
<etal/>
</person-group>. <article-title>Targets of balancing selection in the human genome</article-title>. <source>Mol Biol Evol</source> (<year>2009</year>) <volume>26</volume>(<issue>12</issue>):<page-range>2755&#x2013;64</page-range>. doi: <pub-id pub-id-type="doi">10.1093/molbev/msp190</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Siewert</surname> <given-names>KM</given-names>
</name>
<name>
<surname>Voight</surname> <given-names>BF</given-names>
</name>
</person-group>. <article-title>Detecting long-term balancing selection using allele frequency correlation</article-title>. <source>Mol Biol Evol</source> (<year>2017</year>) <volume>34</volume>(<issue>11</issue>):<fpage>2996</fpage>&#x2013;<lpage>3005</lpage>. doi: <pub-id pub-id-type="doi">10.1093/molbev/msx209</pub-id>
</citation>
</ref>
<ref id="B27">
<label>27</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bitarello</surname> <given-names>BD</given-names>
</name>
<name>
<surname>De Filippo</surname> <given-names>C</given-names>
</name>
<name>
<surname>Teixeira</surname> <given-names>JC</given-names>
</name>
<name>
<surname>Schmidt</surname> <given-names>JM</given-names>
</name>
<name>
<surname>Kleinert</surname> <given-names>P</given-names>
</name>
<name>
<surname>Meyer</surname> <given-names>D</given-names>
</name>
<etal/>
</person-group>. <article-title>Signatures of long-term balancing selection in human genomes</article-title>. <source>Genome Biol Evol</source> (<year>2018</year>) <volume>10</volume>(<issue>3</issue>):<page-range>939&#x2013;55</page-range>. doi: <pub-id pub-id-type="doi">10.1093/gbe/evy054</pub-id>
</citation>
</ref>
<ref id="B28">
<label>28</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cagliani</surname> <given-names>R</given-names>
</name>
<name>
<surname>Fumagalli</surname> <given-names>M</given-names>
</name>
<name>
<surname>Riva</surname> <given-names>S</given-names>
</name>
<name>
<surname>Pozzoli</surname> <given-names>U</given-names>
</name>
<name>
<surname>Comi</surname> <given-names>GP</given-names>
</name>
<name>
<surname>Menozzi</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>The signature of long-standing balancing selection at the human defensin <italic>&#x3b2;</italic>-1 promoter</article-title>. <source>Genome Biol</source> (<year>2008</year>) <volume>9</volume>(<issue>9</issue>):<fpage>1</fpage>&#x2013;<lpage>11</lpage>. doi: <pub-id pub-id-type="doi">10.1186/gb-2008-9-9-r143</pub-id>
</citation>
</ref>
<ref id="B29">
<label>29</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leffler</surname> <given-names>EM</given-names>
</name>
<name>
<surname>Gao</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Pfeifer</surname> <given-names>S</given-names>
</name>
<name>
<surname>S&#xe9;gurel</surname> <given-names>L</given-names>
</name>
<name>
<surname>Auton</surname> <given-names>A</given-names>
</name>
<name>
<surname>Venn</surname> <given-names>O</given-names>
</name>
<etal/>
</person-group>. <article-title>Multiple instances of ancient balancing selection shared between humans and chimpanzees</article-title>. <source>Science</source> (<year>2013</year>) <volume>339</volume>(<issue>6127</issue>):<page-range>1578&#x2013;82</page-range>. doi: <pub-id pub-id-type="doi">10.1126/science.1234070</pub-id>
</citation>
</ref>
<ref id="B30">
<label>30</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Teixeira</surname> <given-names>JC</given-names>
</name>
<name>
<surname>De Filippo</surname> <given-names>C</given-names>
</name>
<name>
<surname>Weihmann</surname> <given-names>A</given-names>
</name>
<name>
<surname>Meneu</surname> <given-names>JR</given-names>
</name>
<name>
<surname>Racimo</surname> <given-names>F</given-names>
</name>
<name>
<surname>Dannemann</surname> <given-names>M</given-names>
</name>
<etal/>
</person-group>. <article-title>Long-term balancing selection in lad1 maintains a missense trans-species polymorphism in humans, chimpanzees, and bonobos</article-title>. <source>Mol Biol Evol</source> (<year>2015</year>) <volume>32</volume>(<issue>5</issue>):<page-range>1186&#x2013;96</page-range>. doi: <pub-id pub-id-type="doi">10.1093/molbev/msv007</pub-id>
</citation>
</ref>
<ref id="B31">
<label>31</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>DeGiorgio</surname> <given-names>M</given-names>
</name>
<name>
<surname>Lohmueller</surname> <given-names>KE</given-names>
</name>
<name>
<surname>Nielsen</surname> <given-names>R</given-names>
</name>
</person-group>. <article-title>A model-based approach for identifying signatures of ancient balancing selection in genetic data</article-title>. <source>PloS Genet</source> (<year>2014</year>) <volume>10</volume>(<issue>8</issue>):<elocation-id>e1004561</elocation-id>. doi: <pub-id pub-id-type="doi">10.1371/journal.pgen.1004561</pub-id>
</citation>
</ref>
<ref id="B32">
<label>32</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cheng</surname> <given-names>X</given-names>
</name>
<name>
<surname>DeGiorgio</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Flexible mixture model approaches that accommodate foot print size variability for robust detection of balancing selection</article-title>. <source>Mol Biol Evol</source> (<year>2020</year>) <volume>37</volume>(<issue>11</issue>):<page-range>3267&#x2013;91</page-range>. doi: <pub-id pub-id-type="doi">10.1093/molbev/msaa134</pub-id>
</citation>
</ref>
<ref id="B33">
<label>33</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fumagalli</surname> <given-names>M</given-names>
</name>
<name>
<surname>Cagliani</surname> <given-names>R</given-names>
</name>
<name>
<surname>Pozzoli</surname> <given-names>U</given-names>
</name>
<name>
<surname>Riva</surname> <given-names>S</given-names>
</name>
<name>
<surname>Comi</surname> <given-names>GP</given-names>
</name>
<name>
<surname>Menozzi</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>Widespread balancing selection and pathogen-driven selection at blood group antigen genes</article-title>. <source>Genome Res</source> (<year>2009</year>) <volume>19</volume>(<issue>2</issue>):<fpage>199</fpage>&#x2013;<lpage>212</lpage>. doi: <pub-id pub-id-type="doi">10.1101/gr.082768.108</pub-id>
</citation>
</ref>
<ref id="B34">
<label>34</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hughes</surname> <given-names>AL</given-names>
</name>
<name>
<surname>Yeager</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Natural selection at major histocompatibility complex loci of vertebrates</article-title>. <source>Annu Rev Genet</source> (<year>1998</year>) <volume>32</volume>(<issue>1</issue>):<page-range>415&#x2013;35</page-range>. doi: <pub-id pub-id-type="doi">10.1146/annurev.genet.32.1.415</pub-id>
</citation>
</ref>
<ref id="B35">
<label>35</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ronen</surname> <given-names>R</given-names>
</name>
<name>
<surname>Udpa</surname> <given-names>N</given-names>
</name>
<name>
<surname>Halperin</surname> <given-names>E</given-names>
</name>
<name>
<surname>Bafna</surname> <given-names>V</given-names>
</name>
</person-group>. <article-title>Learning natural selection from the site frequency spectrum</article-title>. <source>Genetics</source> (<year>2013</year>) <volume>195</volume>(<issue>1</issue>):<page-range>181&#x2013;93</page-range>. doi: <pub-id pub-id-type="doi">10.1534/genetics.113.152587</pub-id>
</citation>
</ref>
<ref id="B36">
<label>36</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liberman</surname> <given-names>G</given-names>
</name>
<name>
<surname>Benichou</surname> <given-names>JI</given-names>
</name>
<name>
<surname>Maman</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Glanville</surname> <given-names>J</given-names>
</name>
<name>
<surname>Alter</surname> <given-names>I</given-names>
</name>
<name>
<surname>Louzoun</surname> <given-names>Y</given-names>
</name>
</person-group>. <article-title>Estimate of within population incremental selection through branch imbalance in lineage trees</article-title>. <source>Nucleic Acids Res</source> (<year>2016</year>) <volume>44</volume>(<issue>5</issue>):<page-range>e46&#x2013;6</page-range>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkv1198</pub-id>
</citation>
</ref>
<ref id="B37">
<label>37</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alter</surname> <given-names>I</given-names>
</name>
<name>
<surname>Gragert</surname> <given-names>L</given-names>
</name>
<name>
<surname>Fingerson</surname> <given-names>S</given-names>
</name>
<name>
<surname>Maiers</surname> <given-names>M</given-names>
</name>
<name>
<surname>Louzoun</surname> <given-names>Y</given-names>
</name>
</person-group>. <article-title>HLA class I haplotype di421 versity is consistent with selection for frequent existing haplotypes</article-title>. <source>PloS Comput Biol</source> (<year>2017</year>) <volume>13</volume>(<issue>8</issue>):<elocation-id>e1005693</elocation-id>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005693</pub-id>
</citation>
</ref>
<ref id="B38">
<label>38</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sawai</surname> <given-names>H</given-names>
</name>
<name>
<surname>Nishida</surname> <given-names>N</given-names>
</name>
<name>
<surname>Khor</surname> <given-names>S-S</given-names>
</name>
<name>
<surname>Honda</surname> <given-names>M</given-names>
</name>
<name>
<surname>Sugiyama</surname> <given-names>M</given-names>
</name>
<name>
<surname>Baba</surname> <given-names>N</given-names>
</name>
<etal/>
</person-group>. <article-title>Genome-wide association study identified new susceptible genetic variants in HLA class I region for hepatitis B virus-related hepato581 cellular carcinoma</article-title>. <source>Sci Rep</source> (<year>2018</year>) <volume>8</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>8</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-018-26217-7</pub-id>
</citation>
</ref>
<ref id="B39">
<label>39</label>
<citation citation-type="web">
<person-group person-group-type="author">
<collab>Total number of donors and cord blood units</collab>
</person-group>. (<year>2022</year>). Available at: <uri xlink:href="https://statistics.wmda.info/">https://statistics.wmda.info/</uri>.</citation>
</ref>
<ref id="B40">
<label>40</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Slater</surname> <given-names>N</given-names>
</name>
<name>
<surname>Louzoun</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Gragert</surname> <given-names>L</given-names>
</name>
<name>
<surname>Maiers</surname> <given-names>M</given-names>
</name>
<name>
<surname>Chatterjee</surname> <given-names>A</given-names>
</name>
<name>
<surname>Albrecht</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Power laws for heavy-tailed distributions: Modeling allele and haplotype diversity for the national marrow donor program</article-title>. <source>PloS Comput Biol</source> (<year>2015</year>) <volume>11</volume>(<issue>4</issue>):<elocation-id>e1004204</elocation-id>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004204</pub-id>
</citation>
</ref>
<ref id="B41">
<label>41</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lobkovsky</surname> <given-names>AE</given-names>
</name>
<name>
<surname>Levi</surname> <given-names>L</given-names>
</name>
<name>
<surname>Wolf</surname> <given-names>YI</given-names>
</name>
<name>
<surname>Maiers</surname> <given-names>M</given-names>
</name>
<name>
<surname>Gragert</surname> <given-names>L</given-names>
</name>
<name>
<surname>Alter</surname> <given-names>I</given-names>
</name>
<etal/>
</person-group>. <article-title>Multiplicative fitness, rapid haplotype discovery, and fitness decay explain evolution of human MHC</article-title>. <source>Proc Natl Acad Sci</source> (<year>2019</year>) <volume>116</volume>(<issue>28</issue>):<page-range>14098&#x2013;104</page-range>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1714436116</pub-id>
</citation>
</ref>
<ref id="B42">
<label>42</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gragert</surname> <given-names>L</given-names>
</name>
<name>
<surname>Madbouly</surname> <given-names>A</given-names>
</name>
<name>
<surname>Freeman</surname> <given-names>J</given-names>
</name>
<name>
<surname>Maiers</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Six-locus high resolution HLA haplotype frequencies derived from mixed-resolution DNA typing for the entire us donor registry</article-title>. <source>Hum Immunol</source> (<year>2013</year>) <volume>74</volume>(<issue>10</issue>):<page-range>1313&#x2013;20</page-range>. doi: <pub-id pub-id-type="doi">10.1016/j.humimm.2013.06.025</pub-id>
</citation>
</ref>
<ref id="B43">
<label>43</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kryazhimskiy</surname> <given-names>S</given-names>
</name>
<name>
<surname>Plotkin</surname> <given-names>JB</given-names>
</name>
</person-group>. <article-title>The population genetics of dN/dS</article-title>. <source>PloS Genet</source> (<year>2008</year>) <volume>4</volume>(<issue>12</issue>):<elocation-id>e1000304</elocation-id>. doi: <pub-id pub-id-type="doi">10.1371/journal.pgen.1000304</pub-id>
</citation>
</ref>
<ref id="B44">
<label>44</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Fisher</surname> <given-names>RA</given-names>
</name>
</person-group>. <source>The genetical theory of natural selection</source>. <publisher-loc>Oxford UK</publisher-loc>:<publisher-name>Dover Pubns</publisher-name> (<year>1958</year>).</citation>
</ref>
<ref id="B45">
<label>45</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wright</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Evolution in Mendelian populations</article-title>. <source>Genetics</source>. (<year>1931</year>) <volume>16</volume>(<issue>2</issue>), p.<page-range>97</page-range>
</citation>
</ref>
<ref id="B46">
<label>46</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nachman</surname> <given-names>MW</given-names>
</name>
<name>
<surname>Crowell</surname> <given-names>SL</given-names>
</name>
</person-group>. <article-title>Estimate of the mutation rate per nucleotide in humans</article-title>. <source>Genetics</source> (<year>2000</year>) <volume>156</volume>(<issue>1</issue>):<fpage>297</fpage>&#x2013;<lpage>304</lpage>. doi: <pub-id pub-id-type="doi">10.1093/genetics/156.1.297</pub-id>
</citation>
</ref>
<ref id="B47">
<label>47</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carrington</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Recombination within the human MHC</article-title>. <source>Immunol Rev</source> (<year>1999</year>) <volume>167</volume>(<issue>1</issue>):<page-range>245&#x2013;56</page-range>. doi: <pub-id pub-id-type="doi">10.1111/j.1600-065X.1999.tb01397.x</pub-id>
</citation>
</ref>
<ref id="B48">
<label>48</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martin</surname> <given-names>DP</given-names>
</name>
<name>
<surname>Murrell</surname> <given-names>B</given-names>
</name>
<name>
<surname>Golden</surname> <given-names>M</given-names>
</name>
<name>
<surname>Khoosal</surname> <given-names>A</given-names>
</name>
<name>
<surname>Muhire</surname> <given-names>B</given-names>
</name>
</person-group>. <article-title>RDP4: Detection and analysis of recombination patterns in virus genomes</article-title>. <source>Virus Evol</source> (<year>2015</year>) <volume>1</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi: <pub-id pub-id-type="doi">10.1093/ve/vev003</pub-id>
</citation>
</ref>
<ref id="B49">
<label>49</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boegel</surname> <given-names>S</given-names>
</name>
<name>
<surname>L&#xf6;wer</surname> <given-names>M</given-names>
</name>
<name>
<surname>Sch&#xe4;fer</surname> <given-names>M</given-names>
</name>
<name>
<surname>Bukur</surname> <given-names>T</given-names>
</name>
<name>
<surname>De Graaf</surname> <given-names>J</given-names>
</name>
<name>
<surname>Boisgu&#xe9;rin</surname> <given-names>V</given-names>
</name>
<etal/>
</person-group>. <article-title>HLA typing from RNA-Seq sequence reads</article-title>. <source>Genome Med</source> (<year>2013</year>) <volume>4</volume>(<issue>12</issue>):<fpage>1</fpage>&#x2013;<lpage>12</lpage>. doi: <pub-id pub-id-type="doi">10.1186/gm403</pub-id>
</citation>
</ref>
<ref id="B50">
<label>50</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Felsenstein</surname> <given-names>J</given-names>
</name>
</person-group>. <source>Phylip</source> (<year>2021</year>). Available at: <uri xlink:href="https://evolution.genetics.washington.edu/phylip.html">https://evolution.genetics.washington.edu/phylip.html</uri>.</citation>
</ref>
<ref id="B51">
<label>51</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Titus-Trachtenberg</surname> <given-names>E</given-names>
</name>
<name>
<surname>Rickards</surname> <given-names>O</given-names>
</name>
<name>
<surname>De Stefano</surname> <given-names>G</given-names>
</name>
<name>
<surname>Erlich</surname> <given-names>H</given-names>
</name>
</person-group>. <article-title>Analysis of hla class ii haplotypes in the cayapa Indians of Ecuador: A novel drb1 allele reveals evidence for convergent evolution and balancing selection at position 86</article-title>. <source>Am J Hum Genet</source> (<year>1994</year>) <volume>55</volume>(<issue>1</issue>):<fpage>160</fpage>.</citation>
</ref>
<ref id="B52">
<label>52</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Creixell</surname> <given-names>P</given-names>
</name>
<name>
<surname>Schoof</surname> <given-names>EM</given-names>
</name>
<name>
<surname>Tan</surname> <given-names>CSH</given-names>
</name>
<name>
<surname>Linding</surname> <given-names>R</given-names>
</name>
</person-group>. <article-title>Mutational properties of amino acid residues: Implications for evolvability of phosphorylatable residues</article-title>. <source>Philos Trans R Soc B: Biol Sci</source> (<year>2012</year>) <volume>367</volume>(<issue>1602</issue>):<page-range>2584&#x2013;93</page-range>. doi: <pub-id pub-id-type="doi">10.1098/rstb.2012.0076</pub-id>
</citation>
</ref>
<ref id="B53">
<label>53</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>DeLano</surname> <given-names>WL</given-names>
</name>
</person-group>. <source>PyMOL</source> (<year>2022</year>). Available at: <uri xlink:href="https://pymol.org/2/">https://pymol.org/2/</uri>.</citation>
</ref>
<ref id="B54">
<label>54</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Louzoun</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Weigert</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Editing anti-DNA B cells by V &#x3bb;x</article-title>. <source>J Exp Med</source> (<year>2004</year>) <volume>199</volume>(<issue>3</issue>):<page-range>337&#x2013;46</page-range>. doi: <pub-id pub-id-type="doi">10.1084/jem.20031712</pub-id>
</citation>
</ref>
<ref id="B55">
<label>55</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname> <given-names>C</given-names>
</name>
<name>
<surname>Li</surname> <given-names>H</given-names>
</name>
<name>
<surname>Tian</surname> <given-names>Q</given-names>
</name>
<name>
<surname>Beardall</surname> <given-names>M</given-names>
</name>
<name>
<surname>Xu</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Casanova</surname> <given-names>N</given-names>
</name>
<etal/>
</person-group>. <article-title>Selection of anti-double-stranded DNA B cells in autoimmune MRL-lpr/lpr mice</article-title>. <source>J Immunol</source> (<year>2006</year>) <volume>176</volume>(<issue>9</issue>):<page-range>5183&#x2013;90</page-range>. doi: <pub-id pub-id-type="doi">10.4049/jimmunol.176.9.5183</pub-id>
</citation>
</ref>
<ref id="B56">
<label>56</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hedrick</surname> <given-names>PW</given-names>
</name>
</person-group>. <article-title>Balancing selection and MHC</article-title>. <source>Genetica</source> (<year>1998</year>) <volume>104</volume>(<issue>3</issue>):<page-range>207&#x2013;14</page-range>. doi: <pub-id pub-id-type="doi">10.1023/A:1026494212540</pub-id>
</citation>
</ref>
<ref id="B57">
<label>57</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meyer</surname> <given-names>D</given-names>
</name>
<name>
<surname>Single</surname> <given-names>RM</given-names>
</name>
<name>
<surname>Mack</surname> <given-names>SJ</given-names>
</name>
<name>
<surname>Erlich</surname> <given-names>HA</given-names>
</name>
<name>
<surname>Thomson</surname> <given-names>G</given-names>
</name>
</person-group>. <article-title>Signatures of demographic history and natural selection in the human major histocompatibility complex loci</article-title>. <source>Genetics</source> (<year>2006</year>) <volume>173</volume>(<issue>4</issue>):<page-range>2121&#x2013;42</page-range>. doi: <pub-id pub-id-type="doi">10.1534/genetics.105.052837</pub-id>
</citation>
</ref>
<ref id="B58">
<label>58</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robinson</surname> <given-names>J</given-names>
</name>
<name>
<surname>Halliwell</surname> <given-names>JA</given-names>
</name>
<name>
<surname>McWilliam</surname> <given-names>H</given-names>
</name>
<name>
<surname>Lopez</surname> <given-names>R</given-names>
</name>
<name>
<surname>Parham</surname> <given-names>P</given-names>
</name>
<name>
<surname>Marsh</surname> <given-names>SG</given-names>
</name>
</person-group>. <article-title>The imgt/hla database</article-title>. <source>Nucleic Acids Res</source> (<year>2012</year>) <volume>41</volume>(<issue>D1</issue>):<page-range>D1222&#x2013;7</page-range>. doi: <pub-id pub-id-type="doi">10.1093/nar/gks949</pub-id>
</citation>
</ref>
<ref id="B59">
<label>59</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Israeli</surname> <given-names>S</given-names>
</name>
<name>
<surname>Gragert</surname> <given-names>L</given-names>
</name>
<name>
<surname>Maiers</surname> <given-names>M</given-names>
</name>
<name>
<surname>Louzoun</surname> <given-names>Y</given-names>
</name>
</person-group>. <article-title>Hla haplotype frequency estimation for heterogeneous populations using a graph-based imputation algorithm</article-title>. <source>Hum Immunol</source> (<year>2021</year>) <volume>82</volume>(<issue>10</issue>):<page-range>746&#x2013;57</page-range>. doi: <pub-id pub-id-type="doi">10.1016/j.humimm.2021.07.001</pub-id>
</citation>
</ref>
<ref id="B60">
<label>60</label>
<citation citation-type="web">
<person-group person-group-type="author">
<collab>Train test split</collab>
</person-group>. (<year>2022</year>). Available at: <uri xlink:href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html">https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html</uri>.</citation>
</ref>
<ref id="B61">
<label>61</label>
<citation citation-type="web">
<person-group person-group-type="author">
<collab>Neural Network Intelligence</collab>
</person-group>. (<year>2021</year>). Available at: <uri xlink:href="https://nni.readthedocs.io/en/stable/">https://nni.readthedocs.io/en/stable/</uri>.</citation>
</ref>
<ref id="B62">
<label>62</label>
<citation citation-type="web">
<person-group person-group-type="author">
<collab>Support vector regression</collab>
</person-group>. (<year>2022</year>). Available at: <uri xlink:href="https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html">https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html</uri>.</citation>
</ref>
<ref id="B63">
<label>63</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nielsen</surname> <given-names>M</given-names>
</name>
<name>
<surname>Lundegaard</surname> <given-names>C</given-names>
</name>
<name>
<surname>Blicher</surname> <given-names>T</given-names>
</name>
<name>
<surname>Lamberth</surname> <given-names>K</given-names>
</name>
<name>
<surname>Harndahl</surname> <given-names>M</given-names>
</name>
<name>
<surname>Justesen</surname> <given-names>S</given-names>
</name>
<etal/>
</person-group>. <article-title>Netmhcpan, a method for quantitative predictions of peptide binding to any HLA-A and-B locus protein of known sequence</article-title>. <source>PloS One</source> (<year>2007</year>) <volume>2</volume>(<issue>8</issue>):<elocation-id>e796</elocation-id>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0000796</pub-id>
</citation>
</ref>
<ref id="B64">
<label>64</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gagne</surname> <given-names>K</given-names>
</name>
<name>
<surname>Busson</surname> <given-names>M</given-names>
</name>
<name>
<surname>Bignon</surname> <given-names>J-D</given-names>
</name>
<name>
<surname>Bal&#xe8;re-Appert</surname> <given-names>M-L</given-names>
</name>
<name>
<surname>Loiseau</surname> <given-names>P</given-names>
</name>
<name>
<surname>Dormoy</surname> <given-names>A</given-names>
</name>
<etal/>
</person-group>. <article-title>Donor KIR3DL1/3DS1 gene and recipient Bw4 KIR ligand as prognostic markers for outcome in unrelated hematopoietic stem cell transplantation</article-title>. <source>Biol Blood Marrow Transplant</source> (<year>2009</year>) <volume>15</volume>(<issue>11</issue>):<page-range>1366&#x2013;75</page-range>. doi: <pub-id pub-id-type="doi">10.1016/j.bbmt.2009.06.015</pub-id>
</citation>
</ref>
<ref id="B65">
<label>65</label>
<citation citation-type="web">
<person-group person-group-type="author">
<collab>Ipd-imgt/hla database</collab>
</person-group>. (<year>2022</year>). Available at: <uri xlink:href="http://hla.alleles.org/alleles/index.html">http://hla.alleles.org/alleles/index.html</uri>.</citation>
</ref>
<ref id="B66">
<label>66</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Anjanappa</surname> <given-names>R</given-names>
</name>
<name>
<surname>Garcia-Alai</surname> <given-names>M</given-names>
</name>
<name>
<surname>Kopicki</surname> <given-names>J-D</given-names>
</name>
<name>
<surname>Lockhauserb&#xe4;umer</surname> <given-names>J</given-names>
</name>
<name>
<surname>Aboelmagd</surname> <given-names>M</given-names>
</name>
<name>
<surname>Hinrichs</surname> <given-names>J</given-names>
</name>
<etal/>
</person-group>. <article-title>Structures of peptide-free and partially loaded MHC class I molecules reveal mechanisms of peptide selection</article-title>. <source>Nat Commun</source> (<year>2020</year>) <volume>11</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>11</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41467-020-14862-4</pub-id>
</citation>
</ref>
<ref id="B67">
<label>67</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garboczi</surname> <given-names>DN</given-names>
</name>
<name>
<surname>Ghosh</surname> <given-names>P</given-names>
</name>
<name>
<surname>Utz</surname> <given-names>U</given-names>
</name>
<name>
<surname>Fan</surname> <given-names>QR</given-names>
</name>
<name>
<surname>Biddison</surname> <given-names>WE</given-names>
</name>
<name>
<surname>Wiley</surname> <given-names>DC</given-names>
</name>
</person-group>. <article-title>Structure of the complex between human T-cell receptor, viral peptide and HLA-A2</article-title>. <source>Nature</source> (<year>1996</year>) <volume>384</volume>(<issue>6605</issue>):<page-range>134&#x2013;41</page-range>. doi: <pub-id pub-id-type="doi">10.1038/384134a0</pub-id>
</citation>
</ref>
<ref id="B68">
<label>68</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shimizu</surname> <given-names>A</given-names>
</name>
<name>
<surname>Kawana-Tachikawa</surname> <given-names>A</given-names>
</name>
<name>
<surname>Yamagata</surname> <given-names>A</given-names>
</name>
<name>
<surname>Han</surname> <given-names>C</given-names>
</name>
<name>
<surname>Zhu</surname> <given-names>D</given-names>
</name>
<name>
<surname>Sato</surname> <given-names>Y</given-names>
</name>
<etal/>
</person-group>. <article-title>Structure of TCR and antigen complexes at an immunodominant CTL epitope in hiv-1 infection</article-title>. <source>Sci Rep</source> (<year>2013</year>) <volume>3</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>9</lpage>. doi: <pub-id pub-id-type="doi">10.1038/srep03097</pub-id>
</citation>
</ref>
<ref id="B69">
<label>69</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname> <given-names>Y-H</given-names>
</name>
<name>
<surname>Smith</surname> <given-names>KJ</given-names>
</name>
<name>
<surname>Garboczi</surname> <given-names>DN</given-names>
</name>
<name>
<surname>Utz</surname> <given-names>U</given-names>
</name>
<name>
<surname>Biddison</surname> <given-names>WE</given-names>
</name>
<name>
<surname>Wiley</surname> <given-names>DC</given-names>
</name>
</person-group>. <article-title>Two human T cell receptors bind in a similar diagonal mode to the HLA-A2/Tax peptide complex using different TCR amino acids</article-title>. <source>Immunity</source> (<year>1998</year>) <volume>8</volume>(<issue>4</issue>):<page-range>403&#x2013;11</page-range>. doi: <pub-id pub-id-type="doi">10.1016/S1074-7613(00)80546-4</pub-id>
</citation>
</ref>
<ref id="B70">
<label>70</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boyington</surname> <given-names>JC</given-names>
</name>
<name>
<surname>Motyka</surname> <given-names>SA</given-names>
</name>
<name>
<surname>Schuck</surname> <given-names>P</given-names>
</name>
<name>
<surname>Brooks</surname> <given-names>AG</given-names>
</name>
<name>
<surname>Sun</surname> <given-names>PD</given-names>
</name>
</person-group>. <article-title>Crystal structure of an NK cell immunoglobulin-like receptor in complex with its class I MHC ligand</article-title>. <source>Nature</source> (<year>2000</year>) <volume>405</volume>(<issue>6786</issue>):<page-range>537&#x2013;43</page-range>. doi: <pub-id pub-id-type="doi">10.1038/35014520</pub-id>
</citation>
</ref>
<ref id="B71">
<label>71</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fan</surname> <given-names>QR</given-names>
</name>
<name>
<surname>Long</surname> <given-names>EO</given-names>
</name>
<name>
<surname>Wiley</surname> <given-names>DC</given-names>
</name>
</person-group>. <article-title>Crystal structure of the human natural killer cell inhibitory receptor KIR2DL1&#x2013;HLA-Cw4 complex</article-title>. <source>Nat Immunol</source> (<year>2001</year>) <volume>2</volume>(<issue>5</issue>):<page-range>452&#x2013;60</page-range>. doi: <pub-id pub-id-type="doi">10.1038/87766</pub-id>
</citation>
</ref>
<ref id="B72">
<label>72</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pymm</surname> <given-names>P</given-names>
</name>
<name>
<surname>Illing</surname> <given-names>PT</given-names>
</name>
<name>
<surname>Ramarathinam</surname> <given-names>SH</given-names>
</name>
<name>
<surname>O&#x2019;Connor</surname> <given-names>GM</given-names>
</name>
<name>
<surname>Hughes</surname> <given-names>VA</given-names>
</name>
<name>
<surname>Hitchen</surname> <given-names>C</given-names>
</name>
<etal/>
</person-group>. <article-title>MHC-I peptides get out of the groove and enable a novel mechanism of HIV-1 escape</article-title>. <source>Nat Struct Mol Biol</source> (<year>2017</year>) <volume>24</volume>(<issue>4</issue>):<page-range>387&#x2013;94</page-range>. doi: <pub-id pub-id-type="doi">10.1038/nsmb.3381</pub-id>
</citation>
</ref>
<ref id="B73">
<label>73</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mohammed</surname> <given-names>F</given-names>
</name>
<name>
<surname>Stones</surname> <given-names>DH</given-names>
</name>
<name>
<surname>Zarling</surname> <given-names>AL</given-names>
</name>
<name>
<surname>Willcox</surname> <given-names>CR</given-names>
</name>
<name>
<surname>Shabanowitz</surname> <given-names>J</given-names>
</name>
<name>
<surname>Cummings</surname> <given-names>KL</given-names>
</name>
<etal/>
</person-group>. <article-title>The antigenic identity of human class I MHC phosphopeptides is critically dependent upon phosphorylation status</article-title>. <source>Oncotarget</source> (<year>2017</year>) <volume>8</volume>(<issue>33</issue>):<fpage>54160</fpage>. doi: <pub-id pub-id-type="doi">10.18632/oncotarget.16952</pub-id>
</citation>
</ref>
<ref id="B74">
<label>74</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Willcox</surname> <given-names>E</given-names>
</name>
<name>
<surname>Thomas</surname> <given-names>L</given-names>
</name>
<name>
<surname>Bjorkman</surname> <given-names>P</given-names>
</name>
</person-group>. <article-title>Crystal structure of HLA-A2 bound to LIR-1, a host and viral major histocompatibility complex receptor</article-title>. <source>Nature immunology</source> (<year>2003</year>) <volume>4</volume>(<issue>9</issue>):<fpage>913</fpage>&#x2013;<lpage>919</lpage>.</citation>
</ref>
<ref id="B75">
<label>75</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Touw</surname> <given-names>WG</given-names>
</name>
<name>
<surname>Baakman</surname> <given-names>C</given-names>
</name>
<name>
<surname>Black</surname> <given-names>J</given-names>
</name>
<name>
<surname>Te Beek</surname> <given-names>TA</given-names>
</name>
<name>
<surname>Krieger</surname> <given-names>E</given-names>
</name>
<name>
<surname>Joosten</surname> <given-names>RP</given-names>
</name>
<etal/>
</person-group>. <article-title>A series of PDB-related databanks for everyday needs</article-title>. <source>Nucleic Acids Res</source> (<year>2015</year>) <volume>43</volume>(<issue>D1</issue>):<page-range>D364&#x2013;8</page-range>. doi: <pub-id pub-id-type="doi">10.1093/nar/gku1028</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>