<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Mol. Biosci.</journal-id>
<journal-title>Frontiers in Molecular Biosciences</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Mol. Biosci.</abbrev-journal-title>
<issn pub-type="epub">2296-889X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">773385</article-id>
<article-id pub-id-type="doi">10.3389/fmolb.2021.773385</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Molecular Biosciences</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>3D Interaction Homology: Computational Titration of Aspartic Acid, Glutamic Acid and Histidine Can Create pH-Tunable Hydropathic Environment Maps</article-title>
<alt-title alt-title-type="left-running-head">Herrington and Kellogg</alt-title>
<alt-title alt-title-type="right-running-head">pH Tuning of Protein Hydropathic Environments</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Herrington</surname>
<given-names>Noah B.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Kellogg</surname>
<given-names>Glen E.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/180856/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Department of Medicinal Chemistry and Institute for Structural Biology, Drug Discovery and Development, Virginia Commonwealth University, <addr-line>Richmond</addr-line>, <addr-line>VA</addr-line>, <country>United&#x20;States</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Center for the Study of Biological Complexity, Virginia Commonwealth University, <addr-line>Richmond</addr-line>, <addr-line>VA</addr-line>, <country>United&#x20;States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/142943/overview">Andrea Mozzarelli</ext-link>, University of Parma, Italy</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1127704/overview">Giulio Vistoli</ext-link>, University of Milan, Italy</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1491178/overview">Ruibin Liu</ext-link>, University of Maryland, Baltimore, United&#x20;States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Glen E. Kellogg, <email>glen.kellogg@vcu.edu</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Structural Biology, a section of the journal Frontiers in Molecular Biosciences</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>03</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>8</volume>
<elocation-id>773385</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>09</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Herrington and Kellogg.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Herrington and Kellogg</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Aspartic acid, glutamic acid and histidine are ionizable residues occupying various protein environments and perform many different functions in structures. Their roles are tied to their acid/base equilibria, solvent exposure, and backbone conformations. We propose that the number of unique environments for ASP, GLU and HIS is quite limited. We generated maps of these residue&#x0027;s environments using a hydropathic scoring function to record the type and magnitude of interactions for each residue in a 2703-protein structural dataset. These maps are backbone-dependent and suggest the existence of new structural motifs for each residue type. Additionally, we developed an algorithm for tuning these maps to any pH, a potentially useful element for protein design and structure building. Here, we elucidate the complex interplay between secondary structure, relative solvent accessibility, and residue ionization states: the degree of protonation for ionizable residues increases with solvent accessibility, which in turn is notably dependent on backbone structure.</p>
</abstract>
<kwd-group>
<kwd>ionizable residues</kwd>
<kwd>aspartic acid</kwd>
<kwd>glutamic acid</kwd>
<kwd>histidine</kwd>
<kwd>hydropathic interactions</kwd>
<kwd>solvent-accessible surface area</kwd>
<kwd>pKa</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Proteins are largely composed of unique combinations of 20 possible amino acids, varying from tens to thousands of residues in length. Specific protein sequences organize themselves into unique and well-defined secondary structures that comprise much larger and more complex structures that ultimately determine their functions. This relationship between structure and function is important to grasp in order to understand how different features of biological targets can be exploited for treatments of various disease states.</p>
<sec id="s1-1">
<title>pH, pK<sub>a</sub> and Protonation States</title>
<p>One important aspect of this relationship is the dependence of protein structure on pH and protonation states of constituent residues. Histidine (HIS), for example, has a nominal pK<sub>a</sub> of 6.00 (<xref ref-type="bibr" rid="B28">Hunt, 2021</xref>), situated closely enough to physiological pH that its imidazole sidechain can act either as a cationic dual hydrogen bond donor or a neutral donor and acceptor depending on its local pH environment. That is, the resultant influence of a residue&#x2019;s neighborhood, comprised of the hydrogen bond donors, acceptors, charged species, and etc. that influence the solution pH surrounding it (<xref ref-type="bibr" rid="B16">Di Russo et&#x20;al., 2012</xref>). The importance of histidine&#x2019;s protonation state in the so-called &#x201c;catalytic triad&#x201d; of serine, histidine, and aspartate in serine proteases was shown decades ago for trypsin (<xref ref-type="bibr" rid="B31">Kasserra and Laidler, 1969</xref>; <xref ref-type="bibr" rid="B5">Antonino and Ascenzi, 1981</xref>). The pH-dependence of protein function is a well-established principle and has promoted extensive research into identifying optimum pH for activity of various other macromolecules (<xref ref-type="bibr" rid="B51">Talley and Alexov, 2010</xref>).</p>
<p>The pK<sub>a</sub>s of aspartic acid (ASP) and glutamic acid (GLU) when isolated or in model peptides are reported to be 3.65 and 4.25, respectively (<xref ref-type="bibr" rid="B28">Hunt, 2021</xref>), making them functionally similar residues and leaving them both largely deprotonated at physiological pH. These pK<sub>a</sub>s are not static, and large deviations from these values are not uncommon. For example, the active site of bacteriorhodopsin contains an aspartic acid with an experimental pK<sub>a</sub> of 7.68 (<xref ref-type="bibr" rid="B40">Otto et&#x20;al., 1989</xref>).</p>
<p>Unfortunately, protein structure elucidation by X-ray crystallography or cryogenic electron microscopy are seldom of sufficient resolution to determine locations of hydrogens, due to their extremely low electron density. X-ray crystallography detects protons only under difficult-to-achieve conditions such as resolution &#x223c;1&#xa0;&#xc5; (<xref ref-type="bibr" rid="B54">Woi&#x144;ska et&#x20;al., 2016</xref>). Such resolution is not yet possible with cryo-EM. While neutron diffraction experiments can overcome this problem (<xref ref-type="bibr" rid="B39">O&#x2019;Dell et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B46">Schr&#xf6;der and Meilleur, 2020</xref>), as it is detecting nuclei rather than electrons, experimental constraints, such as required crystal sizes, availability of neutron sources, and others, make neutron diffraction-derived structures for proteins quite rare. Multidimensional nuclear magnetic resonance methods can be applied to protein structure determination (<xref ref-type="bibr" rid="B8">Barrett et&#x20;al., 2013</xref>), but only under certain conditions like protein size and solubility. Because NMR directly probes hydrogens, it can be used for pK<sub>a</sub> determination of specific residues (<xref ref-type="bibr" rid="B9">Bartik et&#x20;al., 1994</xref>; <xref ref-type="bibr" rid="B22">Schmidt et&#x20;al., 2010</xref>), but this is only a probe of the residue under the NMR experimental conditions, which may differ greatly from its native physiological or solution conditions. In general, it is quite difficult to discern structural reasons for residue pK<sub>a</sub> shifts experimentally, although this is a quite active area of computational research as many reports have been published suggesting what types of environments stabilize shifts (<xref ref-type="bibr" rid="B29">Isom et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B30">Isom et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B7">Bandyopadhyay et&#x20;al., 2020</xref>). Interestingly, experimental methodologies such as NMR perform well in determining pK<sub>a</sub>s for surface ionizable residues but are less applicable to buried residues (<xref ref-type="bibr" rid="B19">Fitch et&#x20;al., 2002</xref>).</p>
<p>Much of the effort to study protonation of ionizable residues via computational means has focused on predicting their pK<sub>a</sub>s by understanding the effects of other residues in the local environment. Li et&#x20;al. developed a method, known as PROPKA, to empirically calculate pK<sub>a</sub> values impacted by nearby residues (<xref ref-type="bibr" rid="B36">Li et&#x20;al., 2005</xref>). In this model, hydrogen bonding to aspartates and glutamates stabilizes their deprotonated forms and lowers their pK<sub>a</sub>s. <xref ref-type="bibr" rid="B49">Spassov and Yan (2008)</xref> utilized CHARMM (<xref ref-type="bibr" rid="B10">Brooks et&#x20;al., 1983</xref>) to develop a molecular dynamics-based approach to predict pK<sub>a</sub> values of titratable groups. Several factors of 3D protein structure determination&#x2014;and the resulting structural model&#x2014;can compromise such predictions, e.g., uncertainties in sidechain conformations if the collected data resolution is too low (<xref ref-type="bibr" rid="B38">Miao and Cao, 2016</xref>).</p>
</sec>
<sec id="s1-2">
<title>Computational Titration</title>
<p>Our lab has also previously examined this problem using our in-house force field HINT (Hydropathic INTeractions) (<xref ref-type="bibr" rid="B33">Kellogg et&#x20;al., 1991</xref>; <xref ref-type="bibr" rid="B18">Kellogg and Abraham, 2000</xref>; <xref ref-type="bibr" rid="B45">Sarkar and Kellogg, 2010</xref>) that, briefly, exploits experimental libraries of data for atomistic partial logP<sub>o/w</sub> values of small molecules and residues to account for enthalpic, entropic, and solvation contributions to free energy and score protein-ligand, protein-protein, protein-nucleotide, etc. interactions. In one study, HINT was used to predict the degree of protonation of ligand-active site interactions of neuraminidase-inhibitor complexes using a method that we termed &#x201c;computational titration&#x201d; (<xref ref-type="bibr" rid="B20">Fornabaio et&#x20;al., 2003</xref>). By scoring all potential models, i.e.,&#x20;where the number of protons attached to ionizable residues and ligand functional groups were exhaustively enumerated, lower energy models were identified. Since proton positions are not unambiguously known from experiment, we term all such models &#x201c;isocrystallographic&#x201d; in that all would fit the available electron density envelope. In another report, HINT modeled the protonation state of a peptide inhibitor&#x2013;HIV-1 protease complex with pH-dependent interaction scores that paralleled experimental pH-dependent binding data (<xref ref-type="bibr" rid="B50">Spyrakis et&#x20;al., 2004</xref>).</p>
<p>Clearly, the presence or absence of protic hydrogens on these residue types within a protein will impact the interactions that these residues make, and in turn the protein&#x2019;s 3D structure. For example, the interaction between two aspartates is radically different if one of the pair is protonated and the proton is oriented to form a hydrogen bond between them. Evaluating and understanding these phenomena is part of our long-term goal of building a new paradigm for protein structure elucidation and prediction.</p>
</sec>
<sec id="s1-3">
<title>Three-Dimensional Interaction Homology</title>
<p>Since the dawn of protein structure elucidation, our understanding of the roles and contributions of interatomic interactions between protein residues toward biomolecular structural organization has evolved dramatically. Each of the 20 amino acid residues, regardless of how many unique protein structures they compose, is likely to situate itself within a limited set of environments with a unique system of interactions of varying magnitude, type, and loci. Our model describes four classes of interactions: favorable polar (e.g., hydrogen bond, acid-base), unfavorable polar (acid-acid, base-base, repulsive Coulombic), favorable hydrophobic (hydrophobic-hydrophobic, hydrophobic packing, &#x3c0;-&#x3c0; stacking) and unfavorable hydrophobic (hydrophobic-polar, desolvation).</p>
<p>Importantly, interactions with the environment of each constituent residue of a protein contributes in some part toward its rotameric structure and the protein&#x2019;s overall secondary, tertiary, and quaternary structure. Our hypothesis is that each residue has a &#x201c;hydropathic valence&#x201d; that must somehow be satiated by nearby interacting groups. Hydrophobic residues such as phenylalanine and leucine, by interacting with other hydrophobic groups, pack together to avoid water, while polar residues, such as the three of this study, favor environments where they can engage in polar interactions, e.g., hydrogen bonding, with other residues or water. Thus, obviously, 3D protein structure is not driven by &#x201c;primary&#x201d; structure, but by the hydropathic interactions that each residue must make based on its type and sidechain and backbone conformations.</p>
<p>In our first report to address this concept, we calculated 3D hydropathic interaction maps to visualize and probe all possible environments of tyrosine (TYR) using a dataset of &#x223c;30,000 residues. Our analysis organized all of our TYR residues into 262 unique, backbone-dependent environments, each with a unique map encoding the specific interactions made by the residue in that environment (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>). A similar analysis with over 57,000 alanine (ALA) residues, separately calculating backbone-environment and sidechain-environment maps, yielded 136 and 150 backbone- and sidechain-dependent maps, respectively, despite ALA&#x2019;s simplicity. We concluded that ALA&#x2019;s mapped environments are a new and insightful form of structural motif (<xref ref-type="bibr" rid="B1">Ahmed et&#x20;al., 2019</xref>). Recently, in our report on phenylalanine, tryptophan, and tyrosine, we showed that the subtle effects of &#x3c0;-&#x3c0; and &#x3c0;-cation interactions are encoded in their 3D hydropathic interaction maps (<xref ref-type="bibr" rid="B3">AL Mughram et&#x20;al., 2021</xref>). In a report on serine and cysteine we highlight the major structural features&#x2014;similarities and differences&#x2014;between these two isosteric residues (<xref ref-type="bibr" rid="B13">Catalano et&#x20;al., 2021</xref>). Importantly, our analyses describe residues by cataloguing their environments in terms of interactions and not identity. A water molecule oriented for a residue can play the same &#x201c;acidic&#x201d; role as a TYR&#x2013;OH or a LYS&#x2013;NH<sub>3</sub>
<sup>&#x2b;</sup> to satisfy its hydropathic valence. Protein structure is driven by the set of these hydropathic interactions for each residue.</p>
<p>In the current report, we focus our attention on the hydropathic environments of aspartic acid, glutamic acid and histidine, three residue types considered to be &#x201c;ionizable&#x201d;, extracted from the same relatively large dataset of X-ray crystallographic protein structures. Following the same logic used in our previous work, we believe that, not only are each of these residues likely to make their own unique sets of interactions that can be clustered, but their environments also determine each residue&#x2019;s unique ionization state. Thus, using our scoring methods, we have simulated titration of thousands of each of these ionizable residue types to model their protonation in available crystal structures by computationally varying pH. We have generated interaction maps similar to those in our reports on tyrosine, alanine, phenylalanine, and tryptophan, but with each possessing an individually optimized protonation state. The role of sidechain buriedness was examined using a calculated solvent-accessible surface area for each of the extracted residues. Further, we show that each residue&#x2019;s backbone conformation plays a significant role in determining these protonation states. With these, we can directly predict a specific residue&#x2019;s ionization state, explore the effects of varying pH, i.e.,&#x20;tuning, on their hydropathic environments, and collect 3D interaction-similar residue environments by clustering. Moreover, we highlight the most common environments that contribute to one state or another, but more importantly we have developed a basis set of 3D backbone-dependent residue interaction profiles for these three residues that are pieces of the protein structure analysis and prediction puzzle.</p>
</sec>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>Materials and Methods</title>
<sec id="s2-1">
<title>Dataset</title>
<p>From a collection of 2,703 randomly selected proteins from the RCSB Protein Data Bank, using only structures containing no ligand or cofactor, we extracted all ASP, GLU, and HIS residues from each structure, excluding N- and C-terminal residues. For these structures, we have previously described our selection criteria (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>). Our intention was to abide by random population-based sampling of a variety of primary, secondary, and tertiary structures, thus not excluding proteins with similar or identical sequences. We believe the size of our dataset should exhaust all unique residue environments of HIS, ASP, and GLU. Hydrogen atoms were added to all heavy atoms of all structures based on their hybridization states. Positions of these atoms underwent conjugate gradient minimizations.</p>
</sec>
<sec id="s2-2">
<title>Alignment Calculations</title>
<p>We overlayed an 8 by 8&#x20;&#x201c;chessboard&#x201d; on the standard Ramachandran plot, where each &#x201c;chess square&#x201d; has dimensions of 45&#xb0; by 45&#xb0; in &#x3d5; (phi)&#x2013;&#x3c8; (psi) space. The grid of the board was shifted by &#x2212;20&#xb0; and &#x2212;25&#xb0; in the &#x3d5; and &#x3c8; directions, respectively, to enclose higher-density regions of the plot within single squares. The &#x3d5;, &#x3c8;, and &#x3c7; angles were all calculated for every residue in our dataset, and each residue was binned into their proper chess square based on its respective &#x3d5; and &#x3c8; angles. All residues in each chess square were further divided by their &#x3c7;<sub>1</sub> angles into three parse groups: group &#x201c;0.60&#x201d;, (0&#xb0; &#x2264; &#x3c7;<sub>1</sub> &#x3c; 120&#xb0;), group &#x201c;0.180&#x201d; (120&#xb0; &#x2264; &#x3c7;<sub>1</sub> &#x3c; 240&#xb0;), and group &#x201c;0.300&#x201d; (240&#xb0; &#x2264; &#x3c7;<sub>1</sub> &#x3c; 360&#xb0;). In the case of GLU, residues were still further parsed by their &#x3c7;<sub>2</sub> angles, yielding a total of nine parses for this residue. <xref ref-type="sec" rid="s9">Supplementary Table S1</xref> contains all information for each residue of each type in our dataset, including their chess squares, parses, PDB IDs, &#x3d5;, &#x3c8; and &#x3c9; torsion angles and atom numbers for the backbone atoms and CB of each residue.</p>
<p>A single model residue of each type was constructed at the center of each chess square with characteristic &#x3d5; and &#x3c8; angles for that centroid. The CA of the peptide backbone was placed at the origin with the CA-CB oriented along the z-axis and the CA-HA bond oriented into the -y, -z quadrant of the yz-plane. All residues of each type were aligned to this model, and rotation and translation matrices were calculated by least-squares fitting of the residue constituent atoms to the model. This effectively shifted coordinates of every protein structure to align the residue of interest with the centroid within a common frame and ensures that all calculated maps and environments are attributable to a residue&#x2019;s interactions and not misalignments in backbone structure. The average root-mean square distances (RMSDs) for superimpositions of backbone atoms in each chess square are close to 0.15&#xa0;&#xc5;, indicating that errors arising from aligning residue backbones to the centroid model (based on the CA-CB bond) are minimal.</p>
</sec>
<sec id="s2-3">
<title>HINT Scoring Function</title>
<p>The HINT forcefield (<xref ref-type="bibr" rid="B33">Kellogg et&#x20;al., 1991</xref>; <xref ref-type="bibr" rid="B18">Kellogg and Abraham, 2000</xref>; <xref ref-type="bibr" rid="B45">Sarkar and Kellogg, 2010</xref>) was used for all scoring of interactions between protein atoms. HINT relies on atom-focused parameters, namely the hydrophobic atom constant (<italic>a</italic>
<sub>1</sub>) and a value for solvent-accessible surface area (SASA, <italic>S</italic>
<sub>
<italic>i</italic>
</sub>) for atom <italic>i</italic>. Generally speaking, <italic>a</italic>
<sub>
<italic>i</italic>
</sub> &#x3e; 0 for hydrophobic atoms and <italic>a</italic>
<sub>
<italic>i</italic>
</sub> &#x3c; 0 for polar&#x20;atoms.</p>
<p>
<italic>S</italic>
<sub>
<italic>i</italic>
</sub> is greater for more solvent-exposed external atoms. The interaction score between atoms <italic>i</italic> and <italic>j</italic> is calculated by:<disp-formula id="equ1">
<mml:math id="m1">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#xa0;</mml:mtext>
<mml:msup>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>where r is the distance in angstroms between atoms <italic>i</italic> and <italic>j</italic>. <italic>T</italic>
<sub>
<italic>ij</italic>
</sub> is equivalent to &#x2212;1, 0, or 1 to account for acidic, basic, etc. character of atoms involved and assign the proper sign to the interaction score. Finally, L<sub>
<italic>ij</italic>
</sub> implements the Lennard-Jones potential function (<xref ref-type="bibr" rid="B33">Kellogg et&#x20;al., 1991</xref>). <italic>b</italic>
<sub>
<italic>ij</italic>
</sub> &#x3e; 0 for favorable interactions, such as Lewis acid-base and hydrophobic-hydrophobic interactions, while <italic>b</italic>
<sub>
<italic>ij</italic>
</sub> &#x3c; 0 for unfavorable interactions, including hydrophobic-polar or Lewis base-base interactions.</p>
</sec>
<sec id="s2-4">
<title>Computational Titration of Ionizable Residues</title>
<p>To determine the optimal ionization state of each studied residue, we adapted an algorithm that we reported previously for improving protein-ligand models for scoring (<xref ref-type="bibr" rid="B33">Kellogg et&#x20;al., 1991</xref>; <xref ref-type="bibr" rid="B32">Kellogg et&#x20;al., 2004</xref>; <xref ref-type="bibr" rid="B45">Sarkar and Kellogg, 2010</xref>). Our algorithm scores all possible ionization states of a model residue with other residues in its environment. Here, we optimized the ionization states of residues by first calculating the normal (environment-free) cost for ionizations of these residues using published data (ASP, pK<sub>a</sub> &#x3d; 3.65; GLU, pK<sub>a</sub> &#x3d; 4.25; HIS, pK<sub>a1</sub> &#x3d; 6.00, pK<sub>a2</sub> &#x3d; 14.44) (<xref ref-type="bibr" rid="B24">George et&#x20;al., 1964</xref>) and applying the Henderson-Hasselbalch equation. For ASP, at pH 7, log [(CO<sub>2</sub>
<sup>&#x2013;</sup>]/(CO<sub>2</sub>H)] &#x3d; 3.35, which is an equilibrium constant that can be converted to a &#x394;G of 4.57&#xa0;kcal mol<sup>&#x2212;1</sup>. Using the previously reported relation that &#x2212;1&#xa0;kcal mol<sup>&#x2212;1</sup> &#x2248; 500 HINT score units, the energy cost in HINT score units for protonating aspartate at pH 7, in the absence of local pH effects is 2,295. <xref ref-type="table" rid="T1">Table&#x20;1</xref> summarizes these energy&#x20;costs.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Energy costs in HINT scores for computational titration of aspartic acid, glutamic acid and histidine at various pH values.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">pK<sub>a</sub>
</th>
<th align="center">pH 4</th>
<th align="center">pH 5</th>
<th align="center">pH 6</th>
<th align="center">pH 7</th>
<th align="center">pH 8</th>
<th align="center">pH 9</th>
<th align="center">pH 10</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Aspartic Acid<sup>a</sup>
</td>
<td align="char" char=".">3.65</td>
<td align="center">240</td>
<td align="center">925</td>
<td align="center">1,610</td>
<td align="center">2,295</td>
<td align="center">2,980</td>
<td align="center">3,665</td>
<td align="center">4,350</td>
</tr>
<tr>
<td align="left">Glutamic Acid<sup>a</sup>
</td>
<td align="char" char=".">4.25</td>
<td align="center">&#x2212;171</td>
<td align="center">514</td>
<td align="center">1,199</td>
<td align="center">1884</td>
<td align="center">2,569</td>
<td align="center">3,254</td>
<td align="center">3,939</td>
</tr>
<tr>
<td align="left">Histidine K<sub>a1</sub>
<sup>b</sup>
</td>
<td align="char" char=".">6.00</td>
<td align="center">&#x2212;1,370</td>
<td align="center">&#x2212;685</td>
<td align="center">0</td>
<td align="center">685</td>
<td align="center">1,370</td>
<td align="center">2055</td>
<td align="center">2,740</td>
</tr>
<tr>
<td align="left">Histidine K<sub>a2</sub>
<sup>c</sup>
</td>
<td align="char" char=".">14.44</td>
<td align="center">7,151</td>
<td align="center">6,466</td>
<td align="center">5,781</td>
<td align="center">5,096</td>
<td align="center">4,411</td>
<td align="center">3,726</td>
<td align="center">3,041</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The second term, calculated for each residue in varying protonation states, also as a HINT score, measures the effects of the local environment around the residue. This assessment of the environment scores the interactions of the residue in question with those nearby, in each accessible protonation state. These scores are summed together with the appropriate values in <xref ref-type="table" rid="T1">Table&#x20;1</xref> to determine the best scoring, and therefore most likely, protonation state of the residue. For ASP and GLU, we examined the ionized (carboxylate, CO<sub>2</sub>
<sup>&#x2013;</sup>) and neutral states with protonation at each oxygen atom (OD1/OE1 and OD2/OE2). For the latter, the -C-C-O-H dihedral angles were exhaustively optimized for ideal hydrogen bonding to surrounding residues. For HIS, four potential ionization states exist: 1) protonation at both ND1 and NE2 (HIS<sup>&#x2b;</sup>), 2) protonation at only ND1 (HIS-&#x3b4;), 3) protonation at only NE2 (HIS-&#x3b5;) and 4) deprotonated (HIS<sup>&#x2212;</sup>), the last of which is reported to be exceedingly rare. Since the entire imidazole ring of HIS can be flipped, the potential cases for this residue are doubled to eight (<italic>vide infra</italic>). If the HINT score was 50 or more (&#x223c;0.1&#xa0;kcal mol<sup>&#x2212;1</sup>) than the starting case, the residue&#x2019;s molecular model was replaced with the (protonated or deprotonated) trial model for that case. All further calculations at that pH were performed with the resulting optimized residue structure and coordinates.</p>
</sec>
<sec id="s2-5">
<title>pK<sub>a</sub> Calculations</title>
<p>We identified 94 residues with experimental pK<sub>a</sub> values in the PKAD database (<xref ref-type="bibr" rid="B41">Pahari et&#x20;al., 2019</xref>) that were also present in our dataset and compared our predicted pK<sub>a</sub> values for those to their experimental values. Using the technique described above, we calculated individual pK<sub>a</sub> values for these residues and compared them with those in the PKAD database. Calculation of a residue&#x2019;s protonation state was performed within a range from 1 to 14 in increments of a quarter of a pH unit. We treated the two points representing the protonation transition state as part of a linear regression and solved for the &#x201c;equivalence point&#x201d; between&#x20;them.</p>
</sec>
<sec id="s2-6">
<title>HINT Basis Interaction Maps</title>
<p>Each residue with its CA-CB bond along the z-axis, was placed within a three-dimensional box large enough to accommodate the structure of a residue, plus an additional 5&#xc5; on each dimension. These boxes, based on residue type, are as follows: ASP, &#x2013;8.5&#xa0;&#xc5; &#x2264; x &#x2264; 8.5&#xa0;&#xc5;; &#x2013;8.5&#xa0;&#xc5; &#x2264; y &#x2264; 8.5&#xa0;&#xc5;; &#x2013;7.5&#xa0;&#xc5; &#x2264; z &#x2264; 9.5&#xa0;&#xc5;, (42,875 points, 4,913&#xa0;&#xc5;<sup>3</sup>); GLU, &#x2013;8.5&#xa0;&#xc5; &#x2264; x &#x2264; 8.5&#xa0;&#xc5;; &#x2013;8.5&#xa0;&#xc5; &#x2264; y &#x2264; 8.5&#xa0;&#xc5;; &#x2013;7.5&#xa0;&#xc5; &#x2264; z &#x2264; 10.5&#xa0;&#xc5;, (45,325 points, 5,202&#xa0;&#xc5;<sup>3</sup>); and HIS, &#x2013;10.0&#xa0;&#xc5; &#x2264; x &#x2264; 10.0&#xa0;&#xc5;; &#x2013;10.0&#xa0;&#xc5; &#x2264; y &#x2264; 10.0&#xa0;&#xc5;; &#x2013;7.5&#xa0;&#xc5; &#x2264; z &#x2264; 9.5&#xa0;&#xc5;, (58,835 points, 6,800&#xa0;&#xc5;<sup>3</sup>); all with a point spacing of 0.5&#xa0;&#xc5;. As described previously (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>), HINT was used to calculate an interaction grid representing the 3D interaction space surrounding a residue of interest. In short, these maps interpret sums of pairwise HINT scores (<xref ref-type="bibr" rid="B33">Kellogg et&#x20;al., 1991</xref>; <xref ref-type="bibr" rid="B18">Kellogg and Abraham, 2000</xref>; <xref ref-type="bibr" rid="B45">Sarkar and Kellogg, 2010</xref>) into 3D map objects indicating position, intensity, and type of interaction between atoms of the residue and those close in proximity. Each grid point for a map was calculated, according to:<disp-formula id="equ2">
<mml:math id="m2">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>z</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>z</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>where <italic>&#x3c1;</italic>
<sub>
<italic>xyz</italic>
</sub> is the map interaction score at coordinates (<italic>x</italic>, <italic>y</italic>, <italic>z</italic>), <italic>x</italic>
<sub>
<italic>i</italic>j</sub>, <italic>y</italic>
<sub>
<italic>ij</italic>
</sub> and <italic>z</italic>
<sub>
<italic>ij</italic>
</sub> are coordinates of the midpoint of the vector between atoms <italic>i</italic> and <italic>j</italic>, and &#x3c3; is the width of the Gaussian map peak, 0.5 for our purposes (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>). Map data were calculated for sidechain atoms of all ASP, GLU, and HIS residues with individual maps for the four interaction classes: favorable/unfavorable polar and favorable/unfavorable hydrophobic.</p>
</sec>
<sec id="s2-7">
<title>Calculation of Map-Map Correlation Metrics</title>
<p>Comparison of two maps, <bold>m</bold> and <bold>n</bold>, are based on:<disp-formula id="equ3">
<mml:math id="m3">
<mml:mrow>
<mml:mi mathvariant="normal">if</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>1.0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>F</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>;</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:mi mathvariant="normal">else</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>where each raw map data point (<italic>G</italic>
<sub>
<italic>t</italic>
</sub>, for point at index <italic>t</italic>) is transformed to log<sub>10</sub> space and normalized with a predefined floor value, F &#x3d; 1.0. Similarity between maps <bold>m</bold> and <bold>n</bold>, defined as <italic>D</italic> (<bold>m</bold>,<bold>n</bold>) is calculated based on previous methods (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>):<disp-formula id="equ4">
<mml:math id="m4">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">n</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">n</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">n</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>In this metric, <italic>A</italic>
<sub>
<italic>t</italic>
</sub>(<bold>m</bold>) and <italic>A</italic>
<sub>
<italic>t</italic>
</sub>(<bold>n</bold>) are map values for the same grid points in maps <bold>m</bold> and <bold>n</bold>, respectively, and &#x7c;<italic>A</italic>&#x7c;<sub>max</sub> is the absolute max value of the grid points in <bold>m</bold> and <bold>n</bold>. Our map boxes are designed to accommodate all possible residue environments and usually contain a majority (&#x3e;60%) of zero-valued points. To mitigate the issue that all map pairs would appear similar, only points where &#x7c;<italic>A</italic>
<sub>
<italic>t</italic>
</sub>(<bold>m</bold>)&#x7c; &#x2265; 8&#x20;&#x7c;<italic>A</italic>(<bold>m</bold>)<sub>stddev</sub>&#x7c; or &#x7c;<italic>A</italic>
<sub>
<italic>t</italic>
</sub>(<bold>n</bold>)&#x7c; &#x2265; 8&#x20;&#x7c;<italic>A</italic>(<bold>n</bold>)<sub>stddev</sub>&#x7c; (<italic>A</italic>
<sub>stddev</sub> is the standard deviation of the average value of all points in the map) in calculating <italic>D</italic> (<bold>m</bold>,<bold>n</bold>) (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>) were considered.</p>
<p>
<italic>D</italic> (<bold>m</bold>,<bold>n</bold>) should normally range from 0 to 1, where 1 indicates identical maps; realistically, <italic>D</italic> (<bold>m</bold>,<bold>n</bold>) &#x3d; 0 cannot exist, as it would signify completely overlapping maps with opposite signs. Neither will <italic>D</italic> (<bold>m</bold>,<bold>n</bold>) &#x3d; 0.5 exist, as it would require completely non-overlapping maps. Typically, the minimum <italic>D</italic> thus falls between 0.6 and 0.7. To calculate the overall similarity (<bold>
<italic>D</italic>
</bold>
<sub>
<italic>all</italic>
</sub>) between two like residue maps <bold>m</bold> and <bold>n</bold>, one composite metric was calculated from four metrics containing data for the map quartet described above [hydro (&#x2b;), hydro (&#x2212;), polar (&#x2b;), and polar (&#x2212;), which are favorable and unfavorable hydrophobic (e.g. hydrophobic-polar) contributions, and favorable and unfavorable polar contributions to each map, respectively]. Here, <bold>
<italic>D</italic>
</bold> (<bold>m</bold>,<bold>n</bold>)<sub>
<italic>all</italic>
</sub> &#x3d; { 4[<italic>D</italic> (<bold>m</bold>,<bold>n</bold>)<sub>hydro(&#x2b;)</sub>] &#x2b; 2[<italic>D</italic> (<bold>m</bold>,<bold>n</bold>)<sub>hydro(&#x2013;)</sub>] &#x2b; [<italic>D</italic> (<bold>m</bold>,<bold>n</bold>)<sub>polar(&#x2b;)</sub>] &#x2b; [<italic>D</italic> (<bold>m</bold>,<bold>n</bold>)<sub>polar(&#x2013;)</sub>]&#x20;}/8.</p>
<p>The favorable and unfavorable hydrophobic interactions were scaled by 4 and 2, respectively; these two terms are more subtle, diverse and potentially information-rich, than those driven by electrostatic, particularly ionic, interactions.</p>
<p>Also, to reduce the computational burden, we applied a first-pass similarity filter (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>) to our matrix calculations to remove certain residues from further consideration because many maps are highly similar as they share highly similar environments, and thus can be removed to avoid redundancy. This significantly scales down our pool of calculations, which is significant as several steps scale more or less as&#x20;n<sup>2</sup>.</p>
<p>As described previously (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>), all above calculations were performed with in-house-written programs that exploit the inherent parallelism of our methods with GPUs, specifically used to calculate maps and similarity matrices.</p>
</sec>
<sec id="s2-8">
<title>Clustering and Validation</title>
<p>We utilized the freely available R programming language and environment (<xref ref-type="bibr" rid="B43">R Core Team, 2013</xref>) to perform our clustering analysis on the pairwise map similarity matrices calculated above. We determined (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>) that for our purposes, out of a number of different clustering methods, the k-means method was most reliable. Through the experience of our previous reports (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B1">Ahmed et&#x20;al., 2019</xref>) and preliminary studies here, we opted to set a uniform maximum number of clusters of 12 for each chess square-parse combination. This allows for significant map diversity and facilitates inter-chess square/inter-residue comparisons. Most <bold>
<italic>chess squares/parses</italic>
</bold>, however, had fewer than 12 clusters in their optimal solutions. Additionally, k-means clustering will not form singleton clusters, i.e.,&#x20;with a single member. However, while this is fairly rare (&#x223c;5%), these maps could be interesting, so our protocols are designed to optionally recover them by reconstructing the cluster solutions with the missing singletons. Any chess square-parse with four or fewer maps was not clustered, but, instead, averaged to create what is, effectively, a 1-cluster&#x20;case.</p>
</sec>
<sec id="s2-9">
<title>Average Map, RMSD, and Solvent-Accessible Surface Area Calculations</title>
<p>Careful consideration must be given to calculation of average maps. First, to avoid what we have described as &#x201c;brown mapping&#x201d; (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>), only maps sharing high similarity should be combined. Second, the average maps are calculated by Gaussian weighting (<italic>w</italic>) the contribution of each map with respect to its Euclidean distance from the cluster centroid, given by:<disp-formula id="equ5">
<mml:math id="m5">
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mi>d</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>where <italic>d</italic> is the map&#x2019;s distance from the centroid and &#x3c3; &#x3d; <italic>d</italic>
<sub>max</sub>/8, which is the average of all maximum distances across all clusters in the chess square. This weighting ensures that maps closer to the centroid contribute more significantly to the average map of the cluster, whereas taking a flat average of all map data would overweight the importance of maps further from the centroid. While a formal definition exists for &#x201c;exemplar&#x201d; in affinity propagation clustering, for our purposes, it represents the residue datum closest to the centroid of each cluster output by the k-means algorithm.</p>
<p>RMSDs (root-mean square distances) for each residue type were calculated by weighted averaging, as above, all atomic positions from all residues in a cluster to construct one average residue structure. For each non-hydrogen atom, an RMSD was calculated from the average structure, and then all atomic values were averaged to obtain the reported RMSD for the cluster.</p>
<p>We calculated SASAs for all residue sidechains using the GETAREA algorithm (<xref ref-type="bibr" rid="B21">Fraczkiewicz and Braun, 1998</xref>) and its default settings. The protein coordinates in PDB files were submitted as input. Also from GETAREA&#x2019;s &#x201c;In/Out&#x201d; parameter, we created a new metric &#x201c;<italic>f</italic>
<sub>
<italic>outside</italic>
</sub>&#x201d; to represent the buriedness of the set of residues in a cluster, parse, chess square, etc. by recasting &#x201c;In&#x201d; as 0.0, &#x201c;Out&#x201d; as 1.0 and &#x201c;indeterminant&#x201d; as 0.5, and averaging the&#x20;set.</p>
</sec>
</sec>
<sec sec-type="results|discussion" id="s3">
<title>Results and Discussion</title>
<sec id="s3-1">
<title>Dataset: Binning and Parsing Residues</title>
<p>From the dataset of 2,703 protein structures described in Methods, we extracted 42,713 ASPs, 49,306 GLUs, and 15,276 HISs, all of which were non-terminal residues. An 8 by 8 chessboard was overlaid on a standard Ramachandran plot (<xref ref-type="bibr" rid="B44">Ramachandran et&#x20;al., 1963</xref>), such that each grid square has dimensions of 45&#xb0; by 45&#xb0; in &#x3d5;&#x2013;&#x3c8; space and the extents of the board are shifted slightly to contain regions of high residue population density in single squares (<xref ref-type="fig" rid="F1">Figure&#x20;1</xref>), named as <bold>
<italic>a1</italic>
</bold> through <bold>
<italic>h8</italic>
</bold>. We binned residues into each square by their backbone &#x3d5; and &#x3c8; angles and further parsed them by their &#x3c7;<sub>1</sub> angles into three groups corresponding to those normally observed in rotamer libraries (<xref ref-type="bibr" rid="B48">Shapovalov and Dunbrack, 2011</xref>): a group averaging &#x223c;60&#xb0;, a group averaging &#x223c;180&#xb0;, and a group averaging &#x223c;300&#xb0; from here on referred to as the &#x201c;0.60&#x201d;, &#x201c;0.180&#x201d;, and &#x201c;0.300&#x201d; parses. In the case of GLU, residues were still further parsed by their &#x3c7;<sub>2</sub> angles, yielding a total of nine parses for this residue: &#x201c;0.60.60&#x201d;, &#x201c;0.60.180&#x201d;, &#x201c;0.60.300&#x201d;, &#x201c;0.180.60&#x201d;, &#x201c;0.180.180&#x201d;, &#x201c;0.180.300&#x201d;, &#x201c;0.300.60&#x201d;, &#x201c;0.300.180&#x201d; and &#x201c;0.300.300&#x201d; (<xref ref-type="fig" rid="F2">Figure&#x20;2</xref>). We showed previously (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>) that map-based clustering was able to easily identify this (&#x3c7;<sub>1</sub>, &#x3c7;<sub>2</sub>) low level of detail, except for surface-exposed residues that show few interactions with anything apart from solvent. However, even a few such failures were problematical in calculating average maps and residue coordinates. Furthermore, parsing of the chess square members into &#x3c7; bins increased computational efficiency. (Many calculations scale as n<sup>2</sup>: 3&#x20;&#xd7; (n/3)<sup>2</sup> &#x3c; n<sup>2</sup>). The additional &#x3c7;<sub>2</sub> parse for GLU further reduced the computations and made the ASP and GLU data more comparable, i.e.,&#x20;the (unparsed) remainder of their sidechains is the same &#x2013;C&#x2013;COOH fragment.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Ramachandran chessboard displaying the chess square/parse population for aspartic acid. The Ramachandran &#x3d5; vs. &#x3c8; plot is rendered into 64&#x20;45&#xb0; by 45&#xb0; (&#x3c0;/4 by &#x3c0;/4) chess squares. The (&#x3c7;<sub>1</sub>) parse populations for ASP are represented in log<sub>10</sub> scale with the colored bars. Their colors reflect the average weighted fraction outside or solvent exposed, i.e.,&#x20;&#x201c;<italic>f</italic>
<sub>
<italic>outside</italic>
</sub>&#x201d;, a measure of solvent accessibility (see text for definition). The &#x3d5; vs. &#x3c8; regions associated with &#x3b2;-pleat, &#x3b1;-helix, and left-hand &#x3b1;-helix secondary structure motifs are shaded in light purple, light orange and light green chess squares, respectively.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g001.tif"/>
</fig>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>The &#x3c7;<sub>1</sub> and &#x3c7;<sub>2</sub> rotamer parses. CB (black) has three &#x3c7;<sub>1</sub> rotamers (dark gray, CG): 0.60, 0.180 and 0.300. Each of those, for GLU, has three &#x3c7;<sub>2</sub> rotamers (light gray, CD), as&#x20;shown.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g002.tif"/>
</fig>
<p>Throughout this work, chess square names will be given in bold italics, e.g., a1, b4, etc. The &#x3c7;<sub>1</sub> parses for ASP and HIS will be denoted by the suffixes 0.60, 0.180 and 0.300 and the &#x3c7;<sub>1</sub>/&#x3c7;<sub>2</sub> parses for GLU will be denoted by the suffixes 0.60.60, 0.60.180, 0.60.300,&#x20;etc.</p>
<p>The occupancies of the chess square/parses range from 0 to 6,215 (<bold>
<italic>d4.300</italic>
</bold>) for aspartate, to 4,563 (<bold>
<italic>d4.300.180</italic>
</bold>) for glutamate, and to 1,504 (<bold>
<italic>d4.180</italic>
</bold>) for histidine. For aspartate, 44 (of 64) chess squares contain 10 or more residues, and 159 chess squares/parses (of 192) are occupied at all. These metrics are 40/64 and 356/576 for glutamate and 32/64 and 120/192 for histidine. <xref ref-type="sec" rid="s9">Supplementary Table S1</xref> provides occupancies in the Ramachandran chessboards for these three residues. To simplify nomenclature in this article, we are using a numerical scheme wherein the sequential number of that residue in its chess square/parse is its name. Thus, histidine 100 in chess square <bold>
<italic>a1.60</italic>
</bold> is the 100th histidine contained within that chess square/parse combination, as tabulated in <xref ref-type="sec" rid="s9">Supplementary Table S2</xref>, wherein the specific actual PDB ID, chain, residue name, etc. for each datum in this study can be found. Clusters (<italic>vide infra</italic>) will be named for the residue closest to its centroid or exemplar and will be given in bold numerals.</p>
<p>The Ramachandran plot generally contains four regions associated with specific secondary structure motifs. According to our schema (<xref ref-type="fig" rid="F1">Figure&#x20;1</xref>), fifteen chess squares (<bold>
<italic>a1</italic>
</bold>, <bold>
<italic>a6</italic>
</bold>, <bold>
<italic>a7</italic>
</bold>, <bold>
<italic>a8</italic>
</bold>, <bold>
<italic>b1</italic>
</bold>, <bold>
<italic>b2</italic>
</bold>, <bold>
<italic>b7</italic>
</bold>, <bold>
<italic>b8</italic>
</bold>, <bold>
<italic>c1</italic>
</bold>, <bold>
<italic>c2</italic>
</bold>, <bold>
<italic>c6</italic>
</bold>, <bold>
<italic>c7</italic>
</bold>, <bold>
<italic>c8</italic>
</bold>, <bold>
<italic>d1</italic>
</bold> and <bold>
<italic>d8</italic>
</bold>) correspond to the &#x3b2;-pleat motif, seven chess squares (<bold>
<italic>b4</italic>
</bold>, <bold>
<italic>b5</italic>
</bold>, <bold>
<italic>b6</italic>
</bold>, <bold>
<italic>c4</italic>
</bold>, <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d4</italic>
</bold> and <bold>
<italic>d5</italic>
</bold>) correspond to the right-hand &#x3b1;-helix motif and five chess squares (<bold>
<italic>f5</italic>
</bold>, <bold>
<italic>f6</italic>
</bold>, <bold>
<italic>f7</italic>
</bold>, <bold>
<italic>g5</italic>
</bold> and <bold>
<italic>g6</italic>
</bold>) correspond to the left-hand &#x3b1;-helix motif. The remaining chess squares, some of which may contain mixtures of secondary structural motifs, account for the remaining residues.</p>
<p>Calculations in this study were performed for all Ramachandran chess squares, but, for brevity&#x2019;s sake, we focus our discussion on a particular four, designed to sample the three major regions of the standard Ramachandran plot: <bold>
<italic>b1</italic>
</bold>, <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d5</italic>
</bold> and <bold>
<italic>f6</italic>
</bold>. The <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d5</italic>
</bold> pair allows us to compare independently-calculated map and environment data between chess squares within the same right-hand &#x3b1;-helix structural motif region.</p>
</sec>
<sec id="s3-2">
<title>Ionization State Optimization</title>
<p>While our primary goal for this study is to evaluate the hydropathic environments of the ASP, GLU and HIS residue types, a key requirement was to use molecular models that are in appropriate ionization states. We were also interested in examining the effects of these ionization states on the residue environments. Also, such structures (and 3D maps) should have rational and tunable pH dependencies to enable prediction of structure, properties, and function.</p>
<p>As the local environment heavily influences protonation states of ionizable residues, we updated the computational titration algorithm that we reported earlier (<xref ref-type="bibr" rid="B18">Kellogg and Abraham, 2000</xref>; <xref ref-type="bibr" rid="B20">Fornabaio et&#x20;al., 2003</xref>) to optimize the ionization state (and concomitantly the&#x2013;C&#x2013;O&#x2013;H dihedral angle) of all residues in this study. Briefly (Methods), we calculated the HINT score between each residue and its local environment in each of its possible ionization/rotameric states (3 for ASP and GLU, 8 for HIS, <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>). These scores were modified by pK<sub>a</sub>- and pH dependent factors derived from the Henderson-Hasselbalch equation. It is important to emphasize that all these calculations were performed without changing the atomic positions of the non-hydrogen atoms&#x2014;except for the &#x3c0; rotation about &#x3c7;<sub>2</sub> shown on the right side of <xref ref-type="fig" rid="F3">Figure&#x20;3B</xref>. In other words, all models generated and scored are isocrystallographic. The highest-scoring model of the set generated for each residue was selected for moving forward in the study. We note an advantage here: since the positions of the heavy atoms are fixed based on their X-ray structures, calculations will likely identify the protonation model most favorable for that conformation.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Various possibilities for ASP, GLU and HIS ionization/rotameric states. <bold>(A)</bold> ASP, GLU, and <bold>(B)</bold> HIS sidechain functional groups. Red &#x3d; Lewis acid, blue &#x3d; Lewis base, green &#x3d; hydrophobic. Note that &#x201c;ring flips&#x201d; of HIS present distinct patterns for interaction.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g003.tif"/>
</fig>
<sec id="s3-2-1">
<title>Aspartic Acid</title>
<p>We calculated the optimal structure for each studied aspartic acid at a range of pHs. For this residue, where the pK<sub>a</sub> is 3.65, we determined the fraction of the nearly 43,000 residues protonated at pHs from 0 through 8. The result, which is reminscent of a titration curve, is shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>. Our calculations yielded the total fraction of aspartic acids expected to be protonated at pHs 0 through 8 in increments of 1 with an overall titration curve centered close to the nominal ASP pK<sub>a</sub> and differing, overall, by &#x223c;0.31 pH units. Our calculations suggest that residue backbone structure has an impact on levels of protonation. Our data (<italic>vide infra</italic>) also suggest that differences in secondary structure have an effect on solvent accessibility: these two phenomena are intimately linked, and in fact difficult to separate. pK<sub>a</sub> shifts associated with differences in solvent-accessible surface area are known, as less solvent exposure may increase the pK<sub>a</sub>s of acidic residues (<xref ref-type="bibr" rid="B25">Harms et&#x20;al., 2009</xref>). Highly solvent-exposed residues are, in practice, <italic>in vacuo</italic> in many protein structure models so that there are no inter-residue interactions to account for. The pH in our calculations at which the aspartic acids are 50% ionized (which we are calling pH<sub>50</sub>) is 3.345. While this is an arbitrary value, we will use pH<sub>50</sub>s as set points for map calculations (see below).</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Titration curves of ASP residues by secondary structure. The native pK<sub>a</sub> for aspartic acid is indicated.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g004.tif"/>
</fig>
</sec>
<sec id="s3-2-2">
<title>Glutamic Acid</title>
<p>The titration curves for the over 49,000 GLU residues in our study are shown in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>. These look very similar to those of ASP and, in the same way, center very closely to its native experimental pK<sub>a</sub>. In fact, the average calculated GLU pK<sub>a</sub> deviated from the experimentally-determined pK<sub>a</sub> for the GLU model peptide by only &#x223c;0.03 pH units. There is also seemingly less secondary structure dependence for these results, which is likely due to differences in solvent accessibility between ASP and GLU sidechains. pH<sub>50</sub> for our glutamic acid data is&#x20;4.224.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Titration curves of GLU residues by secondary structure. The native pK<sub>a</sub> for glutamic acid is indicated.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g005.tif"/>
</fig>
</sec>
<sec id="s3-2-3">
<title>Histidine</title>
<p>This residue type potentially has three different protonation states, resulting in four unique protonation patterns (<xref ref-type="fig" rid="F3">Figure&#x20;3</xref>), compared to ASP&#x2019;s and GLU&#x2019;s two, and thus tells a more complicated story (<xref ref-type="fig" rid="F6">Figure&#x20;6</xref>). In addition to the expected HIS to HIS<sup>&#x2b;</sup> protonation, HIS can be deprotonated to HIS<sup>&#x2212;</sup> (<xref ref-type="bibr" rid="B6">Ascone et&#x20;al., 1997</xref>) in exceedingly rare cases, such as Cu, Zn superoxide dismutase. We simulated the titration of more than 15,000 HIS residues in our dataset together and separately by their secondary structure. According to our calculations, in the neutral state, a greater fraction of HIS residues were protonated at the &#x3b5;-nitrogen in all secondary structures. However, factors contributing to protonation of HIS are much more complicated, including solvent accessibility and conformational changes, discussed later. The deviation of our calculated pH<sub>50</sub> of 5.174 from the nominal HIS pK<sub>a1</sub> of 6.00 is greater for HIS than those of ASP and GLU, here &#x223c;0.83 pH units. Also interesting is that apparently only around 80% of HIS residues can even be protonated to HIS<sup>&#x2b;</sup>, likely due to steric contraints disallowing that configuration, but for HIS in left-hand &#x3b1;-helix conformations, 90% can be protonated, presumably due to less structural constraint imposed by that backbone&#x20;motif.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Titration curves of HIS residues by secondary structure. The native pK<sub>a1</sub> for histidine is indicated. Full deprotonation of HIS to HIS<sup>&#x2212;</sup> is shown with data colored in gray and right-hand y-axis.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g006.tif"/>
</fig>
</sec>
<sec id="s3-2-4">
<title>Summary of pH Optimization Results</title>
<p>Although this was a secondary goal, our predictions for residue pK<sub>a</sub>s are reasonable enough (<xref ref-type="sec" rid="s9">Supplementary Table S3</xref>) that the molecular models upon which our 3D maps are constructed are likely to be correct, as least as snapshots of them in the dynamic biological solution. Our algorithm tends to simulate ionization for <italic>highly solvent-exposed</italic> residues in protonated forms (charge neutral for ASP and GLU and cationic for HIS). As noted above, there are no interacting residues and (usually) few or no explicit water molecules in the protein models for such residues to aid in the estimation, and the few interactions that are found prefer uncharged species. Our simulation of &#x201c;bulk&#x201d; solvent is only through the pressure applied by the external pH term in the Henderson-Hasselbalch relation. For high-level pK<sub>a</sub> estimations, clearly more rigorous consideration of solvent molecules and, as <xref ref-type="bibr" rid="B23">Friedman (2011)</xref> showed, ions, may provide more accurate predictions of ionization states. However, on the &#x223c;10<sup>5</sup> case scale of this study, we used our more practical and accessible approach.</p>
<p>Interestingly, the easier to experimentally determine pK<sub>a</sub>s of surface residues (<xref ref-type="bibr" rid="B19">Fitch et&#x20;al., 2002</xref>) contrasts with the easier to calculate pK<sub>a</sub>s of more buried residues, and there is not really a lot of experimental data available. The ionization state-optimized molecular models, which are more important for our purposes, are likely to be quite reasonable except in edge cases. The computationally more problematical highly solvent-exposed residues are fully immersed in water and are thus less participatory in protein structure. We will show below that the edge cases, themselves, are also not a significant issue because it is interactions that are assayed by the maps, and an ASP, GLU or HIS can be a donor and/or an acceptor.</p>
</sec>
</sec>
<sec id="s3-3">
<title>Calculation of Hydropathic Environment Maps</title>
<p>Based on methods in our previous reports (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B1">Ahmed et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B3">AL Mughram et&#x20;al., 2021</xref>) we evaluated interatomic interactions using the <italic>HINT</italic> force field and score model (<xref ref-type="bibr" rid="B33">Kellogg et&#x20;al., 1991</xref>; <xref ref-type="bibr" rid="B33">Kellogg et&#x20;al., 1991</xref>; <xref ref-type="bibr" rid="B45">Sarkar and Kellogg, 2010</xref>), which uses two atom-centered parameters <italic>a</italic>
<sub>
<italic>i</italic>
</sub> and <italic>S</italic>
<sub>
<italic>i</italic>
</sub>, the partial log <italic>P</italic>
<sub>
<italic>o/w</italic>
</sub> (for 1-octanol and water solute transfer) and a term related to solvent accessible surface area, respectively, for atom <italic>i</italic> to score atom-atom interactions (see Materials and Methods). We have reported previously on <italic>HINT</italic>&#x2019;s ability to estimate changes in free energy for ligand-protein, protein-protein and other complexes in various systems, (<xref ref-type="bibr" rid="B12">Burnett et&#x20;al., 2000</xref>; <xref ref-type="bibr" rid="B11">Burnett et&#x20;al., 2001</xref>; <xref ref-type="bibr" rid="B14">Cozzini et&#x20;al., 2004</xref>; <xref ref-type="bibr" rid="B15">Da et&#x20;al., 2013</xref>), such that &#x223c;500&#x20;<italic>HINT</italic> score units correlate well with a &#x394;&#x394;G &#x3d; &#x2212;1&#xa0;kcal mol<sup>&#x2212;1</sup>.</p>
<p>As stated above, one of our primary hypotheses is that there is a limited set of unique 3D hydropathic interaction environments that satisfy the &#x201c;valence&#x201d; of a residue. These valences are based on interaction types, strengths and geometry. For example, as we showed in previous work (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>) the phenol hydroxyl of tyrosine can make favorable polar interactions with an appropriately positioned hydrogen bond donor and/or acceptor, and it can take the form of a backbone amide, another polar sidechain, or a water molecule. In contrast, our alanine maps showed fewer unique interactions, with its methyl sidechain and no rotamers, but about four to six specific patterns appeared to be conserved (<xref ref-type="bibr" rid="B1">Ahmed et&#x20;al., 2019</xref>). Consistent in both of these studies is that we only need to be focused on the interactions that a residue makes with its environment by class, not by the specific donor-acceptor pair or residue type identities. In other words, the <italic>type</italic> of interaction, its strength and location are more significant than its participants.</p>
<p>Maps were constructed within rectangular boxes tailored to be large enough to contain each of our three studied residue types with its interacting atoms (<italic>Materials and Methods</italic>). These maps are calculated to quantify the strength of the variety of interactions each residue in our dataset makes with the other atoms in its environment. Our maps categorize interactions in &#x201c;quartets&#x201d; of four separate types: favorable polar, unfavorable polar, favorable hydrophobic and unfavorable hydrophobic. Our previous work on tyrosine (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>) and alanine (<xref ref-type="bibr" rid="B1">Ahmed et&#x20;al., 2019</xref>) examined the hydropathic environments as stand-ins for structure. Here, we exploit these maps that encode extensive information concerning the structural roles of the carboxylates and sidechains of aspartate and glutamate and the dual proton acceptor-donor nature of histidine&#x2019;s imidazole. Our map data further use this information to account for the environments that potentially stabilize any of these residue&#x0027;s ionization states, particularly in response to changes in&#x20;pH.</p>
</sec>
<sec id="s3-4">
<title>Evaluating the Fundamental Patterns in the Maps</title>
<p>To extract the information encoded in the 3D hydropathic interaction maps, we first developed a map-map similarity metric (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>) to score two maps <bold>m</bold> and <bold>n</bold> (section <italic>Materials and Methods</italic>). In brief, the overall similarity (<bold>
<italic>D</italic>
</bold>
<sub>
<italic>all</italic>
</sub>) between two like residue maps <bold>m</bold> and <bold>n</bold>, is comprised of a single scalar metric derived by the linear combination of four terms, one for each member of the map quartet contributions to each map, respectively. These scalars were loaded in square matrices, for each chess square and parse, for statistical analysis. Next, we clustered these matrices with k-means clustering within the R programming environment. As described in <italic>Materials and Methods</italic>, we set a maximum number of 12 clusters per chess square-parse combination; this was sufficient for capturing the diversity of residue environments while balancing computational efficiency. <xref ref-type="sec" rid="s9">Supplementary Table S4</xref> sets out the number of clusters found on a chess square-parse basis for the three residue types in this&#x20;study.</p>
</sec>
<sec id="s3-5">
<title>Hydropathic Interaction Maps</title>
<p>The objective of examining maps is to view 3D representations of the positions and magnitudes of the constellation of interactions made by residues. We expected that secondary structural differences affect the interactions a residue makes with its environment, which we enforced with the chessboard schema. Additionally, the parse inside each chess square may impact these interactions. For these reasons, we focused the analysis presented here on four particular chess squares, <bold>
<italic>b1</italic>
</bold>, <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d5</italic>
</bold> and <bold>f6</bold>, to survey the environments from each of the three secondary structural regions of the Ramachandran plot, as in previous reports (<xref ref-type="bibr" rid="B1">Ahmed et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B3">AL Mughram et&#x20;al., 2021</xref>). We performed complete studies for all three residues at pHs 3, 5, 7, and 9 and at the pH for each residue at which half of all of that type of residue were protonated, which we named pH<sub>50</sub> above. However, we only constructed visual map contours displays at each residue&#x2019;s pH<sub>50</sub>, as we believed this pH would be best representative of the diversity of maps in protonated and deprotonated&#x20;cases.</p>
<sec id="s3-5-1">
<title>Aspartic Acid</title>
<p>Aspartic acid, by nature, is an extremely polar residue, owing to its carboxy acid sidechain. For this reason, we expected to see two things: 1) a plethora of maps indicating strong favorable and unfavorable polar interactions localized around the carboxylate end of the sidechain and 2) many clusters of maps with high solvent-accessible surface areas, due to the high presence of ASP residues on protein exteriors. Indeed, many clusters of ASP within our studied chess squares show intense positive and negative polar interactions surrounding the carboxylate, particularly in clusters with low SASA. Those maps that appear largely void of interactions are in clusters with high solvent-accessible surface area, where, as we noted above, there are no residue-protein interactions.</p>
<p>For brevity, we are discussing in more detail ASP residues in the <bold>
<italic>b1</italic>
</bold> chess square, but further detail on the <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d5</italic>
</bold> and <bold>
<italic>f6</italic>
</bold> chess square results are in Supporting Information. Aspartic acid residues in the <bold>
<italic>b1</italic>
</bold> chess square appear to be, comparatively, the least solvent-exposed of the four squares, yielding more robust sidechain interactions; this point is the subject of further discussion in a later section. <xref ref-type="fig" rid="F7">Figures 7</xref>&#x2013;<xref ref-type="fig" rid="F9">9</xref> display the contoured maps for ASP in the 60&#xb0;, 180&#xb0; and 300&#xb0; parses of <bold>
<italic>b1</italic>
</bold>, respectively. The percentile contribution of each cluster to the chess square/parse is listed, along with the average GETAREA (<xref ref-type="bibr" rid="B21">Fraczkiewicz and Braun, 1998</xref>) SASA (<italic>S</italic>) and the fraction of the members of that cluster that are protonated (<italic>f</italic>
<sub>
<italic>prot</italic>
</sub>).</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Hydropathic interaction maps displaying the Gaussian-weighted average sidechain environments of aspartic acid in the &#x3c7;<sub>1</sub> &#x3d; 60&#xb0; parse of the <italic>b1</italic> chess square at pH &#x3d; 3.345. Two map viewpoints are given for each cluster, whose ID is given in bold. The left map in each pair is oriented such that the CA-CB z-axis bond points upward, while the right is oriented to point it out of the page. The x-axis is oriented horizontally in both. The percentage indicates the fraction of the parse represented by that cluster. <italic>S</italic> represents the solvent accessible surface area in &#xc5;<sup>2</sup>, and <italic>f</italic>
<sub>
<italic>prot</italic>
</sub> indicates the fraction of the cluster protonated at pH<sub>50</sub>. Blue contours indicate positive polar interactions made with the sidechain, and red indicates negative polar interactions, while green and purple indicate positive and negative hydrophobic interactions, respectively.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g007.tif"/>
</fig>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Hydropathic interaction maps displaying the Gaussian-weighted average sidechain environments of aspartic acid in the &#x3c7;<sub>1</sub> &#x3d; 180&#xb0; parse of the <italic>b1</italic> chess square at pH &#x3d; 3.345. See caption for <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g008.tif"/>
</fig>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Hydropathic interaction maps displaying the Gaussian-weighted average sidechain environments of aspartic acid in the &#x3c7;<sub>1</sub> &#x3d; 300&#xb0; parse of the <italic>b1</italic> chess square at pH &#x3d; 3.345. See caption for <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g009.tif"/>
</fig>
<p>One significant point is that the displayed contours, as they represent a map, are showing <italic>interactions</italic>. Thus, cases where the ASP is ionized (acting as an H-bond acceptor) interacting with a donor could be indistinguishable from cases where the ASP is protonated (acting as a donor) interacting with an acceptor. Thus, it is entirely reasonable for some clusters to have a mixture of ionized and protonated ASPs, although most have <italic>f</italic>
<sub>
<italic>prot</italic>
</sub> &#x2264; 0.2 or <italic>f</italic>
<sub>
<italic>prot</italic>
</sub> &#x2265; 0.8. Most interactions shown are of the positive polar type, which is appropriate, given the role we expect ASP to serve. These are the prominent, mostly blue contours near the carboxy acid/carboxylate oxygens that signify hydrogen bonds between one or both of these atoms and their environment. Additionally, many clusters in buried environments with low SASA (&#x3c;20&#xa0;&#xc5;<sup>2</sup>) were calculated to be largely deprotonated, i.e.,&#x20;ASP in this environment is acting as a hydrogen bond acceptor. However, some clusters showed high degrees of protonation at pH<sub>50</sub> &#x3d; 3.345, such as clusters <bold>12</bold>, <bold>118</bold> and <bold>540</bold> in <bold>
<italic>b1.60</italic>
</bold> (<xref ref-type="fig" rid="F7">Figure&#x20;7</xref>) and <bold>84</bold> in <bold>
<italic>b1.300</italic>
</bold> (<xref ref-type="fig" rid="F9">Figure&#x20;9</xref>). Cluster <bold>84</bold>, in particular, showed protonation of 77% of its members with a SASA of 13&#x20;&#xb1; 12&#xa0;&#xc5;<sup>2</sup> at this&#x20;pH.</p>
<p>Contour maps for the <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d5</italic>
</bold> and <bold>
<italic>f6</italic>
</bold> chess squares show largely similar map profiles, and are presented in <xref ref-type="sec" rid="s9">Supplementary Figures S1, S2</xref> for <bold>
<italic>c5</italic>
</bold> parses 0.60, 0.180 and 0.300, respectively; in <xref ref-type="sec" rid="s9">Supplementary Figures S4&#x2013;S6</xref> for <bold>
<italic>d5</italic>
</bold> parses 0.60, 0.180 and 0.300, respectively; and in <xref ref-type="sec" rid="s9">Supplementary Figures S7&#x2013;S9</xref> for <bold>
<italic>f6</italic>
</bold> parses 0.60, 0.180 and 0.300, respectively. Further numerical data supporting these results and encompassing all chess squares is provided in <xref ref-type="sec" rid="s9">Supplementary Figure S5</xref>. In summary, each map appears to be a backbone-specific representation of a unique collection of interactions made by an aspartate/aspartic acid residue. To demonstrate this, we calculated inter-cluster similarities using the previously described algorithms. The average cluster-cluster similarities <italic>within</italic> chess squares are: 0.799 in <bold>
<italic>b1</italic>
</bold>, 0.795 in <bold>
<italic>c5</italic>
</bold>, 0.791 in <bold>
<italic>d5</italic>
</bold>, and 0.802 in <bold>
<italic>f6</italic>
</bold> chess squares. However, a few pairs of cluster maps in the adjacent chess squares <bold>
<italic>c5</italic>
</bold> and <bold>
<italic>d5</italic>
</bold> have similarities of &#x3e;0.900: <bold>637</bold> (<bold>
<italic>c5.60</italic>
</bold>) and <bold>146</bold> (<bold>
<italic>d5.60</italic>
</bold>)<italic>,</italic> <bold>57</bold> (<bold>
<italic>c5.180</italic>
</bold>) and <bold>70</bold> (<bold>
<italic>d5.180</italic>
</bold>), and <bold>217</bold> (<bold>
<italic>c5.300</italic>
</bold>) and <bold>58</bold> (<bold>
<italic>d5.300</italic>
</bold>), indicating that backbone secondary structural elements may encode inherent similarities in the kinds of environments likely to surround a given residue.</p>
</sec>
<sec id="s3-5-2">
<title>Glutamic Acid</title>
<p>Glutamic acid tells a very similar story to that of aspartic acid, so many of the points made for that residue stand here, as well. First, the bulk of interactions made with the GLU sidechain are of the positive polar type, followed by negative polar. Again, many clusters were also calculated to have high SASA. Also, we calculated GLU maps with three times as many parses as ASP (<italic>vide supra</italic>), due to the 1-carbon extension to its sidechain, making the number of clusters about three times as many. We believed it is redundant to showcase maps for every average cluster in every subparse. Instead, we have chosen to focus on the <bold>
<italic>b1</italic>
</bold> chess square and show maps of its highest occupied clusters in each parse (<xref ref-type="fig" rid="F10">Figure&#x20;10</xref>). This collection is representative of the 67&#x20;<bold>
<italic>b1</italic>
</bold> clusters, and suggests the diversity of sidechain orientations available in the full map set. One aspect of the GLU maps that we expected to see was an amplified presence of hydrophobic interactions compared to the ASP maps. However slightly, the maps of these specific clusters do show some indication of additional hydrophobic interactions localized around the hydrophobic chain, although these interactions appear more likely in the lower population parses. Their lack of visibility in <xref ref-type="fig" rid="F10">Figure&#x20;10</xref> may be more due to the limitations of contouring at consistent values than anything else, but perhaps the expected hydrophobic interactions with this sidechain are actually rare or have backbone conformation dependence. A confounding factor certainly is that GLU is even more solvent exposed than ASP, and this will be explored below. Numerical data for all GLU chess squares is provided in <xref ref-type="sec" rid="s9">Supplementary Table&#x20;S6</xref>.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Hydropathic interaction maps displaying the Gaussian-weighted average sidechain environments of glutamic acid in the highest populated clusters of the nine parses of the <italic>b1</italic> chess square at pH &#x3d; 4.224. Residues are oriented such that the CA-CB z-axis points upward and the x-axis runs to the right. The parses of the &#x3c7;<sub>1</sub> and &#x3c7;<sub>2</sub> angles are indicated along the side of each map. The cluster ID and number of clusters in the parse are given above the map in black and red, respectively. Below each map, in blue, is indicated the fraction of the entire chess square represented by each map, followed in black by the parse&#x2019;s representative fraction of the chess square. Blue contours indicate position and magnitude of positive polar interactions near the sidechain, while red represents negative polar interactions. Green and purple contours indicate positive and negative hydrophobic interactions, respectively.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g010.tif"/>
</fig>
</sec>
<sec id="s3-6-1">
<title>Histidine</title>
<p>Histidine naturally tells very much a different story from ASP and GLU. Its imidazole sidechain can play numerous roles in protein structure. Not only does it have more protonation states than the acidic residues we have discussed, but its two nitrogens can act as either (or both) hydrogen bond donors and acceptors in any combination. Its ring is partially hydrophobic and aromatic, meaning it can make any variety of polar, nonpolar, and &#x3c0;-&#x3c0; stacking interactions with other residues. These &#x3c0;-&#x3c0; stacking interactions with aromatic residues, for example, may be indicated in maps where the ring is bordered by large, flat, green contours. This brand of versatility is very clearly indicated in our generated maps for HIS. <xref ref-type="fig" rid="F11">Figure&#x20;11</xref> displays the contour maps for the HIS <bold>
<italic>b1.60</italic>
</bold> chess square parse. <xref ref-type="sec" rid="s9">Supplementary Figures S20&#x2013;S30</xref> for histidine maps in the <bold>
<italic>b1.180</italic>
</bold>, <bold>
<italic>b1.300</italic>
</bold> parses and all parses of the <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d5</italic>
</bold> and <bold>
<italic>f6</italic>
</bold> chess squares. The patterns in these maps are complex, but interpretable in terms of the interaction types. A detailed description for all 12 clustered maps in the 0.60 parse of the <bold>
<italic>b1</italic>
</bold> chess square would be too much for here, but first, it is clear that all maps displayed here (and in <xref ref-type="sec" rid="s9">Supplementary Figures S20&#x2013;S30</xref>) represent unique sets of interaction features, or routes to complete the residue&#x0027;s hydropathic valences. Consider cluster <bold>31</bold> in the <bold>
<italic>b1.60</italic>
</bold> map set (<xref ref-type="fig" rid="F11">Figure&#x20;11</xref>): 93.3% of the histidines in this cluster are protonated, it has mid-range solvent exposure, the CB methylene is making hydrophobic interactions (green) with its environment, and the protonated NE is engaged in a hydrogen bonding interaction (blue) largely perpendicular to the ring. Cluster <bold>235</bold> here is singly protonated at NE, which enagages with an on-axis hydrogen bond, and has very low solvent exposure, and its environment is dominated by hydrophobic interactions, both favorable (green) and unfavorable (purple), with the former above the ring and the latter below the ring. Comprehensive numerical data for all chess squares of histidine is provided in <xref ref-type="sec" rid="s9">Supplementary Table&#x20;S7</xref>.</p>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption>
<p>Hydropathic interaction maps displaying the Gaussian-weighted average sidechain environments of histidine in the &#x3c7;<sub>1</sub> &#x3d; 60&#xb0; parse of the <italic>b1</italic> chess square at pH &#x3d; 5.174. See caption for <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g011.tif"/>
</fig>
</sec>
</sec>
<sec id="s3-7">
<title>Hydropathic Character of Maps With Changes in pH</title>
<p>We were interested to see how changing the environmental pH would affect the maps. In other words, can we rationally &#x201c;tune&#x201d; the residue interactions by this means, and can that be exploited in protein design, e.g., to stabilize or destabilize binding sites, folds or interfaces? As an illustration, consider ASP141A in PDB structure 1WNS&#x2014;family B DNA polymerase from hyperthermophilic archaeon pyrococcus kodakaraensis KOD1 (<xref ref-type="bibr" rid="B26">Hashimoto et&#x20;al., 2001</xref>), which is situated in a highly anionic region with three other acidic residue side chains. This residue is in our cluster <bold>202</bold> of parse <bold>
<italic>b1.180</italic>
</bold> with <italic>f</italic>
<sub>
<italic>prot</italic>
</sub> &#x3d; 0.520 and has a significant free energy difference between protonated and deprotonated states. Our model suggests ASP141A has an elevated pK<sub>a</sub> and, when protonated, forms a hydrogen bond with ASP215A. There are significant visible differences between the calculated maps for this particular residue (<xref ref-type="fig" rid="F12">Figure&#x20;12</xref>): at high pH (9), the interactions surrounding ASP141A (top) are largely unfavorable polar, but protonation, as shown in the low pH (5) case, protonates one of the carboxylate oxygens and yields a strong favorable hydrogen bond between it and ASP215A. As described earlier, the map contours displayed in this work were calculated at what we are calling pH<sub>50</sub>, which shows the highest diversity of protonated and deprotonated cases. Such maps can be calculated, clustered, etc. at any pH, and indeed making use of different maps at different protonation states will expand the scope for protein structure prediction of real situations where ionization states can vary due to local environments.</p>
<fig id="F12" position="float">
<label>FIGURE 12</label>
<caption>
<p>Variations in mapped environments around ASP141A in PDB structure 1WNS. <bold>(A)</bold> structure model mapped environment around deprotonated ASP141A with strong unfavorable polar interaction between it and nearby residue ASP215A (pH 9). <bold>(B)</bold> structure model and mapped environment around protonated ASP141A with new strong, favorable polar interaction with ASP215A (pH 5).</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g012.tif"/>
</fig>
<p>For further insight, we examined the interaction character of ASPs in one parse, <bold>
<italic>b1</italic>.300</bold>, to determine if the relative fractions of our four-type quartet of interactions were altered with changes in pH (<xref ref-type="fig" rid="F13">Figure&#x20;13</xref>). We expected to see small, but noticeable, changes in clustering of residues as adjustment of pH altered the memberships of the clusters as protonation became either more favorable or unfavorable. To facilitate comparisons between the cluster sets at different pH values, the bars are arranged by increasing average solvent-accessible surface area for the cluster (low to high). At pHs of 1, 3.345 (i.e.,&#x20;pH<sub>50</sub>) and 7, some character changes were in fact observed, but, interestingly, most of these occurred in low population clusters. We theorize that, as residues clustered differently, residues being added/subtracted to/from new groups simply had a greater impact on the overall character of smaller clusters. One point of note, however, is that, although most clusters with high SASA had the highest protonation levels (discussed later), only cluster <bold>84</bold> retained any level of protonation at pH 7, in spite of having the lowest SASA. This suggests that this cluster, in particular, describes scenarios where aspartate protonation is energetically required.</p>
<fig id="F13" position="float">
<label>FIGURE 13</label>
<caption>
<p>Character interaction charts for ASP residues in the b1.300 parse at pH 1, 3.345, and 7. The fraction of each interaction type is given on the x-axis, for each cluster ID on the y-axis. The bars are arranged such that, descending, clusters have smaller SASAs. The thickness of the bars indicates residue population contained within that cluster. The black bars indicate <italic>f</italic>
<sub>
<italic>prot</italic>
</sub>, the fraction of the residues in the cluster protonated.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g013.tif"/>
</fig>
<p>We also examined the interaction character of the GLU <bold>
<italic>b1.300.180</italic>
</bold> parse (<xref ref-type="sec" rid="s9">Supplementary Figures S31</xref>), which is probably the parse most like the <bold>
<italic>b1.300</italic>
</bold> parse of ASP. The clusters within this GLU parse generally involved more hydrophobic interactions, both favorable and unfavorable, than those of the ASP <bold>
<italic>b1.300</italic>
</bold> parse. However, these observations are subtle and not easily visualized in the map contours. Nevertheless, overall, the average fractions of favorable and unfavorable hydrophobic interaction contributions, <italic>f</italic>
<sub>
<italic>hydro</italic>(&#x2b;)</sub> and <italic>f</italic>
<sub>
<italic>hydro</italic>(&#x2013;)</sub>, are 0.038 and 0.218, respectively for GLU, and 0.021 and 0.153 for ASP at their respective pH<sub>50</sub>s. Importantly, the higher propensity for hydrophobic interactions by GLU, due to the additional methylene in the sidechain, are encoded in the interaction maps on a cluster by cluster&#x20;basis.</p>
<p>Our ability to generate tunable maps for HIS is slightly more limited. The constrained conformational flexibility of the HIS sidechain and surrounding protein allowed by our approach could clearly be remedied by molecular dynamics or even energy minimization, but the cost&#x2013;beyond CPU, etc. &#x2013;would be the loss of positional certainty afforded by experimental data. That said, our map data for HIS, like ASP and GLU, exhaustively captures the many possible HIS interaction environments found in crystallographic structures exploitable for protein structure analyses and predictions.</p>
</sec>
<sec id="s3-8">
<title>Solvent-Accessible Surface Areas for the Ionizable Residues</title>
<p>The historical Ramachandran plots showed the relationship between backbone angles and frequency of observation. Our chessboard schema (<xref ref-type="fig" rid="F1">Figure&#x20;1</xref> for ASP, <xref ref-type="fig" rid="F14">Figure&#x20;14</xref> for GLU and HIS) was intended to organize our dataset by backbone structure, and thus facilitate comparisons between like residues. We also see a further population dependence on &#x3c7;<sub>1</sub> (and &#x3c7;<sub>2</sub> for GLU). In fact, further exploration revealed that solvent accessibility for each of our three residues is also seemingly dependent on the residue&#x2019;s backbone and &#x3c7; angles, which suggests a trend between this level of solvent exposure and underlying protein structure. For example, the average SASAs for ASP residues were calculated to be 37, 59, 64, and 64&#xa0;&#xc5;<sup>2</sup> for the <bold>
<italic>b1</italic>
</bold>, <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d5</italic>
</bold>, and <bold>
<italic>f6</italic>
</bold> chess squares, respectively. With a similar trend, the average SASAs for GLU residues were calculated to be 57, 75, 80, and 81&#xa0;&#xc5;<sup>2</sup> for the <bold>
<italic>b1</italic>
</bold>, <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d5</italic>
</bold>, and <bold>
<italic>f6</italic>
</bold> chess squares, respectively. However, in spite of it being significantly more hydrophobic than ASP and GLU, and thus more likely to be buried, GETAREA calculations for HIS yielded the surprisingly large average SASAs of 41, 59, 62, and 79&#xa0;&#xc5;<sup>2</sup> for the <bold>
<italic>b1</italic>
</bold>, <bold>
<italic>c5</italic>
</bold>, <bold>
<italic>d5</italic>
</bold>, <bold>
<italic>f6</italic>
</bold> chess squares, respectively.</p>
<fig id="F14" position="float">
<label>FIGURE 14</label>
<caption>
<p>Ramachandran chessboard displaying the chess square/parse population for <bold>(A)</bold> glutamic acid and <bold>(B)</bold> histidine. The (&#x3c7;<sub>1</sub>/&#x3c7;<sub>2</sub>) parse populations for GLU are represented by colored squares with sizes as indicated on the legend. The (&#x3c7;<sub>1</sub>) parse populations for HIS are represented in log<sub>10</sub> scale with colored bars. See also caption for <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>.</p>
</caption>
<graphic xlink:href="fmolb-08-773385-g014.tif"/>
</fig>
<p>To evaluate our data in a more nuanced way, we calculated the &#x201c;fraction outside&#x201d; (<italic>f</italic>
<sub>
<italic>outside</italic>
</sub>) metric based on GETAREA (<xref ref-type="bibr" rid="B21">Fraczkiewicz and Braun, 1998</xref>), as described in Methods. The <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> values for each chess square/parse are also illustrated in <xref ref-type="fig" rid="F1">Figures 1</xref>, <xref ref-type="fig" rid="F14">14</xref>, with the colors of the bars (that represent parse populations by their lengths) for ASP and HIS or squares (that represent parse populations by their areas) for GLU. Chess square/parses within the &#x3b2;-pleat region of the Ramachandran plot for aspartate (<xref ref-type="fig" rid="F1">Figure&#x20;1</xref>), as expected, show lower <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> (more buried) relative to the right- and left-hand &#x3b1;-helix, i.e.,&#x20;most parses show averaged <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> in the 0.4&#x2013;0.6 (green) range, whereas in the &#x3b1;-helix region most are in the <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> range 0.6&#x2013;0.8, and the left-hand &#x3b1;-helix is still more exposed, in the <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> range 0.8&#x2013;1. 0. The same trends hold for glutamates (<xref ref-type="fig" rid="F14">Figure&#x20;14A</xref>), although the data suggests somewhat larger <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> values. This is likely a result of GLU&#x2019;s inherent additional surface area concomitant with its 1-carbon chain extension. The <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> trends for HIS (<xref ref-type="fig" rid="F14">Figure&#x20;14B</xref>) suggest more buriedness: in the &#x3b2;-pleat region of the Ramachandran plot, the parses are evenly split between the 0.2&#x2013;0.4 and 0.4&#x2013;0.6 ranges (yellow and green), histidines in the &#x3b1;-helix region are in the f<sub>outside</sub> range 0.4&#x2013;0.6, while those in the left-hand &#x3b1;-helix are more exposed, in the range 0.6&#x2013;0.8.</p>
<p>It should be noted that the sidechain solvent-accessible surface areas for these three residues in Gly-X-Gly &#x201c;random coil&#x201d; tripeptides show that histidine has a larger surface area (154.6&#xa0;&#xc5;<sup>2</sup>) than either aspartate (113.0&#xa0;&#xc5;<sup>2</sup>) or glutamate (141.2&#xa0;&#xc5;<sup>2</sup>) (<xref ref-type="bibr" rid="B21">Fraczkiewicz and Braun, 1998</xref>), which is incorporated into the <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> calculations. Thus, while HIS may have, overall, higher solvent exposure in surface area, the actual fraction of solvent-exposed residues is smaller. All three residues show the same trend: larger solvent exposure in the &#x3b1;-helix regions that is more extreme in the left-hand region, and greater burial in the &#x3b2;-pleat region. These conclusions are in qualitative agreement with those of <xref ref-type="bibr" rid="B37">Lins et&#x20;al. (2003)</xref> in their report on differences in solvent-accessible surface area between residues in different secondary structures. However, <italic>f</italic>
<sub>
<italic>outside</italic>
</sub>, exactly as SASA does, varies from cluster-to-cluster within each chess square and parse. For example, <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> for ASP <bold>
<italic>b1.300</italic>
</bold> ranges widely&#x2013;between 0.077 (cluster <bold>84</bold>) to 1.000 (cluster <bold>162</bold>), despite its overall <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> of &#x3c;0.4 suggesting mostly burial for this group of residues.</p>
<p>The SASA and <italic>f</italic>
<sub>
<italic>outside</italic>
</sub> values for all three residues in this study, on a cluster-by-cluster basis are included in the <xref ref-type="sec" rid="s9">Supplementary Tables S5&#x2013;S7</xref>. To summarize, each 3D map cluster represents a unique set of interactions that also encodes solvent exposure and buriedness. We should emphasize that map profiles <italic>appearing</italic> to be similar could manifest with different buriedness and/or protonation, and thus remain unique.</p>
</sec>
<sec id="s3-9">
<title>Summary and Conclusion</title>
<p>We analyzed the interaction environments of more than 105,000 ionizable amino acid residues (aspartic acid, glutamic acid, histidine) in a diverse collection of protein structures. From above and our previous reports (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B1">Ahmed et&#x20;al., 2019</xref>), it is clear that the hydropathic environment surrounding an amino acid residue in a protein can be mapped in terms of its interactions. Significantly, the patterns of interactions within the maps, representing the constellation of contacts and their interaction strengths and characters, cluster into a fairly limited set of unique, backbone-dependent motifs. Each of these motifs can be rendered into an average map quartet and an average prototype residue structure. Thus, we have produced a backbone-dependent library of not only sidechain rotamers, but also 3D residue interaction preferences. The presence of a feature, such as a favorable polar interaction in one of these maps, e.g., an ASP in the <bold>
<italic>b1.300</italic>
</bold> (&#x3b2;-pleat) cluster <bold>100</bold> (<xref ref-type="fig" rid="F9">Figure&#x20;9</xref>), where the carboxylate/carboxylic acid functional group is involved in hydrogen bonding through both oxygens, should have complementary donors/acceptors on neighboring residue(s). Accordingly, those residue&#x0027;s maps should contain similar features, and the alignment of these features&#x2013;and all others from a collection of such maps&#x2013;would describe a well-organized hydropathic interaction network.</p>
<p>It is not just the favorable hydrophobic and polar interactions that constitute this network. The maps illustrated by contours here, and previously (<xref ref-type="bibr" rid="B2">Ahmed et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B1">Ahmed et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B3">AL Mughram et&#x20;al., 2021</xref>), nearly ubiquitously display unfavorable polar and hydrophobic interactions. These interactions are integral parts of protein structure; for example, even polar residues like the ASP, GLU, and HIS of this report have hydrophobic atoms covalently bonded to the polar functional groups. Thus, a background of unfavorable hydrophobic interactions is usually seen with strong favorable polar interactions. However, other hydrophobic interactions are functional components of structure that Nature uses, e.g., for adding flexibility or isolating water. Developing an understanding of them will help illuminate protein design and drug discovery. Unfavorable polar interactions, on the other hand, provide a route to understanding and predicting residue ionization states. The presence of this type of interaction signals an opportunity for water intervention, an adjustment in local pH or can be used as drug design&#x20;cues.</p>
<p>While our predictions of pK<sub>a</sub>s for ASP and GLU are adequate (and seemingly less so for HIS over a much smaller training set), our primary goal was not that, but instead to evaluate the hydropathic environments surrounding these residue types. As expected, those environments change drastically with pH. We illustrated environments with 3D maps for an artificial half-way point&#x2013;pH<sub>50</sub>&#x2013;that showed a range of environments, but we have also calculated maps for other pH cases, and the nature of interactions displayed therein are, although unsurprising, quite informative. Importantly, this means that we can <italic>tune</italic> residue hydropathic environment maps as a function of pH, and that they encode this critical element of structure, interaction and energetics in a rational way. Thus, if we use these maps as part of a scheme for protein structure building and prediction, we have the additional scope to explore ionization states in understanding and defining optimal protein structures.</p>
<p>In our 2019 report (Ahmed et&#x20;al.), we stated that full understanding of the individual environment maps for alanine would first require completing the analysis for all residue types. This current report is a status update on that task&#x2013;for ASP, GLU and HIS. The remaining residues are in various stages of completion and analysis, and we anticipate additional communications in the near future.</p>
<p>As with alanine, our evaluation of interactions of the ionizable residues with 3D maps backs our interaction homology paradigm&#x2013;for understanding and potentially predicting protein structure. The hydropathic valence for ASP and GLU is largely satisfied by a functional group that complements the carboxy acid, and some involvement with the CB, CG (and for GLU, the CD) methylenes by a hydrophobic interaction partner, except if the sidechain is fully solvent exposed. HIS is, however, much more complex, involving additional terms such as hydrophobic interactions with aromatic carbons that may be of &#x3c0;-&#x3c0; character and polar interactions that include hydrogen bonding with its ND1 and/or NE2, as either acceptors or donors. As these effects are recorded within the maps, we see that it is the hydropathic &#x201c;field&#x201d; of the atoms surrounding a residue, not specific residue types or atoms, that directs its conformation or other properties, including rotameric and secondary structure. Finally, biological structure is a puzzle consisting of a delicate balance of effects, mostly favorable but others seemingly counterproductive. Assembing structure by homology modeling (<xref ref-type="bibr" rid="B17">Eisenmenger et&#x20;al., 1993</xref>; <xref ref-type="bibr" rid="B35">Laughton, 1994</xref>; <xref ref-type="bibr" rid="B34">Krivov et&#x20;al., 2009</xref>) or even <italic>de novo</italic> structure prediction (<xref ref-type="bibr" rid="B4">Alley et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B47">Senior et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B55">Yang et&#x20;al., 2020</xref>) involves many puzzle pieces and interactions, but some key information involving, e.g., hydrophobic interactions or residue ionizations is not utilized in the usual Newtonian physics-based approaches.</p>
<p>Our ability to map interactions in 3D space, including a rational means to explore the local pH of individual residues in more or less real time should be advantageous in later studies. Since the maps highlight <italic>interactions</italic>, building structural models that optimize the map-map overlaps of interactions arising from adjacent or through-space residue map pairs (or larger sets) could yield a very useful and unique target function for protein structure prediction, likely quite amenable for machine learning optimization.</p>
</sec>
</sec>
</body>
<back>
<sec id="s4">
<title>Data Availability Statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/<xref ref-type="sec" rid="s9">Supplementary Material</xref>.</p>
</sec>
<sec id="s5">
<title>Author Contributions</title>
<p>NH and GK contributed to all aspects of this study, including performing computational experiments, data analysis, preparation of figures and writing the manuscript.</p>
</sec>
<sec id="s6">
<title>Funding</title>
<p>Preliminary studies for this research were partially funded by 1910 Genetics, Cambridge, Massachusetts, and were performed by Erik W. Kellogg, Sean G. Kellogg and Olivia Xu of 1910 Genetics.</p>
</sec>
<sec sec-type="COI-statement" id="s7">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>We acknowledge the motivation for continuing this project given to us by numerous proposal and manuscript reviewers who have critiqued our past work. Further, J.&#x20;Neel Scarsdale provided advice and insight into protein structure. We are also grateful to Jen Nwankwo of 1910 Genetics for her enthusiasm.</p>
</ack>
<sec id="s9">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fmolb.2021.773385/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fmolb.2021.773385/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="Table1.xls" id="SM1" mimetype="application/xls" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table4.pdf" id="SM2" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table2.pdf" id="SM3" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table5.xls" id="SM4" mimetype="application/xls" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table3.xls" id="SM5" mimetype="application/xls" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table6.xls" id="SM6" mimetype="application/xls" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="DataSheet1.pdf" id="SM7" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table7.xls" id="SM8" mimetype="application/xls" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ahmed</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Catalano</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Portillo</surname>
<given-names>S. C.</given-names>
</name>
<name>
<surname>Safo</surname>
<given-names>M. K.</given-names>
</name>
<name>
<surname>Neel Scarsdale</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>3D Interaction Homology: The Hydropathic Interaction Environments of Even Alanine Are Diverse and Provide Novel Structural Insight</article-title>. <source>J.&#x20;Struct. Biol.</source> <volume>207</volume>, <fpage>183</fpage>&#x2013;<lpage>198</lpage>. <pub-id pub-id-type="doi">10.1016/j.jsb.2019.05.007</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ahmed</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Koparde</surname>
<given-names>V. N.</given-names>
</name>
<name>
<surname>Safo</surname>
<given-names>M. K.</given-names>
</name>
<name>
<surname>Neel Scarsdale</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>3D Interaction Homology: The Structurally Known Rotamers of Tyrosine Derive from a Surprisingly Limited Set of Information-Rich Hydropathic Interaction Environments Described by Maps</article-title>. <source>Proteins</source> <volume>83</volume>, <fpage>1118</fpage>&#x2013;<lpage>1136</lpage>. <pub-id pub-id-type="doi">10.1002/prot.24813</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>AL Mughram</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Catalano</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bowry</surname>
<given-names>J.&#x20;P.</given-names>
</name>
<name>
<surname>Safo</surname>
<given-names>M. K.</given-names>
</name>
<name>
<surname>Scarsdale</surname>
<given-names>J.&#x20;N.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>3D Interaction Homology: Hydropathic Analyses of the "&#x3c0;-Cation" and "&#x3c0;-&#x3c0;" Interaction Motifs in Phenylalanine, Tyrosine, and Tryptophan Residues</article-title>. <source>J.&#x20;Chem. Inf. Model.</source> <volume>61</volume>, <fpage>2937</fpage>&#x2013;<lpage>2956</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.1c00235</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alley</surname>
<given-names>E. C.</given-names>
</name>
<name>
<surname>Khimulya</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Biswas</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>AlQuraishi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>G. M.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Unified Rational Protein Engineering with Sequence-Based Deep Representation Learning</article-title>. <source>Nat. Methods</source> <volume>16</volume>, <fpage>1315</fpage>&#x2013;<lpage>1322</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-019-0598-1</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Antonino</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Ascenzi</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1981</year>). <article-title>The Mechanism of Trypsin Catalysis at Low pH. Proposal for a Structural Model</article-title>. <source>J.&#x20;Biol. Chem.</source> <volume>256</volume>, <fpage>12449</fpage>&#x2013;<lpage>12455</lpage>. </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ascone</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Casta&#xf1;er</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Tarricone</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bolognesi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Stroppolo</surname>
<given-names>M. E.</given-names>
</name>
<name>
<surname>Desideri</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Evidence of His61 Imidazolate Bridge Rupture in Reduced Crystalline Cu,Zn Superoxide Dismutase</article-title>. <source>Biochem. Biophysical Res. Commun.</source> <volume>241</volume>, <fpage>119</fpage>&#x2013;<lpage>121</lpage>. <pub-id pub-id-type="doi">10.1006/bbrc.1997.7777</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bandyopadhyay</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bhatnagar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Jain</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pratyaksh</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Selective Stabilization of Aspartic Acid Protonation State within a Given Protein Conformation Occurs via Specific &#x201c;Molecular Association&#x201d;</article-title>. <source>J.&#x20;Phys. Chem. B</source> <volume>124</volume>, <fpage>5350</fpage>&#x2013;<lpage>5361</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jpcb.0c02629</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barrett</surname>
<given-names>P. J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Cho</surname>
<given-names>M.-K.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.-H.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Mathew</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>The Quiet Renaissance of Protein Nuclear Magnetic Resonance</article-title>. <source>Biochemistry</source> <volume>52</volume>, <fpage>1303</fpage>&#x2013;<lpage>1320</lpage>. <pub-id pub-id-type="doi">10.1021/bi4000436</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bartik</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Redfield</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Dobson</surname>
<given-names>C. M.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>Measurement of the Individual pKa Values of Acidic Residues of Hen and turkey Lysozymes by Two-Dimensional 1H NMR</article-title>. <source>Biophysical J.</source> <volume>66</volume>, <fpage>1180</fpage>&#x2013;<lpage>1184</lpage>. <pub-id pub-id-type="doi">10.1016/S0006-3495(94)80900-2</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brooks</surname>
<given-names>B. R.</given-names>
</name>
<name>
<surname>Bruccoleri</surname>
<given-names>R. E.</given-names>
</name>
<name>
<surname>Olafson</surname>
<given-names>B. D.</given-names>
</name>
<name>
<surname>States</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Swaminathan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Karplus</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>1983</year>). <article-title>CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations</article-title>. <source>J.&#x20;Comput. Chem.</source> <volume>4</volume>, <fpage>187</fpage>&#x2013;<lpage>217</lpage>. <pub-id pub-id-type="doi">10.1002/jcc.540040211</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burnett</surname>
<given-names>J.&#x20;C.</given-names>
</name>
<name>
<surname>Botti</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Abraham</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Computationally Accessible Method for Estimating Free Energy Changes Resulting from Site-specific Mutations of Biomolecules: Systematic Model Building and Structural/hydropathic Analysis of Deoxy and Oxy Hemoglobins</article-title>. <source>Proteins</source> <volume>42</volume>, <fpage>355</fpage>&#x2013;<lpage>377</lpage>. <pub-id pub-id-type="doi">10.1002/1097-0134(20010215)42:3&#x3c;355:aid-prot60&#x3e;3.0.co;2-f</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burnett</surname>
<given-names>J.&#x20;C.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
<name>
<surname>Abraham</surname>
<given-names>D. J.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Computational Methodology for Estimating Changes in Free Energies of Biomolecular Association upon Mutation. The Importance of Bound Water in Dimer&#x2212;Tetramer Assembly for &#x3b2;37 Mutant Hemoglobins</article-title>. <source>Biochemistry</source> <volume>39</volume>, <fpage>1622</fpage>&#x2013;<lpage>1633</lpage>. <pub-id pub-id-type="doi">10.1021/bi991724u</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Catalano</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>AL Mughram</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>3D Interaction Homology: Hydropathic Interaction Environments of Serine and Cysteine Are Strikingly Different and Their Roles Adapt in Membrane Proteins</article-title>. <source>Curr. Res. Struct. Biol.</source> <volume>3</volume>, <fpage>239</fpage>&#x2013;<lpage>256</lpage>. <pub-id pub-id-type="doi">10.1016/j.crstbi.2021.09.002</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cozzini</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Fornabaio</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Marabotti</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Abraham</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
<name>
<surname>Mozzarelli</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Free Energy of Ligand Binding to Protein: Evaluation of the Contribution of Water Molecules by Computational Methods</article-title>. <source>Cmc</source> <volume>11</volume>, <fpage>3093</fpage>&#x2013;<lpage>3118</lpage>. <pub-id pub-id-type="doi">10.2174/0929867043363929</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Da</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Mooberry</surname>
<given-names>S. L.</given-names>
</name>
<name>
<surname>Gupton</surname>
<given-names>J.&#x20;T.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>How to Deal with Low-Resolution Target Structures: Using SAR, Ensemble Docking, Hydropathic Analysis, and 3D-QSAR to Definitively Map the &#x3b1;&#x3b2;-Tubulin Colchicine Site</article-title>. <source>J.&#x20;Med. Chem.</source> <volume>56</volume>, <fpage>7382</fpage>&#x2013;<lpage>7395</lpage>. <pub-id pub-id-type="doi">10.1021/jm400954h</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Di Russo</surname>
<given-names>N. V.</given-names>
</name>
<name>
<surname>Estrin</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Mart&#xed;</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Roitberg</surname>
<given-names>A. E.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>pH-Dependent Conformational Changes in Proteins and Their Effect on Experimental pKas: The Case of Nitrophorin 4</article-title>. <source>Plos Comput. Biol.</source> <volume>8</volume>, <fpage>e1002761</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1002761</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eisenmenger</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Argos</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Abagyan</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>1993</year>). <article-title>A Method to Configure Protein Side-Chains from the Main-Chain Trace in Homology Modelling</article-title>. <source>J.&#x20;Mol. Biol.</source> <volume>231</volume>, <fpage>849</fpage>&#x2013;<lpage>860</lpage>. <pub-id pub-id-type="doi">10.1006/jmbi.1993.1331</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eugene Kellogg</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Abraham</surname>
<given-names>D. J.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Hydrophobicity: Is LogPo/w More Than the Sum of its Parts?</article-title> <source>Eur. J.&#x20;Med. Chem.</source> <volume>35</volume>, <fpage>651</fpage>&#x2013;<lpage>661</lpage>. <pub-id pub-id-type="doi">10.1016/s0223-5234(00)00167-7</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fitch</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Karp</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K. K.</given-names>
</name>
<name>
<surname>Stites</surname>
<given-names>W. E.</given-names>
</name>
<name>
<surname>Lattman</surname>
<given-names>E. E.</given-names>
</name>
<name>
<surname>Garc&#xed;a-Moreno</surname>
<given-names>E. B.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Experimental pKa Values of Buried Residues: Analysis with Continuum Methods and Role of Water Penetration</article-title>. <source>Biophysical J.</source> <volume>82</volume>, <fpage>3289</fpage>&#x2013;<lpage>3304</lpage>. <pub-id pub-id-type="doi">10.1016/s0006-3495(02)75670-1</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fornabaio</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cozzini</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Mozzarelli</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Abraham</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Simple, Intuitive Calculations of Free Energy of Binding for Protein&#x2212;Ligand Complexes. 2. Computational Titration and pH Effects in Molecular Models of Neuraminidase&#x2212;Inhibitor Complexes</article-title>. <source>J.&#x20;Med. Chem.</source> <volume>46</volume>, <fpage>4487</fpage>&#x2013;<lpage>4500</lpage>. <pub-id pub-id-type="doi">10.1021/jm0302593</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fraczkiewicz</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Braun</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Exact and Efficient Analytical Calculation of the Accessible Surface Areas and Their Gradients for Macromolecules</article-title>. <source>J.&#x20;Comput. Chem.</source> <volume>19</volume>, <fpage>319</fpage>&#x2013;<lpage>333</lpage>. <pub-id pub-id-type="doi">10.1002/(sici)1096-987x(199802)19:3&#x3c;319:aid-jcc6&#x3e;3.0.co;2-w</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frericks Schmidt</surname>
<given-names>H. L.</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>G. J.</given-names>
</name>
<name>
<surname>Sperling</surname>
<given-names>L. J.</given-names>
</name>
<name>
<surname>Rienstra</surname>
<given-names>C. M.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>NMR Determination of Protein pKa Values in the Solid State</article-title>. <source>J.&#x20;Phys. Chem. Lett.</source> <volume>1</volume>, <fpage>1623</fpage>&#x2013;<lpage>1628</lpage>. <pub-id pub-id-type="doi">10.1021/jz1004413</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedman</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Ions and the Protein Surface Revisited: Extensive Molecular Dynamics Simulations and Analysis of Protein Structures in Alkali-Chloride Solutions</article-title>. <source>J.&#x20;Phys. Chem. B</source> <volume>115</volume>, <fpage>9213</fpage>&#x2013;<lpage>9223</lpage>. <pub-id pub-id-type="doi">10.1021/jp112155m</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>George</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hanania</surname>
<given-names>G. I. H.</given-names>
</name>
<name>
<surname>Irvine</surname>
<given-names>D. H.</given-names>
</name>
<name>
<surname>Abu-Issa</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>1964</year>). <article-title>1090. The Effect of Co-ordination on Ionization. Part IV. Imidazole and its Ferrimyoglobin Complex</article-title>. <source>J.&#x20;Chem. Soc.</source>, <fpage>5689</fpage>&#x2013;<lpage>5694</lpage>. <pub-id pub-id-type="doi">10.1039/JR9640005689</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harms</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Casta&#xf1;eda</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Schlessman</surname>
<given-names>J.&#x20;L.</given-names>
</name>
<name>
<surname>Sue</surname>
<given-names>G. R.</given-names>
</name>
<name>
<surname>Isom</surname>
<given-names>D. G.</given-names>
</name>
<name>
<surname>Cannon</surname>
<given-names>B. R.</given-names>
</name>
<etal/>
</person-group> (<year>2009</year>). <article-title>The pKa Values of Acidic and Basic Residues Buried at the Same Internal Location in a Protein Are Governed by Different Factors</article-title>. <source>J.&#x20;Mol. Biol.</source> <volume>389</volume>, <fpage>34</fpage>&#x2013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmb.2009.03.039</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hashimoto</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Nishioka</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Fujiwara</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Takagi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Imanaka</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Inoue</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2001</year>). <article-title>Crystal Structure of DNA Polymerase from Hyperthermophilic Archaeon Pyrococcus Kodakaraensis KOD111Edited by R. Huber</article-title>. <source>J.&#x20;Mol. Biol.</source> <volume>306</volume>, <fpage>469</fpage>&#x2013;<lpage>477</lpage>. <pub-id pub-id-type="doi">10.1006/jmbi.2000.4403</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>R.-B.</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>Q.-S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.-H.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>S.-M.</given-names>
</name>
<name>
<surname>Chou</surname>
<given-names>K.-C.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>A Fast and Accurate Method for Predicting pKa of Residues in Proteins</article-title>. <source>Protein Eng. Des. Selection</source> <volume>23</volume>, <fpage>35</fpage>&#x2013;<lpage>42</lpage>. <pub-id pub-id-type="doi">10.1093/protein/gzp067</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hunt</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2021</year>). <source>Table of pK<sub>a</sub> and pI Values</source>. <publisher-name>University of Calgary</publisher-name>. <comment>Availableat: <ext-link ext-link-type="uri" xlink:href="https://www.chem.ucalgary.ca/courses/351/Carey5th/Ch27/ch27-1-4-2.html">https://www.chem.ucalgary.ca/courses/351/Carey5th/Ch27/ch27-1-4-2.html</ext-link> (accessed March 31, 2021)</comment>. </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Isom</surname>
<given-names>D. G.</given-names>
</name>
<name>
<surname>Cannon</surname>
<given-names>B. R.</given-names>
</name>
<name>
<surname>Casta&#xf1;eda</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Robinson</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Garc&#xed;a-Moreno E.</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>High Tolerance for Ionizable Residues in the Hydrophobic interior of Proteins</article-title>. <source>Pnas</source> <volume>105</volume>, <fpage>17784</fpage>&#x2013;<lpage>17788</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0805113105</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Isom</surname>
<given-names>D. G.</given-names>
</name>
<name>
<surname>Casta&#xf1;eda</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Cannon</surname>
<given-names>B. R.</given-names>
</name>
<name>
<surname>Garcia-Moreno E.</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Large Shifts in pKa Values of Lysine Residues Buried inside a Protein</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>108</volume>, <fpage>5260</fpage>&#x2013;<lpage>5265</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1010750108</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kasserra</surname>
<given-names>H. P.</given-names>
</name>
<name>
<surname>Laidler</surname>
<given-names>K. J.</given-names>
</name>
</person-group> (<year>1969</year>). <article-title>pH Effects in Trypsin Catalysis</article-title>. <source>Can. J.&#x20;Chem.</source> <volume>47</volume>, <fpage>4021</fpage>&#x2013;<lpage>4029</lpage>. <pub-id pub-id-type="doi">10.1139/v69-668</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
<name>
<surname>Fornabaio</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Spyrakis</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lodola</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Cozzini</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Mozzarelli</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2004</year>). <article-title>Getting it Right: Modeling of pH, Solvent and "nearly" Everything Else in Virtual Screening of Biological Targets</article-title>. <source>J.&#x20;Mol. Graphics Model.</source> <volume>22</volume>, <fpage>479</fpage>&#x2013;<lpage>486</lpage>. <pub-id pub-id-type="doi">10.1016/j.jmgm.2004.03.008</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
<name>
<surname>Semus</surname>
<given-names>S. F.</given-names>
</name>
<name>
<surname>Abraham</surname>
<given-names>D. J.</given-names>
</name>
</person-group> (<year>1991</year>). <article-title>HINT: A New Method of Empirical Hydrophobic Field Calculation for CoMFA</article-title>. <source>J.&#x20;Computer-aided Mol. Des.</source> <volume>5</volume>, <fpage>545</fpage>&#x2013;<lpage>552</lpage>. <pub-id pub-id-type="doi">10.1007/BF00135313</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krivov</surname>
<given-names>G. G.</given-names>
</name>
<name>
<surname>Shapovalov</surname>
<given-names>M. V.</given-names>
</name>
<name>
<surname>Dunbrack</surname>
<given-names>R. L.</given-names>
<suffix>Jr.</suffix>
</name>
</person-group> (<year>2009</year>). <article-title>Improved Prediction of Protein Side-Chain Conformations with SCWRL4</article-title>. <source>Proteins</source> <volume>77</volume>, <fpage>778</fpage>&#x2013;<lpage>795</lpage>. <pub-id pub-id-type="doi">10.1002/prot.22488</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Laughton</surname>
<given-names>C. A.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>Prediction of Protein Side-Chain Conformations from Local Three-Dimensional Homology Relationships</article-title>. <source>J.&#x20;Mol. Biol.</source> <volume>235</volume>, <fpage>1088</fpage>&#x2013;<lpage>1097</lpage>. <pub-id pub-id-type="doi">10.1006/jmbi.1994.1059</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Robertson</surname>
<given-names>A. D.</given-names>
</name>
<name>
<surname>Jensen</surname>
<given-names>J.&#x20;H.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Very Fast Empirical Prediction and Rationalization of Protein pKa Values</article-title>. <source>Proteins</source> <volume>61</volume>, <fpage>704</fpage>&#x2013;<lpage>721</lpage>. <pub-id pub-id-type="doi">10.1002/prot.20660</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lins</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Brasseur</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Analysis of Accessible Surface of Residues in Proteins</article-title>. <source>Protein Sci.</source> <volume>12</volume>, <fpage>1406</fpage>&#x2013;<lpage>1417</lpage>. <pub-id pub-id-type="doi">10.1110/ps.0304803</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miao</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Quantifying Side-Chain Conformational Variations in Protein Structure</article-title>. <source>Sci. Rep.</source> <volume>6</volume>, <fpage>37024</fpage>. <pub-id pub-id-type="doi">10.1038/srep37024</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>O&#x27;Dell</surname>
<given-names>W. B.</given-names>
</name>
<name>
<surname>Bodenheimer</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Meilleur</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Neutron Protein Crystallography: A Complementary Tool for Locating Hydrogens in Proteins</article-title>. <source>Arch. Biochem. Biophys.</source> <volume>602</volume>, <fpage>48</fpage>&#x2013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1016/j.abb.2015.11.033</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Otto</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Marti</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Holz</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mogi</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Lindau</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Khorana</surname>
<given-names>H. G.</given-names>
</name>
<etal/>
</person-group> (<year>1989</year>). <article-title>Aspartic Acid-96 Is the Internal Proton Donor in the Reprotonation of the Schiff Base of Bacteriorhodopsin</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>86</volume>, <fpage>9228</fpage>&#x2013;<lpage>9232</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.86.23.9228</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pahari</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Alexov</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>PKAD: a Database of Experimentally Measured pKa Values of Ionizable Groups in Proteins</article-title>. <source>Database (Oxford)</source> <volume>2019</volume>, <fpage>baz024</fpage>. <pub-id pub-id-type="doi">10.1093/database/baz024</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Pedretti</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Vistoli</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>PropKa</article-title>. <comment>Availableat: <ext-link ext-link-type="uri" xlink:href="https://www.ddl.unimi.it/vegaol/propka.htm">https://www.ddl.unimi.it/vegaol/propka.htm</ext-link> (accessed April 3, 2021)</comment>. </citation>
</ref>
<ref id="B43">
<citation citation-type="book">
<collab>R Core Team</collab> (<year>2013</year>). <source>R: A Language and Environment for Statistical Computing</source>. <publisher-name>R Foundation for Statistical Computing</publisher-name>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.R-project.org/">http://www.R-project.org/</ext-link>
</comment>. </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ramachandran</surname>
<given-names>G. N.</given-names>
</name>
<name>
<surname>Ramakrishnan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sasisekharan</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>1963</year>). <article-title>Stereochemistry of Polypeptide Chain Configurations</article-title>. <source>J.&#x20;Mol. Biol.</source> <volume>7</volume>, <fpage>95</fpage>&#x2013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1016/s0022-2836(63)80023-6</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sarkar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Hydrophobicity - Shake Flasks, Protein Folding and Drug Discovery</article-title>. <source>Ctmc</source> <volume>10</volume>, <fpage>67</fpage>&#x2013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.2174/156802610790232233</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schr&#xf6;der</surname>
<given-names>G. C.</given-names>
</name>
<name>
<surname>Meilleur</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Neutron Crystallography Data Collection and Processing for Modelling Hydrogen Atoms in Protein Structures</article-title>. <source>JoVE</source> <volume>166</volume>, <fpage>e61903</fpage>. <pub-id pub-id-type="doi">10.3791/61903</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Senior</surname>
<given-names>A. W.</given-names>
</name>
<name>
<surname>Evans</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Jumper</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kirkpatrick</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sifre</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Green</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Improved Protein Structure Prediction Using Potentials from Deep Learning</article-title>. <source>Nature</source> <volume>577</volume>, <fpage>706</fpage>&#x2013;<lpage>710</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1923-7</pub-id> </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shapovalov</surname>
<given-names>M. V.</given-names>
</name>
<name>
<surname>Dunbrack</surname>
<given-names>R. L.</given-names>
<suffix>Jr.</suffix>
</name>
</person-group> (<year>2011</year>). <article-title>A Smoothed Backbone-dependent Rotamer Library for Proteins Derived from Adaptive Kernel Density Estimates and Regressions</article-title>. <source>Structure</source> <volume>19</volume>, <fpage>844</fpage>&#x2013;<lpage>858</lpage>. <pub-id pub-id-type="doi">10.1016/j.str.2011.03.019</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spassov</surname>
<given-names>V. Z.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>A Fast and Accurate Computational Approach to Protein Ionization</article-title>. <source>Protein Sci.</source> <volume>17</volume>, <fpage>1955</fpage>&#x2013;<lpage>1970</lpage>. <pub-id pub-id-type="doi">10.1110/ps.036335.108</pub-id> </citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spyrakis</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Fornabaio</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cozzini</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Mozzarelli</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Abraham</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Kellogg</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Computational Titration Analysis of a Multiprotic HIV-1 Protease&#x2212;Ligand Complex</article-title>. <source>J.&#x20;Am. Chem. Soc.</source> <volume>126</volume>, <fpage>11764</fpage>&#x2013;<lpage>11765</lpage>. <pub-id pub-id-type="doi">10.1021/ja0465754</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Talley</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Alexov</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>On the pH-Optimum of Activity and Stability of Proteins</article-title>. <source>Proteins</source> <volume>78</volume>, <fpage>a</fpage>&#x2013;<lpage>n</lpage>. <pub-id pub-id-type="doi">10.1002/prot.22786</pub-id> </citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Alexov</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>DelPhiPKa Web Server: Predicting pKaof Proteins, RNAs and DNAs</article-title>. <source>Bioinformatics</source> <volume>32</volume>, <fpage>614</fpage>&#x2013;<lpage>615</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv607</pub-id> </citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woi&#x144;ska</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Grabowsky</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Dominiak</surname>
<given-names>P. M.</given-names>
</name>
<name>
<surname>Wo&#x17a;niak</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Jayatilaka</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Hydrogen Atoms Can Be Located Accurately and Precisely by X-ray Crystallography</article-title>. <source>Sci. Adv.</source> <volume>2</volume>, <fpage>e1600192</fpage>. <pub-id pub-id-type="doi">10.1126/sciadv.1600192</pub-id> </citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Anishchenko</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Ovchinnikov</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Improved Protein Structure Prediction Using Predicted Interresidue Orientations</article-title>. <source>Proc. Natl. Acad. Sci. USA</source> <volume>117</volume>, <fpage>1496</fpage>&#x2013;<lpage>1503</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1914677117</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>