<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2023.1130594</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Huang</surname><given-names>Ailan</given-names></name><xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Lu</surname><given-names>Fuping</given-names></name><xref rid="aff1" ref-type="aff"><sup>1</sup></xref><xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Liu</surname><given-names>Fufeng</given-names></name><xref rid="aff1" ref-type="aff"><sup>1</sup></xref><xref rid="aff2" ref-type="aff"><sup>2</sup></xref><xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/677887/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>College of Biotechnology, Tianjin University of Science &#x0026; Technology</institution>, <addr-line>Tianjin</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology</institution>, <addr-line>Tianjin</addr-line>, <country>China</country></aff>
<author-notes>
<fn id="fn0001" fn-type="edited-by">
<p>Edited by: Bowen Li, Newcastle University, United Kingdom</p>
</fn>
<fn id="fn0002" fn-type="edited-by">
<p>Reviewed by: Rahul Kaushik, RIKEN Yokohama, Japan; Yao Nie, Jiangnan University, China</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Fufeng Liu, &#x02709; <email>fufengliu@tju.edu.cn</email>; <email>fufengliu@tust.edu.cn</email></corresp>
<fn id="fn0003" fn-type="other">
<p>This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>02</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1130594</elocation-id>
<history>
<date date-type="received">
<day>23</day>
<month>12</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>01</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Huang, Lu and Liu.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Huang, Lu and Liu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<sec>
<title>Introduction</title>
<p>Psychrophilic enzymes are a class of macromolecules with high catalytic activity at low temperatures. Cold-active enzymes possessing eco-friendly and cost-effective properties, are of huge potential application in detergent, textiles, environmental remediation, pharmaceutical as well as food industry. Compared with the time-consuming and labor-intensive experiments, computational modeling especially the machine learning (ML) algorithm is a high-throughput screening tool to identify psychrophilic enzymes efficiently.</p>
</sec>
<sec>
<title>Methods</title>
<p>In this study, the influence of 4 ML methods (support vector machines, K-nearest neighbor, random forest, and na&#x00EF;ve Bayes), and three descriptors, i.e., amino acid composition (AAC), dipeptide combinations (DPC), and AAC&#x2009;+&#x2009;DPC on the model performance were systematically analyzed.</p>
</sec>
<sec>
<title>Results and discussion</title>
<p>Among the 4 ML methods, the support vector machine model based on the AAC descriptor using 5-fold cross-validation achieved the best prediction accuracy with 80.6%. The AAC outperformed than the DPC and AAC&#x2009;+&#x2009;DPC descriptors regardless of the ML methods used. In addition, amino acid frequencies between psychrophilic and non-psychrophilic proteins revealed that higher frequencies of Ala, Gly, Ser, and Thr, and lower frequencies of Glu, Lys, Arg, Ile,Val, and Leu could be related to the protein psychrophilicity. Further, ternary models were also developed that could classify psychrophilic, mesophilic, and thermophilic proteins effectively. The predictive accuracy of the ternary classification model using AAC descriptor <italic>via</italic> the support vector machine algorithm was 75.8%. These findings would enhance our insight into the cold-adaption mechanisms of psychrophilic proteins and aid in the design of engineered cold-active enzymes. Moreover, the proposed model could be used as a screening tool to identify novel cold-adapted proteins.</p>
</sec>
</abstract>
<kwd-group>
<kwd>psychrophilic enzyme</kwd>
<kwd>machine learning</kwd>
<kwd>support vector machine</kwd>
<kwd>amino acid composition</kwd>
<kwd>structural flexibility</kwd>
</kwd-group>
<contract-num rid="cn1">32272269</contract-num>
<contract-sponsor id="cn1">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content></contract-sponsor>
<counts>
<fig-count count="3"/>
<table-count count="4"/>
<equation-count count="4"/>
<ref-count count="65"/>
<page-count count="8"/>
<word-count count="6846"/>
</counts>
</article-meta>
</front>
<body>
<sec id="sec1" sec-type="intro">
<title>Introduction</title>
<p>Psychrophilic enzymes are also called cold-adaptive enzymes, maintaining catalytic efficiency and function under low temperatures (0&#x2013;25&#x00B0;C; <xref ref-type="bibr" rid="ref48">Siddiqui and Cavicchioli, 2006</xref>; <xref ref-type="bibr" rid="ref45">Sarmiento et al., 2015</xref>). This types of enzymes are mainly isolated from glaciers, polar regions, and deep seas. Possessing high catalytic activity at low and moderate temperatures and heat-labile properties, psychrophilic enzymes could be used in various industries such as detergent, food, medical, and bioremediation (<xref ref-type="bibr" rid="ref42">Saeki et al., 2007</xref>; <xref ref-type="bibr" rid="ref5">Al-Ghanayem and Joseph, 2020</xref>; <xref ref-type="bibr" rid="ref25">Gupta et al., 2020</xref>; <xref ref-type="bibr" rid="ref34">Mangiagalli et al., 2020</xref>; <xref ref-type="bibr" rid="ref31">Kumar et al., 2021</xref>; <xref ref-type="bibr" rid="ref37">Mhetras et al., 2021</xref>), thus they offer huge economic benefits. For example, the addition of cold-adapted proteases, lipases, and cellulases in detergents can remove dirt efficiently under low temperatures, which is eco-friendly and cost-effective as does not require an extensive heating process. Cold-active lipases additives can prevent spoilage and adverse changes of substrates that are used in food processing. The application of cold-adapted lipases in the synthesis of chiral organic compounds has also been reported in several reviews (<xref ref-type="bibr" rid="ref37">Mhetras et al., 2021</xref>). Additionally, psychrophilic enzymes are not only vital enzymes in industrial applications, but also are valuable research models in the basic research of protein folding and catalysis (<xref ref-type="bibr" rid="ref20">Feller and Gerday, 2003</xref>; <xref ref-type="bibr" rid="ref48">Siddiqui and Cavicchioli, 2006</xref>; <xref ref-type="bibr" rid="ref7">&#x00C5;qvist et al., 2017</xref>).</p>
<p>According to the Arrhenius equation <inline-formula>
<mml:math id="M1">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>e</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>a</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>, the reaction rate decays exponentially with the decrease of temperatures (<xref ref-type="bibr" rid="ref49">Struvay and Feller, 2012</xref>; <xref ref-type="bibr" rid="ref7">&#x00C5;qvist et al., 2017</xref>). The main issue of psychrophilic enzymes is how to maintain the catalytic rate at low temperatures. The first resolved psychrophilic protein structure is alpha-amylase derived from <italic>Alteromonas haloplanctis</italic> (<xref ref-type="bibr" rid="ref1">Aghajari et al., 1996</xref>). The increasing resolved 3D structures of psychrophilic enzymes shed light on the molecular basis of cold-adaption mechanisms (<xref ref-type="bibr" rid="ref9">Arnorsdottir et al., 2005</xref>). The comparison with the mesophilic and thermophilic homologous proteins shows that psychrophilic enzymes have evolved some structural features responsible to maintain the low-temperature catalytic activity, such as more flexible structures, decreased core hydrophobicity, increased surface hydrophobicity, fewer disulfide bonds (<xref ref-type="bibr" rid="ref46">Schr&#x00F8;der Leiros et al., 2000</xref>), and reduced hydrogen bonds (<xref ref-type="bibr" rid="ref46">Schr&#x00F8;der Leiros et al., 2000</xref>; <xref ref-type="bibr" rid="ref2">Aghajari et al., 2003</xref>; <xref ref-type="bibr" rid="ref48">Siddiqui and Cavicchioli, 2006</xref>; <xref ref-type="bibr" rid="ref6">Almog et al., 2009</xref>). Comparative structural analysis showed that different family enzymes adopt one or a combination of several structural features to adapt to low-temperatures (<xref ref-type="bibr" rid="ref49">Struvay and Feller, 2012</xref>; <xref ref-type="bibr" rid="ref53">Tribelli and L&#x00F3;pez, 2018</xref>).</p>
<p>Unlike wet experiments that are time-consuming and costly, <italic>in silico</italic> method is a reliable and powerful tool. Machine learning (ML) is a data-driven technology and has been applied to various fields, such as protein structure prediction (<xref ref-type="bibr" rid="ref47">Senior et al., 2020</xref>; <xref ref-type="bibr" rid="ref29">Jumper et al., 2021</xref>), protein engineering (<xref ref-type="bibr" rid="ref43">Saito et al., 2018</xref>; <xref ref-type="bibr" rid="ref54">Wang et al., 2018</xref>; <xref ref-type="bibr" rid="ref35">Mazurenko et al., 2019</xref>; <xref ref-type="bibr" rid="ref58">Wu et al., 2019</xref>; <xref ref-type="bibr" rid="ref59">Yang et al., 2019</xref>), protein function prediction (<xref ref-type="bibr" rid="ref26">Han et al., 2006</xref>; <xref ref-type="bibr" rid="ref12">Bonetta and Valentino, 2020</xref>; <xref ref-type="bibr" rid="ref63">Zhang Y. H. et al., 2021</xref>), enzyme substrate scope prediction (<xref ref-type="bibr" rid="ref38">Mou et al., 2021</xref>), screening of novel pharmaceutical candidates (<xref ref-type="bibr" rid="ref14">Chandak et al., 2020</xref>) and efficient catalysts (<xref ref-type="bibr" rid="ref41">Niu et al., 2021</xref>). Computational methods have been conducted to classify acidic and alkaline enzymes effectively based on the protein sequence (<xref ref-type="bibr" rid="ref62">Zhang et al., 2009</xref>; <xref ref-type="bibr" rid="ref30">Khan et al., 2015</xref>). Similarly, predictive models have also been developed to discriminate thermophilic proteins from mesophilic proteins (<xref ref-type="bibr" rid="ref23">Gromiha and Suresh, 2008</xref>; <xref ref-type="bibr" rid="ref32">Lin and Chen, 2011</xref>; <xref ref-type="bibr" rid="ref4">Ai et al., 2018</xref>; <xref ref-type="bibr" rid="ref21">Feng et al., 2020</xref>; <xref ref-type="bibr" rid="ref24">Guo et al., 2020</xref>; <xref ref-type="bibr" rid="ref55">Wang et al., 2020</xref>; <xref ref-type="bibr" rid="ref3">Ahmed et al., 2022</xref>). These models that are composed of different descriptors based on protein sequences achieved reliable prediction performance. Many comparative analyzes have shown that different types of amino acids have a tendency among mesophilic and thermophilic proteins, and amino acid composition (AAC) descriptor could discriminate mesophilic and thermophilic proteins using the support vector machines (SVM), <italic>K</italic>-nearest neighbor (KNN), random forest (RF), and na&#x00EF;ve Bayes (Bayes) algorithms. In addition, other sequence descriptors such as dipeptide combinations (DPC) were also utilized to establish the predictive model.</p>
<p>Due to the essential role of psychrophilic enzymes in industrial applications and scientific research, many efforts have also been carried out to investigate cold-adapted enzymes. A previous study has shown that the random forest model using AAC descriptor and hydrophobic residue patterns as input features could discriminate psychrophilic from mesophilic proteins, with an accuracy of 70.3% (<xref ref-type="bibr" rid="ref39">Nath et al., 2012</xref>). To achieve the interpretability of the model, a cascade model was also proposed, and the percentage of different amino acid composition ranges was used as input features, in which the attribute with the highest discriminability was the serine, lysine, glutamic acid and alanine amino acid composition. The rotation forest reached the highest accuracy with 70.5% (<xref ref-type="bibr" rid="ref40">Nath and Subbiah, 2014</xref>). Although these models achieved good accuracy, there are also several issues needed to be addressed. On the one hand, the influence of different features on predictive accuracies should be investigated. Though the AAC descriptor alone proved to be a very useful feature for discriminating psychrophilic and mesophilic proteins, the DPC descriptor has not been explored. On the other hand, the feasibility of the ternary classification model (psychrophilic-mesophilic-thermophilic) is also worth exploring.</p>
<p>In this concern, the iLearnPlus software was exploited to develop computational model, where feature extraction, feature selection, model construction, and result visualization were all deployed in the software (<xref ref-type="bibr" rid="ref17">Chen et al., 2021</xref>). Considering the ability of the AAC descriptor to identify psychrophilic and mesophilic proteins, the AAC descriptor was utilized in this study, while the DPC descriptor was also tested and the ability of AAC, DPC, and AAC&#x2009;+&#x2009;DPC to distinguish psychrophilic from non-psychrophilic proteins was compared. The results indicated that the binary and ternary classification model could be used for discriminating psychrophilic from mesophilic and thermophilic enzymes. In addition, the accuracies of different models were studied and AAC frequency distributions among psychrophilic and non-psychrophilic proteins were also explored.</p>
</sec>
<sec id="sec2" sec-type="materials|methods">
<title>Materials and methods</title>
<sec id="sec3">
<title>Datasets preparation</title>
<p>The thermophilic and mesophilic proteins were obtained from (<xref ref-type="bibr" rid="ref32">Lin and Chen, 2011</xref>). The psychrophilic proteins were extracted from the UniProt web server, the search keywords including the &#x201C;psychrophilic, cold-adaptive, and low-temperature.&#x201D; Firstly, all queried protein sequences must be reviewed and manually annotated; secondly, entries which be a part of other proteins were excluded; finally, to avoid redundancy and homology bias, the CD-HIT program (<xref ref-type="bibr" rid="ref27">Huang et al., 2010</xref>) was used with a cutoff of 40% sequence identity. The dataset included 2,400 protein sequences, among which the thermophilic, mesophilic, and psychrophilic proteins were 915, 793 and 692, respectively. The training and test sets were split in a 4:1 ratio, so there were 731, 574, and 554 thermophilic, mesophilic, and psychrophilic proteins in the training set, and 184, 219, and 138 in the test set. The sequences of the datasets could be downloaded from the supporting material.</p>
</sec>
<sec id="sec4">
<title>Feature extraction</title>
<p>Protein feature descriptors are generated from protein sequences. The feature descriptor extraction and model construction were implemented using iLearnPlus, a machine-learning platform that served as protein sequence analysis and prediction. It has been reported that the AAC and DPC descriptors can discriminate the thermophilic from mesophilic proteins effectively (<xref ref-type="bibr" rid="ref61">Zhang and Fang, 2006b</xref>; <xref ref-type="bibr" rid="ref23">Gromiha and Suresh, 2008</xref>; <xref ref-type="bibr" rid="ref32">Lin and Chen, 2011</xref>; <xref ref-type="bibr" rid="ref4">Ai et al., 2018</xref>; <xref ref-type="bibr" rid="ref24">Guo et al., 2020</xref>; <xref ref-type="bibr" rid="ref55">Wang et al., 2020</xref>), therefore, the two descriptors were calculated for each protein sequence.</p>
<p>AAC refers to the occurrence of each amino acid in the protein sequence (<xref ref-type="bibr" rid="ref56">Wang et al., 2011</xref>; <xref ref-type="bibr" rid="ref30">Khan et al., 2015</xref>; <xref ref-type="bibr" rid="ref24">Guo et al., 2020</xref>; <xref ref-type="bibr" rid="ref50">Sun et al., 2020</xref>; <xref ref-type="bibr" rid="ref15">Charoenkwan et al., 2021</xref>), as that there are 20 kinds of naturally-occurring amino acids, that is ACDEFGHIKLMNPQRSTVWY. Therefore, each residue frequency in a sequence can be calculated by the following formula:</p>
<disp-formula id="E1">
<mml:math id="M2">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:mfrac>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mi>W</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>
<p>DPC calculates dipeptide composition and generates a 400-dimensional feature vector.(<xref ref-type="bibr" rid="ref15">Charoenkwan et al., 2021</xref>) and it was defined as:</p>
<disp-formula id="E2">
<mml:math id="M3">
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi mathvariant="normal">,</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi mathvariant="normal">,</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mi>W</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>Y</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>
<p>The CHI2 algorithm (<xref ref-type="bibr" rid="ref16">Chen et al., 2009</xref>) was used for DPC feature selection and dimensional reduction.</p>
</sec>
<sec id="sec5">
<title>Model construction</title>
<p>Several machine learning algorithms were tested to distinguish between psychrophilic and non-psychrophilic proteins. Considering the reliable performance of SVM, RF, KNN, and Bayes in classifying thermophilic and mesophilic proteins, these algorithms were used in our study (<xref ref-type="bibr" rid="ref18">Cortes and Vapnik, 1995</xref>; <xref ref-type="bibr" rid="ref13">Breiman, 2001</xref>). The RF is an ensemble of decision trees. The algorithm performs better than decision trees by building and merging multiple decision trees to obtain more accurate results. For a new sample, the RF assigns the class label based on the prediction by each tree. The n_trees was set to 300. The SVM is a simple but powerful supervised machine learning algorithm used in classification and/or regression. It seeks a hyperplane to classify samples. When the sample is linearly inseparable in the low-dimensional space, the kernel function is used to map the sample to the high-dimensional space to achieve linear separability. The radial basis function (RBF) was selected in the kernel function of SVM and the optimized &#x03B3; and C were 8.0 and 15.0, respectively. KNN is also one of the most basic algorithms in supervised machine learning. It assumes that similar things are near to each other, and the Euclidean distance between samples was calculated to solve the classification and regression of data. The top K value was set to 3. Na&#x00EF;ve Bayes method is a set of supervised machine learning algorithms based on Bayes&#x2019; Theorem. It obeys the assumption that every pair of the feature are independent and every feature is equal to the value of the class variable. It states the following relationship, and is mathematically expressed as the following equation:</p>
<disp-formula id="E3">
<mml:math id="M4">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi mathvariant="normal">|</mml:mi>
<mml:mi>B</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>B</mml:mi>
<mml:mi mathvariant="normal">|</mml:mi>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>B</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where A and B are events and P(B)&#x2009;&#x2260;&#x2009;0. All the parameters in four machine learning algorithms were optimized by grid search.</p>
</sec>
<sec id="sec6">
<title>Performance evaluation</title>
<p>The data set was randomly divided into training set and test set in a ratio of 4:1. The 5-fold cross-validation was also used in this study, out of which the datasets were randomly divided into 5 subsets, one of which was used to test the model, and the remaining 4 subsets were used as the training set to train the model and optimize the parameters. This process was repeated 5 times until each subset was used as the test set only once to validate the model. Four indicators were adopted to evaluate the model performance, that is sensitivity (Sn), specificity (Sp), accuracy (ACC), and Matthews correlation coefficient (MCC). The calculation formulas of these indicators were as follows:<table-wrap position="anchor" id="tab1">
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td align="left" valign="top">
<inline-formula>
<mml:math id="M5">
<mml:mrow>
<mml:mi mathvariant="normal">Sensitivity</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">TP</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">TP</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">FN</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left" valign="top">
<inline-formula>
<mml:math id="M6">
<mml:mrow>
<mml:mi mathvariant="normal">Specificity</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">TN</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">TN</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">FP</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left" valign="top">
<inline-formula>
<mml:math id="M7">
<mml:mrow>
<mml:mi mathvariant="normal">Accuracy</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">TP</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">TN</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">TP</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">TN</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">FP</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">FN</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left" valign="top">
<inline-formula>
<mml:math id="M8">
<mml:mrow>
<mml:mi mathvariant="normal">MCC</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">TP</mml:mi>
<mml:mo>&#x00D7;</mml:mo>
<mml:mi mathvariant="normal">TN</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">FP</mml:mi>
<mml:mo>&#x00D7;</mml:mo>
<mml:mi mathvariant="normal">FN</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">TP</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">FP</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">TP</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">FN</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">TN</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">FP</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">TN</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">FN</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
</tbody>
</table>
</table-wrap>Where TP, TN, FP, and FN represent the number of correctly predicted positive samples, correctly predicted negative samples, incorrectly predicted positive samples, and incorrectly predicted negative samples, respectively. For a multi-classification task, the ACC was calculated as follows:</p>
<disp-formula id="E4">
<mml:math id="M9">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Where TP(i), TN(i), FP(i), and FN(i) represent the number of the samples that are correctly predicted as i-th class, the number of samples that are classified correctly as not to be i-th class, the number of samples not in i-th class that is classified wrongly as belonging to i-th class, the number of samples in i-th class that are predicted incorrectly as not in i-th class, respectively. Additionally, ROC (Receiver Operating Characteristic) curves were also utilized to visualize the predictive performance of the classifiers.</p>
</sec>
</sec>
<sec id="sec7">
<title>Results and discussion</title>
<sec id="sec8">
<title>Performance of models for discriminating psychrophilic and non-psychrophilic proteins</title>
<p>The predictive performance of the machine learning model based on AAC, DPC, and the combination of the two descriptors were listed in <xref rid="tab2" ref-type="table">Table 1</xref>. Among the models using with AAC descriptor, the SVM model achieved the highest prediction accuracy with 80.6%. The prediction accuracy of RF was lower than 0.4% of the SVM. And the two other models, Bayes and KNN, the accuracies were less than 80%, especially the Bayes model had the lowest prediction accuracy with 73.8%. All the trained models were public in github<xref rid="fn0004" ref-type="fn"><sup>1</sup></xref>website.</p>
<table-wrap position="float" id="tab2">
<label>Table 1</label>
<caption>
<p>Prediction results of AAC and DPC descriptors for psychrophilic and non-psychrophilic proteins.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Descriptor</th>
<th align="left" valign="top">Model</th>
<th align="center" valign="top">Sn</th>
<th align="center" valign="top">Sp</th>
<th align="center" valign="top">Acc</th>
<th align="center" valign="top">MCC</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="4">AAC</td>
<td align="left" valign="top">RF</td>
<td align="center" valign="top">0.524</td>
<td align="center" valign="top">0.919</td>
<td align="center" valign="top">0.802</td>
<td align="center" valign="top">0.497</td>
</tr>
<tr>
<td align="left" valign="top">SVM</td>
<td align="center" valign="top">0.780</td>
<td align="center" valign="top">0.859</td>
<td align="center" valign="top">0.806</td>
<td align="center" valign="top">0.546</td>
</tr>
<tr>
<td align="left" valign="top">Bayes</td>
<td align="center" valign="top">0.711</td>
<td align="center" valign="top">0.749</td>
<td align="center" valign="top">0.738</td>
<td align="center" valign="top">0.439</td>
</tr>
<tr>
<td align="left" valign="top">KNN</td>
<td align="center" valign="top">0.667</td>
<td align="center" valign="top">0.808</td>
<td align="center" valign="top">0.766</td>
<td align="center" valign="top">0.470</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="4">DPC</td>
<td align="left" valign="top">RF</td>
<td align="center" valign="top">0.266</td>
<td align="center" valign="top">0.940</td>
<td align="center" valign="top">0.740</td>
<td align="center" valign="top">0.300</td>
</tr>
<tr>
<td align="left" valign="top">SVM</td>
<td align="center" valign="top">0.548</td>
<td align="center" valign="top">0.874</td>
<td align="center" valign="top">0.747</td>
<td align="center" valign="top">0.370</td>
</tr>
<tr>
<td align="left" valign="top">Bayes</td>
<td align="center" valign="top">0.654</td>
<td align="center" valign="top">0.696</td>
<td align="center" valign="top">0.684</td>
<td align="center" valign="top">0.348</td>
</tr>
<tr>
<td align="left" valign="top">KNN</td>
<td align="center" valign="top">0.461</td>
<td align="center" valign="top">0.823</td>
<td align="center" valign="top">0.716</td>
<td align="center" valign="top">0.304</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="4">AAC&#x2009;+&#x2009;DPC</td>
<td align="left" valign="top">RF</td>
<td align="center" valign="top">0.529</td>
<td align="center" valign="top">0.943</td>
<td align="center" valign="top">0.790</td>
<td align="center" valign="top">0.497</td>
</tr>
<tr>
<td align="left" valign="top">SVM</td>
<td align="center" valign="top">0.785</td>
<td align="center" valign="top">0.850</td>
<td align="center" valign="top">0.801</td>
<td align="center" valign="top">0.546</td>
</tr>
<tr>
<td align="left" valign="top">Bayes</td>
<td align="center" valign="top">0.743</td>
<td align="center" valign="top">0.702</td>
<td align="center" valign="top">0.714</td>
<td align="center" valign="top">0.439</td>
</tr>
<tr>
<td align="left" valign="top">KNN</td>
<td align="center" valign="top">0.629</td>
<td align="center" valign="top">0.817</td>
<td align="center" valign="top">0.761</td>
<td align="center" valign="top">0.470</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Sn: sensitivity, Sp: specificity, Acc: Accuracy, MCC: Matthews correlation coefficient.</p>
</table-wrap-foot>
</table-wrap>
<p>DPC descriptor generates 400-dimensional vectors, and the CHI2 algorithm was used for feature dimension reduction. The results indicated that the prediction accuracy of the models based on DPC descriptor decreased compared with AAC descriptor, which declined by about 5&#x2013;7%. Similar to the AAC descriptor, the model with DPC descriptor using the SVM algorithm also achieved the best accuracies.</p>
<p>In addition, the two descriptors were integrated to construct the classification model. Compared with the DPC descriptor, the accuracies of the AAC&#x2009;+&#x2009;DPC descriptors had been improved to varying degrees. While compared to AAC descriptors, the accuracies of SVM and KNN models were almost unchanged, RF and Bayes models even dropped by 1.2 and 2.4%, respectively. The models constructed by AAC have achieved best accuracy <italic>via</italic> four machine learning algorithms in this study. Of course, DPC is also an important feature to distinguish psychrophilic proteins from non-psychrophilic proteins, which has also achieved relatively good prediction accuracy. However, the addition of DPC to the descriptor may cause redundancy of features, which makes the accuracy decrease slightly. In a report of using AAC and DPC to distinguish thermophilic and mesophilic proteins, AAC and DPC achieved 0.9256 and 0.9157 prediction accuracy, respectively. The accuracy of AAC and DPC combination to distinguish thermophilic and mesophilic proteins also decreased, though DPC contained more parameters (<xref ref-type="bibr" rid="ref32">Lin and Chen, 2011</xref>).</p>
<p>The ROC curves of four models using AAC and DPC descriptors were plotted (<xref rid="fig1" ref-type="fig">Figure 1</xref>), it also showed that the AAC descriptors outperformed the DPC descriptors. In a comparison of the frequencies of amino acids between thermophilic and non-thermophilic proteins, it is proposed that the AAC captures the thermostability of the protein (<xref ref-type="bibr" rid="ref50">Sun et al., 2020</xref>). Same as thermostability, it is also demonstrated that the psychrophilicity is highly related to the AAC descriptor in this study.</p>
<fig position="float" id="fig1">
<label>Figure 1</label>
<caption>
<p>The receiver operation characteristic (ROC) curve of the four machine learning models using AAC and DPC descriptors. <bold>(A)</bold> Random forest, RF; <bold>(B)</bold> SVM; <bold>(C)</bold> Bayes; <bold>(D)</bold> KNN.</p>
</caption>
<graphic xlink:href="fmicb-14-1130594-g001.tif"/>
</fig>
<p>In addition to the higher predictive performance of the AAC descriptor, it is easy to find in <xref rid="fig1" ref-type="fig">Figure 1</xref> that the SVM model achieved the best predictive accuracy among the four models (AUC 0.8609). It has been shown in many studies that the SVM model based on AAC descriptor had achieved good predictive performance in discriminating thermophilic from mesophilic enzymes. For example, the SVM model constructed by Michael Gromiha et al. using AAC descriptors could distinguish thermophiles from mesophiles with an accuracy of 89% (<xref ref-type="bibr" rid="ref23">Gromiha and Suresh, 2008</xref>). And employing AAC descriptors with only 16 dimensions to discriminate thermophilic and non-thermophilic proteins with 93% accuracy (<xref ref-type="bibr" rid="ref24">Guo et al., 2020</xref>).</p>
</sec>
<sec id="sec9">
<title>Performance of ternary classification for discriminating psychrophilic, mesophilic, and thermophilic proteins</title>
<p>To verify the feasibility of ternary classification, the scatter diagram of the three types of enzymes was calculated employing the K-means clustering method, where psychrophilic proteins were labeled as 1, mesophilic proteins were labeled as 0, and thermophilic proteins were labeled as 2. As seen from <xref rid="fig2" ref-type="fig">Figure 2</xref>, three types of proteins had different distribution patterns on principal component 1 and principal component 2, which indicated that multi-class classification is feasible. Therefore, the ternary classification model was established, and the predictive accuracies of the models for psychrophilic (P), mesophilic (M), and thermophilic (T) proteins were listed in <xref rid="tab3" ref-type="table">Table 2</xref>; <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S1</xref>.</p>
<fig position="float" id="fig2">
<label>Figure 2</label>
<caption>
<p>The scatter plot of the dimension reduction of the three enzymes.</p>
</caption>
<graphic xlink:href="fmicb-14-1130594-g002.tif"/>
</fig>
<table-wrap position="float" id="tab3">
<label>Table 2</label>
<caption>
<p>Prediction accuracies of ternary classification model for psychrophilic, mesophilic, and thermophilic proteins.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Class</th>
<th align="left" valign="top">Descriptor</th>
<th align="center" valign="top">RF</th>
<th align="center" valign="top">SVM</th>
<th align="center" valign="top">Bayes</th>
<th align="center" valign="top">KNN</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top" rowspan="3">P-M-T (P)<xref rid="tfn1" ref-type="table-fn"><sup>a</sup></xref></td>
<td align="left" valign="top">AAC</td>
<td align="center" valign="top">0.738 (0.731)</td>
<td align="center" valign="top">0.758 (0.761)</td>
<td align="center" valign="top">0.756 (0.717)</td>
<td align="center" valign="top">0.746 (0.724)</td>
</tr>
<tr>
<td align="left" valign="top">DPC</td>
<td align="center" valign="top">0.700 (0.710)</td>
<td align="center" valign="top">0.721 (0.703)</td>
<td align="center" valign="top">0.702 (0.703)</td>
<td align="center" valign="top">0.671 (0.688)</td>
</tr>
<tr>
<td align="left" valign="top">AAC&#x2009;+&#x2009;DPC</td>
<td align="center" valign="top">0.736 (0.717)</td>
<td align="center" valign="top">0.761 (0.753)</td>
<td align="center" valign="top">0.716 (0.710)</td>
<td align="center" valign="top">0.688 (0.710)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn1">
<label>a</label>
<p>The combined accuracies with three descriptors for psychrophilic (P), mesophilic (M) and thermophilic (T) proteins, and the accuracies for psychrophilic proteins is listed in bracket.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>The results showed that the accuracies of ternary classification were slightly lower than that of binary classification. The model predictive accuracy of AAC&#x2009;+&#x2009;DPC descriptors by SVM method was 76.1%, which was 4.0% lower than that of binary classification model with the same descriptors and method. In general, the SVM model performed well in discriminating three types of enzymes. As an ensemble classifier, the RF also achieved relatively good prediction accuracy with 73.8% solely using AAC descriptor. Among the four models, the predictive accuracy of the KNN model was relatively lower than other models, the prediction accuracy based on DPC descriptor was 67.1%. Taken together, the AAC descriptor achieved the highest prediction accuracy, which indicated the capacity of the amino acid composition in distinguishing psychrophilic proteins.</p>
</sec>
<sec id="sec10">
<title>Differences of amino acid composition in psychrophilic, mesophilic, and thermophilic proteins</title>
<p>The frequencies of 20 amino acids in psychrophilic, mesophilic, and thermophilic proteins were computed (<xref rid="fig3" ref-type="fig">Figure 3</xref>). Ala, Gly, Ser, and Thr amino acids in psychrophilic enzymes were higher than those in non-psychrophilic proteins, whereas the other amino acids Glu, Lys, and Arg were lower than, the non-psychrophilic proteins, and aliphatic amino acids Ile,Val, and Leu were slightly lower than non-psychrophilic proteins.</p>
<fig position="float" id="fig3">
<label>Figure 3</label>
<caption>
<p>The amino acids composition in the psychrophilic and non-psychrophilic proteins.</p>
</caption>
<graphic xlink:href="fmicb-14-1130594-g003.tif"/>
</fig>
<p>Many studies have demonstrated that psychrophilic proteins maintain their high catalytic activity at low temperatures mainly due to their more flexible structures (<xref ref-type="bibr" rid="ref48">Siddiqui and Cavicchioli, 2006</xref>; <xref ref-type="bibr" rid="ref44">Santiago et al., 2016</xref>; <xref ref-type="bibr" rid="ref7">&#x00C5;qvist et al., 2017</xref>; <xref ref-type="bibr" rid="ref8">Arcus and Mulholland, 2020</xref>). Several factors contribute to conformational flexibility, such as reduced inter-domain and inter-subunit interactions, fewer inter-protein disulfide bonds, and reduced hydrogen bonds and electrostatic interactions. Glycine and alanine are very small amino acids, and the side chains are a hydrogen atom and a methyl group, respectively. And the comparative analysis focused on the dataset from psychrophilic and mesophilic proteins also showed that Ala and Gly residues are over-represented. Increased levels of Gly residue have been suggested to be related to psychrophilicity.</p>
<p>A higher percentage of serine and threonine is also found in the psychrophilic proteins. The study of Subbiah <italic>et.al</italic> on the classification rules for psychrophilic and mesophilic proteins showed that when the percentage of Ser and Thr is higher than certain values, the proteins would be classified as psychrophilic proteins (<xref ref-type="bibr" rid="ref39">Nath et al., 2012</xref>). Meanwhile, a pairwise comparison of proteins from cold-adapted archaea revealed that there was higher content of non-charged polar residues, especially threonine (<xref ref-type="bibr" rid="ref11">Berthelot et al., 2019</xref>; <xref ref-type="bibr" rid="ref10">Bargiela et al., 2020</xref>). Ser and Thr are uncharged polar amino acids and prefer to reside on the surface of the psychrophilic proteins (<xref ref-type="bibr" rid="ref28">Jahandideh et al., 2007</xref>), therefore they tend to have more interactions with water molecules around proteins (<xref ref-type="bibr" rid="ref50">Sun et al., 2020</xref>). Structural and molecular dynamics (MD) analysis of homologous psychrophilic, mesophilic, and thermophilic counterparts of serine proteases (<xref ref-type="bibr" rid="ref52">Tiberti and Papaleo, 2011</xref>; <xref ref-type="bibr" rid="ref19">du et al., 2017</xref>) and serine hydroxy methyltransferases (<xref ref-type="bibr" rid="ref64">Zhang Z. B. et al., 2021</xref>) reported that psychrophilic proteins formed more hydrogen bonds with solvent water molecules. Further analysis revealed that the content of serine in psychrophilic proteases and hydroxy methyltransferases is greater than in homologous mesophilic and thermophilic proteins. Although these studies only include several types proteins, it seems that serine and threonine involve increasement of surface hydrophilicity <italic>via</italic> forming more H-bonds with water molecules to enhance the mobility and flexibility of psychrophilic enzymes.</p>
<p>The charged amino acids in proteins are divided into two groups: basic amino acids which are lysine, arginine, and histidine; while acidic amino acids including glutamic acid and aspartic acid. Basic and acidic amino acids have positive and negative charges under physiological conditions and thus form higher number of salt bridges and electrostatic interactions. Therefore, more charged residues were found in the non-psychrophilic than in psychrophilic proteins to maintain the conformational stability of protein structures (<xref ref-type="bibr" rid="ref22">Gianese et al., 2002</xref>; <xref ref-type="bibr" rid="ref52">Tiberti and Papaleo, 2011</xref>; <xref ref-type="bibr" rid="ref57">Wu et al., 2017</xref>). However, <xref rid="fig3" ref-type="fig">Figure 3</xref> indicated that Asp amino acid favors psychrophilic proteins. It seems that Asp is unstable at high temperatures, thus the increased content of Asp contributes to the structural flexibility of the psychrophilic proteins. Another acidic amino acid, Glu, contributes to the formation of helical structures, and structure comparative analysis shows that the content of helical structures is lower in the psychrophilic proteins than in the mesophilic proteins (<xref ref-type="bibr" rid="ref36">Metpally and Reddy, 2009</xref>), thus the reduced content of Glu maintains the thermolability of psychrophilic proteins. In contrast, the higher charged amino acids in thermophilic proteins are essential to protein stabilization at high temperatures (<xref ref-type="bibr" rid="ref60">Zhang and Fang, 2006a</xref>,<xref ref-type="bibr" rid="ref61">b</xref>; <xref ref-type="bibr" rid="ref23">Gromiha and Suresh, 2008</xref>; <xref ref-type="bibr" rid="ref51">Taylor and Vaisman, 2010</xref>; <xref ref-type="bibr" rid="ref4">Ai et al., 2018</xref>). For example, a model only using Lys residue feature to classify thermophilic and non-thermophilic proteins reached 76.41% accuracy, a striking difference between the thermophilic and non-thermophilic proteins (<xref ref-type="bibr" rid="ref24">Guo et al., 2020</xref>).</p>
<p>The content of three aliphatic acids (valine, leucine, and isoleucine) in psychrophilic is slightly lower than in the non-psychrophilic proteins. The aliphatic amino acids maintain the conformational stability of the protein structure through hydrophobic interactions. Many findings have demonstrated that psychrophilic enzymes possess reduced core hydrophobicity (<xref ref-type="bibr" rid="ref33">Lonhienne et al., 2000</xref>; <xref ref-type="bibr" rid="ref20">Feller and Gerday, 2003</xref>; <xref ref-type="bibr" rid="ref48">Siddiqui and Cavicchioli, 2006</xref>; <xref ref-type="bibr" rid="ref7">&#x00C5;qvist et al., 2017</xref>; <xref ref-type="bibr" rid="ref8">Arcus and Mulholland, 2020</xref>). Such as fewer Ile residue were found on the core of the psychrophilic citrate synthase, trypsins, and AHA (<xref ref-type="bibr" rid="ref48">Siddiqui and Cavicchioli, 2006</xref>). In other comparative studies, fewer Leu residues were proposed to contribute to the reduced hydrophobic interaction within the protein (<xref ref-type="bibr" rid="ref65">Zhou et al., 2008</xref>).</p>
<p>In conclusion, psychrophilicity is the consequence of numerous characteristics, and different families of psychrophilic enzymes may adopt one or several strategies to adapt to low temperatures, which causes no structural features that is always presented in all psychrophilic enzymes.</p>
</sec>
<sec id="sec11">
<title>Feature importance</title>
<p>To identify the key amino acids, the influence of different features subset on the accuracy of the model was investigated. According to the residue differences between psychrophilic and non-psychrophilic proteins, the feature of hydrophobic (ILV), charged (KRED), aromatic (WYF), and polar uncharged (STQ) residues were explored. These residual features were removed, respectively, and the remaining residues were used to build the classification model. It is demonstrated that by removing the descriptors, the performance of all established models was decreased, especially the sensitivity values decreased significantly (<xref rid="tab4" ref-type="table">Table 3</xref>). The largest degradation in performance was the models that excluded from the KRED and STQ residues. It is deduced that the charged amino acids and non-polar amino acids play a vital role in discriminating psychrophilic from non-psychrophilic proteins. However, the Acc and MCC values did not decrease significantly, because the number of psychrophilic proteins was smaller than that of non-psychrophilic proteins, thus the subtle change of TP values had little effect on Acc and MCC.</p>
<table-wrap position="float" id="tab4">
<label>Table 3</label>
<caption>
<p>Prediction results of using different AAC descriptors for psychrophilic and non-psychrophilic proteins.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Descriptor</th>
<th align="center" valign="top">Sn</th>
<th align="center" valign="top">Sp</th>
<th align="center" valign="top">Acc</th>
<th align="center" valign="top">MCC</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">AAC</td>
<td align="center" valign="top">0.780</td>
<td align="center" valign="top">0.859</td>
<td align="center" valign="top">0.806</td>
<td align="center" valign="top">0.550</td>
</tr>
<tr>
<td align="left" valign="top">AAC-WYF&#x002A;</td>
<td align="center" valign="top">0.667</td>
<td align="center" valign="top">0.860</td>
<td align="center" valign="top">0.803</td>
<td align="center" valign="top">0.536</td>
</tr>
<tr>
<td align="left" valign="top">AAC- ILV&#x002A;</td>
<td align="center" valign="top">0.660</td>
<td align="center" valign="top">0.869</td>
<td align="center" valign="top">0.807</td>
<td align="center" valign="top">0.544</td>
</tr>
<tr>
<td align="left" valign="top">AAC-KRED&#x002A;</td>
<td align="center" valign="top">0.615</td>
<td align="center" valign="top">0.879</td>
<td align="center" valign="top">0.800</td>
<td align="center" valign="top">0.520</td>
</tr>
<tr>
<td align="left" valign="top">AAC-STQ&#x002A;</td>
<td align="center" valign="top">0.631</td>
<td align="center" valign="top">0.865</td>
<td align="center" valign="top">0.795</td>
<td align="center" valign="top">0.514</td>
</tr>
<tr>
<td align="left" valign="top">AAC-KREDSTQ&#x002A;</td>
<td align="center" valign="top">0.537</td>
<td align="center" valign="top">0.866</td>
<td align="center" valign="top">0.768</td>
<td align="center" valign="top">0.426</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>AAC-WYF&#x002A; means the model constructed by the deletion of WYF amino acids frequency features, and other descriptors in the table were constructed similarly to this descriptor.</p>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec id="sec12" sec-type="conclusions">
<title>Conclusion</title>
<p>In this study, the iLearnPlus platform was utilized to develop binary and ternary classification machine learning models to identify psychrophilic proteins. The models were constructed based on AAC, DPC, and the combination of two descriptors, respectively. In the binary classification models, the SVM model using AAC descriptor achieved the highest prediction accuracy with 80.6%. Whereas, the accuracy of the SVM model using the DPC descriptor was 74.7%. It indicated that AAC descriptor can better distinguish psychrophilic and non-psychrophilic proteins than DPC descriptor. At the same time, the distribution frequency difference of AAC in psychrophilic and non-psychrophilic proteins was compared, and the influence of different amino acid composition in AAC descriptor on the accuracy of the model was identified. This also provides the interpretability of the model for AAC descriptor could better distinguish psychrophilic from non-psychrophilic proteins. The frequency of amino acid composition results demonstrated that the abundance of Ala, Gly amino acids in psychrophilic proteins might provide greater conformational mobility. Meanwhile, a higher number of Ser and Thr amino acids in psychrophilic enzymes could enhance the interaction between the protein with water molecules, thus inducing the protein structural flexibility. Moreover, the decreased charged amino acids in psychrophilic proteins tend to form fewer salt bridges and hydrogen bonds within the protein and be important for the structural plasticity of cold-adapted enzymes. Non-psychrophilic proteins showed favor for aliphatic residues (Leu, Ile, Val) than psychrophilic proteins. In a word, the sequence changes of psychrophilic proteins are related to the protein structural flexibility. Additionally, compared with binary classification, the feasibility of ternary classification was also investigated. The proposed machine learning model is expected to be useful for the identification of psychrophilic enzymes and can provide meaningful guidance for the modification of cold-adaption of enzymes.</p>
</sec>
<sec id="sec13" sec-type="data-availability">
<title>Data availability statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/<xref rid="sec17" ref-type="sec">Supplementary material</xref>.</p>
</sec>
<sec id="sec14">
<title>Author contributions</title>
<p>AH: conceptualization, methodology, software, investigation, and writing-original draft. FpL: supervision and funding acquisition. FfL: writing&#x2014;review &#x0026; editing and funding acquisition. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="sec15" sec-type="funding-information">
<title>Funding</title>
<p>This work was supported by the National Key R&#x0026;D Program of China (2021YFC2102700), National Natural Science Foundation of China (No. 32272269) and Tianjin Research Innovation Project for Postgraduate Students (2021YJSB214).</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec id="sec17" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fmicb.2023.1130594/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fmicb.2023.1130594/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.docx" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aghajari</surname> <given-names>N.</given-names></name> <name><surname>Feller</surname> <given-names>G.</given-names></name> <name><surname>Gerday</surname> <given-names>C.</given-names></name> <name><surname>Haser</surname> <given-names>R.</given-names></name></person-group> (<year>1996</year>). <article-title>Crystallization and preliminary X-ray diffraction studies of alpha-amylase from the antarctic psychrophile <italic>Alteromonas haloplanctis</italic> A23</article-title>. <source>Protein Sci.</source> <volume>5</volume>, <fpage>2128</fpage>&#x2013;<lpage>2129</lpage>. doi: <pub-id pub-id-type="doi">10.1002/pro.5560051021</pub-id>, PMID: <pub-id pub-id-type="pmid">8897615</pub-id></citation></ref>
<ref id="ref2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aghajari</surname> <given-names>N.</given-names></name> <name><surname>Van Petegem</surname> <given-names>F.</given-names></name> <name><surname>Villeret</surname> <given-names>V.</given-names></name> <name><surname>Chessa</surname> <given-names>J. P.</given-names></name> <name><surname>Gerday</surname> <given-names>C.</given-names></name> <name><surname>Haser</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2003</year>). <article-title>Crystal structures of a psychrophilic metalloprotease reveal new insights into catalysis by cold-adapted proteases</article-title>. <source>Proteins</source> <volume>50</volume>, <fpage>636</fpage>&#x2013;<lpage>647</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.10264</pub-id>, PMID: <pub-id pub-id-type="pmid">12577270</pub-id></citation></ref>
<ref id="ref3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahmed</surname> <given-names>Z.</given-names></name> <name><surname>Zulfiqar</surname> <given-names>H.</given-names></name> <name><surname>Khan</surname> <given-names>A. A.</given-names></name> <name><surname>Gul</surname> <given-names>I.</given-names></name> <name><surname>Dao</surname> <given-names>F. Y.</given-names></name> <name><surname>Zhang</surname> <given-names>Z. Y.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>iThermo: a sequence-based model for identifying Thermophilic proteins using a multi-feature fusion strategy</article-title>. <source>Front. Microbiol.</source> <volume>13</volume>:<fpage>790063</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2022.790063</pub-id>, PMID: <pub-id pub-id-type="pmid">35273581</pub-id></citation></ref>
<ref id="ref4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ai</surname> <given-names>H. X.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>J. K.</given-names></name> <name><surname>Cui</surname> <given-names>T.</given-names></name> <name><surname>Chang</surname> <given-names>A. K.</given-names></name> <name><surname>Liu</surname> <given-names>H. S.</given-names></name></person-group> (<year>2018</year>). <article-title>Discrimination of Thermophilic and Mesophilic proteins using support vector machine and decision tree</article-title>. <source>Curr. Proteom.</source> <volume>15</volume>, <fpage>374</fpage>&#x2013;<lpage>383</lpage>. doi: <pub-id pub-id-type="doi">10.2174/1570164615666180718143606</pub-id></citation></ref>
<ref id="ref5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Al-Ghanayem</surname> <given-names>A. A.</given-names></name> <name><surname>Joseph</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <article-title>Current prospective in using cold-active enzymes as eco-friendly detergent additive</article-title>. <source>Appl. Microbiol. Biotechnol.</source> <volume>104</volume>, <fpage>2871</fpage>&#x2013;<lpage>2882</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s00253-020-10429-x</pub-id>, PMID: <pub-id pub-id-type="pmid">32037467</pub-id></citation></ref>
<ref id="ref6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Almog</surname> <given-names>O.</given-names></name> <name><surname>Gonzalez</surname> <given-names>A.</given-names></name> <name><surname>Godin</surname> <given-names>N.</given-names></name> <name><surname>de Leeuw</surname> <given-names>M.</given-names></name> <name><surname>Mekel</surname> <given-names>M. J.</given-names></name> <name><surname>Klein</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>The crystal structures of the psychrophilic subtilisin S41 and the mesophilic subtilisin Sph reveal the same calcium-loaded state</article-title>. <source>Proteins</source> <volume>74</volume>, <fpage>489</fpage>&#x2013;<lpage>496</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.22175</pub-id>, PMID: <pub-id pub-id-type="pmid">18655058</pub-id></citation></ref>
<ref id="ref7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>&#x00C5;qvist</surname> <given-names>J.</given-names></name> <name><surname>Isaksen</surname> <given-names>G. V.</given-names></name> <name><surname>Brandsdal</surname> <given-names>B. O.</given-names></name></person-group> (<year>2017</year>). <article-title>Computation of enzyme cold adaptation</article-title>. <source>Nat. Rev. Chem.</source> <volume>1</volume>, <fpage>1</fpage>&#x2013;<lpage>14</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41570-017-0051</pub-id></citation></ref>
<ref id="ref8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arcus</surname> <given-names>V. L.</given-names></name> <name><surname>Mulholland</surname> <given-names>A. J.</given-names></name></person-group> (<year>2020</year>). <article-title>Temperature, dynamics, and enzyme-catalyzed reaction rates</article-title>. <source>Annu. Rev. Biophys.</source> <volume>49</volume>, <fpage>163</fpage>&#x2013;<lpage>180</lpage>. doi: <pub-id pub-id-type="doi">10.1146/annurev-biophys-121219-081520</pub-id>, PMID: <pub-id pub-id-type="pmid">32040931</pub-id></citation></ref>
<ref id="ref9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arnorsdottir</surname> <given-names>J.</given-names></name> <name><surname>Kristjansson</surname> <given-names>M. M.</given-names></name> <name><surname>Ficner</surname> <given-names>R.</given-names></name></person-group> (<year>2005</year>). <article-title>Crystal structure of a subtilisin-like serine proteinase from a psychrotrophic vibrio species reveals structural aspects of cold adaptation</article-title>. <source>FEBS J.</source> <volume>272</volume>, <fpage>832</fpage>&#x2013;<lpage>845</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1742-4658.2005.04523.x</pub-id>, PMID: <pub-id pub-id-type="pmid">15670163</pub-id></citation></ref>
<ref id="ref10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bargiela</surname> <given-names>R.</given-names></name> <name><surname>Lanthaler</surname> <given-names>K.</given-names></name> <name><surname>Potter</surname> <given-names>C. M.</given-names></name> <name><surname>Ferrer</surname> <given-names>M.</given-names></name> <name><surname>Yakunin</surname> <given-names>A. F.</given-names></name> <name><surname>Paizs</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Proteome cold-shock response in the extremely acidophilic Archaeon, Cuniculiplasma divulgatum</article-title>. <source>Microorganisms</source> <volume>8</volume>:<fpage>759</fpage>. doi: <pub-id pub-id-type="doi">10.3390/microorganisms8050759</pub-id>, PMID: <pub-id pub-id-type="pmid">32438588</pub-id></citation></ref>
<ref id="ref11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berthelot</surname> <given-names>C.</given-names></name> <name><surname>Clarke</surname> <given-names>J.</given-names></name> <name><surname>Desvignes</surname> <given-names>T.</given-names></name> <name><surname>William Detrich</surname> <given-names>H.</given-names></name> <name><surname>Flicek</surname> <given-names>P.</given-names></name> <name><surname>Peck</surname> <given-names>L. S.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Adaptation of proteins to the cold in Antarctic fish: a role for methionine?</article-title> <source>Genome Biol. Evol.</source> <volume>11</volume>, <fpage>220</fpage>&#x2013;<lpage>231</lpage>. doi: <pub-id pub-id-type="doi">10.1093/gbe/evy262</pub-id>, PMID: <pub-id pub-id-type="pmid">30496401</pub-id></citation></ref>
<ref id="ref12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonetta</surname> <given-names>R.</given-names></name> <name><surname>Valentino</surname> <given-names>G.</given-names></name></person-group> (<year>2020</year>). <article-title>Machine learning techniques for protein function prediction</article-title>. <source>Proteins</source> <volume>88</volume>, <fpage>397</fpage>&#x2013;<lpage>413</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.25832</pub-id></citation></ref>
<ref id="ref13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L.</given-names></name></person-group> (<year>2001</year>). <article-title>Random forests</article-title>. <source>Mach. Learn.</source> <volume>45</volume>, <fpage>5</fpage>&#x2013;<lpage>32</lpage>. doi: <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id></citation></ref>
<ref id="ref14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chandak</surname> <given-names>T.</given-names></name> <name><surname>Mayginnes</surname> <given-names>J. P.</given-names></name> <name><surname>Mayes</surname> <given-names>H.</given-names></name> <name><surname>Wong</surname> <given-names>C. F.</given-names></name></person-group> (<year>2020</year>). <article-title>Using machine learning to improve ensemble docking for drug discovery</article-title>. <source>Proteins</source> <volume>88</volume>, <fpage>1263</fpage>&#x2013;<lpage>1270</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.25899</pub-id>, PMID: <pub-id pub-id-type="pmid">32401384</pub-id></citation></ref>
<ref id="ref15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Charoenkwan</surname> <given-names>P.</given-names></name> <name><surname>Chotpatiwetchkul</surname> <given-names>W.</given-names></name> <name><surname>Lee</surname> <given-names>V. S.</given-names></name> <name><surname>Nantasenamat</surname> <given-names>C.</given-names></name> <name><surname>Shoombuatong</surname> <given-names>W.</given-names></name></person-group> (<year>2021</year>). <article-title>A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides</article-title>. <source>Sci. Rep.</source> <volume>11</volume>:<fpage>23782</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-021-03293-w</pub-id>, PMID: <pub-id pub-id-type="pmid">34893688</pub-id></citation></ref>
<ref id="ref16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>K.</given-names></name> <name><surname>Jiang</surname> <given-names>Y.</given-names></name> <name><surname>Du</surname> <given-names>L.</given-names></name> <name><surname>Kurgan</surname> <given-names>L.</given-names></name></person-group> (<year>2009</year>). <article-title>Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs</article-title>. <source>J. Comput. Chem.</source> <volume>30</volume>, <fpage>163</fpage>&#x2013;<lpage>172</lpage>. doi: <pub-id pub-id-type="doi">10.1002/jcc.21053</pub-id>, PMID: <pub-id pub-id-type="pmid">18567007</pub-id></citation></ref>
<ref id="ref17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>P.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Li</surname> <given-names>F.</given-names></name> <name><surname>Xiang</surname> <given-names>D.</given-names></name> <name><surname>Chen</surname> <given-names>Y. Z.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization</article-title>. <source>Nucleic Acids Res.</source> <volume>49</volume>:<fpage>e60</fpage>. doi: <pub-id pub-id-type="doi">10.1093/nar/gkab122</pub-id>, PMID: <pub-id pub-id-type="pmid">33660783</pub-id></citation></ref>
<ref id="ref18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cortes</surname> <given-names>C.</given-names></name> <name><surname>Vapnik</surname> <given-names>V.</given-names></name></person-group> (<year>1995</year>). <article-title>Support-vector networks</article-title>. <source>Mach. Learn.</source> <volume>20</volume>, <fpage>273</fpage>&#x2013;<lpage>297</lpage>. doi: <pub-id pub-id-type="doi">10.1007/BF00994018</pub-id></citation></ref>
<ref id="ref19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>du</surname> <given-names>X.</given-names></name> <name><surname>Sang</surname> <given-names>P.</given-names></name> <name><surname>Xia</surname> <given-names>Y. L.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Liang</surname> <given-names>J.</given-names></name> <name><surname>Ai</surname> <given-names>S. M.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Comparative thermal unfolding study of psychrophilic and mesophilic subtilisin-like serine proteases by molecular dynamics simulations</article-title>. <source>J. Biomol. Struct. Dyn.</source> <volume>35</volume>, <fpage>1500</fpage>&#x2013;<lpage>1517</lpage>. doi: <pub-id pub-id-type="doi">10.1080/07391102.2016.1188155</pub-id>, PMID: <pub-id pub-id-type="pmid">27485684</pub-id></citation></ref>
<ref id="ref20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Feller</surname> <given-names>G.</given-names></name> <name><surname>Gerday</surname> <given-names>C.</given-names></name></person-group> (<year>2003</year>). <article-title>Psychrophilic enzymes: hot topics in cold adaptation</article-title>. <source>Nat. Rev. Microbiol.</source> <volume>1</volume>, <fpage>200</fpage>&#x2013;<lpage>208</lpage>. doi: <pub-id pub-id-type="doi">10.1038/nrmicro773</pub-id>, PMID: <pub-id pub-id-type="pmid">15035024</pub-id></citation></ref>
<ref id="ref21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Feng</surname> <given-names>C.</given-names></name> <name><surname>Ma</surname> <given-names>Z.</given-names></name> <name><surname>Yang</surname> <given-names>D.</given-names></name> <name><surname>Li</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>A method for prediction of Thermophilic protein based on reduced amino acids and mixed features</article-title>. <source>Front. Bioeng. Biotechnol.</source> <volume>8</volume>:<fpage>285</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fbioe.2020.00285</pub-id>, PMID: <pub-id pub-id-type="pmid">32432088</pub-id></citation></ref>
<ref id="ref22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gianese</surname> <given-names>G.</given-names></name> <name><surname>Bossa</surname> <given-names>F.</given-names></name> <name><surname>Pascarella</surname> <given-names>S.</given-names></name></person-group> (<year>2002</year>). <article-title>Comparative structural analysis of psychrophilic and meso- and thermophilic enzymes</article-title>. <source>Proteins</source> <volume>47</volume>, <fpage>236</fpage>&#x2013;<lpage>249</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.10084</pub-id>, PMID: <pub-id pub-id-type="pmid">11933070</pub-id></citation></ref>
<ref id="ref23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gromiha</surname> <given-names>M. M.</given-names></name> <name><surname>Suresh</surname> <given-names>M. X.</given-names></name></person-group> (<year>2008</year>). <article-title>Discrimination of mesophilic and thermophilic proteins using machine learning algorithms</article-title>. <source>Proteins</source> <volume>70</volume>, <fpage>1274</fpage>&#x2013;<lpage>1279</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.21616</pub-id>, PMID: <pub-id pub-id-type="pmid">17876820</pub-id></citation></ref>
<ref id="ref24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>P.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Discrimination of Thermophilic proteins and non-thermophilic proteins using feature dimension reduction</article-title>. <source>Front. Bioeng. Biotechnol.</source> <volume>8</volume>:<fpage>584807</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fbioe.2020.584807</pub-id>, PMID: <pub-id pub-id-type="pmid">33195148</pub-id></citation></ref>
<ref id="ref25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gupta</surname> <given-names>S. K.</given-names></name> <name><surname>Kataki</surname> <given-names>S.</given-names></name> <name><surname>Chatterjee</surname> <given-names>S.</given-names></name> <name><surname>Prasad</surname> <given-names>R. K.</given-names></name> <name><surname>Datta</surname> <given-names>S.</given-names></name> <name><surname>Vairale</surname> <given-names>M. G.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Cold adaptation in bacteria with special focus on cellulase production and its potential application</article-title>. <source>J. Clean. Prod.</source> <volume>258</volume>:<fpage>120351</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jclepro.2020.120351</pub-id></citation></ref>
<ref id="ref26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>L. Y.</given-names></name> <name><surname>Cui</surname> <given-names>J.</given-names></name> <name><surname>Lin</surname> <given-names>H. H.</given-names></name> <name><surname>Ji</surname> <given-names>Z. L.</given-names></name> <name><surname>Cao</surname> <given-names>Z. W.</given-names></name> <name><surname>Li</surname> <given-names>Y. X.</given-names></name> <etal/></person-group>. (<year>2006</year>). <article-title>Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity</article-title>. <source>Proteomics</source> <volume>6</volume>, <fpage>4023</fpage>&#x2013;<lpage>4037</lpage>. doi: <pub-id pub-id-type="doi">10.1002/pmic.200500938</pub-id>, PMID: <pub-id pub-id-type="pmid">16791826</pub-id></citation></ref>
<ref id="ref27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>Y.</given-names></name> <name><surname>Niu</surname> <given-names>B.</given-names></name> <name><surname>Gao</surname> <given-names>Y.</given-names></name> <name><surname>Fu</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name></person-group> (<year>2010</year>). <article-title>CD-HIT suite: a web server for clustering and comparing biological sequences</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>680</fpage>&#x2013;<lpage>682</lpage>. doi: <pub-id pub-id-type="doi">10.1093/bioinformatics/btq003</pub-id>, PMID: <pub-id pub-id-type="pmid">20053844</pub-id></citation></ref>
<ref id="ref28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jahandideh</surname> <given-names>S.</given-names></name> <name><surname>Barzegari Asadabadi</surname> <given-names>E.</given-names></name> <name><surname>Abdolmaleki</surname> <given-names>P.</given-names></name> <name><surname>Jahandideh</surname> <given-names>M.</given-names></name> <name><surname>Hoseini</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>Protein psychrophilicity: role of residual structural properties in adaptation of proteins to low temperatures</article-title>. <source>J. Theor. Biol.</source> <volume>248</volume>, <fpage>721</fpage>&#x2013;<lpage>726</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jtbi.2007.06.019</pub-id>, PMID: <pub-id pub-id-type="pmid">17669434</pub-id></citation></ref>
<ref id="ref29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jumper</surname> <given-names>J.</given-names></name> <name><surname>Evans</surname> <given-names>R.</given-names></name> <name><surname>Pritzel</surname> <given-names>A.</given-names></name> <name><surname>Green</surname> <given-names>T.</given-names></name> <name><surname>Figurnov</surname> <given-names>M.</given-names></name> <name><surname>Ronneberger</surname> <given-names>O.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Highly accurate protein structure prediction with AlphaFold</article-title>. <source>Nature</source> <volume>596</volume>, <fpage>583</fpage>&#x2013;<lpage>589</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41586-021-03819-2</pub-id>, PMID: <pub-id pub-id-type="pmid">34265844</pub-id></citation></ref>
<ref id="ref30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khan</surname> <given-names>Z. U.</given-names></name> <name><surname>Hayat</surname> <given-names>M.</given-names></name> <name><surname>Khan</surname> <given-names>M. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model</article-title>. <source>J. Theor. Biol.</source> <volume>365</volume>, <fpage>197</fpage>&#x2013;<lpage>203</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jtbi.2014.10.014</pub-id>, PMID: <pub-id pub-id-type="pmid">25452135</pub-id></citation></ref>
<ref id="ref31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kumar</surname> <given-names>A.</given-names></name> <name><surname>Mukhia</surname> <given-names>S.</given-names></name> <name><surname>Kumar</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>Industrial applications of cold-adapted enzymes: challenges, innovations and future perspective</article-title>. <source>3 Biotech</source> <volume>11</volume>:<fpage>426</fpage>. doi: <pub-id pub-id-type="doi">10.1007/s13205-021-02929-y</pub-id>, PMID: <pub-id pub-id-type="pmid">34567931</pub-id></citation></ref>
<ref id="ref32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>H.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name></person-group> (<year>2011</year>). <article-title>Prediction of thermophilic proteins using feature selection technique</article-title>. <source>J. Microbiol. Methods</source> <volume>84</volume>, <fpage>67</fpage>&#x2013;<lpage>70</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.mimet.2010.10.013</pub-id>, PMID: <pub-id pub-id-type="pmid">21044646</pub-id></citation></ref>
<ref id="ref33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lonhienne</surname> <given-names>T.</given-names></name> <name><surname>Gerday</surname> <given-names>C.</given-names></name> <name><surname>Feller</surname> <given-names>G.</given-names></name></person-group> (<year>2000</year>). <article-title>Psychrophilic enzymes: revisiting the thermodynamic parameters of activation may explain local &#x00A3;exibility</article-title>. <source>Biochim. Biophys. Acta</source> <volume>1543</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. doi: <pub-id pub-id-type="doi">10.1016/s0167-4838(00)00210-7</pub-id></citation></ref>
<ref id="ref34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mangiagalli</surname> <given-names>M.</given-names></name> <name><surname>Brocca</surname> <given-names>S.</given-names></name> <name><surname>Orlando</surname> <given-names>M.</given-names></name> <name><surname>Lotti</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>The "cold revolution". Present and future applications of cold-active enzymes and ice-binding proteins</article-title>. <source>New Biotechnol.</source> <volume>55</volume>, <fpage>5</fpage>&#x2013;<lpage>11</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.nbt.2019.09.003</pub-id>, PMID: <pub-id pub-id-type="pmid">31546027</pub-id></citation></ref>
<ref id="ref35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mazurenko</surname> <given-names>S.</given-names></name> <name><surname>Prokop</surname> <given-names>Z.</given-names></name> <name><surname>Damborsky</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>Machine learning in enzyme engineering</article-title>. <source>ACS Catal.</source> <volume>10</volume>, <fpage>1210</fpage>&#x2013;<lpage>1223</lpage>. doi: <pub-id pub-id-type="doi">10.1021/acscatal.9b04321</pub-id></citation></ref>
<ref id="ref36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Metpally</surname> <given-names>R. P.</given-names></name> <name><surname>Reddy</surname> <given-names>B. V.</given-names></name></person-group> (<year>2009</year>). <article-title>Comparative proteome analysis of psychrophilic versus mesophilic bacterial species: insights into the molecular basis of cold adaptation of proteins</article-title>. <source>BMC Genomics</source> <volume>10</volume>:<fpage>11</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2164-10-11</pub-id>, PMID: <pub-id pub-id-type="pmid">19133128</pub-id></citation></ref>
<ref id="ref37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mhetras</surname> <given-names>N.</given-names></name> <name><surname>Mapare</surname> <given-names>V.</given-names></name> <name><surname>Gokhale</surname> <given-names>D.</given-names></name></person-group> (<year>2021</year>). <article-title>Cold active lipases: biocatalytic tools for greener technology</article-title>. <source>Appl. Biochem. Biotechnol.</source> <volume>193</volume>, <fpage>2245</fpage>&#x2013;<lpage>2266</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s12010-021-03516-w</pub-id>, PMID: <pub-id pub-id-type="pmid">33544363</pub-id></citation></ref>
<ref id="ref38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mou</surname> <given-names>Z. Y.</given-names></name> <name><surname>Eakes</surname> <given-names>J.</given-names></name> <name><surname>Cooper</surname> <given-names>C. J.</given-names></name> <name><surname>Foster</surname> <given-names>C. M.</given-names></name> <name><surname>Standaert</surname> <given-names>R. F.</given-names></name> <name><surname>Podar</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Machine learning-based prediction of enzyme substrate scope: application to bacterial nitrilases</article-title>. <source>Proteins</source> <volume>89</volume>, <fpage>336</fpage>&#x2013;<lpage>347</lpage>. doi: <pub-id pub-id-type="doi">10.1002/prot.26019</pub-id>, PMID: <pub-id pub-id-type="pmid">33118210</pub-id></citation></ref>
<ref id="ref39"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Nath</surname> <given-names>A.</given-names></name> <name><surname>Chaube</surname> <given-names>R.</given-names></name> <name><surname>Karthikeyan</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>). &#x201C;Discrimination of psychrophilic and Mesophilic proteins using random Forest algorithm,&#x201D; in <italic>International Conference on Biomedical Engineering and Biotechnology</italic>. <fpage>179</fpage>&#x2013;<lpage>182</lpage>.</citation></ref>
<ref id="ref40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nath</surname> <given-names>A.</given-names></name> <name><surname>Subbiah</surname> <given-names>K.</given-names></name></person-group> (<year>2014</year>). <article-title>Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier</article-title>. <source>Comput. Biol. Chem.</source> <volume>53</volume>, <fpage>198</fpage>&#x2013;<lpage>203</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compbiolchem.2014.10.002</pub-id>, PMID: <pub-id pub-id-type="pmid">25462328</pub-id></citation></ref>
<ref id="ref41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Niu</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>X. H.</given-names></name> <name><surname>Wang</surname> <given-names>X. T.</given-names></name> <name><surname>Shao</surname> <given-names>C.</given-names></name> <name><surname>Robertson</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Z. F.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Single-atom rhodium on defective g-C3N4: a promising Bifunctional oxygen Electrocatalyst</article-title>. <source>ACS Sustain. Chem. Eng.</source> <volume>9</volume>, <fpage>3590</fpage>&#x2013;<lpage>3599</lpage>. doi: <pub-id pub-id-type="doi">10.1021/acssuschemeng.0c09192</pub-id></citation></ref>
<ref id="ref42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saeki</surname> <given-names>K.</given-names></name> <name><surname>Ozaki</surname> <given-names>K.</given-names></name> <name><surname>Kobayashi</surname> <given-names>T.</given-names></name> <name><surname>Ito</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>Detergent alkaline proteases: enzymatic properties, genes, and crystal structures</article-title>. <source>J. Biosci. Bioeng.</source> <volume>103</volume>, <fpage>501</fpage>&#x2013;<lpage>508</lpage>. doi: <pub-id pub-id-type="doi">10.1263/jbb.103.501</pub-id>, PMID: <pub-id pub-id-type="pmid">17630120</pub-id></citation></ref>
<ref id="ref43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saito</surname> <given-names>Y.</given-names></name> <name><surname>Oikawa</surname> <given-names>M.</given-names></name> <name><surname>Nakazawa</surname> <given-names>H.</given-names></name> <name><surname>Niide</surname> <given-names>T.</given-names></name> <name><surname>Kameda</surname> <given-names>T.</given-names></name> <name><surname>Tsuda</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins</article-title>. <source>ACS Synth. Biol.</source> <volume>7</volume>, <fpage>2014</fpage>&#x2013;<lpage>2022</lpage>. doi: <pub-id pub-id-type="doi">10.1021/acssynbio.8b00155</pub-id>, PMID: <pub-id pub-id-type="pmid">30103599</pub-id></citation></ref>
<ref id="ref44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Santiago</surname> <given-names>M.</given-names></name> <name><surname>Ramirez-Sarmiento</surname> <given-names>C. A.</given-names></name> <name><surname>Zamora</surname> <given-names>R. A.</given-names></name> <name><surname>Parra</surname> <given-names>L. P.</given-names></name></person-group> (<year>2016</year>). <article-title>Discovery, molecular mechanisms, and industrial applications of cold-active enzymes</article-title>. <source>Front. Microbiol.</source> <volume>7</volume>:<fpage>1408</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fmicb.2016.01408</pub-id>, PMID: <pub-id pub-id-type="pmid">27667987</pub-id></citation></ref>
<ref id="ref45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sarmiento</surname> <given-names>F.</given-names></name> <name><surname>Peralta</surname> <given-names>R.</given-names></name> <name><surname>Blamey</surname> <given-names>J. M.</given-names></name></person-group> (<year>2015</year>). <article-title>Cold and hot Extremozymes: industrial relevance and current trends</article-title>. <source>Front. Bioeng. Biotechnol.</source> <volume>3</volume>:<fpage>148</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fbioe.2015.00148</pub-id>, PMID: <pub-id pub-id-type="pmid">26539430</pub-id></citation></ref>
<ref id="ref46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schr&#x00F8;der Leiros</surname> <given-names>H.-K.</given-names></name> <name><surname>Willassen</surname> <given-names>N. P.</given-names></name> <name><surname>Smal&#x00E5;s</surname> <given-names>A. O.</given-names></name></person-group> (<year>2000</year>). <article-title>Structural comparison of psychrophilic and mesophilic trypsins</article-title>. <source>Eur. J. Biochem.</source> <volume>267</volume>, <fpage>1039</fpage>&#x2013;<lpage>1049</lpage>. doi: <pub-id pub-id-type="doi">10.1046/j.1432-1327.2000.01098.x</pub-id>, PMID: <pub-id pub-id-type="pmid">10672012</pub-id></citation></ref>
<ref id="ref47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Senior</surname> <given-names>A. W.</given-names></name> <name><surname>Evans</surname> <given-names>R.</given-names></name> <name><surname>Jumper</surname> <given-names>J.</given-names></name> <name><surname>Kirkpatrick</surname> <given-names>J.</given-names></name> <name><surname>Sifre</surname> <given-names>L.</given-names></name> <name><surname>Green</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Improved protein structure prediction using potentials from deep learning</article-title>. <source>Nature</source> <volume>577</volume>, <fpage>706</fpage>&#x2013;<lpage>710</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41586-019-1923-7</pub-id></citation></ref>
<ref id="ref48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Siddiqui</surname> <given-names>K. S.</given-names></name> <name><surname>Cavicchioli</surname> <given-names>R.</given-names></name></person-group> (<year>2006</year>). <article-title>Cold-adapted enzymes</article-title>. <source>Annu. Rev. Biochem.</source> <volume>75</volume>, <fpage>403</fpage>&#x2013;<lpage>433</lpage>. doi: <pub-id pub-id-type="doi">10.1146/annurev.biochem.75.103004.142723</pub-id></citation></ref>
<ref id="ref49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Struvay</surname> <given-names>C.</given-names></name> <name><surname>Feller</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>Optimization to low temperature activity in psychrophilic enzymes</article-title>. <source>Int. J. Mol. Sci.</source> <volume>13</volume>, <fpage>11643</fpage>&#x2013;<lpage>11665</lpage>. doi: <pub-id pub-id-type="doi">10.3390/ijms130911643</pub-id>, PMID: <pub-id pub-id-type="pmid">23109875</pub-id></citation></ref>
<ref id="ref50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>S.</given-names></name> <name><surname>Ao</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Dong</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <article-title>The frequencies of oppositely charged, uncharged polar, and &#x03B2;-branched amino acids determine proteins&#x2019; thermostability</article-title>. <source>IEEE Access</source> <volume>8</volume>, <fpage>66839</fpage>&#x2013;<lpage>66845</lpage>. doi: <pub-id pub-id-type="doi">10.1109/access.2020.2985737</pub-id></citation></ref>
<ref id="ref51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taylor</surname> <given-names>T. J.</given-names></name> <name><surname>Vaisman</surname> <given-names>I. I.</given-names></name></person-group> (<year>2010</year>). <article-title>Discrimination of thermophilic and mesophilic proteins</article-title>. <source>BMC Struct. Biol.</source> <volume>10</volume>:<fpage>S5</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1472-6807-10-S1-S5</pub-id>, PMID: <pub-id pub-id-type="pmid">20487512</pub-id></citation></ref>
<ref id="ref52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tiberti</surname> <given-names>M.</given-names></name> <name><surname>Papaleo</surname> <given-names>E.</given-names></name></person-group> (<year>2011</year>). <article-title>Dynamic properties of extremophilic subtilisin-like serine-proteases</article-title>. <source>J. Struct. Biol.</source> <volume>174</volume>, <fpage>69</fpage>&#x2013;<lpage>83</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jsb.2011.01.006</pub-id>, PMID: <pub-id pub-id-type="pmid">21276854</pub-id></citation></ref>
<ref id="ref53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tribelli</surname> <given-names>P. M.</given-names></name> <name><surname>L&#x00F3;pez</surname> <given-names>N.</given-names></name></person-group> (<year>2018</year>). <article-title>Reporting key features in cold-adapted bacteria</article-title>. <source>Life</source> <volume>8</volume>:<fpage>8</fpage>. doi: <pub-id pub-id-type="doi">10.3390/life8010008</pub-id>, PMID: <pub-id pub-id-type="pmid">29534000</pub-id></citation></ref>
<ref id="ref54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Cao</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>J. Z. H.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name></person-group> (<year>2018</year>). <article-title>Computational protein design with deep learning neural networks</article-title>. <source>Sci. Rep.</source> <volume>8</volume>:<fpage>6349</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-018-24760-x</pub-id>, PMID: <pub-id pub-id-type="pmid">29679026</pub-id></citation></ref>
<ref id="ref55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X. F.</given-names></name> <name><surname>Gao</surname> <given-names>P.</given-names></name> <name><surname>Liu</surname> <given-names>Y. F.</given-names></name> <name><surname>Li</surname> <given-names>H. F.</given-names></name> <name><surname>Lu</surname> <given-names>F.</given-names></name></person-group> (<year>2020</year>). <article-title>Predicting Thermophilic proteins by machine learning</article-title>. <source>Curr. Bioinforma.</source> <volume>15</volume>, <fpage>493</fpage>&#x2013;<lpage>502</lpage>. doi: <pub-id pub-id-type="doi">10.2174/1574893615666200207094357</pub-id></citation></ref>
<ref id="ref56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Yang</surname> <given-names>L.</given-names></name> <name><surname>Fu</surname> <given-names>Z.</given-names></name> <name><surname>Xia</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Prediction of thermophilic protein with pseudo amino acid composition: an approach from combined feature selection and reduction</article-title>. <source>Protein Pept. Lett.</source> <volume>18</volume>, <fpage>684</fpage>&#x2013;<lpage>689</lpage>. doi: <pub-id pub-id-type="doi">10.2174/092986611795446085</pub-id>, PMID: <pub-id pub-id-type="pmid">21413920</pub-id></citation></ref>
<ref id="ref57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>W. L.</given-names></name> <name><surname>Chen</surname> <given-names>M. Y.</given-names></name> <name><surname>Tu</surname> <given-names>I. F.</given-names></name> <name><surname>Lin</surname> <given-names>Y. C.</given-names></name> <name><surname>EswarKumar</surname> <given-names>N.</given-names></name> <name><surname>Chen</surname> <given-names>M. Y.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>The discovery of novel heat-stable keratinases from Meiothermus taiwanensis WR-220 and other extremophiles</article-title>. <source>Sci. Rep.</source> <volume>7</volume>:<fpage>4658</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-017-04723-4</pub-id>, PMID: <pub-id pub-id-type="pmid">28680127</pub-id></citation></ref>
<ref id="ref58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Kan</surname> <given-names>S. B. J.</given-names></name> <name><surname>Lewis</surname> <given-names>R. D.</given-names></name> <name><surname>Wittmann</surname> <given-names>B. J.</given-names></name> <name><surname>Arnold</surname> <given-names>F. H.</given-names></name></person-group> (<year>2019</year>). <article-title>Machine learning-assisted directed protein evolution with combinatorial libraries</article-title>. <source>Proc. Natl. Acad. Sci. U. S. A.</source> <volume>116</volume>, <fpage>8852</fpage>&#x2013;<lpage>8858</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1901979116</pub-id>, PMID: <pub-id pub-id-type="pmid">30979809</pub-id></citation></ref>
<ref id="ref59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>K. K.</given-names></name> <name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Arnold</surname> <given-names>F. H.</given-names></name></person-group> (<year>2019</year>). <article-title>Machine-learning-guided directed evolution for protein engineering</article-title>. <source>Nat. Methods</source> <volume>16</volume>, <fpage>687</fpage>&#x2013;<lpage>694</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41592-019-0496-6</pub-id>, PMID: <pub-id pub-id-type="pmid">31308553</pub-id></citation></ref>
<ref id="ref60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>G. Y.</given-names></name> <name><surname>Fang</surname> <given-names>B. S.</given-names></name></person-group> (<year>2006a</year>). <article-title>Application of amino acid distribution along the sequence for discriminating mesophilic and thermophilic proteins</article-title>. <source>Process Biochem.</source> <volume>41</volume>, <fpage>1792</fpage>&#x2013;<lpage>1798</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.procbio.2006.03.026</pub-id></citation></ref>
<ref id="ref61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>G. Y.</given-names></name> <name><surname>Fang</surname> <given-names>B. S.</given-names></name></person-group> (<year>2006b</year>). <article-title>Discrimination of thermophilic and mesophilic proteins via pattern recognition methods</article-title>. <source>Process Biochem.</source> <volume>41</volume>, <fpage>552</fpage>&#x2013;<lpage>556</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.procbio.2005.09.003</pub-id></citation></ref>
<ref id="ref62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>G.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Fang</surname> <given-names>B.</given-names></name></person-group> (<year>2009</year>). <article-title>Discriminating acidic and alkaline enzymes using a random forest model with secondary structure amino acid composition</article-title>. <source>Process Biochem.</source> <volume>44</volume>, <fpage>654</fpage>&#x2013;<lpage>660</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.procbio.2009.02.007</pub-id></citation></ref>
<ref id="ref63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y. H.</given-names></name> <name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Lu</surname> <given-names>L.</given-names></name> <name><surname>Zeng</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Analysis of the sequence characteristics of antifreeze protein</article-title>. <source>Life (Basel)</source> <volume>11</volume>:<fpage>520</fpage>. doi: <pub-id pub-id-type="doi">10.3390/life11060520</pub-id>, PMID: <pub-id pub-id-type="pmid">34204983</pub-id></citation></ref>
<ref id="ref64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Z. B.</given-names></name> <name><surname>Xia</surname> <given-names>Y. L.</given-names></name> <name><surname>Dong</surname> <given-names>G. H.</given-names></name> <name><surname>Fu</surname> <given-names>Y. X.</given-names></name> <name><surname>Liu</surname> <given-names>S. Q.</given-names></name></person-group> (<year>2021</year>). <article-title>Exploring the cold-adaptation mechanism of serine Hydroxymethyltransferase by comparative molecular dynamics simulations</article-title>. <source>Int. J. Mol. Sci.</source> <volume>22</volume>:<fpage>1781</fpage>. doi: <pub-id pub-id-type="doi">10.3390/ijms22041781</pub-id>, PMID: <pub-id pub-id-type="pmid">33670090</pub-id></citation></ref>
<ref id="ref65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>X. X.</given-names></name> <name><surname>Wang</surname> <given-names>Y. B.</given-names></name> <name><surname>Pan</surname> <given-names>Y. J.</given-names></name> <name><surname>Li</surname> <given-names>W. F.</given-names></name></person-group> (<year>2008</year>). <article-title>Differences in amino acids composition and coupling patterns between mesophilic and thermophilic proteins</article-title>. <source>Amino Acids</source> <volume>34</volume>, <fpage>25</fpage>&#x2013;<lpage>33</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s00726-007-0589-x</pub-id>, PMID: <pub-id pub-id-type="pmid">17710363</pub-id></citation></ref></ref-list>
<fn-group>
<fn id="fn0004">
<p><sup>1</sup><ext-link xlink:href="https://github.com/ailanhuang/A-machine-learning-model-for-psychrophilic-proteins" ext-link-type="uri">https://github.com/ailanhuang/A-machine-learning-model-for-psychrophilic-proteins</ext-link>
</p>
</fn>
</fn-group>
</back>
</article>