<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Pharmacol.</journal-id>
<journal-title>Frontiers in Pharmacology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Pharmacol.</abbrev-journal-title>
<issn pub-type="epub">1663-9812</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1099093</article-id>
<article-id pub-id-type="doi">10.3389/fphar.2023.1099093</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Pharmacology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>DEEPCYPs: A deep learning platform for enhanced cytochrome P450 activity prediction</article-title>
<alt-title alt-title-type="left-running-head">Ai et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fphar.2023.1099093">10.3389/fphar.2023.1099093</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ai</surname>
<given-names>Daiqiao</given-names>
</name>
<xref ref-type="fn" rid="fn1">
<sup>&#x2020;</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cai</surname>
<given-names>Hanxuan</given-names>
</name>
<xref ref-type="fn" rid="fn1">
<sup>&#x2020;</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1591712/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wei</surname>
<given-names>Jiajia</given-names>
</name>
<xref ref-type="fn" rid="fn1">
<sup>&#x2020;</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhao</surname>
<given-names>Duancheng</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1591717/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chen</surname>
<given-names>Yihao</given-names>
</name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Wang</surname>
<given-names>Ling</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/610536/overview"/>
</contrib>
</contrib-group>
<aff>
<institution>Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering</institution>, <institution>Joint International Research Laboratory of Synthetic Biology and Medicine</institution>, <institution>Ministry of Education</institution>, <institution>Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals</institution>, <institution>School of Biology and Biological Engineering</institution>, <institution>South China University of Technology</institution>, <addr-line>Guangzhou</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/904991/overview">Xiangxiang Zeng</ext-link>, Hunan University, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1534875/overview">Grover Paul Miller</ext-link>, University of Arkansas for Medical Sciences, United States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1594884/overview">Noah Flynn</ext-link>, Amazon, United States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Ling Wang, <email>lingwang@scut.edu.cn</email>
</corresp>
<fn fn-type="equal" id="fn1">
<label>
<sup>&#x2020;</sup>
</label>
<p>These authors have contributed equally to this work</p>
</fn>
<fn fn-type="other">
<p>This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>10</day>
<month>04</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1099093</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>11</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>31</day>
<month>03</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Ai, Cai, Wei, Zhao, Chen and Wang.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Ai, Cai, Wei, Zhao, Chen and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Cytochrome P450 (CYP) is a superfamily of heme-containing oxidizing enzymes involved in the metabolism of a wide range of medicines, xenobiotics, and endogenous compounds. Five of the CYPs (1A2, 2C9, 2C19, 2D6, and 3A4) are responsible for metabolizing the vast majority of approved drugs. Adverse drug-drug interactions, many of which are mediated by CYPs, are one of the important causes for the premature termination of drug development and drug withdrawal from the market. In this work, we reported in silicon classification models to predict the inhibitory activity of molecules against these five CYP isoforms using our recently developed FP-GNN deep learning method. The evaluation results showed that, to the best of our knowledge, the multi-task FP-GNN model achieved the best predictive performance with the highest average AUC (0.905), F1 (0.779), BA (0.819), and MCC (0.647) values for the test sets, even compared to advanced machine learning, deep learning, and existing models. Y-scrambling testing confirmed that the results of the multi-task FP-GNN model were not attributed to chance correlation. Furthermore, the interpretability of the multi-task FP-GNN model enables the discovery of critical structural fragments associated with CYPs inhibition. Finally, an online webserver called DEEPCYPs and its local version software were created based on the optimal multi-task FP-GNN model to detect whether compounds bear potential inhibitory activity against CYPs, thereby promoting the prediction of drug-drug interactions in clinical practice and could be used to rule out inappropriate compounds in the early stages of drug discovery and/or identify new CYPs inhibitors.</p>
</abstract>
<kwd-group>
<kwd>cytochrome P450</kwd>
<kwd>multi-task FP-GNN</kwd>
<kwd>deep learning</kwd>
<kwd>online webserver</kwd>
<kwd>CYPs inhibitors</kwd>
</kwd-group>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">Natural Science Foundation of Guangdong Province<named-content content-type="fundref-id">10.13039/501100003453</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Cytochrome P450 (CYP) is a superfamily of heme-containing oxidase enzymes found in the smooth endoplasmic reticulum and mitochondria of liver cells and intestines (<xref ref-type="bibr" rid="B27">Neve and Ingelman-Sundberg, 2010</xref>). In humans, 57 CYP isoforms have been found to be involved in the oxidative metabolism of various xenobiotics as well as organic endogenous chemicals (<xref ref-type="bibr" rid="B2">Arimoto, 2006</xref>; <xref ref-type="bibr" rid="B31">Redlich et al., 2008</xref>). Five CYPs isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4) play crucial roles in approximately 90% of metabolic reactions (<xref ref-type="bibr" rid="B2">Arimoto, 2006</xref>). For example, CYP1A2 is responsible for metabolizing about 9% of clinically used drugs, such as antipsychotics and antibiotics (<xref ref-type="bibr" rid="B9">Chen et al., 2017</xref>). CYP2C9 contributes to the metabolism of around 15% of all medications, and plays an important role in the metabolism of routinely used pharmaceuticals such as non-steroidal anti-inflammatory drugs (NSAIDs) and warfarin (<xref ref-type="bibr" rid="B11">Daly et al., 2017</xref>; <xref ref-type="bibr" rid="B13">Goldwaser et al., 2022</xref>, 9). Many therapeutic drugs including clopidogrel, voriconazole, and proton pump inhibitors are metabolized by CYP2C19 (<xref ref-type="bibr" rid="B6">Botton et al., 2021</xref>, 19). Furthermore, CYP3A4 and CYP2D6 are responsible for approximately 30% and 20% of clinical drug metabolism, respectively (<xref ref-type="bibr" rid="B5">Boji&#x107; et al., 2019</xref>). Avoiding the inhibition of drug-metabolizing CYPs is a major challenge in drug development, as inhibiting these CYP isoforms may lead to drug-drug interactions and significant adverse effects. For example, Miguel and Albuquerque illustrated that most antitumor drugs are metabolized by CYP3A4, and their co-administration with antidepressants that inhibit CYP3A4 (e.g., sertraline, fluoxetine, fluvoxamine, and paroxetine) may result in cause loss of efficacy or increased toxicity (<xref ref-type="bibr" rid="B26">Miguel and Albuquerque, 2011</xref>). In 2016, over two million significant cases of adverse drug reactions were reported in the United States, of which approximately 26% were judged to be preventable drug-drug interactions (<xref ref-type="bibr" rid="B16">Ho et al., 2016</xref>; <xref ref-type="bibr" rid="B20">Le and Le, 2016</xref>). For example, Tateishi and coworkers reported the risk of hypoglycemia prompted by the combination of bucolome and glimepiride. Such hypoglycemia may be caused by CYP2C9-mediated drug interactions in combination with bucolome (<xref ref-type="bibr" rid="B35">Tateishi et al., 2021</xref>). Therefore, determining the potential for CYPs inhibition can help weed out underperforming drug candidates in the early drug discovery process to reduce the occurrence of termination of drug development programs, drug withdrawal from the market, or restriction of therapeutic use, which is crucial for drug discovery and development.</p>
<p>Various computational approaches have been used to predict or explore CYP-mediated metabolism and inhibition. It is difficult to accurately predict CYP450 inhibitors using structure-based techniques like molecular docking and pharmacophore mapping due to the flexible conformation of CYP450 (<xref ref-type="bibr" rid="B23">Li et al., 2018</xref>). In contrast, machine learning (ML)- and deep learning (DL)-based quantitative structure-activity relationship (QSAR) approaches, as the most popular ligand-based methods, are widely utilized to predict CYP450 inhibitors (<xref ref-type="bibr" rid="B37">Tyzack et al., 2016</xref>; <xref ref-type="bibr" rid="B41">Xiong et al., 2019</xref>). For example, previous studies often used conventional ML (CML) and DL methods to predict different CYP isoform inhibitors with different prediction accuracies (<xref ref-type="bibr" rid="B10">Cheng et al., 2011</xref>; <xref ref-type="bibr" rid="B33">Sun et al., 2011</xref>). Considering the high sequence homology and structural similarity of binding active sites in the CYP family (<xref ref-type="bibr" rid="B14">Graham and Peterson, 1999</xref>; <xref ref-type="bibr" rid="B33">Sun et al., 2011</xref>), multi-task models can simultaneously predict inhibitors of different CYP isoforms to provide better predictive power. In 2018, <xref ref-type="bibr" rid="B23">Li et al. (2018)</xref> constructed a multi-task DNN model for the five CYP isoenzymes with an average prediction accuracy of 88.7% for the external test datasets. In 2021, <xref ref-type="bibr" rid="B28">Nguyen-Vo et al. (2021)</xref>. developed iCYP-MFE to further improve the prediction accuracy of CYPs inhibitors using multitask learning and molecular fingerprint-embedded encoding .</p>
<p>Recently, we have developed a new DL architecture called FP-GNN (fingerprints and graph neural networks), which combined molecular graph with three molecular fingerprints to improve the ability of deep learning models to predict molecular properties (<xref ref-type="bibr" rid="B7">Cai et al., 2022</xref>). Herein, we used a multi-task FP-GNN DL architecture (<xref ref-type="fig" rid="F1">Figure 1</xref>) to construct classification model for predicting the inhibitory activity of molecules against five CYPs (1A2, 2C9, 2C19, 2D6, and 3A4), which achieved state-of-the-art performance compared to baseline predictive models based on four conventional machine learning methods, three deep learning algorithms, as well as two existing models. Moreover, Y-scrambling testing verified that the model results were not by chance. The interpretability analysis provided critical structural fragments associated with CYPs inhibition. An online webserver called DEEPCYPs (<ext-link ext-link-type="uri" xlink:href="https://deepcyps.idruglab.cn/">https://deepcyps.idruglab.cn/</ext-link>) and its local version python software (<ext-link ext-link-type="uri" xlink:href="https://github.com/idrugLab/FP-GNN_CYP">https://github.com/idrugLab/FP-GNN_CYP</ext-link>) were established to prioritize compounds in drug discovery to avoid adverse reactions and/or identify new CYP inhibitors.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Model construction pipeline.</p>
</caption>
<graphic xlink:href="fphar-14-1099093-g001.tif"/>
</fig>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>2 Materials and methods</title>
<sec id="s2-1">
<title>2.1 Dataset collection and preparation</title>
<p>We selected the modelling CYP inhibitors datasets reported by (<xref ref-type="bibr" rid="B28">Nguyen-Vo et al., 2021</xref>). The modelling datasets contain inhibitors toward five major CYP isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4). Briefly, chemical data were gathered from six datasets (AID 1851, AID 410, AID 883, AID 889, AID 891, and AID 884) from PubChem BioAssay Database (<xref ref-type="bibr" rid="B39">Wang et al., 2017</xref>), which contains 71,456 samples. The dataset AID 1851 contains compounds targeting five isoforms of CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4. Samples from datasets AID 410, AID 883, AID 899, AID 891, and AID 884 target compounds of the CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4 isoforms, respectively. Such bioassay data come from the same institution (National Center for Advancing Translational Sciences), which ensures consistent experimental protocols for gathering data and minimizing impact of noise. The datasets collected and processed by (<xref ref-type="bibr" rid="B28">Nguyen-Vo et al., 2021</xref>) were briefly described as follows: 1) Elimination of inorganics and mixtures; 2) Changing SMILES to canonical SMILES and discarding salts based on XlogP values; 3) Elimination of compounds with multiple structural patterns based on canonical SMILES to avoid incomplete duplication; and 4) Deduplication. Finally, the datasets containing 65,467 samples were obtained. The number of shared compounds was 4,352, which were present in the five data sets. To limit data leakage and make multitask benefits more interpretable, they adopted stringent structure-based data splitting method to generate training, validation, and test sets. (<xref ref-type="bibr" rid="B28">Nguyen-Vo et al., 2021</xref>) employed k-mean clustering with k &#x3d; 6 to divide the samples into six groups. They calculated the within-cluster sum of squared (WSS) errors with different k values by using the Elbow method and chose the k value with the smallest WSS. The validation and test sets for each isoform were created with 2,000 and 1,000 samples, respectively. The 50,467 remaining samples served as training data. The numbers of samples of training data for CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4 isoforms were 9,357, 9,244, 9,871, 10,284, and 11,711, respectively. (<xref ref-type="bibr" rid="B28">Nguyen-Vo et al., 2021</xref>). The final modelling datasets are freely available at (<ext-link ext-link-type="uri" xlink:href="https://github.com/idrugLab/FP-GNN_CYP">https://github.com/idrugLab/FP-GNN_CYP</ext-link>).</p>
</sec>
<sec id="s2-2">
<title>2.2 Multi-task FP-GNN framework and model training protocol</title>
<p>In this study, we used a multi-task FP-GNN framework for predicting inhibition of molecules against the five major CYP isoforms (CYP1A2, CYP2C19, CYP2C9, CYP2D6, and CYP3A4). The technological process of our work is described in <xref ref-type="fig" rid="F1">Figure 1</xref>. Recently, to better forecast molecular properties such as physicochemical properties, biological activities, and ADMET properties, we developed the FP-GNN DL algorithm to concurrently learn molecular graph information and mixed molecular fingerprints information (<xref ref-type="bibr" rid="B7">Cai et al., 2022</xref>). For one thing, the fingerprint-based network (FPN) module of the FP-GNN architecture uses an artificial neural network (ANN) to learn information from two substructure-based molecular fingerprints (PubChem FP and MACCS FP) as well as a pharmacophore-based fingerprint (Pharmacophore ErG FP). For another, the graph-based module of the FP-GNN architecture utilizes a spatial graph neural network (GNN) with an attention mechanism to acquire structural information in molecular graphs. The GNN module encodes pre-defined atomic and chemical bond information into molecular graph structure data, and the model communicates the information between surrounding atoms based on the molecular graph structure. It gradually expands over the entire molecular graph. Meanwhile, we use the attention mechanism to update the nodes, focusing on the interactions between surrounding atoms and atoms necessary for the relevant attributes in training. We combine knowledge of all atoms in a molecule to accurately predict its attributes. Finally, FP-GNN architecture employs full connect layers (FCL) to fuse the features from both GNN and FPN paths, and then outputs molecular property prediction results.</p>
<p>The FP-GNN deep learning algorithm (<ext-link ext-link-type="uri" xlink:href="https://github.com/idrugLab/FP-GNN">https://github.com/idrugLab/FP-GNN</ext-link>) we developed is a general QSAR modeling method that can be used to build predictive models to predict the properties of molecules, including physicochemical properties, biological activity and ADMET properties. Our lab reported the FP-GNN model achieved the best predictive performance on 13 public datasets (covering biological activities, physicochemical properties, physiology, and toxicity properties), an unbiased LIT-PCBA dataset, and 14 phenotypic screening datasets for breast cell lines (<xref ref-type="bibr" rid="B7">Cai et al., 2022</xref>). We successfully selected five compounds using the FP-GNN model to target cycle-dependent family kinase 9 (CDK9) inhibition and demonstrated good anti-cancer activity on eight tumor cells by <italic>in vitro</italic> cell assay. (<xref ref-type="bibr" rid="B43">Zhang et al., 2022</xref>). However, most datasets in drug discovery feature significant linkages between subtasks. If only a single-task model is used for modelling, data association information between subtasks would be lost. Therefore, we developed the multi-task FP-GNN framework to prevent data loss from subtasks, which was then successfully used to accurately predict inhibitors of four poly ADP-ribose polymerase (PARP) isoforms (<xref ref-type="bibr" rid="B1">Ai et al., 2022</xref>). In this study, we continue to extend the application of the multi-task FP-GNN method in predicting the inhibitory activity of molecules against five CYPs (1A2, 2C9, 2C19, 2D6, and 3A4, <xref ref-type="fig" rid="F1">Figure 1</xref>). Specifically, the multi-task FP-GNN uses a parameter-sharing multi-task learning approach, inherits the molecular graph and molecular fingerprints modules of the single-task FP-GNN model, and finally expands the fusion module into a multi-task output module (<xref ref-type="fig" rid="F1">Figure 1</xref>, middle). All subtasks share the weights of molecular graph and molecular fingerprint modules and extract common features of samples in subtasks. The multi-task output module of FP-GNN accepts the feature information from both GNN (molecular graph path) and ANN (fingerprints path), and then uses the data of different subtasks to optimize the weight of the network, and finally outputs the specific prediction results of different subtasks.</p>
<p>The Binary Cross Entropy loss function (BCELoss) is commonly used in binary classification tasks, where the goal is to predict a binary outcome (e.g., positive or negative). In the case of multitask learning, where there are multiple subtasks to be predicted, the BCELoss function can be used to calculate the loss for each subtask separately and then averaged to obtain the overall loss for the multitask model. The detailed Loss is expressed as follows:<disp-formula id="e1">
<mml:math id="m1">
<mml:mrow>
<mml:mfenced open="" close=")" separators="|">
<mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mfenced open="(" close="" separators="|">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mi mathvariant="italic">log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
<p>Where, <italic>n</italic> is the number of training molecules in each batch; <italic>Label</italic> is the real Label of the molecule. <italic>Pred</italic> is the molecular prediction result. For multi-task prediction, the loss function calculates the loss of each subtask and takes the average value as the total loss function of multitask.</p>
</sec>
<sec id="s2-3">
<title>2.3 The baseline machine learning and deep learning algorithms</title>
<p>We constructed fingerprint- and graph-based models (<xref ref-type="sec" rid="s10">Supplementary Table S1</xref>) to fairly compare the multi-task FP-GNN model in the CYPs inhibitors prediction tasks. Fingerprint-based prediction models were constructed based on the Morgan fingerprint (similar to ECFP, 1,024&#xa0;bits) using four CML algorithms, i.e., Naive Bayes (NB) (<xref ref-type="bibr" rid="B12">Duda and Hart, 1973</xref>), random forest (RF) (<xref ref-type="bibr" rid="B34">Svetnik et al., 2003</xref>), support vector machine (SVM) (<xref ref-type="bibr" rid="B42">Zernov et al., 2003</xref>), and extreme gradient boosting (XGBoost) (<xref ref-type="bibr" rid="B8">Chen and Guestrin, 2016</xref>) and one DL method, deep neural networks (DNN) (<xref ref-type="bibr" rid="B24">McCulloch and Pitts, 1943</xref>). Two DL algorithms were used to create graph-based prediction models, i.e., graph attention network (GAT) (<xref ref-type="bibr" rid="B38">Veli&#x10d;kovi&#x107; et al., 2018</xref>) and graph convolutional networks (GCN) (<xref ref-type="bibr" rid="B19">Kipf and Welling, 2017</xref>). A basic overview of these CML and DL techniques can be obtained elsewhere (<xref ref-type="bibr" rid="B40">Wu et al., 2017</xref>; <xref ref-type="bibr" rid="B15">He et al., 2021</xref>). All these CML and DL models, as well as FP-GNN models presented here were trained on the CPU (Intel(R) Xeon(R) Silver 4216 CPU @ 2.10&#xa0;GHz) and GPU (NVIDIA Corporation GV100GL [Tesla V100 PCIe 32&#xa0;GB]). Meanwhile, we compared the multi-task FP-GNN model to reported models, such as SuperCYPsPred (<xref ref-type="bibr" rid="B3">Banerjee et al., 2020</xref>) and iCYP-MFE (<xref ref-type="bibr" rid="B28">Nguyen-Vo et al., 2021</xref>).</p>
</sec>
<sec id="s2-4">
<title>2.4 Performance evaluation of models</title>
<p>The performance of the multi-task FP-GNN model, the baseline CML and DL models were evaluated using the following four metrics: the area under the receiver operating characteristic (AUC), F1-measure (F1 score), Matthews correlation coefficient (MCC), and balanced accuracy (BA). To evaluate the effectiveness of classification models (<xref ref-type="bibr" rid="B29">Niijima et al., 2012</xref>; <xref ref-type="bibr" rid="B22">Li et al., 2020</xref>; <xref ref-type="bibr" rid="B17">Jiang et al., 2021</xref>; <xref ref-type="bibr" rid="B28">Nguyen-Vo et al., 2021</xref>), we also used the AUC value to optimize and choose the best models. Such metrics are defined as follows:<disp-formula id="e2">
<mml:math id="m2">
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
<disp-formula id="e3">
<mml:math id="m3">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mfenced open="(" close=")" separators="|">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
<disp-formula id="e4">
<mml:math id="m4">
<mml:mrow>
<mml:mi>B</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>R</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
<disp-formula id="e5">
<mml:math id="m5">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
<disp-formula id="e6">
<mml:math id="m6">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>where TP, TN, FP, FN, SP, SE, TNR, and TPR represent the number of true positives, true negatives, false positives, false negatives, specificity, sensitivity, true negative rate, and true positive rate respectively.</p>
</sec>
<sec id="s2-5">
<title>2.5 Model applicability domain</title>
<p>The applicability domain (AD) analysis helps us to figure out whether the built QSAR model can be applied to any set of compounds (<xref ref-type="bibr" rid="B30">Peter et al., 2019</xref>). For AD analysis, we used the Euclidean distance-based method (DM), which is based on structural similarity. Here is the detailed formula:<disp-formula id="e7">
<mml:math id="m7">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>Z</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>where <italic>d</italic>
<sub>
<italic>ave</italic>
</sub> is the average Euclidean distance between each compound in the training set and its nearest k compounds. <italic>&#x3b8;</italic> is the corresponding standard deviation. <italic>Z</italic> is an optional parameter representing the significance level. First, RDKit software is used to calculate the Pharmacophore ErG, PubChem, and MACCS fingerprints of the test and training sets, and then the average of the Euclidean distance is calculated. For each molecule in the training set, <italic>d</italic>
<sub>
<italic>ave</italic>
</sub> and <italic>&#x3b8;</italic> are calculated from the Euclidean distances of the k nearest neighbors. Finally, the Euclidean distance between each molecule in the test set and the nearest neighbor molecule in the training set is determined. The compound is regarded to be outside the domain (OD) if the distance exceeds the threshold of <italic>D</italic>
<sub>
<italic>T</italic>
</sub>. Otherwise, it has entered the inside domain (ID). We utilize the test set to discover acceptable parameters k and Z, and then compute the threshold of the AD of the model.</p>
</sec>
</sec>
<sec sec-type="results|discussion" id="s3">
<title>3 Results and discussion</title>
<sec id="s3-1">
<title>3.1 Datasets analysis and model construction</title>
<p>
<xref ref-type="fig" rid="F2">Figure 2</xref> confirms strong correlations between the datasets of the five isoforms used for modelling, as they share a large number of common molecular entities. As shown in <xref ref-type="fig" rid="F3">Figure 3</xref>, the compounds in the CYPs modelling datasets were dispersed over a wide range of molecular weight (32.042&#x2013;1701.206) and LogP (&#x2212;15.231&#x2013;20.751), indicating that the compounds in the modelling datasets have a vast chemical space. Meanwhile, each isoform had a comparable distribution compared to the total dataset, indicating that the five CYP isoforms are closely linked and suitable for multitasking modelling processing. Furthermore, as shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, Bemis Murcko scaffold (<xref ref-type="bibr" rid="B4">Bemis and Murcko, 1996</xref>) analysis revealed that the fraction of scaffolds in the modeling datasets ranged from 22.33% to 25.47%, showing a significant structural diversity of compounds among the five CYP subtypes.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>The data occupation distribution for the five isoforms in the CYPs modelling datasets.</p>
</caption>
<graphic xlink:href="fphar-14-1099093-g002.tif"/>
</fig>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>The distribution of molecular chemical space of CYP1A2 <bold>(A)</bold>, CYP2C9 <bold>(B)</bold>, CYP2C19 <bold>(C)</bold>, CYP2D6 <bold>(D)</bold>, and CYP3A4 <bold>(E)</bold>. LogP (<italic>X-axis</italic>) and molecular weight (MW, <italic>Y-axis</italic>) were used to define chemical space. RDKit software was used to calculate MW and LogP.</p>
</caption>
<graphic xlink:href="fphar-14-1099093-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>The Bemis Murcko scaffold analysis of CYP1A2 (orange), CYP2C19 (green), CYP2C9 (purple), CYP2D6 (yellow), and CYP3A4 (blue) inhibitors. The five CYPs isoforms (<italic>X-axis</italic>) and the fraction of scaffolds in the modeling datasets (scaffolds/compounds (%), <italic>Y-axis</italic>) were used to determine structural diversity.</p>
</caption>
<graphic xlink:href="fphar-14-1099093-g004.tif"/>
</fig>
</sec>
<sec id="s3-2">
<title>3.2 Performance of the multi-task FP-GNN model on CYPs datasets</title>
<p>The comparison results of the multi-task FP-GNN model with other advanced CML, DL, and reported models are shown in <xref ref-type="table" rid="T1">Table 1</xref>; <xref ref-type="sec" rid="s10">Supplementary Tables S2&#x2013;S4</xref>. <xref ref-type="table" rid="T1">Table 1</xref>; <xref ref-type="sec" rid="s10">Supplementary Tables S2&#x2013;S4</xref> illustrate that the multi-task FP-GNN model achieves the best overall performance on these five CYP isoforms, with the highest average AUC (0.905), F1 (0.779), BA (0.819), and MCC (0.647) values for the test sets. Specifically, taking the AUC value as the main evaluation metric, in four of the five subtypes (CYP1A2, CYP2C9, CYP2C19, and CYP3A4), the multi-task FP-GNN model ranked first in terms of predictive performance. Meanwhile, the multi-task FP-GNN model achieved second-ranked predictive performance on CYP2D6 (AUC &#x3d; 0.883). The single-task and multitask models of iCYP-MFE have the best performance on CYP2D6, while SuperCYPsPred based on the Morgan fingerprints performed as well on CYP2C19 as the multi-task FP-GNN model. In addition, <xref ref-type="sec" rid="s10">Supplementary Tables S2&#x2013;S4</xref> indicate that the multi-task FP-GNN model achieves the best-performance on other metrics. For example, in three of the five subtypes (CYP2C9, CYP2C19, and CYP3A4), the multi-task FP-GNN model ranked first in terms of F1, BA, and MCC values. Such results show that, compared with the current advanced CML, DL, as well as the existing multi-task models, the multi-task FP-GNN model presented here exhibits the state-of-the-art (SOTA) performance in predicting the inhibitory activity of compounds against the five CYPs isoforms. Furthermore, the compounds from the test set were not mispredicted for all targets (<xref ref-type="sec" rid="s10">Supplementary Figure S1</xref>). The optimal set of hyperparameters for each CYP isoform is provided in <xref ref-type="sec" rid="s10">Supplementary Table S5</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>The AUC value of FP-GNN on CYPs dataset compared to other baseline models.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Model</th>
<th align="center">CYP1A2</th>
<th align="center">CYP2C9</th>
<th align="center">CYP2C19</th>
<th align="center">CYP2D6</th>
<th align="center">CYP3A4</th>
<th align="center">AVE</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">SuperCYP-MACCS<sup>31</sup>
</td>
<td align="center">0.820</td>
<td align="center">0.790</td>
<td align="center">0.880</td>
<td align="center">0.880</td>
<td align="center">0.870</td>
<td align="center">0.848</td>
</tr>
<tr>
<td align="center">SuperCYP-Morgan<sup>31</sup>
</td>
<td align="center">0.830</td>
<td align="center">0.870</td>
<td align="center">
<bold>0.900</bold>
</td>
<td align="center">0.880</td>
<td align="center">0.880</td>
<td align="center">0.872</td>
</tr>
<tr>
<td align="center">iCYP-MFE (single)<sup>19</sup>
</td>
<td align="center">0.900</td>
<td align="center">0.850</td>
<td align="center">0.860</td>
<td align="center">
<bold>0.930</bold>
</td>
<td align="center">0.880</td>
<td align="center">0.884</td>
</tr>
<tr>
<td align="center">iCYP-MFE (multi)<sup>19</sup>
</td>
<td align="center">0.910</td>
<td align="center">0.890</td>
<td align="center">0.860</td>
<td align="center">
<bold>0.930</bold>
</td>
<td align="center">0.890</td>
<td align="center">0.896</td>
</tr>
<tr>
<td align="center">DNN::Morgan</td>
<td align="center">0.904</td>
<td align="center">0.878</td>
<td align="center">0.887</td>
<td align="center">0.848</td>
<td align="center">0.883</td>
<td align="center">0.880</td>
</tr>
<tr>
<td align="center">RF::Morgan</td>
<td align="center">0.910</td>
<td align="center">0.891</td>
<td align="center">0.881</td>
<td align="center">0.867</td>
<td align="center">0.891</td>
<td align="center">0.888</td>
</tr>
<tr>
<td align="center">SVM::Morgan</td>
<td align="center">0.909</td>
<td align="center">0.856</td>
<td align="center">0.898</td>
<td align="center">0.838</td>
<td align="center">0.884</td>
<td align="center">0.877</td>
</tr>
<tr>
<td align="center">NB::Morgan</td>
<td align="center">0.848</td>
<td align="center">0.826</td>
<td align="center">0.822</td>
<td align="center">0.816</td>
<td align="center">0.829</td>
<td align="center">0.828</td>
</tr>
<tr>
<td align="center">GCN</td>
<td align="center">0.921</td>
<td align="center">0.860</td>
<td align="center">0.886</td>
<td align="center">0.875</td>
<td align="center">0.900</td>
<td align="center">0.889</td>
</tr>
<tr>
<td align="center">XGB::Morgan</td>
<td align="center">0.888</td>
<td align="center">0.857</td>
<td align="center">0.868</td>
<td align="center">0.842</td>
<td align="center">0.864</td>
<td align="center">0.864</td>
</tr>
<tr>
<td align="center">GAT</td>
<td align="center">0.928</td>
<td align="center">0.888</td>
<td align="center">0.885</td>
<td align="center">0.861</td>
<td align="center">0.896</td>
<td align="center">0.891</td>
</tr>
<tr>
<td align="center">FP-GNN (single)</td>
<td align="center">0.928</td>
<td align="center">0.893</td>
<td align="center">0.879</td>
<td align="center">0.881</td>
<td align="center">0.907</td>
<td align="center">0.897</td>
</tr>
<tr>
<td align="center">FP-GNN (multi)</td>
<td align="center">
<bold>0.930</bold>
</td>
<td align="center">
<bold>0.902</bold>
</td>
<td align="center">
<bold>0.900</bold>
</td>
<td align="center">0.883</td>
<td align="center">
<bold>0.911</bold>
</td>
<td align="center">
<bold>0.905</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Bold font illustrates the models that outperformed all other models.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>In addition to the multi-task FP-GNN model, the single-task FP-GNN model also exhibits good and/or comparable performance results, achieving the second-ranked overall predictive performance on the CYPs modelling datasets with higher average AUC (0.897), F1 (0.773), BA (0.812), and MCC (0.631) values. Specifically, the single-task FP-GNN model performed best on three CYP isoforms (CYP1A2, CYP2C9, and CYP3A4) compared to other CML, DL, as well as the existing multi-task models. Clearly, the FP-GNN model without the multi-task module still showed superior prediction performance on these five CYPs isoforms, indicating the superiority of the FP-GNN DL algorithm.</p>
<p>Although the FP-GNN model showed remarkable predictive performance in both single-task and multi-task models, the multi-task FP-GNN model outperformed the single-task FP-GNN model in CYPs inhibitors prediction task. The five CYPs isoforms datasets are highly correlated (<xref ref-type="fig" rid="F1">Figure 1</xref>), and the multi-task FP-GNN model can capture relevant information among subtasks, thereby significantly improving the performance of the model.</p>
<p>Y-scrambling testing was used to demonstrate that the results were not attributed to chance correlation. As illustrated in <xref ref-type="sec" rid="s10">Supplementary Figure S2</xref>, the AUC values of the multi-task FP-GNN model were significantly higher than those of any of the Y-scrambled models, confirming that the results were not chance correlations.</p>
</sec>
<sec id="s3-3">
<title>3.3 Model applicability domain</title>
<p>The amounts of compounds outside the AD in the test sets at different Z and k values are shown in <xref ref-type="sec" rid="s10">Supplementary Table S6</xref>. It can be shown that when Z values increased and k remained constant, the number of compounds outside the AD decreased. Afterward, the multi-task FP-GNN model was used to predict the ID and OD chemicals in the test sets at various k and Z values, and the detailed performance of each data set is presented in <xref ref-type="sec" rid="s10">Supplementary Table S7</xref>. We found that when k &#x3d; 3, <italic>Z</italic> &#x3d; 0.2, the overall evaluation metrics of the model were improved, and it was able to discriminate between ID and OD compounds of the CYP datasets to the maximum extent. The predictive performance of ID compounds (AUC &#x3d; 0.920, F1 &#x3d; 0.804, BA &#x3d; 0.842, and MCC &#x3d; 0.689) was significantly better than that of OD compounds (AUC &#x3d; 0.852, F1 &#x3d; 0.692, BA &#x3d; 0.748, and MCC &#x3d; 0.510). The results showed that our defined AD is appropriate for the suggested multi-task FP-GNN model and it might help the model serve more properly in real-world situations.</p>
</sec>
<sec id="s3-4">
<title>3.4 Interpretation of the multi-task FP-GNN model</title>
<p>To understand the multi-task FP-GNN model for the prediction of CYP inhibitors, we accomplished an interpretation of its GNN and FPN modules. Taking an active molecule (Miconazole, CHEMBL91, <xref ref-type="fig" rid="F5">Figure 5A</xref>) and an inactive molecule (<xref ref-type="fig" rid="F5">Figure 5B</xref>) as examples, the multi-task FP-GNN architecture can calculate the attention coefficients of neighboring atoms and map them to the bonds that connect them. Chemical fragments contribute more to the prediction of CYPs inhibitory activity when the attention coefficient for the molecule is higher. In other words, the portions of the molecule colored more darkly were more essential in predicting whether the molecule can inhibit CYPs, and <italic>vice versa</italic>.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>The importance of molecular structures during the prediction process of the GNN module of the multi-task FP-GNN model on the CYPs dataset. The darker the color, the more important are for the structures. <bold>(A)</bold> Represents an active molecule on the five isoforms. <bold>(B)</bold> Represents an inactive molecule on the five isoforms.</p>
</caption>
<graphic xlink:href="fphar-14-1099093-g005.tif"/>
</fig>
<p>In addition to the GNN module, we also investigated the interpretation of the FPN module on the CYP modelling datasets. <xref ref-type="table" rid="T2">Table 2</xref> summarizes the top ten most significant bits, which represent important structural fragments or pharmacophore feature information that contribute greatly to the inhibitory activity of CYPs. Collectively, these fragments may facilitate in the design and optimization of novel CYPs inhibitors.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>The top ten significant bits from the FPN module of the multi-task FP-GNN model on the CYPs datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Rank</th>
<th align="center">Importance</th>
<th align="center">Mixed FP Bit</th>
<th align="center">FP Class</th>
<th align="center">Meaning</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">1</td>
<td align="center">0.00210</td>
<td align="center">35</td>
<td align="center">MACCS</td>
<td align="center">CH2 &#x3d; A</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">0.00184</td>
<td align="center">31</td>
<td align="center">MACCS</td>
<td align="center">CQ(C) (C)A</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">0.00180</td>
<td align="center">489</td>
<td align="center">Pharmacophore ErG</td>
<td align="center">(&#x2018;Negative&#x2019;, &#x2018;Negative&#x2019;, 7)</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">0.00177</td>
<td align="center">493</td>
<td align="center">Pharmacophore ErG</td>
<td align="center">(&#x2018;Negative&#x2019;, &#x2018;Negative&#x2019;, 11)</td>
</tr>
<tr>
<td align="center">5</td>
<td align="center">0.00173</td>
<td align="center">640</td>
<td align="center">PubChem</td>
<td align="center">&#x3e;&#x3d; 2&#xa0;P</td>
</tr>
<tr>
<td align="center">6</td>
<td align="center">0.00164</td>
<td align="center">369</td>
<td align="center">Pharmacophore ErG</td>
<td align="center">(&#x2018;Acceptor&#x2019;, &#x2018;Hydrophobic&#x2019;, 13)</td>
</tr>
<tr>
<td align="center">7</td>
<td align="center">0.00153</td>
<td align="center">956</td>
<td align="center">PubChem</td>
<td align="center">C (&#x223c;C) (&#x223c;H) (&#x223c;O) (&#x223c;O)</td>
</tr>
<tr>
<td align="center">8</td>
<td align="center">0.00139</td>
<td align="center">371</td>
<td align="center">Pharmacophore ErG</td>
<td align="center">(&#x2018;Acceptor&#x2019;, &#x2018;Hydrophobic&#x2019;, 15)</td>
</tr>
<tr>
<td align="center">9</td>
<td align="center">0.00122</td>
<td align="center">612</td>
<td align="center">PubChem</td>
<td align="center">&#x3e;&#x3d; 32&#xa0;H</td>
</tr>
<tr>
<td align="center">10</td>
<td align="center">0.00119</td>
<td align="center">32</td>
<td align="center">MACCS</td>
<td align="center">QX</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Q: atom of non-C, or non-H.</p>
</fn>
<fn>
<p>X: atom of other than H, C, N, O, Si, P, S, F, Cl, Br, I.</p>
</fn>
<fn>
<p>A: any valid periodic table element symbol.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3-5">
<title>3.5 Webserver construction and use</title>
<p>DEEPCYPs, an online platform for the prediction of cytochrome activity, was constructed based on the established multitask FP-GNN model. Users can draw a structure online, input or upload structures in SMILES format to conveniently predict the inhibitory activity and selectivity of molecules against five major CYP isoforms (e.g., CYP1A2, CYP2C19, CYP2C9, CYP2D6, and CYP3A4) (<xref ref-type="fig" rid="F6">Figure 6A</xref>, left). Existing machine learning-based predictive models, including our multi-task FP-GNN model, are classification models that can only assess the likelihood of inhibiting CYPs (i.e., probability score, 0&#x2013;1) for compounds of interest. The 0.5 threshold is used to determine whether or not a molecule inhibits CYP. Based on the predicted score, DEEPCYPs can be used to assess the relative inhibitory potential of compounds against specific CYP subtypes. The higher the score, the more likely it is that the subtype will be suppressed.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>
<bold>(A)</bold> Represents the bioactivity prediction diagram of the DEEPCYPs. <bold>(B)</bold> Represents a case result display of the model applicability domain module of the DEEPCYPs. The chemical structure of thiabendazole is used as an example.</p>
</caption>
<graphic xlink:href="fphar-14-1099093-g006.tif"/>
</fig>
<p>(<xref ref-type="bibr" rid="B3">Banerjee et al., 2020</xref>) constructed SuperCYP using the RF model with two types of molecular fingerprints (Morgan and MACCS). iCYP-MFE (<xref ref-type="bibr" rid="B28">Nguyen-Vo et al., 2021</xref>) was developed by multi-task convolutional neural networks combined with molecular fingerprint embedding features. Compared to SuperCYP, developing the DEEPCYPs dataset is larger than SuperCYP, which is important for the improvement of the model performance. Compared with SuperCYP and iCYP-MFE, DEEPCYP can not only learn effective and complementary information from molecular graph and molecular fingerprints but also learn the information dependent on each other in relevant data sets, which is beneficial to improve the prediction accuracy of the model. Furthermore, unlike SuperCYP and iCYP-MFE, users can obtain the distance between the input molecule and the k-nearest training instances on the Model Applicability Domain module of the DEEPCYPs (<xref ref-type="fig" rid="F6">Figure 6B</xref>). Detailed comparison of DEEPCYPs with the advanced existing models such as SuperCYP and iCYP-MFE is shown in <xref ref-type="table" rid="T3">Table 3</xref>. Clearly, DEEPCYPs shows advantages in terms of accuracy, functionality and ease of use.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>The detailed comparison of DEEPCYPs and the most advanced existing models (SuperCYP and iCYP-MFE).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="center">Model</th>
<th colspan="3" align="center">Average values</th>
<th rowspan="2" align="center">Interpretation</th>
<th rowspan="2" align="center">Model applicability domain</th>
<th rowspan="2" align="center">Webserver</th>
</tr>
<tr>
<th align="center">AUC<sup>a</sup>
</th>
<th align="center">F1<sup>b</sup>
</th>
<th align="center">BA<sup>c</sup>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3" align="center">DEEPCYPs (Our)</td>
<td rowspan="3" align="center">
<bold>0.905</bold>
</td>
<td rowspan="3" align="center">
<bold>0.779</bold>
</td>
<td rowspan="3" align="center">
<bold>0.819</bold>
</td>
<td rowspan="3" align="center">
<bold>Yes</bold>
</td>
<td rowspan="3" align="center">
<bold>Yes</bold>
</td>
<td align="left">&#x25a0; Input format: draw a structure online, input or upload structures in SMILES.</td>
</tr>
<tr>
<td align="left">&#x25a0; Prediction: realize both single-molecule prediction and batch-molecules prediction simultaneously</td>
</tr>
<tr>
<td align="left">&#x25a0; Model Applicability Domain: users can get the distance between the input molecule and the k-nearest training instances</td>
</tr>
<tr>
<td align="center">iCYP-MFE (multi)</td>
<td align="center">0.896</td>
<td align="center">0.754</td>
<td align="center">0.796</td>
<td align="center">No</td>
<td align="center">No</td>
<td align="center">Yes, but not work</td>
</tr>
<tr>
<td align="center">SuperCYP-MACCS</td>
<td align="center">0.848</td>
<td align="center">0.572</td>
<td align="center">0.710</td>
<td rowspan="2" align="center">No</td>
<td rowspan="2" align="center">No</td>
<td align="center">&#x25a0; Input format: input structures in SMILES.</td>
</tr>
<tr>
<td align="center">SuperCYP-Morgan</td>
<td align="center">0.872</td>
<td align="center">0.538</td>
<td align="center">0.704</td>
<td align="center">&#x25a0; Prediction: single-molecule prediction</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>
<sup>a</sup>AUC: The area under receiver operating characteristic.</p>
</fn>
<fn>
<p>
<sup>b</sup>F1: F1-measure.</p>
</fn>
<fn>
<p>
<sup>c</sup>BA: balanced accuracy.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>To validate website prediction performance, we chose molecules within the dataset (thiabendazole and fenofibrate) and molecules outside the dataset (quinidine and telithromycin) that have been reported to be CYP-related inhibitors. Taking thiabendazole as an example (<xref ref-type="fig" rid="F6">Figure 6A</xref>, right), it has a predicted score of 0.998 in the CYP1A2 model, indicating that it has a strong inhibitory effect on the CYP1A2 isoform. Indeed, thiabendazole is an effective and specific inhibitor of CYP1A2 (IC<sub>50</sub> &#x3d; 0.830&#xa0;&#x3bc;M) (<xref ref-type="bibr" rid="B36">Thelingwani et al., 2009</xref>), proving the accuracy and usability of the DEEPCYPs webserver. In addition, the bioactivity prediction results of quinidine (a potent CYP2D6 inhibitor, IC<sub>50</sub> &#x3d; 0.156&#xa0;&#x3bc;M) (<xref ref-type="bibr" rid="B25">McLaughlin et al., 2005</xref>; <xref ref-type="bibr" rid="B18">Kang et al., 2019</xref>), telithromycin (a potent CYP3A4 inhibitor, IC<sub>50</sub> &#x3d; 11.800&#xa0;&#x3bc;M) (<xref ref-type="bibr" rid="B21">Li et al., 2019</xref>), and fenofibrate (an effective CYP2C19/2C9 inhibitor; CYP2C19, IC<sub>50</sub> &#x3d; 0.200 &#x3bc;M; CYP2C9, IC<sub>50</sub> &#x3d; 9.700&#xa0;&#x3bc;M) (<xref ref-type="bibr" rid="B32">Schelleman et al., 2014</xref>) by the DEEPCYPs are shown in <xref ref-type="sec" rid="s10">Supplementary Figure S3</xref>. The predicted results are generally consistent with real-world drug inhibitory effects, indicating that the DEEPCYPs webserver can not only predict whether the compound has an inhibitory effect on individual CYP450 isoform but also predict whether the compound is selective for dual CYP450 isoforms. However, we must declare that DEEPCYPs can only give prediction results, but it does not mean that the predictions are correct. The predictions can be combined with experiments for further verification. The presence of predictive models is that large-scale compound libraries can be quickly evaluated, and models can outline which chemical fragments are more likely to produce CYP inhibitors, which can help optimize subsequent lead compounds.</p>
</sec>
</sec>
<sec sec-type="conclusion" id="s4">
<title>4 Conclusion</title>
<p>The multi-task FP-GNN was used for the prediction of CYPs inhibitors, which outperformed the baseline models, such as Morgan fingerprint-based ML models (i.e., NB, RF, SVM, XGBoost, and DNN), graph-based DL models (i.e., GAT and GCN), and current reported models (i.e., SuperCYP and iCYP-MFE). Therefore, we constructed DEEPCYPs, a user-friendly webserver for predicting the inhibitory activity of molecules against the five CYP isoforms (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4) based on the multi-task FP-GNN model. We anticipate that DEEPCYPs and its python software can support scientific communities in prioritizing molecules in drug discovery practice and/or identifying CYP inhibitors.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="sec" rid="s10">Supplementary Material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s6">
<title>Author contributions</title>
<p>LW conceived and designed the experiments. DA, HC, JW, DZ, and YC contributed to the literature search, data collection, implemented the algorithm, and created the web-server. DA performed the analysis and wrote the manuscript. LW offered support and critically revised the manuscript. All authors have read and agreed to the published version of the manuscript.</p>
</sec>
<sec id="s7">
<title>Funding</title>
<p>This work was supported in part by the Natural Science Foundation of Guangdong Province (2023B1515020042) and the National Natural Science Foundation of China (81973241).</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s10">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fphar.2023.1099093/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fphar.2023.1099093/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="Table1.DOCX" id="SM1" mimetype="application/DOCX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ai</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>A multi-task FP-GNN framework enables accurate prediction of selective PARP inhibitors</article-title>. <source>Front. Pharmacol.</source> <volume>13</volume>, <fpage>971369</fpage>. <pub-id pub-id-type="doi">10.3389/fphar.2022.971369</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arimoto</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Computational models for predicting interactions with cytochrome p450 enzyme</article-title>. <source>Curr. Top. Med. Chem.</source> <volume>6</volume>, <fpage>1609</fpage>&#x2013;<lpage>1618</lpage>. <pub-id pub-id-type="doi">10.2174/156802606778108951</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Banerjee</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Dunkel</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kemmler</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Preissner</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>SuperCYPsPred&#x2014;A web server for the prediction of cytochrome activity</article-title>. <source>Nucleic Acids Res.</source> <volume>48</volume>, <fpage>W580</fpage>&#x2013;<lpage>W585</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkaa166</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bemis</surname>
<given-names>G. W.</given-names>
</name>
<name>
<surname>Murcko</surname>
<given-names>M. A.</given-names>
</name>
</person-group> (<year>1996</year>). <article-title>The properties of known drugs. 1. Molecular frameworks</article-title>. <source>J. Med. Chem.</source> <volume>39</volume>, <fpage>2887</fpage>&#x2013;<lpage>2893</lpage>. <pub-id pub-id-type="doi">10.1021/jm9602928</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boji&#x107;</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kond&#x17e;a</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Rimac</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Benkovi&#x107;</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Male&#x161;</surname>
<given-names>&#x17d;.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>The effect of flavonoid aglycones on the CYP1A2, CYP2A6, CYP2C8 and CYP2D6 enzymes activity</article-title>. <source>Molecules</source> <volume>24</volume>, <fpage>3174</fpage>. <pub-id pub-id-type="doi">10.3390/molecules24173174</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Botton</surname>
<given-names>M. R.</given-names>
</name>
<name>
<surname>Whirl&#x2010;Carrillo</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Del Tredici</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Sangkuhl</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Cavallari</surname>
<given-names>L. H.</given-names>
</name>
<name>
<surname>Ag&#xfa;ndez</surname>
<given-names>J. A. G.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>PharmVar GeneFocus: CYP2C19</article-title>. <source>Clin. Pharmacol. Ther.</source> <volume>109</volume>, <fpage>352</fpage>&#x2013;<lpage>366</lpage>. <pub-id pub-id-type="doi">10.1002/cpt.1973</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cai</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>FP-GNN: A versatile deep learning architecture for enhanced molecular property prediction</article-title>. <source>Brief. Bioinform.</source> <volume>23</volume>, <fpage>bbac408</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbac408</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Guestrin</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>XGBoost: A scalable tree boosting system</article-title>,&#x201d; in <source>Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source> (<publisher-loc>San Francisco California USA</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>785</fpage>&#x2013;<lpage>794</lpage>. <pub-id pub-id-type="doi">10.1145/2939672.2939785</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tolleson</surname>
<given-names>W. H.</given-names>
</name>
<name>
<surname>Knox</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>The expression, induction and pharmacological activity of CYP1A2 are post-transcriptionally regulated by microRNA hsa-miR-132-5p</article-title>. <source>Biochem. Pharmacol.</source> <volume>145</volume>, <fpage>178</fpage>&#x2013;<lpage>191</lpage>. <pub-id pub-id-type="doi">10.1016/j.bcp.2017.08.012</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cheng</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). <article-title>Classification of cytochrome P450 inhibitors and noninhibitors using combined classifiers</article-title>. <source>J. Chem. Inf. Model.</source> <volume>51</volume>, <fpage>996</fpage>&#x2013;<lpage>1011</lpage>. <pub-id pub-id-type="doi">10.1021/ci200028n</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Daly</surname>
<given-names>A. K.</given-names>
</name>
<name>
<surname>Rettie</surname>
<given-names>A. E.</given-names>
</name>
<name>
<surname>Fowler</surname>
<given-names>D. M.</given-names>
</name>
<name>
<surname>Miners</surname>
<given-names>J. O.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Pharmacogenomics of CYP2C9: Functional and clinical considerations</article-title>. <source>J. Pers. Med.</source> <volume>8</volume>, <fpage>1</fpage>. <pub-id pub-id-type="doi">10.3390/jpm8010001</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Duda</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Hart</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1973</year>). <source>Pattern classification and scene analysis</source>. <publisher-name>A Wiley-Interscience publication</publisher-name>. <pub-id pub-id-type="doi">10.2307/2286028</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goldwaser</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Laurent</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Lagarde</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Fabrega</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Nay</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Villoutreix</surname>
<given-names>B. O.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Machine learning-driven identification of drugs inhibiting cytochrome P450 2C9</article-title>. <source>PLOS Comput. Biol.</source> <volume>18</volume>, <fpage>e1009820</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1009820</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Graham</surname>
<given-names>S. E.</given-names>
</name>
<name>
<surname>Peterson</surname>
<given-names>J. A.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>How similar are P450s and what can their differences teach us?</article-title> <source>Arch. Biochem. Biophys.</source> <volume>369</volume>, <fpage>24</fpage>&#x2013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1006/abbi.1999.1350</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ling</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Machine learning enables accurate and rapid prediction of active molecules against breast cancer cells</article-title>. <source>Front. Pharmacol.</source> <volume>12</volume>, <fpage>796534</fpage>. <pub-id pub-id-type="doi">10.3389/fphar.2021.796534</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ho</surname>
<given-names>T. B.</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Thai</surname>
<given-names>D. T.</given-names>
</name>
<name>
<surname>Taewijit</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Data-driven approach to detect and predict adverse drug reactions</article-title>. <source>Curr. Pharm. Des.</source> <volume>22</volume>, <fpage>3498</fpage>&#x2013;<lpage>3526</lpage>. <pub-id pub-id-type="doi">10.2174/1381612822666160509125047</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Hsieh</surname>
<given-names>C. Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models</article-title>. <source>J. Cheminformatics</source> <volume>13</volume>, <fpage>12</fpage>. <pub-id pub-id-type="doi">10.1186/s13321-020-00479-8</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ginex</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Luque</surname>
<given-names>F. J.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Identification of dihydrofuro[3,4-d]pyrimidine derivatives as novel HIV-1 non-nucleoside reverse transcriptase inhibitors with promising antiviral activities and desirable physicochemical properties</article-title>. <source>J. Med. Chem.</source> <volume>62</volume>, <fpage>1484</fpage>&#x2013;<lpage>1501</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jmedchem.8b01656</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kipf</surname>
<given-names>T. N.</given-names>
</name>
<name>
<surname>Welling</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Semi-supervised classification with graph convolutional networks</article-title>. <comment>arXiv</comment>. <pub-id pub-id-type="doi">10.48550/arXiv.1609.02907</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le</surname>
<given-names>D. H.</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Systems Pharmacology: A unified framework for prediction of drug-target interactions</article-title>. <source>Curr. Pharm. Des.</source> <volume>22</volume>, <fpage>3569</fpage>&#x2013;<lpage>3575</lpage>. <pub-id pub-id-type="doi">10.2174/1381612822666160418121534</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>X. M.</given-names>
</name>
<name>
<surname>Lv</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>S. Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y. X.</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>B. Z.</given-names>
</name>
<name>
<surname>Cushman</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Synthesis and structure-bactericidal activity relationships of non-ketolides: 9-Oxime clarithromycin 11,12-cyclic carbonate featured with three-to eight-atom-length spacers at 3-OH</article-title>. <source>Eur. J. Med. Chem.</source> <volume>171</volume>, <fpage>235</fpage>&#x2013;<lpage>254</lpage>. <pub-id pub-id-type="doi">10.1016/j.ejmech.2019.03.037</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xiong</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>Z.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Deep learning enhancing kinome-wide polypharmacology profiling: Model construction and experiment validation</article-title>. <source>J. Med. Chem.</source> <volume>63</volume>, <fpage>8723</fpage>&#x2013;<lpage>8737</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jmedchem.9b00855</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lai</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Pei</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Prediction of human cytochrome P450 inhibition using a multitask deep autoencoder neural network</article-title>. <source>Mol. Pharm.</source> <volume>15</volume>, <fpage>4336</fpage>&#x2013;<lpage>4345</lpage>. <pub-id pub-id-type="doi">10.1021/acs.molpharmaceut.8b00110</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McCulloch</surname>
<given-names>W. S.</given-names>
</name>
<name>
<surname>Pitts</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>1943</year>). <article-title>A logical calculus of the ideas immanent in nervous activity</article-title>. <source>Bull. Math. Biophys.</source> <volume>5</volume>, <fpage>115</fpage>&#x2013;<lpage>133</lpage>. <pub-id pub-id-type="doi">10.1007/BF02478259</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McLaughlin</surname>
<given-names>L. A.</given-names>
</name>
<name>
<surname>Paine</surname>
<given-names>M. J. I.</given-names>
</name>
<name>
<surname>Kemp</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Mar&#xe9;chal</surname>
<given-names>J. D.</given-names>
</name>
<name>
<surname>Flanagan</surname>
<given-names>J. U.</given-names>
</name>
<name>
<surname>Ward</surname>
<given-names>C. J.</given-names>
</name>
<etal/>
</person-group> (<year>2005</year>). <article-title>Why is quinidine an inhibitor of cytochrome P450 2D6? The role of key active-site residues in quinidine binding</article-title>. <source>J. Biol. Chem.</source> <volume>280</volume>, <fpage>38617</fpage>&#x2013;<lpage>38624</lpage>. <pub-id pub-id-type="doi">10.1074/jbc.M505974200</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miguel</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Albuquerque</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Drug interaction in psycho-oncology: Antidepressants and antineoplastics</article-title>. <source>Pharmacology</source> <volume>88</volume>, <fpage>333</fpage>&#x2013;<lpage>339</lpage>. <pub-id pub-id-type="doi">10.1159/000334738</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Neve</surname>
<given-names>E. P. A.</given-names>
</name>
<name>
<surname>Ingelman-Sundberg</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Cytochrome P450 proteins: Retention and distribution from the endoplasmic reticulum</article-title>. <source>Curr. Opin. Drug Discov. Devel.</source> <volume>13</volume>, <fpage>78</fpage>&#x2013;<lpage>85</lpage>.</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nguyen-Vo</surname>
<given-names>T. H.</given-names>
</name>
<name>
<surname>Trinh</surname>
<given-names>Q. H.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Nguyen-Hoang</surname>
<given-names>P. U.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>T. N.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>D. T.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>iCYP-MFE: Identifying human cytochrome P450 inhibitors using multitask learning and molecular fingerprint-embedded encoding</article-title>. <source>J. Chem. Inf. Model.</source> <volume>62</volume>, <fpage>5059</fpage>&#x2013;<lpage>5068</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.1c00628</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Niijima</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Shiraishi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Okuno</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Dissecting kinase profiling data to predict activity and understand cross-reactivity of kinase inhibitors</article-title>. <source>J. Chem. Inf. Model.</source> <volume>52</volume>, <fpage>901</fpage>&#x2013;<lpage>912</lpage>. <pub-id pub-id-type="doi">10.1021/ci200607f</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Peter</surname>
<given-names>S. C.</given-names>
</name>
<name>
<surname>Dhanjal</surname>
<given-names>J. K.</given-names>
</name>
<name>
<surname>Malik</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Radhakrishnan</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Jayakanthan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sundar</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Quantitative structure-activity relationship (QSAR): Modeling approaches to biological applications</article-title>,&#x201d; in <source>Encyclopedia of bioinformatics and computational biology</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Ranganathan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Gribskov</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nakai</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Sch&#xf6;nbach</surname>
<given-names>C.</given-names>
</name>
</person-group> (<publisher-loc>Oxford</publisher-loc>: <publisher-name>Academic Press</publisher-name>), <fpage>661</fpage>&#x2013;<lpage>676</lpage>. <pub-id pub-id-type="doi">10.1016/B978-0-12-809633-8.20197-0</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Redlich</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Zanger</surname>
<given-names>U. M.</given-names>
</name>
<name>
<surname>Riedmaier</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Bache</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Giessing</surname>
<given-names>A. B. M.</given-names>
</name>
<name>
<surname>Eisenacher</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2008</year>). <article-title>Distinction between human cytochrome P450 (CYP) isoforms and identification of new phosphorylation sites by mass spectrometry</article-title>. <source>J. Proteome Res.</source> <volume>7</volume>, <fpage>4678</fpage>&#x2013;<lpage>4688</lpage>. <pub-id pub-id-type="doi">10.1021/pr800231w</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schelleman</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Brensinger</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>Quinney</surname>
<given-names>S. K.</given-names>
</name>
<name>
<surname>Bilker</surname>
<given-names>W. B.</given-names>
</name>
<name>
<surname>Flockhart</surname>
<given-names>D. A.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Pharmacoepidemiologic and <italic>in vitro</italic> evaluation of potential drug-drug interactions of sulfonylureas with fibrates and statins</article-title>. <source>Br. J. Clin. Pharmacol.</source> <volume>78</volume>, <fpage>639</fpage>&#x2013;<lpage>648</lpage>. <pub-id pub-id-type="doi">10.1111/bcp.12353</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Veith</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Austin</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Predictive models for cytochrome P450 isozymes based on quantitative high throughput screening data</article-title>. <source>J. Chem. Inf. Model.</source> <volume>51</volume>, <fpage>2474</fpage>&#x2013;<lpage>2481</lpage>. <pub-id pub-id-type="doi">10.1021/ci200311w</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Svetnik</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Liaw</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tong</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Culberson</surname>
<given-names>J. C.</given-names>
</name>
<name>
<surname>Sheridan</surname>
<given-names>R. P.</given-names>
</name>
<name>
<surname>Feuston</surname>
<given-names>B. P.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Random forest: A classification and regression tool for compound classification and QSAR modeling</article-title>. <source>J. Chem. Inf. Comput. Sci.</source> <volume>43</volume>, <fpage>1947</fpage>&#x2013;<lpage>1958</lpage>. <pub-id pub-id-type="doi">10.1021/ci034160g</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tateishi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Miyazu</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kurinami</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ieiri</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Hirakawa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Watanabe</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Hypoglycemia possibly caused by CYP2C9-mediated drug interaction in combination with bucolome: A case report</article-title>. <source>J. Pharm. Health Care Sci.</source> <volume>7</volume>, <fpage>39</fpage>. <pub-id pub-id-type="doi">10.1186/s40780-021-00221-y</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thelingwani</surname>
<given-names>R. S.</given-names>
</name>
<name>
<surname>Zvada</surname>
<given-names>S. P.</given-names>
</name>
<name>
<surname>Dolgos</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ungell</surname>
<given-names>A. L. B.</given-names>
</name>
<name>
<surname>Masimirembwa</surname>
<given-names>C. M.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>
<italic>In vitro</italic> and <italic>in silico</italic> identification and characterization of thiabendazole as a mechanism-based inhibitor of CYP1A2 and simulation of possible pharmacokinetic drug-drug interactions</article-title>. <source>Drug Metab. Dispos. Biol. Fate Chem.</source> <volume>37</volume>, <fpage>1286</fpage>&#x2013;<lpage>1294</lpage>. <pub-id pub-id-type="doi">10.1124/dmd.108.024604</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tyzack</surname>
<given-names>J. D.</given-names>
</name>
<name>
<surname>Hunt</surname>
<given-names>P. A.</given-names>
</name>
<name>
<surname>Segall</surname>
<given-names>M. D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Predicting regioselectivity and lability of cytochrome P450 metabolism using quantum mechanical simulations</article-title>. <source>J. Chem. Inf. Model.</source> <volume>56</volume>, <fpage>2180</fpage>&#x2013;<lpage>2193</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.6b00233</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Veli&#x10d;kovi&#x107;</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Cucurull</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Casanova</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Romero</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Li&#xf2;</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Graph attention networks</article-title>. <comment>ArXiv171010903</comment>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Bryant</surname>
<given-names>S. H.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gindulyte</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Shoemaker</surname>
<given-names>B. A.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>PubChem BioAssay: 2017 update</article-title>. <source>Nucleic Acids Res.</source> <volume>45</volume>, <fpage>D955</fpage>&#x2013;<lpage>D963</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkw1118</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Ramsundar</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Feinberg</surname>
<given-names>E. N.</given-names>
</name>
<name>
<surname>Gomes</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Geniesse</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Pappu</surname>
<given-names>A. S.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>MoleculeNet: A benchmark for molecular machine learning</article-title>. <comment>ArXiv170300564</comment>.</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiong</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Qiao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H. Y.</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>D. Q.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Survey of machine learning techniques for prediction of the isoform specificity of cytochrome P450 substrates</article-title>. <source>Curr. Drug Metab.</source> <volume>20</volume>, <fpage>229</fpage>&#x2013;<lpage>235</lpage>. <pub-id pub-id-type="doi">10.2174/1389200219666181019094526</pub-id>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zernov</surname>
<given-names>V. V.</given-names>
</name>
<name>
<surname>Balakin</surname>
<given-names>K. V.</given-names>
</name>
<name>
<surname>Ivaschenko</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Savchuk</surname>
<given-names>N. P.</given-names>
</name>
<name>
<surname>Pletnev</surname>
<given-names>I. V.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Drug discovery using support vector machines. The case studies of drug-likeness, agrochemical-likeness, and enzyme inhibition predictions</article-title>. <source>J. Chem. Inf. Comput. Sci.</source> <volume>43</volume>, <fpage>2048</fpage>&#x2013;<lpage>2056</lpage>. <pub-id pub-id-type="doi">10.1021/ci0340916</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Ligand- and structure-based identification of novel CDK9 inhibitors for the potential treatment of leukemia</article-title>. <source>Bioorg. Med. Chem.</source> <volume>72</volume>, <fpage>116994</fpage>. <pub-id pub-id-type="doi">10.1016/j.bmc.2022.116994</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>