<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Bioinform.</journal-id>
<journal-title>Frontiers in Bioinformatics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Bioinform.</abbrev-journal-title>
<issn pub-type="epub">2673-7647</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1225149</article-id>
<article-id pub-id-type="doi">10.3389/fbinf.2023.1225149</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Bioinformatics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>An algorithm for drug discovery based on deep learning with an example of developing a drug for the treatment of lung cancer</article-title>
<alt-title alt-title-type="left-running-head">Chebanov et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fbinf.2023.1225149">10.3389/fbinf.2023.1225149</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Chebanov</surname>
<given-names>Dmitrii K.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2310271/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Misyurin</surname>
<given-names>Vsevolod A.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Shubina</surname>
<given-names>Irina Zh.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/762456/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Department of Molecular Biology of Cancer</institution>, <institution>BioAlg Corp.</institution>, <addr-line>Covina</addr-line>, <addr-line>CA</addr-line>, <country>United States</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>The Russian Melanoma Professional Association (Melanoma.PRO)</institution>, <addr-line>Moscow</addr-line>, <country>Russia</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/685112/overview">Sajjad Gharaghani</ext-link>, University of Tehran, Iran</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/515459/overview">Xuezhong Zhou</ext-link>, Beijing Jiaotong University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/2306334/overview">Yiyong Zhao</ext-link>, Harvard University, United States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/460811/overview">Amit Kumar Banerjee</ext-link>, Indian Institute of Chemical Technology (CSIR), India</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Dmitrii K. Chebanov, <email>chebanov.dk@gmail.com</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>11</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>3</volume>
<elocation-id>1225149</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>05</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>02</day>
<month>10</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Chebanov, Misyurin and Shubina.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Chebanov, Misyurin and Shubina</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>In this study, we present an algorithmic framework integrated within the created software platform tailored for the discovery of novel small-molecule anti-tumor agents. Our approach was exemplified in the context of combatting lung cancer. In the initial phase, target identification for therapeutic intervention was accomplished. Leveraging deep learning, we scrutinized gene expression profiles, focusing on those associated with adverse clinical outcomes in lung cancer patients. Augmenting this, generative adversarial neural (GAN) networks were employed to amass additional patient data. This effort yielded a subset of genes definitively linked to unfavorable prognoses. We further employed deep learning to delineate genes capable of discriminating between normal and tumor tissues based on expression patterns. The remaining genes were earmarked as potential targets for precision lung cancer therapy. Subsequently, a dedicated module was formulated to predict the interactions between inhibitors and proteins. To achieve this, protein amino acid sequences and chemical compound formulations engaged in protein interactions were encoded into vectorized representations. Additionally, a deep learning-based component was developed to forecast IC<sub>50</sub> values through experimentation on cell lines. Virtual pre-clinical trials employing these inhibitors facilitated the selection of pertinent cell lines for subsequent laboratory assays. In summary, our study culminated in the derivation of several small-molecule formulas projected to bind selectively to specific proteins. This algorithmic platform holds promise in accelerating the identification and design of anti-tumor compounds, a critical pursuit in advancing targeted cancer therapies.</p>
</abstract>
<kwd-group>
<kwd>drug discovery</kwd>
<kwd>artificial intelligence</kwd>
<kwd>deep learning</kwd>
<kwd>modeling</kwd>
<kwd>simulation</kwd>
<kwd>chemoinformatic</kwd>
</kwd-group>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Drug Discovery in Bioinformatics</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>The persistent challenge of effectively treating cancer patients remains a matter of utmost significance as issues related to relapses and drug resistance in antitumor therapies continue to pose unresolved hurdles. Addressing these challenges necessitates the development of novel therapeutic agents that exhibit superior efficacy compared to those already sanctioned. However, this pursuit of innovation inevitably contributes to escalated research and production costs.</p>
<p>To surmount these obstacles, the realm of bioinformatics has embraced the power of computational methodologies, offering a promising avenue to revolutionize drug discovery. Notably, deep learning technologies have garnered substantial success across diverse scientific and industrial domains, enabling the resolution of intricate problems through an unparalleled degree of abstraction unattainable by the human mind.</p>
<p>In this context, the integration of machine learning models emerges as a transformative solution for identifying potential candidates for novel drugs. A prime example lies in harnessing machine learning to predict the therapeutic attributes of molecular compounds, thereby facilitating systematic exploration within vast chemical libraries. Furthermore, the predictive capabilities of machine learning extend to deciphering intricate drug&#x2013;protein interactions, thereby unveiling precise protein targets and potential inhibitor molecules.</p>
<p>An additional facet of machine learning&#x2019;s prowess lies in its capacity to forecast the outcomes of pivotal IC<sub>50</sub> experiments. By assimilating genomic expression profiles of cellular lines and molecular structures, these models can prognosticate the feasibility of achieving established IC<sub>50</sub> values. This emulation of <italic>in silico</italic> cellular experiments showcases the potential to streamline research efforts and augment the drug discovery process.</p>
<p>This article delves into the application of machine learning methodologies throughout the drug discovery process, encompassing stages such as target identification, literature retrieval, and selection of molecular inhibitors guided by target interactions, as well as the strategic planning and predictive modeling of preclinical studies (<xref ref-type="fig" rid="F1">Figure 1</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Overview of the overall pipeline structure.</p>
</caption>
<graphic xlink:href="fbinf-03-1225149-g001.tif"/>
</fig>
<sec id="s1-1">
<title>Target identification</title>
<p>The foundational step in drug discovery hinges upon the precise delineation of therapeutic targets. The multifaceted nature of this endeavor necessitates adherence to several imperative criteria. Primarily, targets should exhibit a degree of specificity that highlights disparities between tumor and normal tissues. Furthermore, their involvement in tumor cell survival pathways is paramount. Equally significant is their amenability to small molecule-based interventions. Notably, our research advances the proposition that altered genes, underpinning increased disease aggressiveness and decreased overall survival, hold promise as prime targets. This aligns with the overarching objective of machine learning-driven target identification&#x2014;an effective prediction of overall survival and relapse-free interval through the discerning selection of indispensable genes while simultaneously accounting for distinctive expression profiles in comparison to normal tissues.</p>
</sec>
<sec id="s1-2">
<title>Optimizing target gene selection</title>
<p>A nuanced challenge arises from genes implicated in pivotal cellular processes that are shared between tumor and normal cells. The intricate balance of targeting such genes mandates a judicious approach. To this end, we have incorporated a refined strategy. Leveraging comprehensive gene expression data from tumor and normal tissues, we discriminate against genes displaying marginal expression differences. Employing deep learning algorithms, we delineate genes that decisively demarcate tumor and normal tissues, thus augmenting the precision of target gene selection.</p>
</sec>
<sec id="s1-3">
<title>Harnessing deep learning for literature mining</title>
<p>Beyond the confines of experimental data, deep learning extends its scope to the vast expanse of the published literature. Our research capitalizes on this potential, streamlining decision-making processes by assimilating insights from a meticulously curated repository of scientific articles. A dedicated tabulated summary of findings from a PubMed and PMC search fortifies the arsenal of tools for selecting promising molecular inhibitors.</p>
</sec>
<sec id="s1-4">
<title>Predictive modeling for drug&#x2013;protein interactions</title>
<p>A pivotal axis of drug discovery revolves around predicting the interaction dynamics between drug molecules and target proteins. We introduce an innovative deep learning model, integrating intricate protein and drug molecule information. This model prognosticates the impact of drug molecules on target proteins, ushering in a refined selection process. This stage inherently winnows the gamut of potential targets, eliminating candidates whose inhibition feasibility or binding efficacy raises concerns.</p>
</sec>
<sec id="s1-5">
<title>Navigating toward preclinical trials</title>
<p>Transitioning toward preclinical studies demands the emulation of cellular experiments, a crucial precursor to laboratory validation. Deliberations encompassing cell lines that accurately mirror real-world conditions are pivotal. Our investigation extends to prognosticating the likelihood of compounds from prior stages attaining IC<sub>50</sub> concentrations within select cell lines, paving a trajectory toward informed preclinical trial design.</p>
<p>In the synthesis of these insights, our study embarks on a journey through the intricate tapestry of machine learning-driven drug discovery in oncology. From the meticulous identification of target genes to the finesse of molecular interaction prediction and the refinement of preclinical trial design, the amalgamation of cutting-edge techniques and comprehensive data analyses presents a formidable paradigm shift in the pursuit of effective therapeutics.</p>
</sec>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>Materials and methods</title>
<sec id="s2-1">
<title>Target identification</title>
<p>To assess the effect of gene expression on disease prognosis, we used the data on gene expression from the open database TCGA <ext-link ext-link-type="uri" xlink:href="https://gdc.cancer.gov/">https://gdc.cancer.gov</ext-link> (<xref ref-type="bibr" rid="B18">Cancer Genome Atlas Research Network et al., 2013</xref>). In this section, gene expression data (RNA-Seq) were acquired and subsequently subjected to a normalization procedure. Normalization was performed by aligning the expression values to the reference levels represented by the control gene GAPDH, commonly referred to as a &#x201c;housekeeping&#x201d; gene. This process was undertaken to facilitate the integration of newly acquired data into the database. The data on overall survival (OS), progression-free interval (PFI), and the same parameters within the follow-up period were derived from this database. The problem of OS prediction was successfully solved in our previous study (<xref ref-type="bibr" rid="B3">Chebanov et al., 2021</xref>).</p>
<p>The essence of applying machine learning in this context can be summarized as follows: a dataset is prepared comprising features that include cancer-associated genes and patient medical history data. The target variable is a binary outcome representing whether the patient surpasses the median PFI or OS value for the whole dataset. A model is constructed using a multi-layer perceptron within the Python environment utilizing the Keras library. Upon achieving satisfactory training quality, the most influential features affecting the prediction are extracted from the dataset.</p>
<p>As an example, we selected the diagnosis of lung cancer and extracted a cohort of 514 patients with this diagnosis from the database.</p>
<p>In order to minimize training noise, we specifically curated a dataset containing genes that are integral to tumor-associated signaling pathways, as defined by the KEGG resource. This meticulous curation resulted in the inclusion of a total of 1,821 genes (<xref ref-type="bibr" rid="B7">Kanehisa and Goto, 2000</xref>).</p>
<p>To begin, we initially trained the algorithm utilizing a dataset comprising 514 patient records. However, following a rigorous five-fold cross-validation procedure, we observed that the mean ROC-AUC values were 0.69 (0.61&#x2013;0.74) for overall survival (OS) and 0.61 (0.54&#x2013;0.69) for progression-free interval (PFI). These results collectively signify a level of predictive performance that falls short of expectations, thus indicating the need for further refinement and enhancement of our predictive model. The reason is that it was a very small dataset for the application of neural networks, even taking into account more than 1,800 features.</p>
<p>To avoid this, we generated additional data comprising 50,000 synthetic patients by applying a generative adversarial network (GAN) to the tabular data on the existing 514 patients. GAN technology has been successfully used in various industries, such as image generation (<xref ref-type="bibr" rid="B5">Goodfellow et al., 2014</xref>). To use this methodology, we leveraged the Python SDV library (<xref ref-type="bibr" rid="B13">Patki et al., 2016</xref>) with a specific focus on utilizing the CTGAN module (<xref ref-type="bibr" rid="B20">Xu et al., 2019</xref>).</p>
</sec>
<sec id="s2-2">
<title>Optimizing target gene selection</title>
<p>To refine the list of potential targets, we trained a deep learning algorithm to classify tissue into healthy and tumor categories based on gene expression. Subsequently, we ranked the features of the original dataset by importance, and genes with the greatest influence on the prediction were identified as more probable candidates for targeting as they contribute more significantly to distinguishing tumors from healthy tissue.</p>
<p>Gene expressions for tumor-normal data were taken from the GENT2 database (<xref ref-type="bibr" rid="B12">Park et al., 2019</xref>). This database comprises information on 68,287 samples of patients&#x2019; tissues and cell lines for all the diagnoses, of which 58,041 were tumor samples and 10,246 were normal samples.</p>
<sec id="s2-2-1">
<title>Literature mining</title>
<p>We developed a deep learning-based tool for named entity recognition (NER) based on the technology of natural language processing (NLP), with the help of the Python library Biopython, for which we trained the NLP algorithm on the abstracts of articles labeled by hand. We identified the name of the gene or protein of interest and the name of its inhibitor. We used the BERT algorithm (<xref ref-type="bibr" rid="B4">Devlin et al., 2019</xref>) as the basis. We applied the fine-tuning procedure for this algorithm, which included training on the dataset of the labeled abstracts with the BRAF gene and its inhibitors.</p>
<p>The created algorithm helped achieve 98% accuracy of prediction.</p>
</sec>
<sec id="s2-2-2">
<title>Drug&#x2013;protein interactions</title>
<p>We obtained the drug data from the open DrugBank database (<xref ref-type="bibr" rid="B19">Wishart et al., 2017</xref>). This database contains data about drugs in combination with targets, including the protein that the drug is targeting, as well as structural representations of the molecules. We have selected only those small molecules for which there is a representation in the SMILES format. We needed two types of data to prepare the dataset: a target protein and a structural representation of the molecule.</p>
<p>As a result, the data array included positive examples with 19,256 interactions for 5,769 drugs and comprised 4,104 unique proteins encoded by 3,516 genes.</p>
<p>A challenge was to find negative examples for the training set. Researchers solve the problem in different ways: for instance, <xref ref-type="bibr" rid="B16">Wang et al. (2018</xref>) reported that they randomly selected negative drug&#x2013;target pairs with no interaction data. Researchers of another study also obtained negative examples by extracting pairs with no interaction data while randomly choosing the number of examples equal to the number of positive examples (<xref ref-type="bibr" rid="B17">Wang et al., 2020</xref>). Some authors predicted the absence of an interaction by the algorithm (<xref ref-type="bibr" rid="B9">Liu et al., 2015</xref>).</p>
<p>We analyzed the STITCH database (<xref ref-type="bibr" rid="B15">Szklarczyk et al., 2016</xref>), which contains scores of interactions between proteins and compounds.</p>
<p>To determine which score to regard as negative, we correlated interactions from the STITCH database with the DrugBank database, which included only pairs with a positive score. Thus, we expected we could understand which value to consider a &#x201c;positive speed.&#x201d;</p>
<p>The left part of <xref ref-type="fig" rid="F2">Figure 2</xref> presents a boxplot for pairs that are present in both databases (STITCH and DrugBank); therefore, they are considered positive. The right part presents a boxplot for all values from the STITCH database. Thus, it is evident that the range of positive rates does not intersect with the main range of data from the database of all interactions and is an outlier for it.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>1&#x2013;Strength of the interaction between the compound and the protein, described in both databases (STITCH and DrugBank). 2&#x2013;Boxplot for all values from the STITCH database.</p>
</caption>
<graphic xlink:href="fbinf-03-1225149-g002.tif"/>
</fig>
<p>We used the lower quartile of interaction rates to form a sample of non-protein-binding drugs. We sorted the compounds of the obtained data, according to the number of known interactions with proteins, and selected the 50,000 most common compounds.</p>
<p>Amino acid sequences were obtained from the UniProt database (<xref ref-type="bibr" rid="B10">Martin et al., 2021</xref>). We presented them in vector form using the approach described elsewhere (<xref ref-type="bibr" rid="B1">Asgari and Mofrad, 2015</xref>), where the authors vectorized all possible amino acid triplets (8,000) in the form of a 100-dimensional vector, and thus, the vector representation of any protein consisting of these triplets was equal to their vector sum.</p>
<p>We also presented the compounds included in the dataset in a vector form and in the form of 100-dimensional vectors, using the embedding approach of natural language processing technologies and implemented in the RDKit (RDKit), mol2vec (<xref ref-type="bibr" rid="B6">Jaeger et al., 2018</xref>), and word2vec (<xref ref-type="bibr" rid="B11">Mikolov et al., 2013</xref>) libraries for Python3 language.</p>
<p>To search for candidate molecules, we experimented with predicting interactions for pairs of genes and compounds of all possible ones.</p>
<p>We used all the molecules from the PubChem library, which had representations in the form of SMILES (23 million in total). They were presented in a vector form similar to that in the learning process. Amino acid sequences of the encoding proteins were obtained from the UniProt database for 12 genes that we received earlier. The prediction result was the DPI probability.</p>
</sec>
</sec>
<sec id="s2-3">
<title>Preclinical trial modeling</title>
<p>The dataset for cell experiment emulation was formed using the data on gene expression profiles of the cell lines from the Cancer Cell Line Encyclopedia (CCLE) database and compound sensitivity data for cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) database (<xref ref-type="bibr" rid="B2">Barretina et al., 2012</xref>; <xref ref-type="bibr" rid="B21">Yang et al., 2013</xref>). We selected only lung cancer cell lines. A total of 11,330 interactions were obtained for 122 drugs and 32,000 genes in 106 cell lines.</p>
<p>Then, a study was performed for the prediction of these 106 lines&#x2019; interactions with potential inhibitors. We substituted each of the 2,921 candidate inhibitors in turn and predicted the success of the IC<sub>50</sub> test. After that, we selected all molecules with a probability of above 0.9, achieving IC<sub>50</sub> from the results in the A549 and CALU1 cell lines. These cell lines were chosen due to the most frequent use of these lines in various studies of lung cancer.</p>
<p>The study resulted in the obtained data on the structure of 37 molecules with potential toxicity for lung cancer cells. We performed visualization in our own module and found that the resulting molecules were large and consisted of many repeating structural elements of radicals. Therefore, we decided to isolate the active parts of the molecule for further analysis. As a result, another 15 molecules, after their decomposition, were added to the initial set of 37 molecules.</p>
<p>In the next step, we planned to test the selectivity of the obtained molecules in all 1,018 cell lines. We designed a similar experiment to predict the IC<sub>50</sub> value.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec id="s3-1">
<title>Target identification</title>
<p>We generated 50k patients&#x2019; data to reach the ROC-AUC value equal to 0.73. We used the Lasso linear regression algorithm with a 5-fold cross-validation to determine the significance of the features. The result of each experiment was obtained as a list of genes ordered by decreasing impact on the effect. We combined the lists of genes obtained in the experiments with OS and with PFI.</p>
<p>Consequently, we identified 36 genes whose expression correlates with compromised survival indicators in individuals diagnosed with lung cancer. <xref ref-type="table" rid="T1">Table 1</xref> presents some characteristics of these genes.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Biological features of the discovered genes.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Gene</th>
<th align="left">Type of encoded protein</th>
<th align="left">Relationship with various processes in the cells or the organism</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<italic>GNB3&#x2a;</italic>
</td>
<td align="left">Signal</td>
<td align="left">Obesity</td>
</tr>
<tr>
<td align="left">
<italic>CHRM1&#x2a;</italic>
</td>
<td align="left">Receptor and signal</td>
<td align="left">Regulation of nerve impulses</td>
</tr>
<tr>
<td align="left">
<italic>SHC4&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Signal</td>
<td align="left">Proliferation and apoptosis</td>
</tr>
<tr>
<td align="left">
<italic>FKBP4&#x2a;&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Signal</td>
<td align="left">Immunoregulation</td>
</tr>
<tr>
<td align="left">
<italic>IL17B&#x2a;</italic>
</td>
<td align="left">Cytokine</td>
<td align="left">Immunoregulation</td>
</tr>
<tr>
<td align="left">
<italic>ATP6V1E2&#x2a;</italic>
</td>
<td align="left">Membrane transporter</td>
<td align="left">ATP synthesis</td>
</tr>
<tr>
<td align="left">
<italic>FASLG</italic>
</td>
<td align="left">Receptor and signaling</td>
<td align="left">Proapoptosis</td>
</tr>
<tr>
<td align="left">
<italic>DKK4&#x2a;&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Signaling</td>
<td align="left">Proliferation, stemness, and chemoresistance</td>
</tr>
<tr>
<td align="left">
<italic>GDF6&#x2a;&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Signaling</td>
<td align="left">Growth factor</td>
</tr>
<tr>
<td align="left">
<italic>GP6</italic>
</td>
<td align="left">Structural</td>
<td align="left">Platelet aggregation</td>
</tr>
<tr>
<td align="left">
<italic>WNT6&#x2a;&#x2a;&#x2a;&#x2a;</italic> and <italic>WNT8B&#x2a;&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Signaling</td>
<td align="left">Differentiation</td>
</tr>
<tr>
<td align="left">
<italic>HMOX1</italic>
</td>
<td align="left">Enzyme</td>
<td align="left">Respiration</td>
</tr>
<tr>
<td align="left">
<italic>LEF1&#x2a;&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Transcription factor</td>
<td align="left">Differentiation and morphology</td>
</tr>
<tr>
<td align="left">
<italic>ATP1A4</italic>
</td>
<td align="left">Membrane transporter</td>
<td align="left">ATP synthesis</td>
</tr>
<tr>
<td align="left">
<italic>ACVR2A&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Receptor and signal</td>
<td align="left">Growth activator</td>
</tr>
<tr>
<td align="left">
<italic>SMAD9&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Signal</td>
<td align="left">Proliferation</td>
</tr>
<tr>
<td align="left">
<italic>CUL1</italic>
</td>
<td align="left">Complex</td>
<td align="left">Protein utilization and cell cycle control</td>
</tr>
<tr>
<td align="left">
<italic>KRT10</italic>
</td>
<td align="left">Structural</td>
<td align="left">Cytoskeleton</td>
</tr>
<tr>
<td align="left">
<italic>PIAS4</italic>
</td>
<td align="left">Regulator</td>
<td align="left">Blocks STAT4</td>
</tr>
<tr>
<td align="left">
<italic>FSHR&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Receptor</td>
<td align="left">Proliferation</td>
</tr>
<tr>
<td align="left">
<italic>CCNA2&#x2a;&#x2a;</italic>
</td>
<td align="left">Regulator</td>
<td align="left">Cyclin</td>
</tr>
<tr>
<td align="left">
<italic>RPS6KA4&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Transcriptional CSF2</td>
<td align="left">Proliferation</td>
</tr>
<tr>
<td align="left">
<italic>factor&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Cytokine</td>
<td align="left">Proliferation</td>
</tr>
<tr>
<td align="left">
<italic>EFNA3&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Signaling</td>
<td align="left">Proliferation</td>
</tr>
<tr>
<td align="left">
<italic>KRT24</italic>, <italic>KRT27</italic>, <italic>MYL10</italic>, and <italic>MYLPF</italic>
</td>
<td align="left">Structural</td>
<td align="left">Cytoskeleton</td>
</tr>
<tr>
<td align="left">
<italic>ITGA3&#x2a;</italic>
</td>
<td align="left">Structural</td>
<td align="left">Adhesion</td>
</tr>
<tr>
<td align="left">
<italic>ZFYVE9&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Transcription factor</td>
<td align="left">Proliferation</td>
</tr>
<tr>
<td align="left">
<italic>ATP6V1G3&#x2a;</italic>
</td>
<td align="left">Complex</td>
<td align="left">Protein</td>
</tr>
<tr>
<td align="left">
<italic>HEY1</italic>
</td>
<td align="left">Transcription factor</td>
<td align="left">Differentiation</td>
</tr>
<tr>
<td align="left">
<italic>PIK3R6&#x2a;&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Receptor and signal</td>
<td align="left">Unknown</td>
</tr>
<tr>
<td align="left">
<italic>BIRC3&#x2a;&#x2a;&#x2a;</italic>
</td>
<td align="left">Complex</td>
<td align="left">Inhibitor</td>
</tr>
<tr>
<td align="left">
<italic>GHSR&#x2a;</italic>
</td>
<td align="left">Receptor and signaling</td>
<td align="left">Obesity</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>At the stage of implementation of the tumor&#x2013;normal filter, deep learning was performed according to the aforementioned method with the ROC-AUC indicator of 0.83. A total of 4,912 genes were selected in the process of determining the significant features. The genes with the expression associated with distinguishing a tumor tissue from the normal tissue were isolated from the previously found 36 genes with the help of the obtained list of genes. These 12 genes were <italic>DKK4</italic>, <italic>GP6</italic>, <italic>LEF1</italic>, <italic>CUL1</italic>, <italic>KRT10</italic>, <italic>PIAS4</italic>, <italic>FSHR</italic>, <italic>MYLPF</italic>, <italic>EFNA3</italic>, <italic>ZFYVE9</italic>, <italic>GHSR</italic>, and <italic>MYL10</italic>.</p>
</sec>
<sec id="s3-2">
<title>Literature mining</title>
<p>As a result of automated literature mining inhibitors, we obtained <xref ref-type="table" rid="T2">Table 2</xref>, which presents the number of articles published in response to various requests for each of the 12 genes of interest. Such a table will help draw a conclusion about the studying extent of the gene as a target.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Results of the NLP search for keywords associated with the studied genes and their inhibitors. Tags: <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>green&#x2014;low-molecular weight inhibitor that triggers apoptosis</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf2">
<mml:math id="m2">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>blue&#x2014;protein</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf3">
<mml:math id="m3">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="red">
<mml:mi>red&#x2014;toxin</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Gene</th>
<th align="left">Gene &#x2b; &#x2018;target&#x27;</th>
<th align="left">Gene &#x2b; &#x2018;cancer&#x27;</th>
<th align="left">Gene &#x2b; &#x2018;lung cancer&#x27;</th>
<th align="left">Gene &#x2b; &#x2018;phase&#x27;</th>
<th align="left">Gene &#x2b; &#x2018;drug&#x27;</th>
<th align="left">Gene &#x2b; &#x2018;approval&#x27;</th>
<th align="left">Gene &#x2b; &#x2018;FDA&#x27;</th>
<th align="left">Total mentions</th>
<th align="left">Inhibitors</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<italic>LEF1</italic>
</td>
<td align="left">9,661</td>
<td align="left">9,138</td>
<td align="left">4,301</td>
<td align="left">4,962</td>
<td align="left">4,771</td>
<td align="left">2,129</td>
<td align="left">665</td>
<td align="left">35,627</td>
<td align="left">
<inline-formula id="inf4">
<mml:math id="m4">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>Imatinib</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf5">
<mml:math id="m5">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>wnt10b</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf6">
<mml:math id="m6">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>dlx3</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf7">
<mml:math id="m7">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>sb203580</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf8">
<mml:math id="m8">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>ex527</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf9">
<mml:math id="m9">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>dasatinib</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf10">
<mml:math id="m10">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>t0070907</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf11">
<mml:math id="m11">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>sb431542</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>CUL1</italic>
</td>
<td align="left">4,854</td>
<td align="left">4,245</td>
<td align="left">1,823</td>
<td align="left">3,629</td>
<td align="left">2,518</td>
<td align="left">634</td>
<td align="left">374</td>
<td align="left">18,077</td>
<td align="left">
<inline-formula id="inf12">
<mml:math id="m12">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>fbx4</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf13">
<mml:math id="m13">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>selumetinib</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf14">
<mml:math id="m14">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>fbxo7</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf15">
<mml:math id="m15">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>fbxo31</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf16">
<mml:math id="m16">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>fbxo21</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf17">
<mml:math id="m17">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>fbxo4</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf18">
<mml:math id="m18">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>fbxw7</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>FSHR</italic>
</td>
<td align="left">2,143</td>
<td align="left">2,150</td>
<td align="left">466</td>
<td align="left">1,657</td>
<td align="left">1,752</td>
<td align="left">752</td>
<td align="left">205</td>
<td align="left">9,125</td>
<td align="left">
<inline-formula id="inf19">
<mml:math id="m19">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>Sunitinib</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf20">
<mml:math id="m20">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>uk5099</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf21">
<mml:math id="m21">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="red">
<mml:mi>clxbpa</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>GP6</italic>
</td>
<td align="left">1,385</td>
<td align="left">1,977</td>
<td align="left">460</td>
<td align="left">1,023</td>
<td align="left">770</td>
<td align="left">766</td>
<td align="left">199</td>
<td align="left">6,580</td>
<td align="left">
<inline-formula id="inf22">
<mml:math id="m22">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>ono1714</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>GHSR</italic>
</td>
<td align="left">1,308</td>
<td align="left">926</td>
<td align="left">365</td>
<td align="left">828</td>
<td align="left">1,224</td>
<td align="left">382</td>
<td align="left">417</td>
<td align="left">5,450</td>
<td align="left">
<inline-formula id="inf23">
<mml:math id="m23">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>Gefitinib</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf24">
<mml:math id="m24">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>pd98059</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf25">
<mml:math id="m25">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>pd90859</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf26">
<mml:math id="m26">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>sb203580</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>KRT10</italic>
</td>
<td align="left">934</td>
<td align="left">998</td>
<td align="left">388</td>
<td align="left">700</td>
<td align="left">604</td>
<td align="left">320</td>
<td align="left">75</td>
<td align="left">4,019</td>
<td align="left">
<inline-formula id="inf27">
<mml:math id="m27">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>r115866</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf28">
<mml:math id="m28">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>sb431542</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>PIAS4</italic>
</td>
<td align="left">810</td>
<td align="left">792</td>
<td align="left">319</td>
<td align="left">505</td>
<td align="left">427</td>
<td align="left">127</td>
<td align="left">53</td>
<td align="left">3,033</td>
<td align="left">
<inline-formula id="inf29">
<mml:math id="m29">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="green">
<mml:mi>nur77</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf30">
<mml:math id="m30">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>pax8</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf31">
<mml:math id="m31">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>foxm1b</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf32">
<mml:math id="m32">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>trim32</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf33">
<mml:math id="m33">
<mml:mrow>
<mml:mstyle displaystyle="false" mathbackground="lightblue">
<mml:mi>zif268</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>DKK4</italic>
</td>
<td align="left">652</td>
<td align="left">661</td>
<td align="left">339</td>
<td align="left">312</td>
<td align="left">368</td>
<td align="left">139</td>
<td align="left">48</td>
<td align="left">2,519</td>
<td align="left"/>
</tr>
<tr>
<td align="left">
<italic>EFNA3</italic>
</td>
<td align="left">542</td>
<td align="left">541</td>
<td align="left">328</td>
<td align="left">272</td>
<td align="left">318</td>
<td align="left">141</td>
<td align="left">28</td>
<td align="left">2,170</td>
<td align="left"/>
</tr>
<tr>
<td align="left">
<italic>MYLPF</italic>
</td>
<td align="left">301</td>
<td align="left">251</td>
<td align="left">98</td>
<td align="left">192</td>
<td align="left">168</td>
<td align="left">90</td>
<td align="left">17</td>
<td align="left">1,117</td>
<td align="left"/>
</tr>
<tr>
<td align="left">
<italic>ZFYVE9</italic>
</td>
<td align="left">166</td>
<td align="left">155</td>
<td align="left">82</td>
<td align="left">85</td>
<td align="left">75</td>
<td align="left">46</td>
<td align="left">12</td>
<td align="left">621</td>
<td align="left"/>
</tr>
<tr>
<td align="left">
<italic>MYL10</italic>
</td>
<td align="left">73</td>
<td align="left">74</td>
<td align="left">33</td>
<td align="left">50</td>
<td align="left">28</td>
<td align="left">35</td>
<td align="left">2</td>
<td align="left">295</td>
<td align="left"/>
</tr>
</tbody>
</table>
</table-wrap>
<p>However, some references may not mean that there is a direct connection between the name of the gene and the drug used. In other words, they may not be related to it in terms of inhibition but are simply mentioned in a similar context.</p>
<p>As a result of the NLP search, we added the right column to the table, which lists all inhibitors of a certain gene. These data allow a researcher to make a decision on the basis of the available number of inhibitors for each of the genes under consideration.</p>
</sec>
<sec id="s3-3">
<title>Drug&#x2013;protein interactions</title>
<p>As a result, a dataset was obtained from 118,379 pairs, including 19,250 pairs describing the compounds bound to proteins and 99,129 precedents describing non-protein-bound compounds.</p>
<p>Deep learning was applied in a similar way, as in the previous approach. ROC analysis of learning quality allowed us to obtain an average area under the curve of 0.86 (<xref ref-type="fig" rid="F3">Figure 3</xref>).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Model quality for predicting drug&#x2013;protein interactions.</p>
</caption>
<graphic xlink:href="fbinf-03-1225149-g003.tif"/>
</fig>
<p>After search for candidate molecules, we received 160,000 pairs with an interaction probability over 0.99 and 2,921 pairs with an algorithm predicted probability of 1.0.</p>
<p>The following distribution by the inhibited genes was obtained for these 2,921 potential inhibitors (<xref ref-type="table" rid="T3">Table 3</xref>).</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Results of predicting the inhibitors for the target genes.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Gene</th>
<th align="left">Number of inhibitors found</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<italic>LEF1</italic>
</td>
<td align="left">639</td>
</tr>
<tr>
<td align="left">
<italic>MYL10</italic>
</td>
<td align="left">524</td>
</tr>
<tr>
<td align="left">
<italic>FSHR</italic>
</td>
<td align="left">461</td>
</tr>
<tr>
<td align="left">
<italic>EFNA3</italic>
</td>
<td align="left">385</td>
</tr>
<tr>
<td align="left">
<italic>GHSR</italic>
</td>
<td align="left">279</td>
</tr>
<tr>
<td align="left">
<italic>CUL1</italic>
</td>
<td align="left">248</td>
</tr>
<tr>
<td align="left">
<italic>GP6</italic>
</td>
<td align="left">190</td>
</tr>
<tr>
<td align="left">
<italic>DKK4</italic>
</td>
<td align="left">134</td>
</tr>
<tr>
<td align="left">
<italic>PIAS4</italic>
</td>
<td align="left">61</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-4">
<title>Preclinical trial modeling</title>
<p>During the cell experiment emulation, the algorithm for determining the importance of the features selected 129 genes. The characteristics of the proteins encoded by the revealed genes are presented in <xref ref-type="table" rid="T4">Table 4</xref>. ROC is shown on <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Characteristics of proteins encoded by the revealed genes.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Encoded proteins</th>
<th align="left">Genes</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Structural</td>
<td align="left">
<italic>LAMB4</italic>, <italic>PDLIM2</italic>, <italic>C1QC</italic>, <italic>SPATA48</italic>, <italic>CDC42SE1</italic>, <italic>UPK3B</italic>, and <italic>APOBEC3</italic>
</td>
</tr>
<tr>
<td align="left">Inhibitor</td>
<td align="left">
<italic>APOBEC3</italic>, <italic>HTN1</italic>, <italic>APOC1</italic>, <italic>CST5</italic>, and <italic>MRGPRX2</italic>
</td>
</tr>
<tr>
<td align="left">Metabolism</td>
<td align="left">
<italic>MGLL</italic>, <italic>UPB1</italic>, <italic>PPIF</italic>, <italic>AMPD1</italic>, <italic>ESYT2</italic>, <italic>RAB30</italic>, <italic>SLC40A1</italic>, <italic>LHFPL2</italic>, <italic>GALNT14</italic>, <italic>TENT5B</italic>, <italic>PADI4</italic>, <italic>FABP6</italic>, <italic>AKR1B10</italic>, <italic>LIPK</italic>, <italic>AWAT1</italic>, <italic>GAPDHP45</italic>, and <italic>CCDC71L</italic>
</td>
</tr>
<tr>
<td align="left">Energy</td>
<td align="left">
<italic>SCN4A</italic>, <italic>SMOX</italic>, <italic>SLC34A1</italic>, <italic>ATP10A</italic>, and <italic>SLC12A8</italic> membrane</td>
</tr>
<tr>
<td align="left">Receptor</td>
<td align="left">
<italic>TSPAN9</italic>, <italic>GPRC5A</italic>, <italic>OXT</italic>, <italic>ANXA10</italic>, <italic>ARTN</italic>, <italic>IL37</italic>, <italic>GNG11</italic>, <italic>EPB41L4A</italic>, <italic>OR11HGU1</italic>, <italic>GYAMC</italic>, <italic>UFCAM3</italic>, <italic>FKBP2</italic>, <italic>CCR4</italic>, <italic>OR10J5</italic>, <italic>OR1D2</italic>, <italic>TNFAIP2</italic>, <italic>ANGPTL5</italic>, <italic>TMEM207</italic>, <italic>TRBV6-5</italic>, <italic>TRAV16</italic>, <italic>OMP</italic>, and <italic>FBXW7-AS1</italic>
</td>
</tr>
<tr>
<td align="left">Kinase</td>
<td align="left">
<italic>PRKY</italic>, <italic>SERTAD2</italic>, and <italic>RN7SKP257</italic>
</td>
</tr>
<tr>
<td align="left">SH3BP1</td>
<td align="left">
<italic>SH3BP1</italic> and <italic>BCL2L1</italic>
</td>
</tr>
<tr>
<td align="left">Transcription factor</td>
<td align="left">
<italic>HOXA3</italic>, <italic>AGFG2</italic>, <italic>NKX3-1</italic>, <italic>MBD3L1</italic>, <italic>GBP6</italic>, <italic>AHNAK2</italic>, and <italic>ZNF680P1</italic>
</td>
</tr>
<tr>
<td align="left">Pseudogene</td>
<td align="left">
<italic>TPRXL</italic>, <italic>OR5AZ1P</italic>, <italic>TTTY2</italic>, <italic>GNL3LP1</italic>, <italic>HIGD1AP16</italic>, <italic>RNU2-37P</italic>, <italic>RN7SKP172</italic>, <italic>KRT18P27</italic>, <italic>C1DP3</italic>, <italic>USP21P1</italic>, <italic>ABCD1P4</italic>, <italic>LINC01529</italic>, <italic>LINC01209</italic>, <italic>LINC01433</italic>, <italic>FDPSP7</italic>, <italic>RPL4P2</italic>, <italic>DPYD-AS1</italic>, <italic>MTND6P24</italic>, <italic>LINC00892</italic>, <italic>RPS24P6</italic>, <italic>LINC01731</italic>, <italic>LINC01440</italic>, <italic>LINC00601</italic>, <italic>LINC00993</italic>, <italic>HS6ST2-AS1</italic>, <italic>MCCD1P1</italic>, <italic>YRDCP2</italic>, <italic>HIGD1AP15</italic>, <italic>NRBF2P3</italic>, <italic>RPS13P4</italic>, <italic>RN7SL589P</italic>, <italic>RN7SL573P</italic>, <italic>VPS26AP1</italic>, <italic>RN7SL454P</italic>, <italic>LINC02150</italic>, <italic>LINC02014</italic>, <italic>SALL4P1</italic>, <italic>AACSP1</italic>, <italic>IGHV3-52</italic>, <italic>LINC02700</italic>, <italic>MRGPRF-AS1</italic>, <italic>LBX2-AS1</italic>, <italic>LINC01580</italic>, <italic>LINC00524</italic>, <italic>NDUFB4P11</italic>, <italic>CYCSP2</italic>, <italic>TBC1D3P5</italic>, <italic>RDM1P2</italic>, <italic>RDM1P1</italic>, <italic>ACTBP9</italic>, <italic>NTF6A</italic>, <italic>OR7E16P</italic>, <italic>VN1R80P</italic>, <italic>IMMP1LP3</italic>, and <italic>RDM1P4</italic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Quality of the model designed to assess the likelihood of reaching the compound IC<sub>50</sub> data on lung cancer cell lines.</p>
</caption>
<graphic xlink:href="fbinf-03-1225149-g004.tif"/>
</fig>
<p>At the cell experiment emulation stage, we chose interactions with a probability of at least 0.9 from the data on the forecast. Molecules were selected that acted on the minimum number of lines with a probability higher than a given one, i.e., those with the highest specificity.</p>
<p>As a result, five small molecules were selected. The certain cell lines used for validation are &#x201c;A549,&#x201d; &#x201c;NCIH23,&#x201d; &#x201c;NCIH460,&#x201d; &#x201c;NCIH1299,&#x201d; &#x201c;HCT116,&#x201d; &#x201c;AMO1,&#x201d; &#x201c;PC3,&#x201d; and &#x201c;CAPAN1.&#x201d;</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>During the initial phase of target discovery, approximately 36 genes were meticulously chosen for further investigation. Interestingly, a few genes appeared to be associated with WNT signaling (<italic>DKK4, LEF1, WNT6</italic>, and <italic>WNT8B</italic>), with BMP (<italic>SMAD9</italic>), and TGF (<italic>ACVR2A, GDF6</italic>, and <italic>ZFYVE9</italic>). The study has found various components of the cytoskeleton and membrane proteins responsible for the transfer of various molecules. Potentially, each of these genes encodes proteins suitable for targeted lung cancer therapy.</p>
<p>Four of the genes found at the stage of tumor&#x2013;normal filter encode cytoskeletal proteins. Notably, the processes of tumor invasion and metastasis are often registered by the time of diagnosis in patients with lung cancer (<xref ref-type="bibr" rid="B8">Kuo et al., 2021</xref>). These features of the tumor are mediated by the developed cytoskeleton in the tumor cells. Thus, it is not surprising that there is a correlation between the expression of cytoskeletal proteins and a decrease in the overall survival of patients with lung cancer.</p>
<p>One of the key considerations in deciding to use artificial intelligence algorithms for drug discovery is the reliability of the results obtained through their utilization. Similar to mathematical modeling across various industries, there is a significant possibility of forecast results not being validated in reality. This arises from the following three primary factors: inadequacy of the model architecture, incorrect data representation, and data insufficiency. The deep learning model type has proven itself effective in the industry, enabling the capture of nonlinear relationships that aptly describe the subject area. The data representation developed within the framework of this study yielded high predictive quality, which was confirmed using standard cross-validation techniques. A factor contributing to further accuracy enhancement is the quantity of data, which is expected to accumulate with the proliferation of high-tech diagnostic methods and the prevalence of data management systems.</p>
<p>It should be noted that in this study, we proceeded from the assumption that all genes with mutations are targets. However, in practice, alterations in active signaling pathways often occur even when inhibiting the activity of a key gene. Undoubtedly, the obtained results require validation through laboratory methods.</p>
</sec>
<sec sec-type="conclusion" id="s5">
<title>Conclusion</title>
<p>The pipeline of methods presented in this paper can serve as the basis for the technology of automated AI-driven drug discovery. The application of modern methods of machine learning, in particular, deep learning, as well as ways to present initial data for learning algorithms, is demonstrated. The performance of the methods, confirmed by cross-validation approaches on known results, was demonstrated using data from open sources. Ways to improve the methodology are the use of more data, including proprietary, as well as a more detailed representation of the original knowledge, in particular&#x2014;three-dimensional modeling of interacting molecules.</p>
<p>Natural language processing technologies used in this work have shown effectiveness for processing tens of thousands of articles. They can also be similarly used to compile own databases of scientific publications.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. These data can be found at: <ext-link ext-link-type="uri" xlink:href="https://gdc.cancer.gov">https://gdc.cancer.gov</ext-link>.</p>
</sec>
<sec id="s7">
<title>Author contributions</title>
<p>DC: idea, algorithm development, and data formatting. VM: idea correction and discussion, data preparation, and physical sense control. IS: correction and management. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of interest</title>
<p>DC: BioAlg Corp., shareholder. Author VM was employed by BioAlg Corp.</p>
<p>The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Asgari</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Mofrad</surname>
<given-names>M. R. K.</given-names>
</name>
</person-group> (<year>2015</year>) <article-title>Continuous distributed representation of biological sequences for deep proteomics and genomics</article-title>. <source>PLoS One</source>, <volume>10</volume>(<issue>11</issue>), <fpage>e0141287</fpage>, <pub-id pub-id-type="doi">10.1371/journal.pone.0141287</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barretina</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Caponigro</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Stransky</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Venkatesan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Margolin</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity</article-title>. <source>Nature</source> <volume>483</volume> (<issue>7391</issue>), <fpage>603</fpage>&#x2013;<lpage>607</lpage>. <pub-id pub-id-type="doi">10.1038/nature11003</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Chebanov</surname>
<given-names>D. K.</given-names>
</name>
<name>
<surname>Tatevosova</surname>
<given-names>N. S.</given-names>
</name>
<name>
<surname>Mikhailova</surname>
<given-names>I. N.</given-names>
</name>
</person-group> (<year>2021</year>).<article-title>Machine learning for predicting overall survival using whole exome DNA and gene expression data and analyzing the significance of features</article-title>, <conf-name>Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging</conf-name>, <conf-date>2021 Jan 13-14</conf-date>, <conf-loc>China</conf-loc>, <volume>27</volume>. <publisher-name>Philadelphia (PA): AACR; Clin. Cancer. Res</publisher-name>.</citation>
</ref>
<ref id="B4">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Devlin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>M.-W.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Toutanova</surname>
<given-names>Kr.</given-names>
</name>
</person-group> (<year>2019</year>).<article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>, <conf-name>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</conf-name>, <conf-date>June 2-7, 2019</conf-date>, <conf-loc>USA</conf-loc>, <volume>Vol. 1</volume>. <publisher-name>IEEE</publisher-name>, <fpage>4171</fpage>&#x2013;<lpage>4186</lpage>. <comment>(Long and Short Papers)</comment>.</citation>
</ref>
<ref id="B5">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Goodfellow</surname>
<given-names>I. J.</given-names>
</name>
<name>
<surname>Pouget-Abadie</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mirza</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Warde-Farley</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ozair</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <source>Generative adversarial networks</source>. <publisher-loc>USA</publisher-loc>: <publisher-name>arXiv</publisher-name>, <fpage>1406.2661</fpage>.</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jaeger</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Fulle</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Turk</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Mol2vec: unsupervised machine learning approach with chemical intuition</article-title>. <source>J. Chem. Inf. Model.</source> <volume>58</volume> (<issue>1</issue>), <fpage>27</fpage>&#x2013;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jcim.7b00616</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>S.KEGG</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>KEGG: kyoto Encyclopedia of genes and genomes</article-title>. <source>Nucleic Acids Res.</source> <volume>28</volume> (<issue>1</issue>), <fpage>27</fpage>&#x2013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1093/nar/28.1.27</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kuo</surname>
<given-names>C.-H.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>M.-W.</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>Y.-W.</given-names>
</name>
<name>
<surname>Chou</surname>
<given-names>H.-C.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>L.-H.</given-names>
</name>
<name>
<surname>Law</surname>
<given-names>C.-H.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Biomarker discovery in highly invasive lung cancer cell through proteomics approaches</article-title>. <source>Cell. Biochem. Funct.</source> <volume>39</volume> (<issue>3</issue>), <fpage>367</fpage>&#x2013;<lpage>379</lpage>. <pub-id pub-id-type="doi">10.1002/cbf.3599</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Improving compound&#x2013;protein interaction prediction by building up highly credible negative samples</article-title>. <source>Bioinformatics</source> <volume>31</volume> (<issue>12</issue>), <fpage>I221</fpage>&#x2013;<lpage>i229</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv256</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martin</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Orchard</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Magrane</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Agivetova</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Ahmad</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>UniProt: the universal protein knowledgebase in 2021</article-title>. <source>Nucleic Acids Res.</source> <volume>49</volume> (<issue>D1</issue>), <fpage>D480</fpage>&#x2013;<lpage>D489</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkaa1100</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mikolov</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Corrado</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Dean</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Efficient estimation of word representations in vector space</article-title>. <source>arXiv</source>, <fpage>1301.3781</fpage>. <pub-id pub-id-type="doi">10.48550/arXiv.1301.3781</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Yoon</surname>
<given-names>B. H.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>S. K.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>S. Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>GENT2: an updated gene expression database for normal and tumor tissues</article-title>. <source>BMC Med. Genomics</source> <volume>12</volume> (<issue>5</issue>), <fpage>101</fpage>. <pub-id pub-id-type="doi">10.1186/s12920-019-0514-7</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Patki</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Wedge</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Veeramachaneni</surname>
<given-names>K.</given-names>
</name>
</person-group> <article-title>The synthetic data vault</article-title>, <year>2016</year> <source>IEEE Int. Conf. Data Sci. Adv. Anal. (DSAA)</source>, <fpage>399</fpage>&#x2013;<lpage>410</lpage>. <pub-id pub-id-type="doi">10.1109/DSAA.2016.49</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="web">
<collab>RDKit</collab> <article-title>Open-source cheminformatics</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.rdkit.org">http://www.rdkit.org</ext-link>.</comment>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Szklarczyk</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>von Mering</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Jensen</surname>
<given-names>L. J.</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Kuhn</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data</article-title>. <source>Nucleic Acids Res.</source> <volume>44</volume> (<issue>D1</issue>), <fpage>D380</fpage>&#x2013;<lpage>D384</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv1277</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>Z.-H.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>S.-X.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>X.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>A computational-based method for predicting drug&#x2013;target interactions by using stacked autoencoder deep neural network</article-title>. <source>J. Comput. Biol.</source> <volume>25</volume> (<issue>3</issue>), <fpage>361</fpage>&#x2013;<lpage>373</lpage>. <pub-id pub-id-type="doi">10.1089/cmb.2017.0135</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y. B.</given-names>
</name>
<name>
<surname>You</surname>
<given-names>Z. H.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yi</surname>
<given-names>H.-C.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.-H.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>K. A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network</article-title>. <source>BMC Med. Inf. Decis. Mak.</source> <volume>20</volume> (<issue>2</issue>), <fpage>49</fpage>. <pub-id pub-id-type="doi">10.1186/s12911-020-1052-0</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<collab>Cancer Genome Atlas Research Network, </collab>
<person-group person-group-type="author">
<name>
<surname>Weinstein</surname>
<given-names>J. N.</given-names>
</name>
<name>
<surname>Collisson</surname>
<given-names>E. A.</given-names>
</name>
<name>
<surname>Mills</surname>
<given-names>G. B.</given-names>
</name>
<name>
<surname>Shaw</surname>
<given-names>K. R.</given-names>
</name>
<name>
<surname>Ozenberger</surname>
<given-names>B. A.</given-names>
</name>
<name>
<surname>Ellrott</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>The cancer genome atlas pan-cancer analysis project</article-title>. <source>Nat. Genet.</source> <volume>45</volume> (<issue>10</issue>), <fpage>1113</fpage>&#x2013;<lpage>1120</lpage>. <pub-id pub-id-type="doi">10.1038/ng.2764</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wishart</surname>
<given-names>D. S.</given-names>
</name>
<name>
<surname>Feunang</surname>
<given-names>Y. D.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Lo</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Marcu</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Grant</surname>
<given-names>J. R.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>DrugBank 5.0: a major update to the DrugBank database for 2018</article-title>. <source>Nucleic Acids Res.</source> <volume>8</volume>. <pub-id pub-id-type="doi">10.1093/nar/gkx1037</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Skoularidou</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cuesta-Infante</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Veeramachaneni</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Modeling tabular data using conditional GAN</source>. <publisher-name>NeurIPS</publisher-name>.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Soares</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Greninger</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Edelman</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Lightfoot</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Forbes</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells</article-title>. <source>Nucleic Acids Res.</source> <volume>41</volume>, <fpage>D955</fpage>&#x2013;<lpage>D961</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gks1111</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>