<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">729834</article-id>
<article-id pub-id-type="doi">10.3389/frai.2021.729834</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk</article-title>
<alt-title alt-title-type="left-running-head">Wu et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">BERT-Based Modeling for DILI</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Wu</surname>
<given-names>Yue</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/437432/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Liu</surname>
<given-names>Zhichao</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/293118/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wu</surname>
<given-names>Leihong</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/595960/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Chen</surname>
<given-names>Minjun</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/304873/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Tong</surname>
<given-names>Weida</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/39650/overview"/>
</contrib>
</contrib-group>
<aff>Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, United&#x20;States Food and Drug Administration, <addr-line>Jefferson</addr-line>, <addr-line>AR</addr-line>, <country>United&#x20;States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/569637/overview">Ruchir Shah</ext-link>, Sciome LLC, United&#x20;States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1028642/overview">Arpit Tandon</ext-link>, Sciome LLC, United&#x20;States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1483217/overview">Adyasha Maharana</ext-link>, University of North Carolina at Chapel Hill, United&#x20;States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Weida Tong, <email>Weida.Tong@fda.hhs.gov</email>; Minjun Chen, <email>Minjun.Chen@fda.hhs.gov</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Medicine and Public Health, a section of the journal Frontiers in Artificial Intelligence</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>06</day>
<month>12</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>4</volume>
<elocation-id>729834</elocation-id>
<history>
<date date-type="received">
<day>23</day>
<month>06</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>11</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Wu, Liu, Wu, Chen and Tong.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Wu, Liu, Wu, Chen and Tong</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>
<bold>Background &#x26; Aims:</bold> The United&#x20;States Food and Drug Administration (FDA) regulates a broad range of consumer products, which account for about 25% of the United&#x20;States market. The FDA regulatory activities often involve producing and reading of a large number of documents, which is time consuming and labor intensive. To support regulatory science at FDA, we evaluated artificial intelligence (AI)-based natural language processing (NLP) of regulatory documents for text classification and compared deep learning-based models with a conventional keywords-based&#x20;model.</p>
<p>
<bold>Methods:</bold> FDA drug labeling documents were used as a representative regulatory data source to classify drug-induced liver injury (DILI) risk by employing the state-of-the-art language model BERT. The resulting NLP-DILI classification model was statistically validated with both internal and external validation procedures and applied to the labeling data from the European Medicines Agency (EMA) for cross-agency application.</p>
<p>
<bold>Results:</bold> The NLP-DILI model developed using FDA labeling documents and evaluated by cross-validations in this study showed remarkable performance in DILI classification with a recall of 1 and a precision of 0.78. When cross-agency data were used to validate the model, the performance remained comparable, demonstrating that the model was portable across agencies. Results also suggested that the model was able to capture the semantic meanings of sentences in drug labeling.</p>
<p>
<bold>Conclusion:</bold> Deep learning-based NLP models performed well in DILI classification of drug labeling documents and learned the meanings of complex text in drug labeling. This proof-of-concept work demonstrated that using AI technologies to assist regulatory activities is a promising approach to modernize and advance regulatory science.</p>
</abstract>
<kwd-group>
<kwd>regulatory science</kwd>
<kwd>drug labeling</kwd>
<kwd>natural language processing</kwd>
<kwd>BERT</kwd>
<kwd>drug induced liver injury</kwd>
<kwd>United&#x20;States Food and Drug Administration</kwd>
<kwd>European medicines agency</kwd>
<kwd>named entity recognition</kwd>
</kwd-group>
<contract-num rid="cn001">E0767701</contract-num>
<contract-sponsor id="cn001">U.S. Food and Drug Administration<named-content content-type="fundref-id">10.13039/100000038</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>The United&#x20;States FDA regulates consumer products including foods, medications and tobacco, which account for about 25% of the United&#x20;States market (<xref ref-type="bibr" rid="B24">US Food and Drug Administration, 2011a</xref>). The core responsibility of FDA is to ensure safe and effective products, while at the same time promote innovation to produce products of better quality (<xref ref-type="bibr" rid="B25">US Food and Drug Administration, 2010</xref>). Therefore, FDA must be equipped with the best available tools and methods to facilitate pre-market evaluation and post-market surveillance, which requires a strong field of regulatory science to develop standards and approaches that assess FDA-regulated products with reliable efficiency and consistency (<xref ref-type="bibr" rid="B24">US Food and Drug Administration, 2011a</xref>; <xref ref-type="bibr" rid="B15">Hamburg, 2011</xref>).</p>
<p>Currently, science and technology are rapidly evolving in the field of healthcare, introducing more complexity to the development and manufacture of new drugs, biologics and medical devices. Artificial intelligence (AI), especially, is a fast-growing area and has shown great potential in addressing the unmet medical and public health needs (<xref ref-type="bibr" rid="B34">Yu et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B4">Basile et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B5">Chan et&#x20;al., 2019</xref>). A long-lasting challenge for FDA is to efficiently retrieve needed information from a huge number of documents received and regularly generated, such as approval documents, guidance, policies and meeting minutes. A significant amount of time must be spent on manually reading and searching information of interest, besides product evaluation and decision making. AI-based natural language processing (NLP) is a promising approach of speeding up this time-consuming and labor-intensive process.</p>
<p>In this study, we applied AI-based NLP to classify drug labeling documents as a proof-of-concept to demonstrate the utility of AI for regulatory applications. Drug labeling provides comprehensive summaries of medications as a reference for healthcare professionals in making prescribing decisions (<xref ref-type="bibr" rid="B29">Watson and Barash, 2009</xref>; <xref ref-type="bibr" rid="B18">McMahon and Preskorn, 2014</xref>). It is also an essential resource for FDA reviewers during drug evaluations, and the research community for pharmacovigilance and drug repositioning (<xref ref-type="bibr" rid="B8">Chen et&#x20;al., 2011</xref>, <xref ref-type="bibr" rid="B7">2016</xref>; <xref ref-type="bibr" rid="B16">Hoffman et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B13">Fang et&#x20;al., 2020</xref>). There are over 130,000 drug labeling documents in the repository, of which 47,000 are labeling for prescription drugs and biologics (<xref ref-type="bibr" rid="B13">Fang et&#x20;al., 2020</xref>). This represents large amounts of regulatory text data, making manually assessing all drug labeling documents prohibitory, if not impossible. Here, we developed an AI-based approach to classify drug-induced liver injury (DILI) risk indicated in drug labeling documents, which serves as a proxy to test the applicability of AI in facilitating text classification from regulatory documents.</p>
<p>Adverse drug reactions (ADRs) such as DILI are described in three sections, &#x201c;Adverse Reactions&#x201d;, &#x201c;Warnings and Precautions&#x201d; and &#x201c;Boxed Warning&#x201d;, in FDA drug labeling documents (<xref ref-type="bibr" rid="B26">US Food and Drug Administration, 2006</xref>; <xref ref-type="bibr" rid="B27">US Food and Drug Administration, 2011b</xref>). The &#x201c;Warnings and Precautions&#x201d; section contains the most comprehensive and complicated descriptions not limited to ADRs, but also includes other related aspects such as warnings to patients for signs and symptoms, clinical/laboratory monitoring plans and contraindications, for which sentences containing DILI-related terms do not necessarily suggest attributable DILI events (<xref ref-type="bibr" rid="B27">US Food and Drug Administration, 2011b</xref>). In contrast, the &#x201c;Boxed Warning&#x201d; section, specific to FDA labeling, contains concise highlights of the most serious ADRs from the &#x201c;Warnings and Precautions&#x201d; section (<xref ref-type="bibr" rid="B27">US Food and Drug Administration, 2011b</xref>), while the &#x201c;Adverse Reactions&#x201d; section more or less lists all possible ADRs (<xref ref-type="bibr" rid="B26">US Food and Drug Administration, 2006</xref>). The current manual classification approach largely relies on the use of pre-defined DILI terms to determine whether sentences in the three labeling sections indicate DILI (<xref ref-type="bibr" rid="B8">Chen et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B7">2016</xref>). Considering that the terms used in the drug labeling are not well normalized to the international standards such as Medical Dictionary for Regulatory Activities (MedDRA) and Systematized Nomenclature of Medicine (SNOMED) and the complexity of language used for describing ADRs, interpretation and judgement by experts with relevant knowledge and experience are necessary. We used an AI-based approach to address these issues in the current study, as language models can capture the semantic meanings of sentences in free text rather than simple string matching (<xref ref-type="bibr" rid="B19">Radford et&#x20;al., 2018</xref>). Specifically, the state-of-the-art language model, Bidirectional Encoder Representations from Transformers (BERT) (<xref ref-type="bibr" rid="B11">Devlin et&#x20;al., 2019</xref>), was trained for binary DILI classification of FDA-approved drug labeling documents and was externally validated using EMA-approved drug labeling documents. The deep learning-based model, hybrid deep learning-based model and keywords-based model developed in this study were compared for DILI risk classification on drug labeling documents.</p>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>Materials and Methods</title>
<sec id="s2-1">
<title>Data Sources for Drug Labeling</title>
<p>FDA drug labeling documents were retrieved from DailyMed (<ext-link ext-link-type="uri" xlink:href="http://www.dailymed.nlm.nih.gov">www.dailymed.nlm.nih.gov</ext-link>), a public database that contains up-to-date drug labeling approved by the FDA. Meanwhile, since the EMA issues standardized drug labeling for drugs approved through a centralized procedure, we used UK-marketed drugs as representatives of drugs authorized in Europe (<xref ref-type="bibr" rid="B12">European Medicines Agency, 2009</xref>). EMA drug labeling documents were collected from the EMC (<ext-link ext-link-type="uri" xlink:href="http://www.medicines.org.uk">www.medicines.org.uk</ext-link>), which maintains the EMA-approved drug labeling for drugs licensed in the United&#x20;Kingdom.</p>
</sec>
<sec id="s2-2">
<title>Drug Selection Criteria</title>
<p>We selected prescription drugs based on three criteria, i) with a single active ingredient, ii) either oral or injection use, and iii) in the categories of NDA, ANDA or BLA, by querying the FDALabel database (<ext-link ext-link-type="uri" xlink:href="https://nctr-crs.fda.gov/fdalabel/ui/search">https://nctr-crs.fda.gov/fdalabel/ui/search</ext-link>) which maintains over 130,000 drug labeling documents containing critical information pertinent to the safe and effective use of medications (<xref ref-type="bibr" rid="B13">Fang et&#x20;al., 2020</xref>). Over-the-counter drugs were removed because of their different labeling format and requirements compared to prescription drugs. The DILIrank dataset provides the DILI risk annotation for 1,036 drugs marketed in the United&#x20;States as of 2010 (<xref ref-type="bibr" rid="B7">Chen et&#x20;al., 2016</xref>). We retrieved the most recent drug labeling documents for the queried 750 representative prescription drugs from the DILIrank dataset. Among these drugs, 540 were also licensed in the United&#x20;Kingdom market. The corresponding EMA drug labeling documents were collected and assessed for DILI risk using the same classification schema described in previously studies (<xref ref-type="bibr" rid="B8">Chen et&#x20;al., 2011</xref>).</p>
</sec>
<sec id="s2-3">
<title>Datasets</title>
<p>We focused our analysis on the &#x201c;Warnings and Precautions&#x201d; section of FDA labeling documents, as the language for ADR descriptions in this section has the highest complexity compared with the other two sections (<xref ref-type="bibr" rid="B27">US Food and Drug Administration, 2011b</xref>). The corresponding section in the EMA labeling documents is the &#x201c;Special warnings and precautions for use&#x201d; section (<xref ref-type="bibr" rid="B12">European Medicines Agency, 2009</xref>). Texts were extracted from either the &#x201c;Warnings and Precautions&#x201d; section (FDA) or the &#x201c;Special warnings and precautions for use&#x201d; section (EMA), followed by formatting clearing and sentence tokenization (<xref ref-type="fig" rid="F1">Figures 1B</xref>,&#x20;<xref ref-type="fig" rid="F2">2</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Quorum flowchart describes the study design. <bold>(A)</bold> Drug labeling document classification models developed and compared in this study. <bold>(B)</bold> The study design of model training and evaluation using FDA labeling documents and model validation using EMA labeling documents.</p>
</caption>
<graphic xlink:href="frai-04-729834-g001.tif"/>
</fig>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Workflow for the training of sentence classification module and the development of final document classification&#x20;model.</p>
</caption>
<graphic xlink:href="frai-04-729834-g002.tif"/>
</fig>
<p>For model training on FDA labeling documents, the representative documents (N &#x3d; 750) were stratified split into 80% training document dataset (N &#x3d; 600) and 20% test document dataset (N &#x3d; 150). Unique sentences (N &#x3d; 29,252) were extracted from the training document dataset, among which DILI-positive (N &#x3d; 540) or DILI-negative sentences (N &#x3d; 28,712) were determined independently by two experts. All disagreements were resolved by discussion. To generate data with more balanced class labels, intermediate datasets were created to facilitate filtering of context prior to sentence classification, via Named Entity Recognition (NER). The unique sentences (<italic>N</italic>&#x20;&#x3d; 29,252) from training documents were annotated using the Inside-Outside-Beginning (IOB) style. The annotated sentences were randomly split into 80% training sentence dataset (N &#x3d; 23,041) and 20% development sentence dataset (N &#x3d; 5,851) for NER model training (<xref ref-type="fig" rid="F1">Figure&#x20;1B</xref>). Sentences with tokens related to a liver context, 540&#x20;DILI-positive and 1,313&#x20;DILI-negative, were selected as liver-related sentences. To simplify the comparison between models, human validated liver-related sentences from the annotated sentences (<italic>N</italic>&#x20;&#x3d; 29,252) were used for developing the sentence classification module. Test document dataset was used to evaluated developed models, and cross-agency data, i.e.,&#x20;EMA labeling documents for drugs not included in the FDA training data, was used for external validation (<xref ref-type="fig" rid="F1">Figure&#x20;1B</xref>).</p>
<p>Examples are given here to illustrate the datasets created for model training. Dataset for context classification included liver-related sentences such as &#x201c;Hepatic toxicity including hepatic failure resulting in transplantation or death have been reported&#x201d; and &#x201c;Rozerem should not be used by patients with severe hepatic impairment&#x201d; and sentences irrelevant to liver including &#x201c;Treat all infections due to Group A beta-hemolytic streptococci for at least 10&#xa0;days&#x201d;. The first two liver-related sentences were used for developing sentence classification models. The first sentence was considered as DILI-positive, while the second sentence is for contraindication information and thus considered as DILI-negative.</p>
<p>To further examine the portability of BERT-based models across agencies, we also developed models using EMA labeling documents as training data and validated the models using FDA labeling documents (<xref ref-type="sec" rid="s12">Supplementary Figure&#x20;1</xref>). EMA labeling documents (N &#x3d; 540) were stratified split into 80% training document dataset (N &#x3d; 431) and 20% test document dataset (N &#x3d; 109). Unique sentences (N &#x3d; 14,915) were extracted from the training document dataset, including 232&#x20;DILI-positive and 14,683&#x20;DILI-negative sentences. Similarly, intermediate datasets were created to facilitate filtering of context prior to sentence classification, via NER. The unique sentences (<italic>N</italic>&#x20;&#x3d; 29,252) from training documents were annotated using the IOB style, and randomly split into 80% training sentence dataset (<italic>N</italic>&#x20;&#x3d; 11,931) and 20% development sentence dataset (<italic>N</italic>&#x20;&#x3d; 2,984) for NER model training (<xref ref-type="sec" rid="s12">Supplementary Figure&#x20;1</xref>). Sentences with tokens related to a liver context, 232&#x20;DILI-positive and 927&#x20;DILI-negative, were selected as liver-related sentences. Human validated liver-related sentences from the annotated sentences (<italic>N</italic>&#x20;&#x3d; 14,915) were used for developing the sentence classification module. EMA test document dataset was used to evaluated developed models, and FDA labeling documents for drugs not included in the EMA training data, was used for external validation.</p>
</sec>
<sec id="s2-4">
<title>Models for Document Classification</title>
<p>In this study, deep learning-based (BERT for DILI classification), hybrid deep learning-based and keywords-based models were developed for classifying drug labeling documents based on whether they contain any sentence suggesting DILI risk (<xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>).</p>
<p>The deep learning-based and hybrid deep learning-based document classification models consisted of two working modules, a context classification module and a BERT sentence classification module (<xref ref-type="fig" rid="F1">Figures 1A</xref>, <xref ref-type="fig" rid="F2">2</xref>). These two models shared the same BERT sentence classification module but differed in the context classification module. For each input document, each sentence was passed into the two working modules sequentially (<xref ref-type="fig" rid="F2">Figure&#x20;2</xref>). The first step was to determine whether the current sentence was related to the liver topic at the context classification module. If not, this sentence was DILI-negative. If yes, this sentence was then passed to the BERT sentence classification module to determine whether it was DILI-positive or DILI-negative. After evaluating all the sentences in the input document, an array of predicted sentence labels was generated. If any DILI-positive sentences were found in the input document, the document was considered DILI-positive, otherwise as DILI-negative.</p>
<p>A keywords-based document classification model was also developed as a comparison to the deep learning-based and hybrid deep learning-based models (<xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>). Keywords for detecting DILI risk in the drug labeling were collected from three previous studies (<xref ref-type="bibr" rid="B8">Chen et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B10">Demner-Fushman et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B23">Suzuki et&#x20;al., 2015</xref>) (<xref ref-type="sec" rid="s12">Supplementary Table&#x20;1</xref>). Chen et&#x20;al. summarized a list of DILI keywords for text-mining (via human reading) in the drug labeling, while Suzuki et&#x20;al. selected a list of MedDRA PT terms for hepatocellular and cholestatic liver injury for text-mining in the WHO VigiBase&#x2122;. These two lists covered most of the DILI terms, but the keywords commonly had multiple imperfect matches in the drug labeling documents. Thus, these keywords could not be used directly for computerized text-mining in the drug labeling documents. Demner-Fushman et&#x20;al. normalized the ADR terms in 200 drug labeling documents to MedDRA Preferred Terms (PTs). By using the matching data in the Demner-Fushman et&#x20;al. study, we generated a keyword list that covers DILI (<xref ref-type="bibr" rid="B8">Chen et&#x20;al., 2011</xref>), liver injury (<xref ref-type="bibr" rid="B23">Suzuki et&#x20;al., 2015</xref>) and hepatic ADRs (<xref ref-type="bibr" rid="B10">Demner-Fushman et&#x20;al., 2018</xref>) terms used in drug labeling. The FDA and EMA test document sets were used to evaluate the performance of keywords-based document classification.</p>
</sec>
<sec id="s2-5">
<title>Development of the Context Classification Modules</title>
<p>Two types of context classification modules were created in this study. The first one is a string pattern matching-based context filter. The other one is an NER-based context classification&#x20;model.</p>
<p>For the hybrid deep learning-based model, general string patterns were used to match sentences with any possible relation to liver, including indications, contraindications, ADRs, clinical monitoring, immune disorders, etc. (<xref ref-type="sec" rid="s12">Supplementary Table&#x20;2</xref>). Most DILI-negative sentences irrelevant to liver were filtered out by applying such pre-defined context, yielding relatively balanced sentence datasets without losing any DILI-positive sentences (<xref ref-type="table" rid="T1">Table&#x20;1</xref>).</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Sentence count with or without pre-defined liver-related context.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th colspan="2" align="center">Without pre-defined context</th>
<th colspan="2" align="center">In context of liver (string-filter)</th>
<th colspan="2" align="center">In context of liver (BERT for NER)</th>
</tr>
<tr>
<th align="left"/>
<th align="center">FDA</th>
<th align="center">EMA</th>
<th align="center">FDA</th>
<th align="center">EMA</th>
<th align="center">FDA</th>
<th align="center">EMA</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">DILI positive sentences</td>
<td align="center">540</td>
<td align="center">232</td>
<td align="center">540</td>
<td align="center">232</td>
<td align="center">540</td>
<td align="center">232</td>
</tr>
<tr>
<td align="left">DILI negative sentences</td>
<td align="center">28,712</td>
<td align="center">14,915</td>
<td align="center">961</td>
<td align="center">764</td>
<td align="center">1,313</td>
<td align="center">927</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Meanwhile, a BERT-based NER model was developed as the context classification module in the deep learning-based model. The NER model was developed by using training sentence dataset and evaluated on development sentence dataset at each epoch of training. The hyperparameters used for model training are listed in <xref ref-type="sec" rid="s12">Supplementary Table&#x20;3</xref>. This BERT-based context classification module was then evaluated by performing context classification on sentences extracted from test documents and cross-agency validation documents.</p>
</sec>
<sec id="s2-6">
<title>Development of the BERT-Based Sentence Classification Module</title>
<p>The liver-related sentences selected from training sentence dataset were used for developing a BERT (base, uncased) model for binary DILI classification as the sentence classification module, while the liver-related sentences selected from development sentence dataset were used to evaluate the performance of the BERT-based sentence classification module. The hyperparameters used for model training are listed in <xref ref-type="sec" rid="s12">Supplementary Table&#x20;4</xref>. The sentence classification module was evaluated using shuffled five-fold cross-validations on the liver-related sentences for 100&#x20;times (<xref ref-type="sec" rid="s12">Supplementary Figure&#x20;2</xref>). In comparison to developing a context-dependent sentence classification model, we also trained a sentence classification model using imbalance sentence datasets extracted from training documents. To address the dataset imbalance issue, we applied an oversampling method, i.e.,&#x20;randomly sampling based on class weights.</p>
<p>Permutation analysis was conducted to determine whether the models developed in this study perform at chance (<xref ref-type="bibr" rid="B6">Chen et&#x20;al., 2013</xref>). Permutated datasets were generated by 100&#x20;times of resampling the liver-related training and test sentence datasets with randomly shuffled DILI classification labels (positive or negative). The performance of the resulting 100 models was compared with that from 100 repetitions of cross-validations with random sampling (<xref ref-type="sec" rid="s12">Supplementary Figure&#x20;2</xref>). A two-sided <italic>t</italic>-test was used determine the statistical significance of the difference between the accuracy scores obtained from permutated data and original&#x20;data.</p>
<p>Shapley Additive Explanations (SHAP) values (<xref ref-type="bibr" rid="B17">Lundberg and Lee, 2017</xref>) were used to quantify the contribution of each token to the prediction made by the model. Higher feature values (red) push the model prediction towards DILI-positive, while lower features (blue) values push the model prediction towards DILI-negative.</p>
</sec>
<sec id="s2-7">
<title>Implementation</title>
<p>The embedding layer and 12-layer encoder from BERT were adopted and connected with a dense layer for token or sentence classification. The deep learning-based model combines NER (token classification) and sentence classification modules. A document is broken down into sentences <italic>s</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, s</italic>
<sub>
<italic>2</italic>
</sub>&#x2026;<italic>s</italic>
<sub>
<italic>i</italic>
</sub>. All sentences are passed into the NER module, where tokens [<italic>t</italic>
<sub>
<italic>11</italic>
</sub>
<italic>, t</italic>
<sub>
<italic>12</italic>
</sub>&#x2026;<italic>t</italic>
<sub>
<italic>1j</italic>
</sub>], [<italic>t</italic>
<sub>
<italic>21</italic>
</sub>
<italic>, t</italic>
<sub>
<italic>22</italic>
</sub>&#x2026;<italic>t</italic>
<sub>
<italic>2j</italic>
</sub>]&#x2026;[<italic>t</italic>
<sub>
<italic>i1</italic>
</sub>
<italic>, t</italic>
<sub>
<italic>i2</italic>
</sub>&#x2026;<italic>t</italic>
<sub>
<italic>ij</italic>
</sub>] are classified. If none of the tokens is associated with &#x201c;Liver&#x201d; (with (<italic>argmax(t</italic>
<sub>
<italic>i1</italic>
</sub>
<italic>)</italic> &#x3d; <italic>y</italic>) &#x7c; (<italic>argmax(t</italic>
<sub>
<italic>i2</italic>
</sub>
<italic>)</italic> &#x3d; <italic>y</italic>) &#x7c; &#x2026; &#x7c; (<italic>argmax(t</italic>
<sub>
<italic>ij</italic>
</sub>
<italic>)</italic> &#x3d; <italic>y</italic>) being False for any sentence <italic>s</italic>
<sub>
<italic>i</italic>
</sub> in a given document, where y equals the value of &#x201c;Liver&#x201d; tag.), then document label is returned as 0 (DILI negative). Otherwise, all selected liver related sentences are passed into sentence classification module. Document label is returned as 0 if none of the liver-related sentences is DILI positive ((<italic>&#x2211;</italic>
<sub>
<italic>i</italic>
</sub> <italic>argmax(s</italic>
<sub>
<italic>i</italic>
</sub>
<italic>)</italic> &#x3d; 0), else returned as 1 (DILI negative).</p>
</sec>
<sec id="s2-8">
<title>Evaluation Metrics</title>
<p>The NER-based context classification was evaluated at two levels. Recall, precision, and f1-score were reported at token level. Context classification at sentence level was evaluated by recall and precision. The BERT-based binary sentence classification was evaluated using accuracy, recall and precision. The test documents were used to assess the performance of the deep learning-based and hybrid deep learning-based models on document classification. Matthews correlation coefficient (MCC), recall and precision were used to evaluate the quality of binary DILI classification predicted by the models.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec id="s3-1">
<title>Development of the Deep Learning-Based Model for DILI Classification of Labeling Documents</title>
<p>The developed deep learning-based model had a BERT-based NER model as the context classification module and a BERT-based sentence classification module (<xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>). FDA test documents were used to evaluate the performance of the NER-based context classification module in selecting liver-related sentences. At token level, the context classification module showed excellent performance in recognizing liver-related words, with an F1 score of 0.98&#x20;&#xb1; 0.003, recall of 0.99&#x20;&#xb1; 0.002 and precision of 0.98&#x20;&#xb1; 0.008. When evaluated at sentence level, it had great sensitivity (0.99) as it was able to extract 431 of 435&#x20;liver-related sentences from the test documents (<xref ref-type="fig" rid="F3">Figure&#x20;3A</xref>). The precision was 0.83 (0.83&#x20;&#xb1; 0.001 from cross-validations) due to that 88 false positives were generated. Considering the large number of non-liver sentences (N &#x3d; 8,763) in the test documents, the context classification module performed well in predicting non-liver sentences as the false positive rate was 1%. Further, the context classification module was externally validated using EMA test documents. It detected 334 of 341&#x20;liver-related sentences while 79 false positives were predicted from 6,115&#x20;non-liver sentences, which was comparable to the results obtained using FDA test documents (<xref ref-type="fig" rid="F3">Figure&#x20;3B</xref>).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Evaluation and validation of the BERT NER models for context classification. <bold>(A)</bold> Confusion matrix obtained from evaluation of the BERT-based context classification module using the FDA test documents. <bold>(B)</bold> Confusion matrix obtained from evaluation of the BERT-based context classification module using the EMA validation documents.</p>
</caption>
<graphic xlink:href="frai-04-729834-g003.tif"/>
</fig>
<p>The BERT-based sentence classification module is the same from the hybrid deep learning-based model, which was developed using liver-related sentences. This module showed an accuracy of 0.81&#x20;&#xb1; 0.02, recall of 0.82&#x20;&#xb1; 0.03 and precision of 0.82&#x20;&#xb1; 0.02. To confirm that the sentence classification module did not perform at chance, we conducted permutation tests. The sentence classification models trained on the permutated FDA training sentences exhibited a great decrease in average accuracy score, as compared to that obtained from cross-validations (0.56 versus 0.81, <italic>p</italic>&#x20;&#x3c; 0.0001) (<xref ref-type="sec" rid="s12">Supplementary Figure&#x20;2</xref>). These results suggested that the observed accuracy scores of the sentence classification models were unlikely to be obtained by chance.</p>
<p>The performance of the deep learning-based model regarding document classification was evaluated using FDA test documents and externally validated using EMA validation documents (<xref ref-type="fig" rid="F1">Figure&#x20;1B</xref>). The deep learning-based model also showed excellent performance in DILI prediction on drug labeling documents with an MCC of 0.84 (<xref ref-type="table" rid="T2">Table&#x20;2</xref>). It could detect all 40 of the DILI-positive documents in the FDA test set (<xref ref-type="fig" rid="F4">Figure&#x20;4A</xref> and <xref ref-type="table" rid="T2">Table&#x20;2</xref>). Eleven false positives were found from a total of 110&#x20;DILI-negative documents, and thus the precision was 0.78. These results were consistent with that from model validation using cross-agency data (EMA validation documents), which had an MCC of 0.79, recall of 1 and precision of 0.71 (<xref ref-type="fig" rid="F4">Figure&#x20;4D</xref> and <xref ref-type="table" rid="T2">Table&#x20;2</xref>).</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Model evaluation and validation using cross-agency&#x20;data.</p>
</caption>
<table>
<tbody valign="top">
<tr>
<td colspan="4" align="left">Model evaluation using FDA test documents</td>
</tr>
<tr>
<td align="left">&#x2003;Document classification models</td>
<td align="center">Matthews correlation coefficient</td>
<td align="center">Recall</td>
<td align="center">Precision</td>
</tr>
<tr>
<td align="left">&#x2003;&#x2003;Deep learning-based model</td>
<td align="center">0.84</td>
<td align="center">1.00</td>
<td align="center">0.78</td>
</tr>
<tr>
<td align="left">&#x2003;&#x2003;Hybrid deep learning-based model</td>
<td align="center">0.87</td>
<td align="center">1.00</td>
<td align="center">0.82</td>
</tr>
<tr>
<td align="left">&#x2003;&#x2003;Keywords-based model</td>
<td align="center">0.60</td>
<td align="center">0.90</td>
<td align="center">0.58</td>
</tr>
<tr>
<td colspan="4" align="left">Model validation using cross-agency data (EMA test documents)</td>
</tr>
<tr>
<td align="left">&#x2003;Document classification models</td>
<td align="center">Matthews correlation coefficient</td>
<td align="center">Recall</td>
<td align="center">Precision</td>
</tr>
<tr>
<td align="left">&#x2003;&#x2003;Deep learning-based model</td>
<td align="center">0.79</td>
<td align="center">1.00</td>
<td align="center">0.71</td>
</tr>
<tr>
<td align="left">&#x2003;&#x2003;Hybrid deep learning-based model</td>
<td align="center">0.84</td>
<td align="center">1.00</td>
<td align="center">0.77</td>
</tr>
<tr>
<td align="left">&#x2003;&#x2003;Keywords-based model</td>
<td align="center">0.61</td>
<td align="center">0.96</td>
<td align="center">0.55</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Evaluation and validation of the document classification models. <bold>(A)</bold> Confusion matrix obtained from evaluation of the AI model using FDA test documents. <bold>(B)</bold> Confusion matrix obtained from evaluation of the hybrid deep learning-based model using FDA test documents. <bold>(C)</bold> Confusion matrix obtained from evaluation of the keywords-based model using FDA test documents. <bold>(D)</bold> Confusion matrix obtained from evaluation of the AI model using EMA validation documents. <bold>(E)</bold> Confusion matrix obtained from evaluation of the hybrid deep learning-based model using EMA validation documents. <bold>(F)</bold> Confusion matrix obtained from evaluation of the keywords-based model using EMA validation documents.</p>
</caption>
<graphic xlink:href="frai-04-729834-g004.tif"/>
</fig>
<p>In comparison with models trained on liver-related sentences, we also developed sentence classification models using all sentences from the training documents, which were extremely imbalanced between DILI-positive and negative labels. We observed decreased recall (0.75&#x20;&#xb1; 0.08) and precision (0.76&#x20;&#xb1; 0.04) as compared to models developed using liver-related sentences. When oversampling was conducted by randomly sampling according to class weights, recall was increased to 0.80&#x20;&#xb1; 0.04 while precision dropped significantly to 0.68&#x20;&#xb1; 0.04. None of these models outperformed the deep learning-based model with NER-based intermediate module at sentence level. When evaluated at document level, the sentence classification model trained on all sentences predicted more false negative FDA documents (<italic>N</italic>&#x20;&#x3d; 4), causing decreased recall (0.90). Interestingly, precision (0.86) was higher than that obtained from the deep learning-based model, as less false positive documents were obtained (<italic>N</italic>&#x20;&#x3d; 6). Similarly, decreased recall (0.89) and increased precision (0.82) were observed when EMA documents were used as external validation data. Higher recall is preferred for the investigated topic in this study, i.e.,&#x20;ADR detection in drug labeling documents, because false positive documents are much easier to be detected during the phase of result interpretation or model validation, as compared to false negative documents.</p>
</sec>
<sec id="s3-2">
<title>Development of the Hybrid Deep Learning-Based Model for DILI Classification of Labeling Documents</title>
<p>The developed hybrid deep learning-based model had a string filter-based context classification module followed by a BERT-based sentence classification module (<xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>). After context filtering of sentences from the &#x201c;Warnings and Precautions&#x201d; section of FDA training documents, 1,501 unique liver-related sentences were collected, of which 540 were DILI-positive while 961 were DILI-negative (<xref ref-type="table" rid="T1">Table&#x20;1</xref>). This sentence dataset was used for training the BERT-based sentence classification module. The developed sentence classification module reached high performance regarding DILI classification with accuracy scores of 0.81&#x20;&#xb1; 0.02 obtained from 100 repetitions of five-fold cross-validations.</p>
<p>The performance of this hybrid deep learning-based model regarding document classification was evaluated using FDA test documents and externally validated using EMA validation documents (<xref ref-type="fig" rid="F1">Figures 1B</xref>, <xref ref-type="fig" rid="F2">2</xref>). The hybrid deep learning-based model achieved excellent performance in DILI prediction on drug labeling documents with an MCC of 0.87 (<xref ref-type="table" rid="T2">Table&#x20;2</xref>). It had a high recall of 1, as it could detect all 40 of the DILI-positive documents in the FDA test set (<xref ref-type="fig" rid="F4">Figure&#x20;4B</xref> and <xref ref-type="table" rid="T2">Table&#x20;2</xref>). Nine false positives were found which resulted in a precision of 0.82. These results were corroborated with that from model validation using cross-agency data (EMA test documents). The hybrid deep learning-based model had a consistent MCC of 0.84, recall of 1 and precision of 0.77 when predicting on the EMA validation documents (<xref ref-type="fig" rid="F4">Figure&#x20;4E</xref> and <xref ref-type="table" rid="T2">Table&#x20;2</xref>).</p>
<p>Interestingly, we observed subtle differences between the deep learning-based and hybrid deep learning-based models in prediction DILI risk. The hybrid deep learning-based model was better at distinguishing liver injury statements in animal studies from human liver injury statements. Also, hepatosplenic T-cell lymphomas due to immunosuppressive treatment could confuse the deep learning-based model rather than the hybrid deep learning-based model. In contrast, the deep learning-based model performed better in detecting term variants/abbreviations, such as SGOT/AST for aspartate aminotransferase and SGPT/ALT for alanine aminotransferase. Although limited in number, the examples from the current data could provide some insight for future research.</p>
</sec>
<sec id="s3-3">
<title>Comparison of the Deep Learning-Based and Hybrid Deep Learning-Based Models With the Keyword-Based Model for DILI Classification of Labeling Documents</title>
<p>As a comparison to the deep learning-based and hybrid deep learning-based models, a keyword matching-based approach was also used to classify the FDA and EMA test documents. The keyword-based classification on FDA test documents showed a significantly lower MCC of 0.60, as compared to that from predictions made by the deep learning-based (0.84) and hybrid deep learning-based (0.87) models (<xref ref-type="table" rid="T2">Table&#x20;2</xref>). It produced a larger number of false positives (N &#x3d; 26), thus the precision (0.58) was remarkably lower than the deep learning-based (0.78) and hybrid deep learning-based (0.82) models (<xref ref-type="fig" rid="F4">Figure&#x20;4C</xref> and <xref ref-type="table" rid="T2">Table&#x20;2</xref>). Most of the false positives produced by keyword-based DILI classification, but not by the deep learning-based and hybrid deep learning-based models, were related to description of contraindications or precautions to special populations (e.g., patients with hepatic impairment) and hypersensitivity reactions (<xref ref-type="sec" rid="s12">Supplementary Table&#x20;5</xref>). Also, four false negatives were generated by the keywords-based document classification model, but none by deep learning-based and hybrid deep learning-based models. Corroborated with the DILI classification results obtained from the FDA test documents, the keywords-based DILI classification on the EMA validation documents also showed poor performance in controlling the number of false positives, which generated a low precision of 0.55 (<xref ref-type="fig" rid="F4">Figure&#x20;4F</xref> and <xref ref-type="table" rid="T2">Table&#x20;2</xref>). The MCC was calculated to be&#x20;0.61.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>In this study we used an AI-based NLP approach to classify drug labeling documents according to the DILI risk suggested in the text from the &#x201c;Warnings and Precautions&#x201d; section. The motivation of this investigation was to address two questions that are important to both regulatory application and drug safety research, i) whether AI-based NLP tools can be used to classify a drug&#x2019;s DILI potential specified in the drug labeling documents, and ii) whether an AI-based model developed using FDA labeling documents was portable to the documents in other regulatory agencies with comparable performance. Therefore, we developed BERT-based deep learning models for DILI classification, which were rigorously evaluated in this&#x20;study.</p>
<p>Our results showed that both the deep learning-based model and the hybrid deep learning-based model developed in this study had outstanding performance in predicting DILI risk encoded in the drug labeling documents, regardless of whether FDA labeling documents or EMA labeling documents were used for model training. This suggested that the deep learning-based models could capture the semantic meanings of sentences in the drug labeling documents, considering that the descriptions approved by the two agencies have some degree of difference in terms of language style and format. The contributions of word tokens to model predictions were explored to examine whether the model learned reasonable semantic meanings of the sentences in the drug labeling. SHAP values were used to quantify the contributions of each word token to the prediction made by the model. In the representative DILI-positive sentences (<xref ref-type="fig" rid="F5">Figures 5A,B</xref>), DILI-related words such as &#x201c;hepatic failure&#x201d;, &#x201c;hepatotoxicity&#x201d; and &#x201c;hepatitis&#x201d; showed positive contributions (red) and pushed the model prediction toward DILI-positive. In contrast, the word &#x201c;hepatitis&#x201d; did not have positive contributions when it was in the phrases &#x201c;chronic hepatitis B&#x201d; and &#x201c;chronic hepatitis C&#x201d;. Collectively, these results suggested that the developed NLP models could capture the semantic relationships between words in a given sentence.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Representative sentences showing contributions of word tokens to model predictions. <bold>(A)</bold> DILI-positive sentence due to fatal hepatic failure. <bold>(B)</bold> DILI-positive sentence due to hepatitis/hepatic failure. <bold>(C)</bold> DILI-negative sentence that provides indication information.</p>
</caption>
<graphic xlink:href="frai-04-729834-g005.tif"/>
</fig>
<p>Notably, the deep learning-based NLP models developed using FDA labeling documents could also be used by other agencies such as EMA without a notable decrease in performance. Furthermore, we also developed a deep learning-based model and a hybrid deep learning-based model using EMA labeling documents (<xref ref-type="sec" rid="s12">Supplementary Figure&#x20;1</xref>). The models trained on the EMA data showed comparable performance when evaluated using EMA test documents and the FDA validation documents (<xref ref-type="sec" rid="s12">Supplementary Table&#x20;6</xref>), which confirmed the portability of the deep learning-based NLP models across agencies. This demonstrated a promising potential of using AI technology to facilitate regulatory activities including drug evaluation and pharmacovigilance.</p>
<p>To best resemble our human reading-based approach and allow for an interpretable classification, we chose a sentence classification strategy over directly using whole documents as input. Briefly, we wanted our final model to be able to select liver-related sentences and determine whether they suggest DILI risk. The determination of DILI risk of a document was not based on quantitative measurement of the number of DILI-positive sentences, but rather dependent on detection of at least one DILI-positive sentence. In this regard, the document classification model is sensitive to false positives. Both the FDA and EMA models developed in this study had low false positive rates (6&#x2013;10%), suggesting that the models performed well in controlling false positives. Furthermore, the sentence classification strategy allowed us to easily track which sentences in a document were the basis for the document classification model to determine DILI potential. It also provided information regarding what type of sentences were ambiguous in DILI risk to the models. From a technical perspective, the current BERT pre-trained model has an input limit of 512 tokens. In order to process lengthy documents such as the &#x201c;Warnings and Precautions&#x201d; section containing hundreds to thousands of words, various solutions have been proposed, including i) text truncation and ii) text splitting combined with different pooling methods or Long Short-Term Memory networks (<xref ref-type="bibr" rid="B1">Adhikari et&#x20;al., 2019a</xref>, <xref ref-type="bibr" rid="B2">2019b</xref>; <xref ref-type="bibr" rid="B22">Sun et&#x20;al., 2020</xref>). Such more complex model structures do not fit better the classification criteria for this study and complicate the model interpretation, as compared to a sentence classification-based model structure. Therefore, we used a hierarchical model structure to predict DILI risk on each individual sentence in a given drug labeling document and output a document classification label based on the combined sentence classification results. Moreover, since not all sentences should contribute to the DILI prediction, we used a context filter as a gating mechanism to select liver-related sentence for DILI prediction, which is similar to aspect-based sentiment analysis (<xref ref-type="bibr" rid="B21">Sun et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B32">Xu et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B9">Choi et&#x20;al., 2020</xref>). The framework for creation of dataset and training of context classification model can be extended to other topics, e.g., cardiotoxicity, drug indication and drug-drug interactions. Outputs from context-classification can also be used for information retrieval pipelines.</p>
<p>Of note, sentence classification models trained on all sentences with skewed distributions did not have dramatically decreased performance than NER-sentence classification combined models. We observed 7 and 6% drop in recall and precision respectively at sentence level, and 10% decrease in recall but 8% increase in precision at document level. However, addition of an NER-based context classification module would be a better approach for the following reasons. First, all the BERT-based models developed in this study were designed to record sentences that were predicted as DILI-positive for human justification. Since the number of sentences suggesting adverse events is far less than that of sentences carrying no information of adverse events, it is much easier to find false positive documents as compared to false negative documents. Also, the false positive sentences collected from users could be used later for model improvement by further training or re-training. Therefore, higher recall is preferred. Second, inclusion of NER-based context classification module enables context-specific sentence classification, which is more flexible, especially in the case of classifying sentences belong to multiple contexts. For example, DILI can be associated with immune-mediated cutaneous ADRs such as Drug Reaction with Eosinophilia and Systemic symptoms, Stevens-Johnson syndrome and toxic epidermal necrolysis (<xref ref-type="bibr" rid="B3">Andrade et&#x20;al., 2019</xref>). Sentences containing information across different contexts could be ambiguous to multiclass sentence classification models for detecting different types of ADRs. If binary sentence classification models were developed for detecting each type of ADRs, large number of negative samples would be used for model training repeatedly, which is not an efficient design. Moreover, NER-based context classification module is versatile and can provide additional functionalities including facilitating information retrieval.</p>
<p>Previous efforts in data mining of drug labeling documents primarily relied on the use of specific ADR terms (<xref ref-type="bibr" rid="B8">Chen et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B10">Demner-Fushman et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B31">Wu et&#x20;al., 2019</xref>). International standards, MedDRA and SNOMED, have been used for searching ADR terms in drug labeling (<xref ref-type="bibr" rid="B10">Demner-Fushman et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B31">Wu et&#x20;al., 2019</xref>). The ADR descriptions in drug labeling often do not follow these standards, which requires human effort in matching ADR terms in drug labeling with standards. Annotation resources have been reported to normalize the terms used in drug labeling (<xref ref-type="bibr" rid="B10">Demner-Fushman et&#x20;al., 2018</xref>). However, providing annotations for such a large repository is not a trivial task. As shown in <xref ref-type="sec" rid="s12">Supplementary Table&#x20;3</xref>, many standard terms such as MedDRA PTs have a number of matched terms in drug labeling. For example, there have been at least 31 different terms in FDA labeling for the MedDRA PT &#x201c;Alanine aminotransferase increased&#x201d;, and 34 for &#x201c;Blood bilirubin increased&#x201d;. New variations in ADR terms are likely to be introduced into drug labeling in the future. Therefore, updating and maintaining such annotations are labor intensive. The deep learning-based model developed in the current study, with BERT-based NER and sentence classification combined, outperformed the keywords-based model by a large margin. Importantly, BERT-based models are not only easy to implement and extend but can also be further improved with better pretrained models in the future.</p>
<p>Furthermore, DILI classification of the labeling documents is a more complicated task than keywords matching. In some cases, a sentence containing hepatic ADR terms does not necessarily suggest DILI. For example, a sentence containing the term hepatitis could indicate antiviral treatment of hepatitis B viruses. It could also be contraindication information specifying that patients with hepatic deficiency due to hepatitis should not take the drug. All these cases are present in the complex descriptions from the &#x201c;Warnings and Precautions&#x201d; section. Therefore, human interpretation has been necessary to determine DILI-positive sentences in drug labeling documents (<xref ref-type="bibr" rid="B8">Chen et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B7">2016</xref>).</p>
<p>Over the past few years, transformers models have changed the landscape of NLP (<xref ref-type="bibr" rid="B30">Wolf et&#x20;al., 2020</xref>). The BERT model used in this study enables bidirectional text learning by using masks (<xref ref-type="bibr" rid="B11">Devlin et&#x20;al., 2019</xref>). Notably, the multi-headed attention architecture leverages the use of deep neural networks to capture the relationships between words within a sentence and across sentences (<xref ref-type="bibr" rid="B28">Vaswani et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B11">Devlin et&#x20;al., 2019</xref>). These two important features allow the BERT model to learn the semantic meanings of a sentence or sentences effectively and efficiently. Thus, we chose BERT as our first attempt to develop AI-based NLP tools, which do not rely on keywords dictionaries but rather learn the meaning of text and perform tasks close to humans. Indeed, our results showed that model predictions were driven by the DILI-related words such as hepatic failure, hepatotoxicity and hepatitis in the representative DILI positive sentences. For the representative DILI positive sentences, model predictions were based on the detection of DILI-negative information including chronic hepatitis B/C, even though DILI-related words were also present in the sentence.</p>
<p>Additionally, we acknowledge the following limitations of this study. The dataset size is relatively small, especially for document-level classification results. This is by large due to that DILI is not a common adverse event, with an incidence of approximately 20 cases per 100,000 persons annually (<xref ref-type="bibr" rid="B14">Garcia-Cortes et&#x20;al., 2020</xref>). There are limited number of drugs carrying warnings for DILI. The developed pipeline was evaluated on just a single topic, i.e.,&#x20;liver injury. Thus, it remains to be proven by future research that this framework is indeed extensible to other topics. The pre-trained BERT model was trained on corpuses using general language. Drug labeling, however, uses many domain-specific terms. Further in-domain training of the BERT model might improve the model performance. Also, we did not try other transformers models such as GPT-2 (<xref ref-type="bibr" rid="B20">Radford et&#x20;al., 2019</xref>) and XLNet (<xref ref-type="bibr" rid="B33">Yang et&#x20;al., 2019</xref>) for comparison. The main purpose of this work was to test the applicability of modern language models on regulatory documents, rather than select better models.</p>
</sec>
<sec sec-type="conclusion" id="s5">
<title>Conclusion</title>
<p>In the current study we demonstrated that AI-based NLP tools performed well in DILI classification of drug labeling documents from two different regulatory agencies, FDA and EMA. The deep learning-based and hybrid deep learning-based models outperformed the keywords-based models and were portable from one agency to the other without a notable decrease in performance. Our results suggest that AI models are able to learn the meaning of text and handle NLP tasks with good accuracy. This proof-of-concept work show that using AI technology to facilitate regulatory activities is a promising approach to modernize and advance regulatory science.</p>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="https://nctr-crs.fda.gov/fdalabel/ui/search">https://nctr-crs.fda.gov/fdalabel/ui/search</ext-link>.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>Conceptualization (WT); Data acquisition (YW and LW); Methodology (YW, WT, ZL, and MC); Data analysis (YW); Manuscript writing (YW, MC, and WT); Manuscript review and editing (YW, WT, and&#x20;MC).</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>The research is internally funded by the project (E0767701) at National Center for Toxicological Research, United&#x20;States Food and Drug Administration.</p>
</sec>
<sec id="s9">
<title>Author Disclaimer</title>
<p>The views presented in this article do not necessarily reflect those of the United&#x20;States Food and Drug Administration. Any mention of commercial products is for clarification and is not intended as an endorsement.</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, orclaim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s12">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frai.2021.729834/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frai.2021.729834/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="Table1.DOCX" id="SM1" mimetype="application/DOCX" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Adhikari</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ram</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019a</year>). <source>DocBERT: BERT for Document Classification</source>. <comment>
<italic>arXiv</italic> :1904.08398v3 [cs.CL]</comment>. </citation>
</ref>
<ref id="B2">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Adhikari</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ram</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019b</year>). &#x201c;<article-title>Rethinking Complex Neural Network Architectures for Document Classification</article-title>,&#x201d; in <conf-name>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</conf-name> (<publisher-loc>Minneapolis, MN</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>), <volume>1</volume>, <fpage>4046</fpage>&#x2013;<lpage>4051</lpage>. <pub-id pub-id-type="doi">10.18653/v1/N19-1408</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andrade</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Chalasani</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Bj&#xf6;rnsson</surname>
<given-names>E. S.</given-names>
</name>
<name>
<surname>Suzuki</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kullak-Ublick</surname>
<given-names>G. A.</given-names>
</name>
<name>
<surname>Watkins</surname>
<given-names>P. B.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Drug-induced Liver Injury</article-title>. <source>Nat. Rev. Dis. Primers</source> <volume>5</volume> (<issue>1</issue>), <fpage>58</fpage>. <pub-id pub-id-type="doi">10.1038/s41572-019-0105-0</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Basile</surname>
<given-names>A. O.</given-names>
</name>
<name>
<surname>Yahi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tatonetti</surname>
<given-names>N. P.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Artificial Intelligence for Drug Toxicity and Safety</article-title>. <source>Trends Pharmacol. Sci.</source> <volume>40</volume>, <fpage>624</fpage>&#x2013;<lpage>635</lpage>. <pub-id pub-id-type="doi">10.1016/j.tips.2019.07.005</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>H. C. S.</given-names>
</name>
<name>
<surname>Shan</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Dahoun</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Vogel</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Advancing Drug Discovery via Artificial Intelligence</article-title>. <source>Trends Pharmacol. Sci.</source> <volume>40</volume>, <fpage>592</fpage>&#x2013;<lpage>604</lpage>. <pub-id pub-id-type="doi">10.1016/j.tips.2019.06.004</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kelly</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Borlak</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Quantitative Structure-Activity Relationship Models for Predicting Drug-Induced Liver Injury Based on FDA-Approved Drug Labeling Annotation and Using a Large Collection of Drugs</article-title>. <source>Toxicol. Sci.</source> <volume>136</volume>, <fpage>242</fpage>&#x2013;<lpage>249</lpage>. <pub-id pub-id-type="doi">10.1093/toxsci/kft189</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Suzuki</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Thakkar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Tong</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>DILIrank: the Largest Reference Drug List Ranked by the Risk for Developing Drug-Induced Liver Injury in Humans</article-title>. <source>Drug Discov. TodayToday</source> <volume>21</volume>, <fpage>648</fpage>&#x2013;<lpage>653</lpage>. <pub-id pub-id-type="doi">10.1016/j.drudis.2016.02.015</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Vijay</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Tong</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>FDA-approved Drug Labeling for the Study of Drug-Induced Liver Injury</article-title>. <source>Drug Discov. Today</source> <volume>16</volume>, <fpage>697</fpage>&#x2013;<lpage>703</lpage>. <pub-id pub-id-type="doi">10.1016/j.drudis.2011.05.007</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Oh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Improving Document-Level Sentiment Classification Using Importance of Sentences</article-title>. <source>Entropy</source> <volume>22</volume>, <fpage>1336</fpage>. <pub-id pub-id-type="doi">10.3390/e22121336</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Demner-Fushman</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Shooshan</surname>
<given-names>S. E.</given-names>
</name>
<name>
<surname>Rodriguez</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Aronson</surname>
<given-names>A. R.</given-names>
</name>
<name>
<surname>Lang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Rogers</surname>
<given-names>W.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>A Dataset of 200 Structured Product Labels Annotated for Adverse Drug Reactions</article-title>. <source>Sci. Data</source> <volume>5</volume>, <fpage>180001</fpage>&#x2013;<lpage>180008</lpage>. <pub-id pub-id-type="doi">10.1038/sdata.2018.1</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Devlin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>M.-W.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Toutanova</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2019</year>). <source>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</source>. <comment>
<italic>arXiv</italic> :1810.04805v2 [cs.CL]</comment>. </citation>
</ref>
<ref id="B12">
<citation citation-type="web">
<collab>European Medicines Agency</collab> (<year>2009</year>). <article-title>A Guideline on Summary of Product Characteristics</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://ec.europa.eu/health/files/eudralex/vol-2/c/smpc_guideline_rev2_en.pdf">http://ec.europa.eu/health/files/eudralex/vol-2/c/smpc_guideline_rev2_en.pdf</ext-link>.</comment> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Thakkar</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ingle</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>FDALabel for Drug Repurposing Studies and beyond</article-title>. <source>Nat. Biotechnol.</source> <volume>38</volume>, <fpage>1378</fpage>&#x2013;<lpage>1379</lpage>. <pub-id pub-id-type="doi">10.1038/s41587-020-00751-0</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garcia-Cortes</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Robles-Diaz</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Stephens</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ortega-Alonso</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lucena</surname>
<given-names>M. I.</given-names>
</name>
<name>
<surname>Andrade</surname>
<given-names>R. J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Drug Induced Liver Injury: an Update</article-title>. <source>Arch. Toxicol.</source> <volume>94</volume>, <fpage>3381</fpage>&#x2013;<lpage>3407</lpage>. <pub-id pub-id-type="doi">10.1007/s00204-020-02885-1</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hamburg</surname>
<given-names>M. A.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Advancing Regulatory Science</article-title>. <source>Science</source> <volume>331</volume>, <fpage>987</fpage>. <pub-id pub-id-type="doi">10.1126/science.1204432</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoffman</surname>
<given-names>K. B.</given-names>
</name>
<name>
<surname>Dimbil</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tatonetti</surname>
<given-names>N. P.</given-names>
</name>
<name>
<surname>Kyle</surname>
<given-names>R. F.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>A Pharmacovigilance Signaling System Based on FDA Regulatory Action and Post-Marketing Adverse Event Reports</article-title>. <source>Drug Saf.</source> <volume>39</volume>, <fpage>561</fpage>&#x2013;<lpage>575</lpage>. <pub-id pub-id-type="doi">10.1007/s40264-016-0409-x</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lundberg</surname>
<given-names>S. M.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S.-I.</given-names>
</name>
</person-group> (<year>2017</year>). <source>A Unified Approach to Interpreting Model Predictions</source>. <comment>
<italic>arXiv</italic> :1705.07874v2 [cs.AI]</comment>. </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McMahon</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Preskorn</surname>
<given-names>S. H.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>The Package Insert</article-title>. <source>J.&#x20;Psychiatr. Pract.</source> <volume>20</volume>, <fpage>284</fpage>&#x2013;<lpage>290</lpage>. <pub-id pub-id-type="doi">10.1097/01.pra.0000452565.83039.20</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Radford</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Narasimhan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Salimans</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Improving Language Understanding with Unsupervised Learning</source>. <publisher-loc>OpenAi</publisher-loc>: <publisher-name>Technical report</publisher-name>. </citation>
</ref>
<ref id="B20">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Radford</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Child</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Luan</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Amodei</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Language Models Are Unsupervised Multitask Learners</source>. <publisher-loc>OpenAi</publisher-loc>: <publisher-name>Technical report</publisher-name>. </citation>
</ref>
<ref id="B21">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Qiu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence</source>. <comment>
<italic>arXiv</italic> :1903.09588 [cs.CL]</comment>. </citation>
</ref>
<ref id="B22">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Qiu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2020</year>). <source>How to Fine-Tune BERT for Text Classification</source>. <comment>arXiv <italic>:1905.05583v3 [cs.CL]</italic>
</comment>. </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Suzuki</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Yuen</surname>
<given-names>N. A.</given-names>
</name>
<name>
<surname>Ilic</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>R. T.</given-names>
</name>
<name>
<surname>Reese</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>H. R.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <article-title>Comedications Alter Drug-Induced Liver Injury Reporting Frequency: Data Mining in the WHO VigiBase</article-title>&#x2122;. <source>Regul. Toxicol. Pharmacol.</source> <volume>72</volume>, <fpage>481</fpage>&#x2013;<lpage>490</lpage>. <pub-id pub-id-type="doi">10.1016/j.yrtph.2015.05.004</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="web">
<collab>US Food and Drug Administration</collab> (<year>2011a</year>). <article-title>Advancing Regulatory Science at FDA: A Strategic Plan</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.fda.gov/media/81109/download">https://www.fda.gov/media/81109/download</ext-link>
</comment>. </citation>
</ref>
<ref id="B25">
<citation citation-type="web">
<collab>US Food and Drug Administration</collab> (<year>2010</year>). <article-title>Advancing Regulatory Science for Public Health</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.fda.gov/media/123792/download">https://www.fda.gov/media/123792/download</ext-link>
</comment>. </citation>
</ref>
<ref id="B26">
<citation citation-type="web">
<collab>US Food and Drug Administration</collab> (<year>2006</year>). <article-title>Adverse Reactions Section of Labeling for Human Prescription Drug and Biological Products &#x2014;&#x20;Content and Format</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.fda.gov/media/71836/download">https://www.fda.gov/media/71836/download</ext-link>
</comment>. </citation>
</ref>
<ref id="B27">
<citation citation-type="web">
<collab>US Food and Drug Administration</collab> (<year>2011b</year>). <article-title>Warnings and Precautions, Contraindications, and Boxed Warning Sections of Labeling for Human Prescription Drug and Biological Products &#x2014;&#x20;Content and Format</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.fda.gov/media/71866/download">https://www.fda.gov/media/71866/download</ext-link>
</comment>. </citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Vaswani</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Shazeer</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Parmar</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Uszkoreit</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Gomez</surname>
<given-names>A. N.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <source>Attention Is All You Need</source>. <comment>arXiv :1706.03762v5 [cs.CL]</comment>. </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Watson</surname>
<given-names>K. T.</given-names>
</name>
<name>
<surname>Barash</surname>
<given-names>P. G.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>The New Food and Drug Administration Drug Package Insert: Implications for Patient Safety and Clinical Care</article-title>. <source>Anesth. Analgesia</source> <volume>108</volume>, <fpage>211</fpage>&#x2013;<lpage>218</lpage>. <pub-id pub-id-type="doi">10.1213/ane.0b013e31818c1b27</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wolf</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Debut</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sanh</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Chaumond</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Delangue</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Moi</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <source>Transformers: State-Of-The-Art Natural Language Processing</source>. <comment>arXiv :1910.03771v5 [cs.CL]</comment>. </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Ingle</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhao-Wong</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Thakkar</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Study of Serious Adverse Drug Reactions Using FDA-Approved Drug Labeling and MedDRA</article-title>. <source>BMC Bioinformatics</source> <volume>20</volume>, <fpage>97</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-019-2628-5</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Shu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>P. S.</given-names>
</name>
</person-group> (<year>2019</year>). <source>BERT Post-Training for Review Reading Comprehension and Aspect-Based Sentiment Analysis</source>. <comment>arXiv :1904.02232v2 [cs.CL]</comment>. </citation>
</ref>
<ref id="B33">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Dai</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Carbonell</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Salakhutdinov</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>Q. V.</given-names>
</name>
</person-group> (<year>2019</year>). <source>XLNet: Generalized Autoregressive Pretraining for Language Understanding</source>. <comment>arXiv :1906.08237 [cs.CL]</comment>. </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>K.-H.</given-names>
</name>
<name>
<surname>Beam</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Kohane</surname>
<given-names>I. S.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Artificial Intelligence in Healthcare</article-title>. <source>Nat. Biomed. Eng.</source> <volume>2</volume>, <fpage>719</fpage>&#x2013;<lpage>731</lpage>. <pub-id pub-id-type="doi">10.1038/s41551-018-0305-z</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>