<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Res. Metr. Anal.</journal-id>
<journal-title>Frontiers in Research Metrics and Analytics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Res. Metr. Anal.</abbrev-journal-title>
<issn pub-type="epub">2504-0537</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frma.2018.00007</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Metrics and Analytics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Application of Public Knowledge Discovery Tool (PKDE4J) to Represent Biomedical Scientific Knowledge</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Song</surname> <given-names>Min</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="cor1">&#x0002A;</xref>
<uri xlink:href="http://frontiersin.org/people/u/284444"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Kim</surname> <given-names>Munui</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://frontiersin.org/people/u/528316"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Kang</surname> <given-names>Keunyoung</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Kim</surname> <given-names>Yong Hwan</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://frontiersin.org/people/u/506596"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Jeon</surname> <given-names>Sieun</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Library and Information Science Department, Yonsei University</institution>, <addr-line>Seoul</addr-line>, <country>South Korea</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Xianwen Wang, Dalian University of Technology (DUT), China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Andreas Holzinger, Medical University of Graz, Austria; Sam Henry, Virginia Commonwealth University, United States</p></fn>
<corresp content-type="corresp" id="cor1">&#x0002A;Correspondence: Min Song, <email>min.song&#x00040;yonsei.ac.kr</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>02</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="collection">
<year>2018</year>
</pub-date>
<volume>3</volume>
<elocation-id>7</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>09</month>
<year>2017</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>02</month>
<year>2018</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2018 Song, Kim, Kang, Kim and Jeon.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Song, Kim, Kang, Kim and Jeon</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>In today&#x02019;s era of information explosion, extracting entities and their relations in large-scale, unstructured collections of text to better represent knowledge has emerged as a daunting challenge in biomedical text mining. To respond to the demand to automatically extract scientific knowledge with higher precision, the public knowledge discovery tool PKDE4J (Song et al., <xref ref-type="bibr" rid="B42">2015</xref>) was proposed as a flexible text-mining tool. In this study, we propose an extended version of PKDE4J to represent scientific knowledge for literature-based knowledge discovery. Specifically, we assess the performance of PKDE4J in terms of three extraction tasks: entity, relation, and event detection. We also suggest applications of PKDE4J along three lines: (1) knowledge search, (2) knowledge linking, and (3) knowledge inference. We first describe the updated features of PKDE4J and report on tests of its performance. With additional options in the processes of named entity extraction, verb expansion, and event detection, we expect that the enhanced PKDE4J can be utilized for literature-based knowledge discovery.</p>
</abstract>
<kwd-group>
<kwd>text mining</kwd>
<kwd>named entity recognition</kwd>
<kwd>relation extraction</kwd>
<kwd>scientific knowledge discovery tool</kwd>
<kwd>scientific knowledge representation</kwd>
</kwd-group>
<contract-num rid="cn01">NRF-2013M3A9C4078138</contract-num>
<contract-sponsor id="cn01">National Research Foundation of Korea<named-content content-type="fundref-id">10.13039/501100003725</named-content></contract-sponsor>
<counts>
<fig-count count="14"/>
<table-count count="8"/>
<equation-count count="0"/>
<ref-count count="61"/>
<page-count count="16"/>
<word-count count="8836"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="introduction">
<title>Introduction</title>
<p>Owing to the deluge of data in today&#x02019;s digital world, mining useful information from large-scale, unstructured collections of text is a challenging task. The demand to discover knowledge from large amounts of data has been steadily growing over the years. Knowledge extraction requires at least two techniques, named entity recognition (NER) and relation extraction (RE). Identifying entities represented in text and the relations among them is a fundamental process of knowledge extraction. Using this process, the extracted knowledge can be utilized in a knowledge network or various systems. The US National Academy of Sciences claimed in a 2011 report that a biomedical knowledge network based on biological data and knowledge is essential for precision medicine (National Research Council, <xref ref-type="bibr" rid="B32">2011</xref>). For knowledge extraction, Song et al. (<xref ref-type="bibr" rid="B42">2015</xref>) proposed PKDE4J, the Public Knowledge Discovery Engine for Java. The goal of PKDE4J is to extract biomedical knowledge from unstructured texts for literature-based knowledge discovery. This is a daunting goal requiring long-term research and development. In our previous study, we introduced PKDE4J as a knowledge extraction system (Song et al., <xref ref-type="bibr" rid="B42">2015</xref>). In this paper, as a first step, we extend PKDE4J to make it flexible as possible, such that it can be applied to various knowledge extraction tasks. In the second step, knowledge identification, we focus on how PKDE4J can be used to represent scientific knowledge.</p>
<p>Several data mining-based approaches to represent biological knowledge have been proposed. Bio2RDF is a mash-up system that can be used to integrate knowledge from multiple bioinformatics databases (Belleau et al., <xref ref-type="bibr" rid="B7">2008</xref>). Bell et al. (<xref ref-type="bibr" rid="B6">2011</xref>) integrated bio-entities and their relations into an existing database. A feature of this approach is that it utilizes structured data. However, in this paper, we introduce an extended version of PKDE4J based on text mining for users in the biomedical domain to transition from the micro-level of knowledge entities to macro-topical level by applying it to unstructured data. Swanson&#x02019;s ABC model (Swanson, <xref ref-type="bibr" rid="B44">1986</xref>) helped unveil knowledge discovery in terms of constructing a knowledge network and discovering new knowledge.</p>
<p>Prior to knowledge discovery with techniques from text mining, a knowledge extraction stage is needed. A biological entity is tagged according to its type, such as gene, disease, cell, and tissue. In PKDE4J, we extended the NER process so that it can be conducted in several modes, such as dictionary-based and machine learning-based methods combined with ontology. The RE process follows to determine the relations among the entities. This process is performed by using a set of predefined rules.</p>
<p>In past studies, NER has been used to extract entities and their types from text (Hanisch et al., <xref ref-type="bibr" rid="B18">2005</xref>; Yang et al., <xref ref-type="bibr" rid="B51">2008</xref>; Munkhdalai et al., <xref ref-type="bibr" rid="B31">2015</xref>; Tang et al., <xref ref-type="bibr" rid="B45">2015</xref>; Leaman and Lu, <xref ref-type="bibr" rid="B26">2016</xref>). In the biomedical field, types usually include gene, disease, and chemicals.</p>
<p>A dictionary-based or a lexicon-based approach is widely used in biomedicine. It matches terms from prepared dictionaries to a given text. Despite its simplicity and high accuracy, there are two major problems in the dictionary-based approach. The first is its possible omission of new terminology not included in the dictionary, and the second problem is a matching problem of variants and synonyms in the dictionary. Several studies (Yang et al., <xref ref-type="bibr" rid="B51">2008</xref>; Munkhdalai et al., <xref ref-type="bibr" rid="B31">2015</xref>) have attempted to combine various dictionaries to solve these problems.</p>
<p>The rule-based approach observes general features of an entity in text and extracts entities based on heuristically acquired rules. These features include parts-of-speech tags, dependencies, and grammatical features. ProMiner (Hanisch et al., <xref ref-type="bibr" rid="B18">2005</xref>) used contextual rules to achieve an accuracy of 92.9%. However, the relevant study also identified the risk of overfitting of the proposed rules.</p>
<p>For machine learning-based approaches, conditional random fields (CRFs), support vector machines (SVMs), and Markov models are widely used. Deep learning-based techniques are also being researched. Munkhdalai et al. (<xref ref-type="bibr" rid="B31">2015</xref>) proposed BANNER-CHEMDNER that uses semi-supervised learning to extract chemical entities. It recorded an <italic>F</italic>-measure score of 85.68% on the testing set Chemical Entity Mention. Tang et al. (<xref ref-type="bibr" rid="B45">2015</xref>) and Li et al. (<xref ref-type="bibr" rid="B27">2015</xref>) used CRFs with a system based on MapReduce and Hadoop to process big data. Although many studies have used CRFs to calculate the probability of the occurrence of a certain word as a biomedical/chemical entity, Tang et al. (<xref ref-type="bibr" rid="B45">2015</xref>) proposed an SSVM-based system (<italic>F</italic>-score: 85.05%) that outperforms CRF-based systems. Leaman and Lu (<xref ref-type="bibr" rid="B26">2016</xref>) used a semi-Markov, structured linear classifier that works well, especially with diseases (NCBI Disease corpus, <italic>F</italic>-score: 0.829) and chemicals (BioCreative 5 CDR corpus, <italic>F</italic>-score: 0.914). Recently, to process large amounts of bio-literature data, machine learning-based approaches have often been combined with parallel and distributed systems (Li et al., <xref ref-type="bibr" rid="B27">2015</xref>; Tang et al., <xref ref-type="bibr" rid="B45">2015</xref>).</p>
<p>Determining the relations among entities is also a fundamental task in discovering knowledge from biomedical text. Although early studies (Jelier et al., <xref ref-type="bibr" rid="B20">2005</xref>) focused on extracting binary relations by using the co-occurrence approach, techniques for the extraction of complex relationships among biomedical entities have received a considerable amount of research interest because complicated and accurate relationships among entities in text can be extracted as knowledge. Extracting these complex relations involves processing using pattern and rule matching (Fundel et al., <xref ref-type="bibr" rid="B16">2006</xref>) or, recently, machine learning-based techniques (Bunescu et al., <xref ref-type="bibr" rid="B12">2005</xref>).</p>
<p>In pattern and rule matching, predefined rules based on a dependency tree and a relation trigger word are used to identify relations between entities, whereas several techniques, including SVM, Markov models, and RNN, are used in machine learning approaches. In past studies, RE for specific types of biomedical entity have been studied widely. Protein&#x02013;protein interactions (PPIs) have been the subject of extensive focus (Thomas et al., <xref ref-type="bibr" rid="B46">2011</xref>, Li et al., <xref ref-type="bibr" rid="B27">2015</xref>). Li et al. (<xref ref-type="bibr" rid="B27">2015</xref>) described miRTex designed for microRNA-gene RE, and it achieved an <italic>F</italic>-score of 88%. Others like Bravo et al. (<xref ref-type="bibr" rid="B10">2015</xref>) focused on the relation between gene and disease.</p>
<p>It is challenging to find integrated systems for all types of biomedical entities. It requires sophisticated techniques and expertise that can be applied to various entity types. To overcome this limitation, Yimam et al. (<xref ref-type="bibr" rid="B53">2016</xref>) proposed an interactive machine learning (iML) approach to improve biomedical knowledge extraction. Holzinger (<xref ref-type="bibr" rid="B19">2016</xref>) defined iML as an algorithm that can optimize training data through interactions between a computer and a human. Although iML may help expedite the discovery process, we focus here on only systems for biomedical knowledge discovery.</p>
<p>An integrated system requires comprehensive techniques, and research on this has not been extensive thus far, despite its potential for knowledge extraction. As PKDE4J is an integrated system, multiple types of entities and relations can be extracted from various types of data sources using it.</p>
<p>In addition to NER and RE, event detection has gained attention for accurate knowledge extraction in recent years. Event detection refers to the task of extracting descriptions of the actions of and relations among one or more entities from the biomedical literature (Bj&#x000F6;rne et al., <xref ref-type="bibr" rid="B8">2010</xref>). In the expanded version of PKDE4J, we enhance the likelihood of extracting accurate relations by adding an event detection module.</p>
<p>The functions of PKDE4J can be applied to practical problems, such as knowledge search, knowledge network construction, and knowledge inference. For knowledge search, a system integrating PKDE4J with PubMed articles provides annotated articles. As such, users can perform more effective searches using annotated papers. Moreover, using PKDE4J, large amounts of knowledge constructed from various resources can be transformed into a network. Applying Swanson&#x02019;s ABC model to the extracted knowledge, new knowledge that has not been found before can be inferred.</p>
<p>In this paper, we compare the extended PKDE4J with other well-known algorithms on various types of entity extraction, RE, and event detection. We also provide a detailed description of how the results are extracted by PKDE4J. To highlight its utility, we introduce examples of how it can be used for knowledge annotation, search, linking, and inference.</p>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>Materials and Methods</title>
<p>The system has been upgraded since the original version of PKDE4J was published in 2015. In this section, we introduce the updated version. The overall architecture of the system is illustrated in Figure <xref ref-type="fig" rid="F1">1</xref>. The three major modules are (1) named entity extraction, (2) RE, and (3) event detection.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Overall architecture of the new version of PKDE4J.</p></caption>
<graphic xlink:href="frma-03-00007-g001.tif"/>
</fig>
<sec id="S2-1">
<title>Named Entity Extraction</title>
<p>The original PKDE4J version extracts biological entities based on a dictionary. To make the NER module more flexible, we propose three options for entity extraction in addition to the dictionary: (1) the Unified Medical Language System (UMLS) combined with the dictionary, (2) machine learning, and (3) the UMLS combined with machine learning. By adding these options, it is expected that the updated system will exhibit better performance and flexibility.</p>
<p>For the dictionary-based approach, we updated the previous version of PKDE4J dictionaries by integrating data from the open biomedical open database GoPubMed. GoPubMed is a search engine for biomedical literature designed to structure a large number of articles from the MEDLINE database (Doms and Schroeder, <xref ref-type="bibr" rid="B15">2005</xref>). It allows users to query and explore PubMed results with controlled vocabulary, such as Gene Ontology (GO) and Medical Subject Headings (MeSH). GO aims to unify the representations of gene and gene products into structured vocabularies. Starting with three databases for organisms&#x02014;FlyBase, the <italic>Saccharomyces</italic> Genome Database, and the Mouse Genome Informatics Project&#x02014;GO has grown by integrating 35 major gene/protein repositories (Ashburner et al., <xref ref-type="bibr" rid="B2">2000</xref>). Similarly, MeSH is the National Library of Medicine&#x02019;s controlled vocabulary thesaurus used to index biomedical publications such as the PubMed database. To add GoPubMed data to various types of PKDE4J dictionaries, we retrieved articles with queries presented in Table <xref ref-type="table" rid="T1">1</xref>. For each article, the tagged GO terms and MeSH terms were collected. Collected terms that did not represent the dictionary type were filtered by certain criteria. For example, general terms like &#x0201C;homo sapiens&#x0201D; repeatedly appeared in several queries, and thus were deleted from the list.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Queries used to gather data from GOPubMed.</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td align="left" valign="top">&#x0201C;Humans[mesh] Cells[mesh],&#x0201D;<break/>&#x0201C;Humans[mesh] all[protein],&#x0201D;<break/>&#x0201C;Humans[mesh] Organisms[mesh],&#x0201D;<break/>&#x0201C;Humans[mesh] Metabolism[mesh],&#x0201D;<break/>&#x0201C;Humans[mesh] Diseases[mesh],&#x0201D;<break/>&#x0201C;Humans[mesh] &#x0005C;&#x0201C;Body Regions&#x0005C;&#x0201D;[mesh],&#x0201D;<break/>&#x0201C;Humans[mesh] biological_process[go],&#x0201D;<break/>&#x0201C;Humans[mesh] Tissues[mesh],&#x0201D;<break/>&#x0201C;Humans[mesh] &#x0005C;&#x0201C;Chemicals and Drugs&#x0005C;&#x0201D;[mesh]&#x0201D;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In addition to MeSH and GO, KEGG Disease as disease dictionary and Drugbank data as drug dictionary were added to the dictionaries for PKDE4J. KEGG Disease is a collection of disease vocabularies. It provides various information concerning diseases on perturbed molecular networks. Approximately, 2,000 items of disease information were added. Drugbank contains biochemical and pharmacological information about drugs and their targets (Wishart et al., <xref ref-type="bibr" rid="B50">2006</xref>). More than 400 drugs and their 1,200 metabolites were added to the drug dictionary.</p>
<p>We also propose a combination of the dictionary with the UMLS-based approach, which first recognizes biological entities in text using the dictionary-based approach and then maps to UMLS terms matching the extracted entities. Many existing NER systems (Rindflesch et al., <xref ref-type="bibr" rid="B40">2000</xref>; Aronson, <xref ref-type="bibr" rid="B1">2001</xref>; Jimeno et al., <xref ref-type="bibr" rid="B21">2008</xref>) that are lexicon based largely depend on knowledge sources such as GO (Doms and Schroeder, <xref ref-type="bibr" rid="B15">2005</xref>) and the UMLS (Bodenreider, <xref ref-type="bibr" rid="B9">2004</xref>). Specifically, the UMLS is a collection of multiple controlled vocabularies (e.g., NCBI and MeSH) in the biomedical domain developed by the US National Library of Medicine. It consists of over 3&#x02009;million concepts, each of which is assigned to at least one of 134 semantic types from the UMLS Semantic Network, such as gene, genome, and cell. Accordingly, the integration of the UMLS into dictionary-based entity extraction can enhance the interoperability of our system and help utilize additional information concerning entities. It also allows for further analysis, such as the measurement of similarity among the extracted biological entities based on the corresponding semantic types in the Semantic Network.</p>
<p>Similarly, we integrated the UMLS into the machine learning-based approach. To this end, the machine learning-based entity recognizers included Abner (Settles, <xref ref-type="bibr" rid="B41">2005</xref>), CheNER (Usi&#x000E9; et al., <xref ref-type="bibr" rid="B47">2013</xref>), and LingPipe (Baldwin and Carpenter, <xref ref-type="bibr" rid="B4">2003</xref>). The model was chosen according to the characteristics of the experiment to be conducted. For instance, if mutation was the entity type for entity extraction, we could choose MutationFinder (Caporaso et al., <xref ref-type="bibr" rid="B13">2007</xref>). After model selection, the NER process was performed using the selected model and the extracted entities were mapped to UMLS terms. Abner is a tagger for biological entities (e.g., protein, cell line, DNA, and RNA) in text and provides two models trained on the standard NLPBA (Kim et al., <xref ref-type="bibr" rid="B22">2003</xref>) and BioCreative (Yeh et al., <xref ref-type="bibr" rid="B52">2004</xref>) corpora. Using these models, two biological entities, gene and cell, extracted from the text are tagged and mapped into terms in the UMLS. Moreover, a named entity recognizer that performs a similar function to Abner is CheNER (Usi&#x000E9; et al., <xref ref-type="bibr" rid="B47">2013</xref>). It recognizes chemical compounds in biomedical text. The CheNER model trained on the corpora provided by Kol&#x000E1;&#x00159;ik and Klinger (Klinger et al., <xref ref-type="bibr" rid="B24">2008</xref>; Kol&#x000E1;rik et al., <xref ref-type="bibr" rid="B25">2008</xref>) was used to tag the drug names mentioned in text.</p>
</sec>
<sec id="S2-2">
<title>Relation Extraction</title>
<p>The core module of RE is similar to that of PKDE4J (2015). With 17 strategies, PKDE4J extracts relations between entities, where PKDE4J&#x02019;s RE module focuses on the presence of verbs. Using verbs as the core of RE, we can extract more precise relations. Therefore, in the new version of PKDE4J, we expand the range of verbs in the RE module.</p>
<sec id="S2-2-1">
<title>Expansion of Biomedical Verb List: Biomedical Verbs</title>
<p>The number of verbs used in the previous version of PKDE4J was 398. This means that only 398 relation types could be extracted, and the remaining relations were identified as &#x0201C;none&#x0201D; or &#x0201C;juxtaposed.&#x0201D; Therefore, a new verb list should be constructed and applied to PKDE4J to overcome this limitation.</p>
<p>In this study, we define biomedical verbs as verbs describing the relation between biological entities used in the biomedical field. Verbs in a general field and in the biomedical field are included if they represent a relation between entities. Therefore, as shown in Figure <xref ref-type="fig" rid="F2">2</xref>, some verbs from general and specialized verbs used in biomedicine can be used as biomedical verbs.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>The definition of biomedical verbs.</p></caption>
<graphic xlink:href="frma-03-00007-g002.tif"/>
</fig>
<p>To construct biomedical verbs, we used the 2014 version of PubMed articles. We collected a total of 14,447,667 records of articles with the title and abstract of each. We then modified PKDE4J to extract verbs from sentences containing two entities. To use the dictionary-based approach for PKDE4J, we constructed dictionaries shown in Table <xref ref-type="table" rid="T2">2</xref>. Dictionaries for each entity included KEGG, HMDB, GO, Entrez Gene, MeSH, DrugBank, Tiger, and GDSC.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Dictionaries used to construct biomedical verbs.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left">Entity type</th>
<th valign="top" align="left">Dictionary</th>
<th valign="top" align="center">&#x00023; of unique name</th>
<th valign="top" align="left">Entity type</th>
<th valign="top" align="left">Dictionary</th>
<th valign="top" align="center">&#x00023; of unique name</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Cell</td>
<td align="left" valign="top">KEGG (Kanehisa and Goto, <xref ref-type="bibr" rid="B56">2000</xref>)</td>
<td align="center" valign="top">1,559</td>
<td align="left" valign="top">Body part</td>
<td align="left" valign="top">KEGG (Kanehisa and Goto, <xref ref-type="bibr" rid="B56">2000</xref>)</td>
<td align="center" valign="top">564</td>
</tr>
<tr>
<td align="left" valign="top">Cellular component</td>
<td align="left" valign="top">HMDB (Wishart et al., <xref ref-type="bibr" rid="B60">2012</xref>)</td>
<td align="center" valign="top">672</td>
<td align="left" valign="top">Disease</td>
<td align="left" valign="top">MeSH, KEGG (Kanehisa and Goto, <xref ref-type="bibr" rid="B56">2000</xref>)</td>
<td align="center" valign="top">73,345</td>
</tr>
<tr>
<td align="left" valign="top">Molecular function</td>
<td align="left" valign="top">Gene ontology (GO) (Ashburner et al., <xref ref-type="bibr" rid="B2">2000</xref>)</td>
<td align="center" valign="top">14,857</td>
<td align="left" valign="top">Drug</td>
<td align="left" valign="top">DrugBank (Knox et al., <xref ref-type="bibr" rid="B57">2011</xref>)</td>
<td align="center" valign="top">30,703</td>
</tr>
<tr>
<td align="left" valign="top">Biological process</td>
<td align="left" valign="top">GO (Ashburner et al., <xref ref-type="bibr" rid="B2">2000</xref>)</td>
<td align="center" valign="top">43,391</td>
<td align="left" valign="top">Tissue</td>
<td align="left" valign="top">Tiger, GDSC (Liu et al., <xref ref-type="bibr" rid="B58">2008</xref>; Yang et al., <xref ref-type="bibr" rid="B61">2013</xref>)</td>
<td align="center" valign="top">76</td>
</tr>
<tr>
<td align="left" valign="top">Gene/protein</td>
<td align="left" valign="top">Entrez gene (Maglott et al., <xref ref-type="bibr" rid="B59">2011</xref>)</td>
<td align="center" valign="top">104,872</td>
<td align="left" valign="top">Metabolite</td>
<td align="left" valign="top">HMDB (Wishart et al., <xref ref-type="bibr" rid="B60">2012</xref>)</td>
<td align="center" valign="top">297,256</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>MeSH, medical subject heading</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>Verb extraction using PKDE4J involves several processes. Figure <xref ref-type="fig" rid="F3">3</xref> shows outline of method for biomedical verb construction. Each PubMed record consists of a title and abstract, and abstract was separated by sentence. The separated sentences were tokenized into words that were used to extract entities by mapping with the dictionaries. If more than two entities were extracted in a sentence, the related terms between them were extracted. If a certain term had a dependency relation satisfying a set of rules, with two entities on a dependency tree provided by the Stanford Core NLP (Manning et al., <xref ref-type="bibr" rid="B28">2014</xref>), the term was extracted as a candidate biomedical verb. Through this process, a total of 72,844 candidate terms were extracted.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Outline of method for biomedical verb construction.</p></caption>
<graphic xlink:href="frma-03-00007-g003.tif"/>
</fig>
<p>The candidate terms included verbs that could not explain biological interaction between entities, and spelling errors or incomprehensible terms. Thus, to retain only biomedical verbs, two additional filtering tasks were performed. First, to remove terms that were not verbs. WordNet (Miller, <xref ref-type="bibr" rid="B29">1995</xref>) and Wiktionary (<uri xlink:href="https://www.wiktionary.org/">https://www.wiktionary.org/</uri>) were used. WordNet is a dictionary database that provides meaning, part of speech, and thesaurus information for each word. Until recently, various studies using WordNet had been used as an ontology for text analytics (Goikoetxea et al., <xref ref-type="bibr" rid="B17">2016</xref>). Wiktionary is a Web dictionary that aims to create a multilingual dictionary as a Wiki project. At least one study recently used Wiktionary (Zesch et al., <xref ref-type="bibr" rid="B54">2008</xref>). We filtered candidate terms using two reliable dictionaries. After this process, 8,855 verbs remained. The second filtering process involved the selection of verb representing meaningful relations through a manual process. In this process, a Ph.D. student in the Department of Library and Information Science and a doctor in biology conducted manual filtering. Verbs representing the relationship between entities were extracted. In the case of transitive verbs, the relationship between subject and object was indicated. For other words, the verbs were directly related to two entities. However, if the extracted transitive verbs did not provide any meaning, we deleted these verbs, such as &#x0201C;investigate,&#x0201D; &#x0201C;survey,&#x0201D; and &#x0201C;study.&#x0201D; Moreover, intransitive verbs indicating relationships between entities with preposition were added. In case of a discrepancy between the opinions of experts, a decision was made through consultation. After all the filtering processes, 4,558 verbs were obtained in a final list.</p>
<p>To construct a verb dictionary, two tasks needed to be performed. The 4,558 verbs were grouped into similar types depending on their meanings. If a number of verbs were classified as belonging to a similar type, the relations derived from them could represent relatively small numbers of relation types. In this study, verbs were classified using the semantic relation of UMLS consisting of hierarchical relations, and were divided into 54 categories, where the six largest categories were &#x0201C;ISA,&#x0201D; &#x0201C;physically_related_to,&#x0201D; &#x0201C;conceptually_related_to,&#x0201D; &#x0201C;functionally_related_to,&#x0201D; &#x0201C;temporally_related_to,&#x0201D; and &#x0201C;spatially_related_to.&#x0201D; To classify the verbs more accurately, manual classification process was carried out. After the classification process, the nominalized forms of the verbs were added. Specifically, a relation between entities was not identified by only verbs. For example, in the sentence &#x0201C;Binding A and B,&#x0201D; the relation between A and B was identified through &#x0201C;binding.&#x0201D; A nominalized form and a gerund can be frequently used to identify relations. Therefore, it should include not only the verb form, but also the nominalized form and gerund. This process was also performed manually.</p>
<p>Finally, we constructed a list of 4,558 verbs including the semantic relations from UMLS, as well as the nominalized form and gerund of each. Table <xref ref-type="table" rid="T3">3</xref> shows an example of a list of constructed biomedical verbs. The entire verb list can be downloaded from the following URL: <uri xlink:href="http://informatics.yonsei.ac.kr/tsmm/data/Biomedical_Verb_List.xlsx">http://informatics.yonsei.ac.kr/tsmm/data/Biomedical_Verb_List.xlsx</uri>.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Top 10 biomedical verbs by frequency.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left">Rel. type</th>
<th valign="top" align="left">Verb</th>
<th valign="top" align="left">Nominalization</th>
<th valign="top" align="center">Freq.</th>
<th valign="top" align="left">Rel. type</th>
<th valign="top" align="left">Verb</th>
<th valign="top" align="left">Nominalization</th>
<th valign="top" align="center">Freq.</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Consists of</td>
<td align="left" valign="top">Have</td>
<td align="left" valign="top">Having</td>
<td align="center" valign="top">2,960,104</td>
<td align="left" valign="top">Disrupt</td>
<td align="left" valign="top">Inhibit</td>
<td align="left" valign="top">Inhibiting</td>
<td align="center" valign="top">1,443,204</td>
</tr>
<tr>
<td align="left" valign="top">Uses</td>
<td align="left" valign="top">Use</td>
<td align="left" valign="top">Using</td>
<td align="center" valign="top">2,744,217</td>
<td align="left" valign="top">Isa</td>
<td align="left" valign="top">Be</td>
<td align="left" valign="top">Being</td>
<td align="center" valign="top">1,327,238</td>
</tr>
<tr>
<td align="left" valign="top">Associate with</td>
<td align="left" valign="top">Associate</td>
<td align="left" valign="top">Associating/association</td>
<td align="center" valign="top">2,034,881</td>
<td align="left" valign="top">Indicate</td>
<td align="left" valign="top">Find</td>
<td align="left" valign="top">Finding</td>
<td align="center" valign="top">1,305,364</td>
</tr>
<tr>
<td align="left" valign="top">Affects</td>
<td align="left" valign="top">Increase</td>
<td align="left" valign="top">Increasing/increment</td>
<td align="center" valign="top">1,880,464</td>
<td align="left" valign="top">Disrupt</td>
<td align="left" valign="top">Reduce</td>
<td align="left" valign="top">Reducing/reduction</td>
<td align="center" valign="top">1,168,892</td>
</tr>
<tr>
<td align="left" valign="top">Causes</td>
<td align="left" valign="top">Induce</td>
<td align="left" valign="top">Inducing</td>
<td align="center" valign="top">1,461,874</td>
<td align="left" valign="top">Contains</td>
<td align="left" valign="top">Include</td>
<td align="left" valign="top">Including</td>
<td align="center" valign="top">1,114,958</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="S2-3">
<title>Event Detection</title>
<p>By considering contextual information in the NER process, we added an event trigger detection module to the original PKDE4J process as shown in Figure <xref ref-type="fig" rid="F4">4</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Architecture of event detection function.</p></caption>
<graphic xlink:href="frma-03-00007-g004.tif"/>
</fig>
<p>Event detection refers to the task of extracting descriptions of actions and relations from one or more entities from the biomedical literature (Bj&#x000F6;rne et al., <xref ref-type="bibr" rid="B8">2010</xref>). Events can function as participants in other events, thus allowing for the construction of complex conceptual networks. Events include complex interactions among biological entities, and are highly reliant on context (Miwa et al., <xref ref-type="bibr" rid="B30">2012</xref>). They are usually composed of triggers that are described as words or phrases indicating the occurrence of certain events, such as &#x0201C;inhibition&#x0201D; and &#x0201C;expression&#x0201D; (Rahul et al., <xref ref-type="bibr" rid="B36">2017</xref>). Figure <xref ref-type="fig" rid="F5">5</xref> shows an example of events detected from sentences. Thus, event trigger identification is essential to extract interactions among biological entities in a more precise manner. To this end, we developed a model to recognize trigger words in text and added it to our system.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>An example of event detection in a sentence.</p></caption>
<graphic xlink:href="frma-03-00007-g005.tif"/>
</fig>
<p>For event detection, we developed a Stanford Core NLP-based model using token-based features (lemmatization, POS tagging, and phrases) and dependency trees. The model was trained on the GENIA Event Extraction corpus (Kim et al., <xref ref-type="bibr" rid="B23">2012</xref>) by selecting 653 trigger words belonging to four event types&#x02014;gene expression, negative regulation, positive regulation, and regulation&#x02014;and 4,132 event sets. To create rules, we applied the k-shortest path algorithm to the dependency trees. Each path from a given entity to a trigger word can be a rule, and the frequency and percentage of its occurrence were analyzed. In a given corpus, the ranking of the generated rules is based on the distance between a trigger word and an entity, and its frequency and percentage of occurrence. Figure <xref ref-type="fig" rid="F6">6</xref> shows an example of rule generation. The event trigger detection module was added following entity extraction because events can be extracted after recognizing biological entities.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>An example of rule generation.</p></caption>
<graphic xlink:href="frma-03-00007-g006.tif"/>
</fig>
</sec>
</sec>
<sec id="S3">
<title>Evaluation</title>
<sec id="S3-1">
<title>Named Entity Extraction</title>
<p>To validate the updated version of the NER module of PKDE4J, we compared the performance of the original version with that of the updated versions in terms of <italic>F</italic>-score by mapping the extracted entities to biological entities in the PubMed papers collected from GOPubMed. On average, the updated version achieved a precision of 99.9%, a recall rate of 86.6%, and an <italic>F</italic>-score of 92.8% for each biomedical entity as shown in Figure <xref ref-type="fig" rid="F7">7</xref>.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Comparative evaluation (<italic>F</italic>-score) of dictionary-based named entity recognition by type of biomedical entity.</p></caption>
<graphic xlink:href="frma-03-00007-g007.tif"/>
</fig>
<p>When combining the UMLS with dictionary-based NER, we measured the <italic>F</italic>-score for 10 types of biomedical entity types. In Figure <xref ref-type="fig" rid="F8">8</xref>, for each entity type, the system yielded a precision of 98.1% of, a recall rate of 67.7%, and an <italic>F</italic>-score of 78.9% on average. The module combining the UMLS for entity extraction exhibited high precision and relatively low recall, as it assisted the natural language process by mapping the extracted entities to semantic entity types.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Evaluation of dictionary combined with Unified Medical Language System (UMLS) in the named entity recognition process by biomedical entity type.</p></caption>
<graphic xlink:href="frma-03-00007-g008.tif"/>
</fig>
<p>For machine learning-based NER, we exploited three prevalent models&#x02014;Abner, LingPipe, and CheNER&#x02014;with PKDE4J. The models were trained to recognize particular entity types as mentioned in the methodology section. Abner is a tool for cells and genes, and the entity type that can be identified by the CheNER model is limited to drugs. Therefore, we conducted an evaluation of each model with a limited number of entity types (gene, cell, and drug). As shown in Figure <xref ref-type="fig" rid="F9">9</xref>, the evaluation shows that when wrapping the Abner model with PKDE4J, in case of cells, the <italic>F</italic>-score of the model was 13% (<italic>P</italic>: 26%, <italic>R</italic>: 9%) and for genes was 31% (<italic>P</italic>: 30%, <italic>R</italic>: 33%). The CheNER model yielded an <italic>F</italic>-score of 4.8% (<italic>P</italic>: 17.3%, <italic>R</italic>: 2.8%). These results indicate that the machine learning-based approach requires a high-quality training model that represents the entire population, where an out-of-the-box train model for Abner and CheNER yields poor performance.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p><italic>F-</italic>scores of the machine learning and the Unified Medical Language System (UMLS)-combined machine learning-connected systems.</p></caption>
<graphic xlink:href="frma-03-00007-g009.tif"/>
</fig>
<p>When combining the UMLS with the machine learning-based system, we applied Abner and LingPipe model to PKDE4J for genes and cells. In this case, LingPipe was limited to tagging gene type. Integrating Abner and LingPipe into PKDE4J yielded low precision and recall. This can be attributed to the use of the out-of-the-box training model.</p>
</sec>
<sec id="S3-2">
<title>Relation Extraction</title>
<sec id="S3-2-1">
<title>Comparison between Biomedical Verb List and Predication in SemRep and UMLS</title>
<p>The entity relations extracted using the biomedical verb list were compared with the entity relations&#x02019; set constructed using SemMed (Rindflesch et al., <xref ref-type="bibr" rid="B39">2011</xref>) and the UMLS to confirm the agreement rate. SemMed is a database that stores a triple &#x0201C;subject&#x02013;predicate&#x02013;object&#x0201D; extracted by SemRep (Rindflesch and Fiszman, <xref ref-type="bibr" rid="B38">2003</xref>) from Medline. It provided triples extracted from 165,670,113 sentences rom approximately 26 million PubMed abstracts. Both the subject and object were biomedical entities, and a CUI of the UMLS was assigned to them. The predicate type was provided by SemRep, and represented 61 predicate types including positive and negative distinctions. In this study, the biomedical verb list was classified using the semantic relation of the UMLS. We collected semantic relations through the UMLS using the CUI of each entity provided by SemMed, and the matching rate was computed based on it. A total of 4,406,360 sentences from approximately 20 million containing a relation verb provided by SemMed were randomly selected. We checked the agreement rate by comparing SemMed predicates with the results of applying the biomedical verb list to the same sentences. Table <xref ref-type="table" rid="T4">4</xref> shows the correspondence ratio between the relation &#x0201C;subject entity&#x02013;object entity&#x0201D; extracted using PKDE4J based on the biomedical verbs and the relations provided by SemMed.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Concordance rate between PKDE4J and SemRep.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"/>
<th valign="top" align="center">Subject entity&#x02013;object entity</th>
<th valign="top" align="center">Subject&#x02013;predicate&#x02013;object</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Concordance rate (%)</td>
<td align="center" valign="top">33</td>
<td align="center" valign="top">8.4</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The &#x0201C;subject entity&#x02013;object entity&#x0201D; matching rate was calculated by matching the subject and object entities, in the context of relations extracted through PKDE4J, to entities provided by SemRep. Approximately 33% matched. We discarded entities with relation type is &#x0201C;none&#x0201D; or &#x0201C;juxtaposed.&#x0201D; The former indicated that the relation between the relevant entities was likely to be positioned close to them, where a verb was located in proximity. The relation type &#x0201C;juxtapose&#x0201D; meant that two entities cooccurred without being connected to each other <italic>via</italic> a meaningful verb. In addition to filtering by these two criteria, a reason for the low match rate is the preprocessing of entities in PKDE4J. Because entities were preprocessed, a significant number of entity pairs extracted by PKDE4J did not match with those extracted by SemRep.</p>
<p>The agreement rate of triples was approximately 8.4%, which was relatively low. This is for the following reasons: First, Semantic Relation in UMLS determines the predefined semantic relation based on the CUI. This means that although other expressions appeared in a sentence, semantic relation was always the same if the CUIs were the same. Therefore, even if the same verb was used in several sentences, the semantic relation of the sentence varied depending on the CUI of the two entities. Moreover, it was also difficult to clearly classify the verb according to semantic relations. For example, the verb &#x0201C;effect&#x0201D; can be classified into various semantic relations such as &#x0201C;interacts_with,&#x0201D; &#x0201C;affects,&#x0201D; &#x0201C;causes,&#x0201D; &#x0201C;associate_with,&#x0201D; &#x0201C;result_of,&#x0201D; and &#x0201C;derive_from.&#x0201D; Because of the difficulty of precise classification, the concordance rate of the predicate decreased.</p>
<p>Although the concordance rate was low, it was useful in two respects. First, as mentioned above, if the result was extracted using the UMLS semantic type, the semantic relation was determined depending on the CUI of the extracted entity. However, if a biomedical verb was used, it had the advantage whereby the relation could be extracted through information provided in the sentence. As the information in the sentence had been used, it was possible to extract a relation more suitable for the context. Second, even if two entities did not have a semantic relation, it was possible to find an entity relation using information in a sentence in PKDE4J. SemRep or UMLS only extract entity relations with a semantic relation even if entity relations appear in the sentence structure. On the contrary, if biomedical verbs are used in RE, more relations can be extracted.</p>
</sec>
<sec id="S3-2-2">
<title>Experiments on Entity-Entity RE</title>
<p>To measure the performance of the RE component of PKDE4J (PKDE4J-RE), we used five corpora with different characteristics and relation types as shown in Table <xref ref-type="table" rid="T5">5</xref>. AIMed is among best-known corpora for PPIs (Bunescu et al., <xref ref-type="bibr" rid="B12">2005</xref>). It contains 225 MEDLINE abstracts, and contains 1,955 sentences pertaining to proteins found in humans. The corpus was curated manually, and had 177 abstracts with PPI and 48 without. BioInfer and GAD are general RE corpora consisting of more than two entity types. Of the relation-type tags available in these corpora, we used only relation tagging for PPI. The BioInfer corpus is known for representing relationships among proteins, genes, and RNA (Pyysalo et al., <xref ref-type="bibr" rid="B35">2007</xref>). It contains 1,100 sentences from PubMed abstracts, and the sentences contain annotations concerning entity, entity relationship, and dependency. A total of 2,662 relationship appeared in 840 sentences and the remainder had no relations. GAD (Becker et al., <xref ref-type="bibr" rid="B5">2004</xref>) is a corpus that was semi-automatically annotated with three type relations: drug&#x02013;disease, target&#x02013;disease, and gene&#x02013;disease relationships. The corpus had 5,329 sentences containing 2,800 true interactions and 2,529 false associations. The former consisted of 1,833 positive interactions and 967 negative ones. HPRD50 was created as a corpus for RelEx based on a subset of the Human Protein Reference Database (HPRD) (Fundel et al., <xref ref-type="bibr" rid="B16">2006</xref>). The corpus contained (1) direct physical interactions, (2) regulatory relations, and (3) modifications (e.g., phosphorylation), which were manually annotated by two domain experts. It contained 145 sentences with a list of 433 PPI. IEPA, the Interaction Extraction Performance Assessment, is a corpus for PPIs consisting of 303 abstracts, with 486 sentences and 817 relations (Ding et al., <xref ref-type="bibr" rid="B14">2002</xref>).</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Five corpora for evaluation of relation extraction module.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left">Relation type</th>
<th valign="top" align="left">Corpus</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Protein&#x02013;protein interaction</td>
<td align="left" valign="top">AIMed, BioInfer, HPRD50, IEPA</td>
</tr>
<tr>
<td align="left" valign="top">Gene&#x02013;disease association</td>
<td align="left" valign="top">GAD</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To evaluate RE performance, we compared the PKDE4J-RE with five approaches&#x02014;SVM (Song et al., <xref ref-type="bibr" rid="B43">2014</xref>), co-occurrence (Pyysalo et al., <xref ref-type="bibr" rid="B34">2008</xref>), RelEx (Pyysalo et al., <xref ref-type="bibr" rid="B35">2007</xref>), PPInterFinder (Raja et al., <xref ref-type="bibr" rid="B37">2013</xref>), and Bui et al.&#x02019;s (Bui et al., <xref ref-type="bibr" rid="B11">2011</xref>) algorithm&#x02014;based on results for these methods from our previous study (Song et al., <xref ref-type="bibr" rid="B43">2014</xref>). The performance of each model was measured in terms of the F-score. SVM and co-occurrence are the two best-known approaches to RE. Other advanced algorithms could have been considered as well, such as CRFs, which recorded an <italic>F</italic>-score of 0.852 on 27,000 abstract from ISI in a study by Tang et al. (<xref ref-type="bibr" rid="B45">2015</xref>), or deep learning, which achieved an <italic>F</italic>-score of 0.613 in the SemEval-2010 Task 8 dataset in a study of Nguyen and Grishman (<xref ref-type="bibr" rid="B33">2015</xref>). However, our study intended to compare commonly used techniques for RE. PKDE4J-RE yielded the best performance.</p>
<p>RelEx is a RE technique that uses dependency trees and simple rules applied to these trees. PPInterFinder is specifically designed to extract PPIs by identifying relation keywords using a parser with Tregex and a relation keyword dictionary for 11 specific patterns based on the syntactic nature of PPI pairs. Bui et al.&#x02019;s approach is tuned to PPI extraction based on dependency trees and SVM. Most of these approaches have already been evaluated using AIMed, BioInfer, HPRD50, and IEPA. As shown in Table <xref ref-type="table" rid="T6">6</xref>, in the experiments, PKDE4J-RE outperformed the other five RE techniques over all corpora.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>Comparative assessment of relation extraction (RE) module.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left">Model</th>
<th valign="top" align="center">AIMed</th>
<th valign="top" align="center">BioInfer</th>
<th valign="top" align="center">HPRD50</th>
<th valign="top" align="center">IEPA</th>
<th valign="top" align="center">GAD</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Co-occurrence (Pyysalo et al., <xref ref-type="bibr" rid="B35">2007</xref>)</td>
<td align="center" valign="top">0.29</td>
<td align="center" valign="top">0.23</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">0.58</td>
<td align="center" valign="top">0.38</td>
</tr>
<tr>
<td align="left" valign="top">RelEx (Pyysalo et al., <xref ref-type="bibr" rid="B35">2007</xref>)</td>
<td align="center" valign="top">0.44</td>
<td align="center" valign="top">0.41</td>
<td align="center" valign="top">0.69</td>
<td align="center" valign="top">0.67</td>
<td align="center" valign="top">N/A</td>
</tr>
<tr>
<td align="left" valign="top">Bui et al. (<xref ref-type="bibr" rid="B11">2011</xref>)</td>
<td align="center" valign="top">0.51</td>
<td align="center" valign="top">0.59</td>
<td align="center" valign="top">0.72</td>
<td align="center" valign="top">0.73</td>
<td align="center" valign="top">N/A</td>
</tr>
<tr>
<td align="left" valign="top">PPInterFinder (Raja et al., <xref ref-type="bibr" rid="B37">2013</xref>)</td>
<td align="center" valign="top">0.57</td>
<td align="center" valign="top">N/A</td>
<td align="center" valign="top">0.52</td>
<td align="center" valign="top">N/A</td>
<td align="center" valign="top">N/A</td>
</tr>
<tr>
<td align="left" valign="top">Support vector machine (Song et al., <xref ref-type="bibr" rid="B43">2014</xref>)</td>
<td align="center" valign="top">0.47</td>
<td align="center" valign="top">0.83</td>
<td align="center" valign="top">0.54</td>
<td align="center" valign="top">0.74</td>
<td align="center" valign="top">0.76</td>
</tr>
<tr>
<td align="left" valign="top">RE component of PKDE4J</td>
<td align="center" valign="top">0.74</td>
<td align="center" valign="top">0.83</td>
<td align="center" valign="top">0.79</td>
<td align="center" valign="top">0.81</td>
<td align="center" valign="top">0.84</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Adapted from &#x0201C;Grounded feature selection for biomedical RE by the combinative approach,&#x0201D; by Song et al. (<xref ref-type="bibr" rid="B43">2014</xref>)</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec id="S3-3">
<title>Event Detection</title>
<p>Effective event detection requires in-depth analysis of sentence structure, and can benefit in particular from the use of semantic processing or deep parsing techniques that analyze both the syntactic and semantic structures of texts. Event detection is a daunting challenge, and is more complex and difficult than RE. We extended PKDE4J for event detection based on dependency trees and 17 rules applied to them. For example, in a dependency tree, if the dependency relation between a governor and a dependent contained a preposition property, such as &#x0201C;prep-at&#x0201D;&#x02009;&#x0002B;&#x02009;a noun phrase, the dependency relation was tagged as an event. Table <xref ref-type="table" rid="T7">7</xref> shows a sample sentence and the results of event detection based on the aforementioned rules.</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p>An example of the results of semantic parsing for event detection (Zhou and He, <xref ref-type="bibr" rid="B55">2011</xref>).</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td align="left" valign="top">Sentence</td>
<td align="left" valign="top">We concluded that CTCF expression and activity were controlled at transcriptional and posttranscriptional levels</td>
</tr>
<tr>
<td align="left" valign="top">Parse results</td>
<td align="left" valign="top">SS&#x02009;&#x0002B;&#x02009;protein (CTCF)<break/>SS&#x02009;&#x0002B;&#x02009;protein&#x02009;&#x0002B;&#x02009;gene expression (expression)<break/>SS&#x02009;&#x0002B;&#x02009;protein&#x02009;&#x0002B;&#x02009;gene expression&#x02009;&#x0002B;&#x02009;regulation (controlled levels)</td>
</tr>
<tr>
<td align="left" valign="top">Events</td>
<td align="left" valign="top">E1 gene expression: expression; theme: CTCF<break/>E2 regulation: controlled levels; theme: E1<break/>E3 regulation: controlled levels; theme: CTCF</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="S3-3-1">
<title>Experiments for Event Detection</title>
<p>To test the event detection performance of PKDE4J (PKDE4J-EVENT), we used the PASBIO corpus. It consists of 30 biomedicine-related predicates (Wattarujeekrit et al., <xref ref-type="bibr" rid="B48">2004</xref>), and was built using predicate&#x02013;argument structures from unstructured texts in the biomedical literature. The predicates available in PASBIO were from sentences mostly from Medline and the Embo, PNAS, NAR, and JV journals. We compared PKDE4J-EVENT with the Wattarujeekrit et al.&#x02019;s approach to PASBIO. The results are shown in Table <xref ref-type="table" rid="T8">8</xref>. The lexicon-based model consisted of six features: surface word, lemma form, head word of noun phrase, parts of speech, orthographic features, and phrase chunks. The PAS-based model consisted of features from the lexicon-based model as well as the predicate surface form, predicate lemma, voice, and surface syntactic role to represent the semantic roles of the arguments. The path model included features related to a syntactic path from the subject argument to the related predicate, and from the related predicate to the object argument. The head pair model included features of the PAS-based model and those representing a pair of subject and object heads. The trans/intrans model contained features of the PAS-based model and supplementary features indicating whether a predicate had been used in the transitive or intransitive sense. As shown in Table <xref ref-type="table" rid="T8">8</xref>, PKDE4J-EVENT outperformed the other five models on the predicate &#x0201C;regulate&#x0201D; but achieved the second-best performance on &#x0201C;associate,&#x0201D; where the lexicon-based model achieved the best performance. This was a surprising result, in that of models with the most sophisticated feature sets, the simplest one achieved the best performance in case of two predicates. As PKDE4J is intended for entity and RE, the performance of PKDE4J-EVENT was acceptable.</p>
<table-wrap position="float" id="T8">
<label>Table 8</label>
<caption><p>Performance results of seven algorithms for cases involving two predicates.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left" rowspan="2">Model</th>
<th valign="top" align="center" colspan="2">Agent or theme<hr/></th>
</tr><tr>
<th valign="top" align="center">Regulate (525)</th>
<th valign="top" align="center">Associate (377)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Lexicon based</td>
<td align="center" valign="top">61.87</td>
<td align="center" valign="top">52.09</td>
</tr>
<tr>
<td align="left" valign="top">PAS based</td>
<td align="center" valign="top">60.48</td>
<td align="center" valign="top">51.48</td>
</tr>
<tr>
<td align="left" valign="top">Path</td>
<td align="center" valign="top">60.13</td>
<td align="center" valign="top">51.29</td>
</tr>
<tr>
<td align="left" valign="top">Head pair</td>
<td align="center" valign="top">60.72</td>
<td align="center" valign="top">50.43</td>
</tr>
<tr>
<td align="left" valign="top">Trans/Intrans</td>
<td align="center" valign="top">60.01</td>
<td align="center" valign="top">51.40</td>
</tr>
<tr>
<td align="left" valign="top">PKDE4J-EVENT</td>
<td align="center" valign="top">63.32</td>
<td align="center" valign="top">52.05</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Adapted from &#x0201C;PASBio: predicate-argument structures for event extraction in molecular biology.&#x0201D; by Wattarujeekrit et al. (<xref ref-type="bibr" rid="B48">2004</xref>)</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="S4">
<title>Applications</title>
<p>The results of the evaluation show that PKDE4J is a useful and effective text-mining tool for NER, RE, and event extraction. As the results of these extraction tasks were sources for biomedical scientific knowledge representation, the results can be applied to knowledge discovery tasks such as literature-based discovery, hypothesis generation, and semantic annotation. In this section, we describe ongoing efforts to extend PKDE4J to knowledge discovery.</p>
<sec id="S4-1">
<title>Knowledge Search</title>
<p>PKDE4J can be applied to knowledge search. Figures <xref ref-type="fig" rid="F10">10</xref> and <xref ref-type="fig" rid="F11">11</xref> show screenshots of its application that is publicly available at <uri xlink:href="http://informatics.yonsei.ac.kr:8080/ner-re">http://informatics.yonsei.ac.kr:8080/ner-re</uri>. PKDE4J can be embedded into any search engine to render the search results more meaningful for users. For example, if users enter queries, the system searches and returns the matching PubMed records with the NER results. On the results&#x02019; page, the extracted entities and relation types recognized by PKDE4J are highlighted, and the list of their relation types is also provided.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Examples of the results&#x02019; page of the search engine: an annotated abstract.</p></caption>
<graphic xlink:href="frma-03-00007-g010.tif"/>
</fig>
<p>The results&#x02019; page consists of an annotated abstract and an annotated relation list. The relation list contains the types of relations between entities. The annotated abstract shows entities and relations extracted from the abstract that are highlighted in different colors by entity type, as shown in Figure <xref ref-type="fig" rid="F10">10</xref>. Moreover, the system provides a list of extracted relations of each entity and its relation type as well as a pie graph showing the ratio of the extracted types of entity relations (Figure <xref ref-type="fig" rid="F11">11</xref>).</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Example of the results&#x02019; page of search engine: an annotated relation list and a pie chart to show the ratio of annotated relation types.</p></caption>
<graphic xlink:href="frma-03-00007-g011.tif"/>
</fig>
</sec>
<sec id="S4-2">
<title>Knowledge Linking</title>
<p>Another application of PKDE4J is knowledge linking. Biomedical data are available from various types of data sources, including research articles, clinical data, and health-care-related social media. Thus, extracting entities and relations from these heterogeneous data sources requires that they be connected to one another for knowledge discovery. These entities and their relations can be organized and linked in the form of a multi-layer network as shown in Figure <xref ref-type="fig" rid="F12">12</xref>. With the constructed network, the links among entities within the same layer as well as with those from different layers can be analyzed. If the network is used for new hypothesis generation, it can provide more sophisticated and integrated hypotheses. Moreover, the multi-layer graph can be efficiently managed using a graph database such as Neo4J (Webber, <xref ref-type="bibr" rid="B49">2012</xref>).</p>
<fig id="F12" position="float">
<label>Figure 12</label>
<caption><p>Conceptual architecture of multi-layer analysis of knowledge network.</p></caption>
<graphic xlink:href="frma-03-00007-g012.tif"/>
</fig>
</sec>
<sec id="S4-3">
<title>Knowledge Inference</title>
<p>After searching and linking entities and their relations, we can infer knowledge as the next step of knowledge representation. The application of PKDE4J to knowledge inference can help us discover new relations or patterns based on the constructed knowledge networks. With the constructed network, PKDE4J can be applied to generate new plausible hypotheses for knowledge inference (Baek et al., <xref ref-type="bibr" rid="B3">2017</xref>), as proposed by Baek et al. (<xref ref-type="bibr" rid="B3">2017</xref>) for literature-based discovery. This application is accessible at <uri xlink:href="http://informatics.yonsei.ac.kr:8080/hypothsis_generator/index.html">http://informatics.yonsei.ac.kr:8080/hypothsis_generator/index.html</uri>.</p>
<p>Users can search by using multiple query terms (e.g., &#x0201C;vacuolation&#x0201D;), and the system returns the matching PubMed results, including the PubMed ID, abstract, and PubMed link to the article as shown in Figure <xref ref-type="fig" rid="F13">13</xref>. Moreover, the search terms that users enter into the system are highlighted in the results. After browsing the results list, users can select the PubMed records to be included to generate new hypotheses.</p>
<fig id="F13" position="float">
<label>Figure 13</label>
<caption><p>Search result page of the hypothesis generator system.</p></caption>
<graphic xlink:href="frma-03-00007-g013.tif"/>
</fig>
<p>When users click the &#x0201C;generate paths&#x0201D; button at the top right-hand corner of the results&#x02019; page, after choosing the number of abstracts, the results are displayed in the path analysis page as illustrated in Figure <xref ref-type="fig" rid="F14">14</xref>. In the path results&#x02019; page, entities extracted from the selected articles are listed on the left and the search bar on the right. Using the list of entities, users can generate paths by selecting two entities of interest. Based on Swanson&#x02019;s ABC model, users enter two entities (A and C terms) to generate plausible hypotheses, including none, one or multiple C-terms between A and C. Moreover, these generated paths are ranked by a semantic relatedness score. If there are connected paths between the extracted entities, they are displayed in order by relatedness score. For instance, if the resulting path is &#x0201C;Vacuolation-(CAUSES)-&#x02009;&#x0003E;&#x02009;Amphotericin B,&#x0201D; this can be interpreted as &#x0201C;vacuolation&#x0201D; and &#x0201C;amphotericin B&#x0201D; are linked <italic>via</italic> a path that implies a causality relation.</p>
<fig id="F14" position="float">
<label>Figure 14</label>
<caption><p>The results&#x02019; page for the generated paths.</p></caption>
<graphic xlink:href="frma-03-00007-g014.tif"/>
</fig>
<p>As demonstrated by the above three applications for representing scientific knowledge, PKDE4J serves a basis for effective and automatic knowledge discovery. It is not limited to knowledge extraction either, and can be adapted to other types of knowledge representation, such as the augmentation of ontology.</p>
</sec>
</sec>
<sec id="S5">
<title>Conclusion</title>
<p>Compared with the original PKDE4J, the upgraded version of PKDE4J was shown in this study to be a flexible system for knowledge representation. For named entity extraction, the following three options were added to it: (1) a dictionary with the UMLS, (2) machine learning, and (3) machine learning with the UMLS. For RE, verb expansion was added for more accurate detection of relations. For more precise extraction, the event trigger extraction module was attached to PKDE4J as part of the RE process based on the contextual information of sentences. The improved PKDE4J was verified to be effective compared with the original version as well as commonly used extraction techniques.</p>
<p>We also proposed applications of PKDE4J for knowledge representation. First, it enables knowledge search. By building a Web search system, PKDE4J helps users search for the extracted entities and their relations. Second, PKDE4J can be used to connect the extracted entities <italic>via</italic> a multi-layered network. Linking knowledge that connects parts of our knowledge can suggest new and plausible knowledge paths. Third, PKDE4J can be applied to knowledge inference. With the constructed knowledge network, PKDE4J can generate promising candidates&#x02019; hypotheses. Although we described only three applications of PKDE4J, other interesting and meaningful applications in biology as well as other domains could be developed.</p>
</sec>
<sec id="S6" sec-type="author-contributor">
<title>Author Contributions</title>
<p>MS (first and corresponding author) made substantial contributions to conception and design, and evaluation and application. He also gave final approval to the version to be submitted as well as revised versions. MK, KK, YK, and SJ participated in drafting the article and revising it.</p>
</sec>
<sec id="S7">
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> This work was supported by the Bio-Synergy Research Project (NRF-2013M3A9C4078138) of the Ministry of Science, ICT, and Future Planning through the National Research Foundation.</p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Aronson</surname> <given-names>A. R.</given-names></name></person-group> (<year>2001</year>). <article-title>&#x0201C;Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program,&#x0201D;</article-title> in <conf-name>Proceedings of the AMIA Symposium</conf-name> (<conf-loc>Washington, DC</conf-loc>: <conf-sponsor>American Medical Informatics Association</conf-sponsor>), <fpage>17</fpage>.</citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ashburner</surname> <given-names>M.</given-names></name> <name><surname>Ball</surname> <given-names>C. A.</given-names></name> <name><surname>Blake</surname> <given-names>J. A.</given-names></name> <name><surname>Botstein</surname> <given-names>D.</given-names></name> <name><surname>Butler</surname> <given-names>H.</given-names></name> <name><surname>Cherry</surname> <given-names>J. M.</given-names></name> <etal/></person-group> (<year>2000</year>). <article-title>Gene ontology: tool for the unification of biology</article-title>. <source>Nat. Genet.</source> <volume>25</volume>, <fpage>25</fpage>.<pub-id pub-id-type="doi">10.1038/75556</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baek</surname> <given-names>S. H.</given-names></name> <name><surname>Lee</surname> <given-names>D.</given-names></name> <name><surname>Kim</surname> <given-names>M.</given-names></name> <name><surname>Lee</surname> <given-names>J. H.</given-names></name> <name><surname>Song</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Enriching plausible new hypothesis generation in PubMed</article-title>. <source>PLoS ONE</source> <volume>12</volume>:<fpage>e0180539</fpage>.<pub-id pub-id-type="doi">10.1371/journal.pone.0180539</pub-id><pub-id pub-id-type="pmid">28678852</pub-id></citation></ref>
<ref id="B4"><citation citation-type="web"><person-group person-group-type="author"><name><surname>Baldwin</surname> <given-names>B.</given-names></name> <name><surname>Carpenter</surname> <given-names>B.</given-names></name></person-group> (<year>2003</year>). <source>LingPipe</source>. Available from World Wide Web: <uri xlink:href="http://alias-i.com/lingpipe/">http://alias-i.com/lingpipe/</uri></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Becker</surname> <given-names>K. G.</given-names></name> <name><surname>Barnes</surname> <given-names>K. C.</given-names></name> <name><surname>Bright</surname> <given-names>T. J.</given-names></name> <name><surname>Wang</surname> <given-names>S. A.</given-names></name></person-group> (<year>2004</year>). <article-title>The genetic association database</article-title>. <source>Nat. Genet.</source> <volume>36</volume>, <fpage>431</fpage>&#x02013;<lpage>432</lpage>.<pub-id pub-id-type="doi">10.1038/ng0504-431</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bell</surname> <given-names>L.</given-names></name> <name><surname>Chowdhary</surname> <given-names>R.</given-names></name> <name><surname>Liu</surname> <given-names>J. S.</given-names></name> <name><surname>Niu</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Integrated bio-entity network: a system for biological knowledge discovery</article-title>. <source>PLoS ONE</source> <volume>6</volume>:<fpage>e21474</fpage>.<pub-id pub-id-type="doi">10.1371/journal.pone.0021474</pub-id><pub-id pub-id-type="pmid">21738677</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Belleau</surname> <given-names>F.</given-names></name> <name><surname>Nolin</surname> <given-names>M. A.</given-names></name> <name><surname>Tourigny</surname> <given-names>N.</given-names></name> <name><surname>Rigault</surname> <given-names>P.</given-names></name> <name><surname>Morissette</surname> <given-names>J.</given-names></name></person-group> (<year>2008</year>). <article-title>Bio2RDF: towards a mashup to build bioinformatics knowledge systems</article-title>. <source>J. Biomed. Inform.</source> <volume>41</volume>, <fpage>706</fpage>&#x02013;<lpage>716</lpage>.<pub-id pub-id-type="doi">10.1016/j.jbi.2008.03.004</pub-id><pub-id pub-id-type="pmid">18472304</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bj&#x000F6;rne</surname> <given-names>J.</given-names></name> <name><surname>Ginter</surname> <given-names>F.</given-names></name> <name><surname>Pyysalo</surname> <given-names>S.</given-names></name> <name><surname>Tsujii</surname> <given-names>J. I.</given-names></name> <name><surname>Salakoski</surname> <given-names>T.</given-names></name></person-group> (<year>2010</year>). <article-title>Complex event extraction at PubMed scale</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>i382</fpage>&#x02013;<lpage>i390</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/btq180</pub-id><pub-id pub-id-type="pmid">20529932</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bodenreider</surname> <given-names>O.</given-names></name></person-group> (<year>2004</year>). <article-title>The unified medical language system (UMLS): integrating biomedical terminology</article-title>. <source>Nucleic Acids Res.</source> <volume>32</volume>(<issue>Suppl._1</issue>), <fpage>D267</fpage>&#x02013;<lpage>D270</lpage>.<pub-id pub-id-type="doi">10.1093/nar/gkh061</pub-id><pub-id pub-id-type="pmid">14681409</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bravo</surname> <given-names>&#x000C0;</given-names></name> <name><surname>Pi&#x000F1;ero</surname> <given-names>J.</given-names></name> <name><surname>Queralt-Rosinach</surname> <given-names>N.</given-names></name> <name><surname>Rautschka</surname> <given-names>M.</given-names></name> <name><surname>Furlong</surname> <given-names>L. I.</given-names></name></person-group> (<year>2015</year>). <article-title>Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research</article-title>. <source>BMC Bioinformatics</source> <volume>16</volume>:<fpage>55</fpage>.<pub-id pub-id-type="doi">10.1186/s12859-015-0472-9</pub-id><pub-id pub-id-type="pmid">25886734</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bui</surname> <given-names>Q. C.</given-names></name> <name><surname>Katrenko</surname> <given-names>S.</given-names></name> <name><surname>Sloot</surname> <given-names>P. M.</given-names></name></person-group> (<year>2011</year>). <article-title>A hybrid approach to extract protein&#x02013;protein interactions</article-title>. <source>Bioinformatics</source> <volume>27</volume>, <fpage>259</fpage>&#x02013;<lpage>265</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/btq620</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bunescu</surname> <given-names>R.</given-names></name> <name><surname>Ge</surname> <given-names>R.</given-names></name> <name><surname>Kate</surname> <given-names>R. J.</given-names></name> <name><surname>Marcotte</surname> <given-names>E. M.</given-names></name> <name><surname>Mooney</surname> <given-names>R. J.</given-names></name> <name><surname>Ramani</surname> <given-names>A. K.</given-names></name> <etal/></person-group> (<year>2005</year>). <article-title>Comparative experiments on learning information extractors for proteins and their interactions</article-title>. <source>Artif. Intell. Med.</source> <volume>33</volume>, <fpage>139</fpage>&#x02013;<lpage>155</lpage>.<pub-id pub-id-type="doi">10.1016/j.artmed.2004.07.016</pub-id><pub-id pub-id-type="pmid">15811782</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Caporaso</surname> <given-names>J. G.</given-names></name> <name><surname>Baumgartner</surname> <given-names>W. A.</given-names> <suffix>Jr.</suffix></name> <name><surname>Randolph</surname> <given-names>D. A.</given-names></name> <name><surname>Cohen</surname> <given-names>K. B.</given-names></name> <name><surname>Hunter</surname> <given-names>L.</given-names></name></person-group> (<year>2007</year>). <article-title>MutationFinder: a high-performance system for extracting point mutation mentions from text</article-title>. <source>Bioinformatics</source> <volume>23</volume>, <fpage>1862</fpage>&#x02013;<lpage>1865</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/btm235</pub-id><pub-id pub-id-type="pmid">17495998</pub-id></citation></ref>
<ref id="B14"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Ding</surname> <given-names>J.</given-names></name> <name><surname>Berleant</surname> <given-names>D.</given-names></name> <name><surname>Nettleton</surname> <given-names>D.</given-names></name> <name><surname>Wurtele</surname> <given-names>E.</given-names></name></person-group> (<year>2002</year>). <article-title>&#x0201C;Mining MEDLINE: abstracts, sentences, or phrases?,&#x0201D;</article-title> in <conf-name>Pacific Symposium on Biocomputing</conf-name> Vol. <volume>7</volume>, (<conf-loc>Kauai, HI</conf-loc>), <fpage>326</fpage>&#x02013;<lpage>337</lpage>.</citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doms</surname> <given-names>A.</given-names></name> <name><surname>Schroeder</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>GoPubMed: exploring PubMed with the gene ontology</article-title>. <source>Nucleic Acids Res.</source> <volume>33</volume>(<issue>Suppl._2</issue>), <fpage>W783</fpage>&#x02013;<lpage>W786</lpage>.<pub-id pub-id-type="doi">10.1093/nar/gki470</pub-id><pub-id pub-id-type="pmid">15980585</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fundel</surname> <given-names>K.</given-names></name> <name><surname>K&#x000FC;ffner</surname> <given-names>R.</given-names></name> <name><surname>Zimmer</surname> <given-names>R.</given-names></name></person-group> (<year>2006</year>). <article-title>RelEx&#x02014;relation extraction using dependency parse trees</article-title>. <source>Bioinformatics</source> <volume>23</volume>, <fpage>365</fpage>&#x02013;<lpage>371</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/btl616</pub-id></citation></ref>
<ref id="B17"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Goikoetxea</surname> <given-names>J.</given-names></name> <name><surname>Agirre</surname> <given-names>E.</given-names></name> <name><surname>Soroa</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Single or multiple? Combining word representations independently learned from text and WordNet,&#x0201D;</article-title> in <conf-name>Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence</conf-name>, (<conf-loc>Phoenix, AZ</conf-loc>: <conf-sponsor>AAAI Press</conf-sponsor>), <fpage>2608</fpage>&#x02013;<lpage>2614</lpage>.</citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanisch</surname> <given-names>D.</given-names></name> <name><surname>Fundel</surname> <given-names>K.</given-names></name> <name><surname>Mevissen</surname> <given-names>H. T.</given-names></name> <name><surname>Zimmer</surname> <given-names>R.</given-names></name> <name><surname>Fluck</surname> <given-names>J.</given-names></name></person-group> (<year>2005</year>). <article-title>ProMiner: rule-based protein and gene entity recognition</article-title>. <source>BMC Bioinformatics</source> <volume>6</volume>:<fpage>S14</fpage>.<pub-id pub-id-type="doi">10.1186/1471-2105-6-14</pub-id><pub-id pub-id-type="pmid">15960826</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holzinger</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Interactive machine learning for health informatics: when do we need the human-in-the-loop?</article-title> <source>Brain Inform.</source> <volume>3</volume>, <fpage>119</fpage>&#x02013;<lpage>131</lpage>.<pub-id pub-id-type="doi">10.1007/s40708-016-0042-6</pub-id><pub-id pub-id-type="pmid">27747607</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jelier</surname> <given-names>R.</given-names></name> <name><surname>Jenster</surname> <given-names>G.</given-names></name> <name><surname>Dorssers</surname> <given-names>L. C.</given-names></name> <name><surname>van der Eijk</surname> <given-names>C. C.</given-names></name> <name><surname>van Mulligen</surname> <given-names>E. M.</given-names></name> <name><surname>Mons</surname> <given-names>B.</given-names></name> <etal/></person-group> (<year>2005</year>). <article-title>Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes</article-title>. <source>Bioinformatics</source> <volume>21</volume>, <fpage>2049</fpage>&#x02013;<lpage>2058</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/bti268</pub-id><pub-id pub-id-type="pmid">15657104</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jimeno</surname> <given-names>A.</given-names></name> <name><surname>Jimenez-Ruiz</surname> <given-names>E.</given-names></name> <name><surname>Lee</surname> <given-names>V.</given-names></name> <name><surname>Gaudan</surname> <given-names>S.</given-names></name> <name><surname>Berlanga</surname> <given-names>R.</given-names></name> <name><surname>Rebholz-Schuhmann</surname> <given-names>D.</given-names></name></person-group> (<year>2008</year>). <article-title>Assessment of disease named entity recognition on a corpus of annotated sentences</article-title>. <source>BMC Bioinformatics</source> <volume>9</volume>(<issue>Suppl. 3</issue>):<fpage>S3</fpage>.<pub-id pub-id-type="doi">10.1186/1471-2105-9-S3-S3</pub-id><pub-id pub-id-type="pmid">18426548</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kanehisa</surname> <given-names>M.</given-names></name> <name><surname>Goto</surname> <given-names>S.</given-names></name></person-group> (<year>2000</year>). <article-title>KEGG: Kyoto encyclopedia of genes and genomes</article-title>. <source>Nucleic Acids Res</source>. <volume>28</volume>, <fpage>27</fpage>&#x02013;<lpage>30</lpage>.<pub-id pub-id-type="doi">10.1093/nar/28.1.27</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>J. D.</given-names></name> <name><surname>Ohta</surname> <given-names>T.</given-names></name> <name><surname>Tateisi</surname> <given-names>Y.</given-names></name> <name><surname>Tsujii</surname> <given-names>J. I.</given-names></name></person-group> (<year>2003</year>). <article-title>GENIA corpus&#x02014;a semantically annotated corpus for bio-textmining</article-title>. <source>Bioinformatics</source> <volume>19</volume>(<issue>Suppl._1</issue>), <fpage>i180</fpage>&#x02013;<lpage>i182</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/btg1023</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>J. D.</given-names></name> <name><surname>Nguyen</surname> <given-names>N.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Tsujii</surname> <given-names>J. I.</given-names></name> <name><surname>Takagi</surname> <given-names>T.</given-names></name> <name><surname>Yonezawa</surname> <given-names>A.</given-names></name></person-group> (<year>2012</year>). <article-title>The Genia event and protein coreference tasks of the BioNLP shared task 2011</article-title>. <source>BMC Bioinformatics</source> <volume>13</volume>(<issue>Suppl. 11</issue>):<fpage>s1</fpage>.<pub-id pub-id-type="doi">10.1186/1471-2105-13-S11-S1</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klinger</surname> <given-names>R.</given-names></name> <name><surname>Kol&#x000E1;&#x00159;ik</surname> <given-names>C.</given-names></name> <name><surname>Fluck</surname> <given-names>J.</given-names></name> <name><surname>Hofmann-Apitius</surname> <given-names>M.</given-names></name> <name><surname>Friedrich</surname> <given-names>C. M.</given-names></name></person-group> (<year>2008</year>). <article-title>Detection of IUPAC and IUPAC-like chemical names</article-title>. <source>Bioinformatics</source> <volume>24</volume>, <fpage>i268</fpage>&#x02013;<lpage>i276</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/btn181</pub-id><pub-id pub-id-type="pmid">18586724</pub-id></citation></ref>
<ref id="B25"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Kol&#x000E1;rik</surname> <given-names>C.</given-names></name> <name><surname>Klinger</surname> <given-names>R.</given-names></name> <name><surname>Friedrich</surname> <given-names>C.</given-names></name> <name><surname>Hofmann-Apitius</surname> <given-names>M.</given-names></name> <name><surname>Fluck</surname> <given-names>J.</given-names></name></person-group> (<year>2008</year>). <article-title>&#x0201C;Chemical names: terminological resources and corpora annotation,&#x0201D;</article-title> in <conf-name>Workshop on Building and Evaluating Resources for Biomedical Text Mining (6th Edition of the Language Resources and Evaluation Conference)</conf-name> (<conf-loc>Marrakech, Morocco</conf-loc>), <fpage>51</fpage>&#x02013;<lpage>58</lpage>.</citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Knox</surname> <given-names>C.</given-names></name> <name><surname>Law</surname> <given-names>V.</given-names></name> <name><surname>Jewison</surname> <given-names>T.</given-names></name> <name><surname>Liu</surname> <given-names>P.</given-names></name> <name><surname>Ly</surname> <given-names>S.</given-names></name> <name><surname>Frolkis</surname> <given-names>A.</given-names></name> <etal/></person-group> (<year>2011</year>). <article-title>DrugBank 3.0: a comprehensive resource for &#x02018;OMICS&#x02019; research on drugs</article-title> <source>Nucleic Acids Res</source>. <volume>39</volume>(<issue>Suppl. 1</issue>) <fpage>D1035</fpage>&#x02013;<lpage>D1041</lpage>.<pub-id pub-id-type="doi">10.1093/nar/gkq1126</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leaman</surname> <given-names>R.</given-names></name> <name><surname>Lu</surname> <given-names>Z.</given-names></name></person-group> (<year>2016</year>). <article-title>TaggerOne: joint named entity recognition and normalization with semi-Markov models</article-title>. <source>Bioinformatics</source> <volume>32</volume>, <fpage>2839</fpage>&#x02013;<lpage>2846</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/btw343</pub-id><pub-id pub-id-type="pmid">27283952</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Ross</surname> <given-names>K. E.</given-names></name> <name><surname>Arighi</surname> <given-names>C. N.</given-names></name> <name><surname>Peng</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>C. H.</given-names></name> <name><surname>Vijay-Shanker</surname> <given-names>K.</given-names></name></person-group> (<year>2015</year>). <article-title>miRTex: a text mining system for miRNA-gene relation extraction</article-title>. <source>PLoS Comput. Biol.</source> <volume>11</volume>:<fpage>e1004391</fpage>.<pub-id pub-id-type="doi">10.1371/journal.pcbi.1004391</pub-id><pub-id pub-id-type="pmid">26407127</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Yu</surname> <given-names>X.</given-names></name> <name><surname>Zack</surname> <given-names>D. J.</given-names></name> <name><surname>Zhu</surname> <given-names>H.</given-names></name> <name><surname>Qian</surname> <given-names>J.</given-names></name></person-group> (<year>2008</year>). <article-title>TiGER: a database for tissue-specific gene expression and regulation</article-title>. <source>TiGER: a database for tissue-specific gene expression and regulation</source>, <volume>9</volume>, <fpage>271</fpage>.</citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maglott</surname> <given-names>D.</given-names></name> <name><surname>Ostell</surname> <given-names>J.</given-names></name> <name><surname>Pruitt</surname> <given-names>K. D.</given-names></name> <name><surname>Tatusova</surname> <given-names>T.</given-names></name></person-group> (<year>2011</year>). <article-title>Entrez gene: gene-centered information at NCBI</article-title>. <source>Nucleic Acids Res</source>. <volume>39</volume>(<issue>Suppl. 1</issue>):<fpage>D52</fpage>&#x02013;<lpage>D57</lpage>.<pub-id pub-id-type="doi">10.1093/nar/gkq1237</pub-id></citation></ref>
<ref id="B28"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Manning</surname> <given-names>C. D.</given-names></name> <name><surname>Surdeanu</surname> <given-names>M.</given-names></name> <name><surname>Bauer</surname> <given-names>J.</given-names></name> <name><surname>Finkel</surname> <given-names>J. R.</given-names></name> <name><surname>Bethard</surname> <given-names>S.</given-names></name> <name><surname>McClosky</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>&#x0201C;The Stanford coreNLP natural language processing toolkit,&#x0201D;</article-title> in <conf-name>Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</conf-name> (<conf-loc>Baltimore, Maryland</conf-loc>: <conf-sponsor>Association for Computational Linguistics</conf-sponsor>), <fpage>55</fpage>&#x02013;<lpage>60</lpage>.</citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname> <given-names>G. A.</given-names></name></person-group> (<year>1995</year>). <article-title>WordNet: a lexical database for English</article-title>. <source>Commun. ACM</source> <volume>38</volume>, <fpage>39</fpage>&#x02013;<lpage>41</lpage>.<pub-id pub-id-type="doi">10.1145/219717.219748</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miwa</surname> <given-names>M.</given-names></name> <name><surname>Thompson</surname> <given-names>P.</given-names></name> <name><surname>Ananiadou</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>). <article-title>Boosting automatic event extraction from the literature using domain adaptation and coreference resolution</article-title>. <source>Bioinformatics</source> <volume>28</volume>, <fpage>1759</fpage>&#x02013;<lpage>1765</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/bts237</pub-id><pub-id pub-id-type="pmid">22539668</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Munkhdalai</surname> <given-names>T.</given-names></name> <name><surname>Li</surname> <given-names>M.</given-names></name> <name><surname>Batsuren</surname> <given-names>K.</given-names></name> <name><surname>Park</surname> <given-names>H. A.</given-names></name> <name><surname>Choi</surname> <given-names>N. H.</given-names></name> <name><surname>Ryu</surname> <given-names>K. H.</given-names></name></person-group> (<year>2015</year>). <article-title>Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations</article-title>. <source>J. Cheminform.</source> <volume>7</volume>, <fpage>S9</fpage>.<pub-id pub-id-type="doi">10.1186/1758-2946-7-S1-S9</pub-id><pub-id pub-id-type="pmid">25810780</pub-id></citation></ref>
<ref id="B32"><citation citation-type="book"><collab>National Research Council</collab>. (<year>2011</year>). <source>Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>National Academies Press</publisher-name>.</citation></ref>
<ref id="B33"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Nguyen</surname> <given-names>T. H.</given-names></name> <name><surname>Grishman</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Relation extraction: perspective from convolutional neural networks,&#x0201D;</article-title> in <conf-name>Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing</conf-name> (<conf-loc>Denver, Colorado</conf-loc>: <conf-sponsor>Association for Computational Linguistics</conf-sponsor>), <fpage>39</fpage>&#x02013;<lpage>48</lpage>.</citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pyysalo</surname> <given-names>S.</given-names></name> <name><surname>Airola</surname> <given-names>A.</given-names></name> <name><surname>Heimonen</surname> <given-names>J.</given-names></name> <name><surname>Bj&#x000F6;rne</surname> <given-names>J.</given-names></name> <name><surname>Ginter</surname> <given-names>F.</given-names></name> <name><surname>Salakoski</surname> <given-names>T.</given-names></name></person-group> (<year>2008</year>). <article-title>Comparative analysis of five protein-protein interaction corpora</article-title>. <source>BMC Bioinformatics</source> <volume>9</volume>:<fpage>S6</fpage>.<pub-id pub-id-type="doi">10.1186/1471-2105-9-S3-S6</pub-id><pub-id pub-id-type="pmid">18426551</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pyysalo</surname> <given-names>S.</given-names></name> <name><surname>Ginter</surname> <given-names>F.</given-names></name> <name><surname>Heimonen</surname> <given-names>J.</given-names></name> <name><surname>Bj&#x000F6;rne</surname> <given-names>J.</given-names></name> <name><surname>Boberg</surname> <given-names>J.</given-names></name> <name><surname>J&#x000E4;rvinen</surname> <given-names>J.</given-names></name> <etal/></person-group> (<year>2007</year>). <article-title>BioInfer: a corpus for information extraction in the biomedical domain</article-title>. <source>BMC Bioinformatics</source> <volume>8</volume>:<fpage>50</fpage>.<pub-id pub-id-type="doi">10.1186/1471-2105-8-50</pub-id><pub-id pub-id-type="pmid">17291334</pub-id></citation></ref>
<ref id="B36"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Rahul</surname> <given-names>P. V.</given-names></name> <name><surname>Sahu</surname> <given-names>S. K.</given-names></name> <name><surname>Anand</surname> <given-names>A.</given-names></name></person-group> (<year>2017</year>). <source>Biomedical Event Trigger Identification Using Bidirectional Recurrent Neural Network Based Models</source> <publisher-loc>Vancouver, Canada</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>, <fpage>316</fpage>&#x02013;<lpage>321</lpage>.</citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Raja</surname> <given-names>K.</given-names></name> <name><surname>Subramani</surname> <given-names>S.</given-names></name> <name><surname>Natarajan</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>PPInterFinder&#x02014;a mining tool for extracting causal relations on human proteins from literature</article-title>. <source>Database (Oxford)</source> <volume>2013</volume>, <fpage>bas052</fpage>.<pub-id pub-id-type="doi">10.1093/database/bas052</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rindflesch</surname> <given-names>T. C.</given-names></name> <name><surname>Fiszman</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). <article-title>The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text</article-title>. <source>J. Biomed. Inform.</source> <volume>36</volume>, <fpage>462</fpage>&#x02013;<lpage>477</lpage>.<pub-id pub-id-type="doi">10.1016/j.jbi.2003.11.003</pub-id><pub-id pub-id-type="pmid">14759819</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rindflesch</surname> <given-names>T. C.</given-names></name> <name><surname>Kilicoglu</surname> <given-names>H.</given-names></name> <name><surname>Fiszman</surname> <given-names>M.</given-names></name> <name><surname>Rosemblat</surname> <given-names>G.</given-names></name> <name><surname>Shin</surname> <given-names>D.</given-names></name></person-group> (<year>2011</year>). <article-title>Semantic MEDLINE: an advanced information management application for biomedicine</article-title>. <source>Inf. Serv. Use</source> <volume>31</volume>, <fpage>15</fpage>&#x02013;<lpage>21</lpage>.<pub-id pub-id-type="doi">10.3233/ISU-2011-0627</pub-id></citation></ref>
<ref id="B40"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Rindflesch</surname> <given-names>T. C.</given-names></name> <name><surname>Tanabe</surname> <given-names>L.</given-names></name> <name><surname>Weinstein</surname> <given-names>J. N.</given-names></name> <name><surname>Hunter</surname> <given-names>L.</given-names></name></person-group> (<year>2000</year>). <article-title>&#x0201C;EDGAR: extraction of drugs, genes and relations from the biomedical literature,&#x0201D;</article-title> in <conf-name>Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing</conf-name> (<conf-loc>Honolulu, Hawaii</conf-loc>: <conf-sponsor>NIH Public Access</conf-sponsor>), <fpage>517</fpage>.</citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Settles</surname> <given-names>B.</given-names></name></person-group> (<year>2005</year>). <article-title>ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text</article-title>. <source>Bioinformatics</source> <volume>21</volume>, <fpage>3191</fpage>&#x02013;<lpage>3192</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/bti475</pub-id><pub-id pub-id-type="pmid">15860559</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>M.</given-names></name> <name><surname>Kim</surname> <given-names>W. C.</given-names></name> <name><surname>Lee</surname> <given-names>D.</given-names></name> <name><surname>Heo</surname> <given-names>G. E.</given-names></name> <name><surname>Kang</surname> <given-names>K. Y.</given-names></name></person-group> (<year>2015</year>). <article-title>PKDE4J: entity and relation extraction for public knowledge discovery</article-title>. <source>J. Biomed. Inform.</source> <volume>57</volume>, <fpage>320</fpage>&#x02013;<lpage>332</lpage>.<pub-id pub-id-type="doi">10.1016/j.jbi.2015.08.008</pub-id><pub-id pub-id-type="pmid">26277115</pub-id></citation></ref>
<ref id="B43"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>S. J.</given-names></name> <name><surname>Heo</surname> <given-names>G. E.</given-names></name> <name><surname>Kim</surname> <given-names>H. J.</given-names></name> <name><surname>Jung</surname> <given-names>H. J.</given-names></name> <name><surname>Kim</surname> <given-names>Y. H.</given-names></name> <name><surname>Song</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>&#x0201C;Grounded feature selection for biomedical relation extraction by the combinative approach,&#x0201D;</article-title> in <conf-name>Proceedings of the ACM 8th International Workshop on Data and Text Mining in Bioinformatics</conf-name> (<conf-loc>Shanghai, China</conf-loc>: <conf-sponsor>ACM</conf-sponsor>), <fpage>29</fpage>&#x02013;<lpage>32</lpage>.</citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Swanson</surname> <given-names>D. R.</given-names></name></person-group> (<year>1986</year>). <article-title>Fish oil, Raynaud&#x02019;s syndrome, and undiscovered public knowledge</article-title>. <source>Perspect. Biol. Med.</source> <volume>30</volume>, <fpage>7</fpage>&#x02013;<lpage>18</lpage>.<pub-id pub-id-type="doi">10.1353/pbm.1986.0087</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>B.</given-names></name> <name><surname>Feng</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Jiang</surname> <given-names>M.</given-names></name> <etal/></person-group> (<year>2015</year>). <article-title>A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature</article-title>. <source>J. Cheminform.</source> <volume>7</volume>, <fpage>S8</fpage>.<pub-id pub-id-type="doi">10.1186/1758-2946-7-S1-S8</pub-id><pub-id pub-id-type="pmid">25810779</pub-id></citation></ref>
<ref id="B46"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Thomas</surname> <given-names>P.</given-names></name> <name><surname>Solt</surname> <given-names>I.</given-names></name> <name><surname>Klinger</surname> <given-names>R.</given-names></name> <name><surname>Leser</surname> <given-names>U.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Learning protein protein interaction extraction using distant supervision,&#x0201D;</article-title> in <conf-name>Proceedings of Robust Unsupervised and Semi-Supervised Methods in Natural Language Processing (Workshop at International Conference Recent Advances in Natural Language Processing)</conf-name>, (<conf-loc>Hissar, Bulgaria</conf-loc>: <conf-sponsor>INCOMA Ltd</conf-sponsor>).</citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Usi&#x000E9;</surname> <given-names>A.</given-names></name> <name><surname>Alves</surname> <given-names>R.</given-names></name> <name><surname>Solsona</surname> <given-names>F.</given-names></name> <name><surname>V&#x000E1;zquez</surname> <given-names>M.</given-names></name> <name><surname>Valencia</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>CheNER: chemical named entity recognizer</article-title>. <source>Bioinformatics</source> <volume>30</volume>, <fpage>1039</fpage>&#x02013;<lpage>1040</lpage>.<pub-id pub-id-type="doi">10.1093/bioinformatics/btt639</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wattarujeekrit</surname> <given-names>T.</given-names></name> <name><surname>Shah</surname> <given-names>P. K.</given-names></name> <name><surname>Collier</surname> <given-names>N.</given-names></name></person-group> (<year>2004</year>). <article-title>PASBio: predicate-argument structures for event extraction in molecular biology</article-title>. <source>BMC Bioinformatics</source> <volume>5</volume>:<fpage>155</fpage>.<pub-id pub-id-type="doi">10.1186/1471-2105-5-155</pub-id><pub-id pub-id-type="pmid">15494078</pub-id></citation></ref>
<ref id="B49"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Webber</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;A programmatic introduction to neo4j,&#x0201D;</article-title> in <conf-name>Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity</conf-name> (<conf-loc>Tucson, Arizona</conf-loc>: <conf-sponsor>ACM</conf-sponsor>), <fpage>217</fpage>&#x02013;<lpage>218</lpage>.</citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wishart</surname> <given-names>D. S.</given-names></name> <name><surname>Knox</surname> <given-names>C.</given-names></name> <name><surname>Guo</surname> <given-names>A. C.</given-names></name> <name><surname>Shrivastava</surname> <given-names>S.</given-names></name> <name><surname>Hassanali</surname> <given-names>M.</given-names></name> <name><surname>Stothard</surname> <given-names>P.</given-names></name> <etal/></person-group> (<year>2006</year>). <article-title>DrugBank: a comprehensive resource for in silico drug discovery and exploration</article-title>. <source>Nucleic. Acids Res.</source> <volume>34</volume>(<issue>Suppl_1</issue>), <fpage>D668</fpage>&#x02013;<lpage>D672</lpage>.<pub-id pub-id-type="doi">10.1093/nar/gkj067</pub-id><pub-id pub-id-type="pmid">16381955</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wishart</surname> <given-names>D. S.</given-names></name> <name><surname>Jewison</surname> <given-names>T.</given-names></name> <name><surname>Guo</surname> <given-names>A. C.</given-names></name> <name><surname>Wilson</surname> <given-names>M.</given-names></name> <name><surname>Knox</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <etal/></person-group> (<year>2012</year>). <article-title>HMDB 3.0 &#x02013; the human metabolome database in 2013</article-title>. <source>Nucleic Acids Res</source> <volume>41</volume>, <fpage>D801</fpage>&#x02013;<lpage>D807</lpage>.</citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>W.</given-names></name> <name><surname>Soares</surname> <given-names>J.</given-names></name> <name><surname>Greninger</surname> <given-names>P.</given-names></name> <name><surname>Edelman</surname> <given-names>E. J.</given-names></name> <name><surname>Lightfoot</surname> <given-names>H.</given-names></name> <name><surname>Forbes</surname> <given-names>S.</given-names></name> <etal/></person-group> (<year>2013</year>). <article-title>Genomics of drug sensitivity in cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells</article-title>. <source>Nucleic Acids Res</source> <volume>41</volume>, <fpage>D955</fpage>&#x02013;<lpage>D961</lpage>.<pub-id pub-id-type="doi">10.1093/nar/gks1111</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Lin</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name></person-group> (<year>2008</year>). <article-title>Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature</article-title>. <source>Comput. Biol. Chem.</source> <volume>32</volume>, <fpage>287</fpage>&#x02013;<lpage>291</lpage>.<pub-id pub-id-type="doi">10.1016/j.compbiolchem.2008.03.008</pub-id><pub-id pub-id-type="pmid">18467180</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yeh</surname> <given-names>A.</given-names></name> <name><surname>Morgan</surname> <given-names>A.</given-names></name> <name><surname>Colosimo</surname> <given-names>M.</given-names></name> <name><surname>Hirschman</surname> <given-names>L.</given-names></name></person-group> (<year>2005</year>). <article-title>BioCreAtIvE task 1A: gene mention finding evaluation</article-title> in <source>BMC Bioinformatics</source>, <volume>6</volume>:<fpage>S2</fpage>.<pub-id pub-id-type="doi">10.1186/1471-2105-6-2</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yimam</surname> <given-names>S. M.</given-names></name> <name><surname>Biemann</surname> <given-names>C.</given-names></name> <name><surname>Majnaric</surname> <given-names>L.</given-names></name> <name><surname>&#x00160;abanovi&#x00107;</surname> <given-names>&#x00160;</given-names></name> <name><surname>Holzinger</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>An adaptive annotation approach for biomedical entity and relation recognition</article-title>. <source>Brain Inform.</source> <volume>3</volume>, <fpage>157</fpage>&#x02013;<lpage>168</lpage>.<pub-id pub-id-type="doi">10.1007/s40708-016-0036-4</pub-id><pub-id pub-id-type="pmid">27747591</pub-id></citation></ref>
<ref id="B54"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Zesch</surname> <given-names>T.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>C.</given-names></name> <name><surname>Gurevych</surname> <given-names>I.</given-names></name></person-group> (<year>2008</year>). <article-title>&#x0201C;Using wiktionary for computing semantic relatedness,&#x0201D;</article-title> in <conf-name>Proceedings of the 23rd National Conference on Artificial Intelligence</conf-name>, Vol. <volume>2</volume>, (<conf-loc>Chicago, IL</conf-loc>: <conf-sponsor>AAAI Press</conf-sponsor>), <fpage>861</fpage>&#x02013;<lpage>866</lpage>.</citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>D.</given-names></name> <name><surname>He</surname> <given-names>Y.</given-names></name></person-group> (<year>2011</year>). <article-title>Biomedical events extraction using the hidden vector state model [Table]</article-title>. <source>Artif. Intell. Med.</source> <volume>53</volume>, <fpage>205</fpage>&#x02013;<lpage>213</lpage>.<pub-id pub-id-type="doi">10.1016/j.artmed.2011.08.002</pub-id></citation></ref>
</ref-list>
</back>
</article>