<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Cell Dev. Biol.</journal-id>
<journal-title>Frontiers in Cell and Developmental Biology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Cell Dev. Biol.</abbrev-journal-title>
<issn pub-type="epub">2296-634X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fcell.2021.735687</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Cell and Developmental Biology</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Heterogeneous Information Network-Based Patient Similarity Search</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Huang</surname> <given-names>Hao-zhe</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1397955/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Lu</surname> <given-names>Xu-dong</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1392158/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Guo</surname> <given-names>Wei</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Jiang</surname> <given-names>Xin-bo</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Yan</surname> <given-names>Zhong-min</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Shi-peng</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1392136/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Software, Shandong University</institution>, <addr-line>Jinan</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University</institution>, <addr-line>Jinan</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>Shandong Provincial Key Laboratory of Software Engineering, Shandong University</institution>, <addr-line>Jinan</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Liang Cheng, Harbin Medical University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Hongbo Sun, Yantai University, China; Di Wang, Nanyang Technological University, Singapore</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Xu-dong Lu <email>dongxul&#x00040;sdu.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Molecular and Cellular Pathology, a section of the journal Frontiers in Cell and Developmental Biology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>08</day>
<month>09</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>9</volume>
<elocation-id>735687</elocation-id>
<history>
<date date-type="received">
<day>03</day>
<month>07</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>07</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Huang, Lu, Guo, Jiang, Yan and Wang.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Huang, Lu, Guo, Jiang, Yan and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract><p>Patient similarity search is a fundamental and important task in artificial intelligence-assisted medicine service, which is beneficial to medical diagnosis, such as making accurate predictions for similar diseases and recommending personalized treatment plans. Existing patient similarity search methods retrieve medical events associated with patients from Electronic Health Record (EHR) data and map them to vectors. The similarity between patients is expressed by calculating the similarity or dissimilarity between the corresponding vectors of medical events, thereby completing the patient similarity measurement. However, the obtained vectors tend to be high dimensional and sparse, which makes it hard to calculate patient similarity accurately. In addition, most of existing methods cannot capture the time information in the EHR, which is not conducive to analyzing the influence of time factors on patient similarity search. To solve these problems, we propose a patient similarity search method based on a heterogeneous information network. On the one hand, the proposed method uses a heterogeneous information network to connect patients, diseases, and drugs, which solves the problem of vector representation of mixed information related to patients, diseases, and drugs. Meanwhile, our method measures the similarity between patients by calculating the similarity between nodes in the heterogeneous information network. In this way, the challenges caused by high-dimensional and sparse vectors can be addressed. On the other hand, the proposed method solves the problem of inaccurate patient similarity search caused by the lack of use of time information in the patient similarity measurement process by encoding time information into an annotated heterogeneous information network. Experiments show that our method is better than the compared baseline methods.</p></abstract>
<kwd-group>
<kwd>heterogeneous information network</kwd>
<kwd>clinical similarity</kwd>
<kwd>electronic health records</kwd>
<kwd>patient similarity search</kwd>
<kwd>weighted meta path</kwd>
</kwd-group>
<contract-num rid="cn001">No. 2019YFB1705904</contract-num>
<contract-sponsor id="cn001">National Key Research and Development Program of China<named-content content-type="fundref-id">10.13039/501100012166</named-content></contract-sponsor>
<counts>
<fig-count count="8"/>
<table-count count="4"/>
<equation-count count="4"/>
<ref-count count="33"/>
<page-count count="10"/>
<word-count count="6960"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Patient similarity search has been identified as one of the key techniques in artificial intelligence (AI) medicine service, which is beneficial to medical diagnosis, such as making accurate predictions for similar diseases and recommending personalized treatment plans (Sharafoddini et al., <xref ref-type="bibr" rid="B18">2017</xref>). Generally speaking, patient similarity analysis involves selecting certain clinical records as features of patients in a specific medical environment, then quantitatively analyzing the distance between them. A proper similarity measure should support various downstream applications, such as personalized medicine recommendation (Zhang et al., <xref ref-type="bibr" rid="B32">2014</xref>; Lee et al., <xref ref-type="bibr" rid="B11">2015</xref>), target patient retrieval (Sun et al., <xref ref-type="bibr" rid="B20">2012</xref>), medical diagnoses (Gottlieb et al., <xref ref-type="bibr" rid="B6">2013</xref>), and cohort study (Che et al., <xref ref-type="bibr" rid="B3">2017</xref>).</p>
<p>The wide availability of Electronic Health Records (EHRs) makes it possible to quickly and accurately calculate the similarity between patients. Many similarity learning methods have been proposed (Tsevas and Iakovidis, <xref ref-type="bibr" rid="B24">2011</xref>; Wang et al., <xref ref-type="bibr" rid="B26">2012b</xref>; Barkhordari and Niamanesh, <xref ref-type="bibr" rid="B1">2015</xref>; Wang and Sun, <xref ref-type="bibr" rid="B27">2015</xref>; Sha et al., <xref ref-type="bibr" rid="B17">2016</xref>; Zhan et al., <xref ref-type="bibr" rid="B30">2016</xref>; Sharafoddini et al., <xref ref-type="bibr" rid="B18">2017</xref>; Huai et al., <xref ref-type="bibr" rid="B9">2018</xref>; Suo et al., <xref ref-type="bibr" rid="B23">2018</xref>) on healthcare datasets. Existing methods have successfully derived the similarity measure from EHRs data through mapping the medical events into vector spaces. However, EHRs contain a variety of data (diagnostics, drugs, etc.) and a large number of medical events, which usually results in high-dimensional embedding vectors.</p>
<p>Heterogeneous information network (HIN) contains rich structure and semantic information, and it can effectively solve the problem caused by the high-dimensional and sparse embedding vectors. For calculating the similarity of patients, the diseases and drugs used by patients provide essential information. The patient&#x00027;s disease is critical to the doctor&#x00027;s clinical decision. At the same time, the patient&#x00027;s disease is basically determined by the patient&#x00027;s clinical symptoms and clinical indicators. It can be said that the disease is a comprehensive reflection of clinical indicators. The medicine is the solution made by the doctor to the patient&#x00027;s disease and symptoms, and is the final manifestation of the doctor&#x00027;s clinical decision. Therefore, it is easy to think that patients, diseases, and drugs can be connected to form HIN.</p>
<p>However, there are many duplicate diseases and drugs in the EHRs, meaning that if we were to use classic HIN modeling techniques with the above schema, we would lose the correlation information between patients and drugs. Considering this problem, we propose a kind of HIN with annotation: that is, in links connecting diseases and drugs, we add an annotation of patient information to enrich the original network with the information between patients and drugs. We call it annotated HIN. On the annotated HIN, we propose a novel node similarity measure S-PathSim to calculate patient similarity. As a node similarity measure, S-PathSim enjoys some good properties, like symmetric and self-maximum.</p>
<p>On the other hand, temporal information is crucial to understand the dynamics of medical expressions. To leverage the essential temporal information for patient similarity evaluation, we propose to use N-disease to encode temporal information into annotated HINs. N-disease is inspired by the N-grams model in natural language processing. Its basic idea is to arrange the patients&#x00027; diseases into time series according to the time they are developed, sequentially collect the N-grams from the disease sequences, and then replace the disease object with the disease N-grams in the annotated HIN. The collected N-grams from the disease time series are called N-diseases.</p>
<p>Finally, two patient similarity search methods, MBH (method based on annotated HIN) and MBHT (method based on annotated HIN and temporal information), were defined according to S-PathSim and N-disease.</p>
<p>The remainder of this paper is structured as follows. The second section reviews the related research work on the topic of patient similarity analysis and heterogeneous information network, while the third section provides some preliminaries on HIN and shows the limitation of HIN to the calculation of patient similarity. In the fourth section, we introduced our method in detail. The experimental results and comparative analysis are shown in section five. Finally, the last section summarizes this paper and discusses some possible avenues for future research.</p>
</sec>
<sec id="s2">
<title>2. Related Work</title>
<p>In this section, we review some related works on evaluating patient similarity and heterogeneous information network.</p>
<p>Studying patient similarity has practical significance in many applications (Lee et al., <xref ref-type="bibr" rid="B11">2015</xref>; Li et al., <xref ref-type="bibr" rid="B13">2015</xref>). Ng et al. provided personalized predictive healthcare model by matching clinical similar patients with a locally supervised metric learning measure (Ng et al., <xref ref-type="bibr" rid="B15">2015</xref>). An integrated method for personalized modeling (IMPM) was proposed to provide personalized treatment and personalized drug design (Kasabov and Hu, <xref ref-type="bibr" rid="B10">2010</xref>). The data-driven clinical decision support system was combined with patient similarity (Xia et al., <xref ref-type="bibr" rid="B29">2019</xref>).</p>
<p>At present, there are many studies to calculate the similarity of patients. Zhang et al. combined patient similarity and drug similarity analysis and proposed a heterogeneous label propagation method to identify which drug is likely to be effective for a given patient (Zhang et al., <xref ref-type="bibr" rid="B32">2014</xref>). Chan et al. proposed a patient similarity algorithm named SimSvm that uses support vector machine to weight the similarity measures (Chan et al., <xref ref-type="bibr" rid="B2">2010</xref>). Wang et al. proposed a patient similarity based disease prognosis strategy named SimProX (Wang et al., <xref ref-type="bibr" rid="B25">2012a</xref>). This model used a local spline regression based method to embed these patient events into an intrinsic space, and then measure the patient similarity by the Euclidean distance in an embedded space. However, these methods do not leverage temporal information to evaluate patient similarities, which prevents them from delivering.</p>
<p>Cheng et al. (<xref ref-type="bibr" rid="B5">2016</xref>) took temporal information into consideration and proposed an adjustable temporal fusion scheme using CNN-extracted features. This method is a supervised model, but the label data are not easy to obtain, which limits its use, and the method lacks interpretability. Zhu et al. proposed the method to solve the problem of high-dimensional vectors and time series (Zhu et al., <xref ref-type="bibr" rid="B33">2016</xref>). They embed medical events from HER into fixed-length vectors, but fixed-length vectors are difficult to obtain complete medical event information.</p>
<p>As mentioned above, the current method of measuring patient similarity is limited, and a better method is needed to calculate patient similarity.</p>
<p>Since, Sun et al. proposed the concept of HIN (Sun and Han, <xref ref-type="bibr" rid="B21">2010</xref>), and the meta path concept subsequently (Sun and Han, <xref ref-type="bibr" rid="B22">2011</xref>), HIN analysis becomes a hot topic rapidly in the fields of data mining, database, and information retrieval. He et al. incorporated temporal information for similarity search in HINs by assigning different weights to the paths built at different time (He et al., <xref ref-type="bibr" rid="B7">2014</xref>). But this method is not suitable for the annotated HIN proposed in this paper. In order to evaluate the relevance of different-typed objects, Shi et al. (<xref ref-type="bibr" rid="B19">2014</xref>) proposed HeteSim to measure the relevance of any object pairs under arbitrary meta paths. As an adaption of HeteSim, LSH-HeteSim (Li et al., <xref ref-type="bibr" rid="B12">2014</xref>) is proposed to mine the drug&#x02013;target interaction in heterogeneous biological networks where drugs and targets are connected with complicated semantic paths. In order to overcome the shortcoming of HeteSim in high computation and memory demand, Meng et al. (<xref ref-type="bibr" rid="B14">2014</xref>) proposed the AvgSim measure that evaluates similarity score through two random walk processes along the given meta path and the reversed meta path, respectively. In order to overcome the problem that the meta path can only express simple information, Cheng et al. (<xref ref-type="bibr" rid="B4">2017</xref>) proposed meta structure to measure the similarity between the objects. Until today, HINs have been widely used in other fields (Wang, <xref ref-type="bibr" rid="B8">2019</xref>; Wang et al., <xref ref-type="bibr" rid="B28">2020</xref>; Zhang et al., <xref ref-type="bibr" rid="B31">2020</xref>).</p>
<p>HIN rarely results in high-dimensional vectors, and most similarity calculation methods based on HIN have good interpretability. But it cannot be perfectly applied to patient similarity calculation, so in this paper, we propose an improved method, annotated HIN, which can be well-applied to calculate the similarity of patients.</p>
</sec>
<sec id="s3">
<title>3. Preliminaries</title>
<p>In this section, as preliminaries, we will detail the HIN and its limitation in measures patient similarity.</p>
<sec>
<title>3.1. HIN</title>
<p>An information network is defined as a directed graph <italic>G</italic> &#x0003D; (<italic>V, E</italic>) with an object type mapping function &#x003C8;: <italic>V</italic> &#x02192; <italic>A</italic> and a link type mapping function &#x003C6;: <italic>E</italic> &#x02192; <italic>R</italic>, in which each object <italic>v</italic> &#x02208; <italic>V</italic> belongs to a particular object type &#x003C8; (<italic>v</italic>) &#x02208; <italic>A</italic> while each link <italic>e</italic> &#x02208; <italic>E</italic> belongs to a particular relation &#x003C6; (<italic>e</italic>) &#x02208; <italic>R</italic>.</p>
<p>Different from the traditional network definition, we explicitly distinguish the object types and relationship types in these networks. When the types of objects |<italic>A</italic>| &#x0003E; 1 or the types of relations |<italic>R</italic>| &#x0003E; 1, the network is referred to as a heterogeneous information network; otherwise, it is a homogeneous information network.</p>
</sec>
<sec>
<title>3.2. Limitation of HIN</title>
<p>HIN can link patients, diseases, and drugs. As shown in <xref ref-type="fig" rid="F1">Figure 1</xref>, we can get the network schema of the patient HIN. P, D, and M represent patient, disease, and medicine, respectively.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Network schema of the patient heterogeneous information network.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcell-09-735687-g0001.tif"/>
</fig>
<p>There may be many kinds of drugs to treat one disease, and one drug can also cure many diseases, which leads to some incorrect information in the traditional heterogeneous information network when connecting patients, diseases, and drugs. We use a specific example below to illustrate this problem.</p>
<p><xref ref-type="table" rid="T1">Table 1</xref> presents three inpatient records for two patients, all of which were diagnosed with the same disease; patient 231 was hospitalized twice. From the data in <xref ref-type="table" rid="T1">Table 1</xref>, the HIN in <xref ref-type="fig" rid="F2">Figure 2</xref> is obtained. However, the HIN shown in <xref ref-type="fig" rid="F2">Figure 2</xref> has two problems. First of all, we need to know that patient 231 has been hospitalized twice, but this information cannot be obtained through <xref ref-type="fig" rid="F2">Figure 2</xref>. Second, patient 231 does not use perindopril in treatment, but the information we get from the heterogeneous information network is that there is a relation between patient 231 and perindopril, which leads to the incorporation of misleading information. Therefore, traditional HIN-based measurement methods are not suitable for our problem.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Example of case information.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Hospital ID</bold></th>
<th valign="top" align="center"><bold>Patient ID</bold></th>
<th valign="top" align="left"><bold>Disease</bold></th>
<th valign="top" align="left"><bold>Medicine</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">564435</td>
<td valign="top" align="center">231</td>
<td valign="top" align="left">Arteriosclerotic heart disease</td>
<td valign="top" align="left">Atorvastatin, Bisoprolol, Clopidogrel</td>
</tr>
<tr>
<td valign="top" align="left">561657</td>
<td valign="top" align="center">200</td>
<td valign="top" align="left">Arteriosclerotic heart disease</td>
<td valign="top" align="left">Aspirin, Atorvastatin, Perindopril</td>
</tr>
<tr>
<td valign="top" align="left">564677</td>
<td valign="top" align="center">231</td>
<td valign="top" align="left">Arteriosclerotic heart disease</td>
<td valign="top" align="left">Atorvastatin, Clopidogrel</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Heterogeneous information network from <xref ref-type="table" rid="T1">Table 1</xref>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcell-09-735687-g0002.tif"/>
</fig>
</sec>
</sec>
<sec id="s4">
<title>4. The Proposed Method</title>
<sec>
<title>4.1. Annotated HIN</title>
<p>As mentioned in section 3, HIN is not suitable for our problem. In order to measure patient similarity, we propose a new graph model-annotated HIN.</p>
<p><bold>Definition 1</bold>. <bold><italic>Annotated Heterogeneous Information Network</italic>.</bold> Annotated HIN is a special heterogeneous information network <italic>G</italic> &#x0003D; (<italic>V, E, C</italic>). In the annotated HIN, there is a set of one or more link types annotated by &#x0003C; <italic>key, value</italic> &#x0003E; pairs. For each &#x0003C; <italic>key, value</italic> &#x0003E; pair, key corresponds to a specific type of object &#x003C8; (<italic>key</italic>) &#x02208; <italic>V</italic>, while value is used to record the number of links.</p>
<p>As above mentioned, we regard the set of &#x0003C; <italic>key, value</italic> &#x0003E; pairs as the annotations of a heterogeneous information network, represented by <italic>C</italic>. The number of key-value pairs in the set is referred to the length of the annotation, which is represented by <italic>L</italic>. Annotations can be added to one or more link types of the classic heterogeneous information network. These annotations can be used to record the source and number of connections and can thus represent more information.</p>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> is a real example diagram of an annotated HIN, and we named it patient-annotated HIN. It can be seen that the connection with &#x0201C;Clopidogrel&#x0201D; has annotation <italic>C</italic><sub><italic>Clopidogrel</italic></sub> &#x0003D; {&#x0003C; 231, 2 &#x0003E;}, and that the annotation length is <italic>L</italic> &#x0003D; 1. Combined with the annotated heterogeneous information network, we can interpret it as follows: Patient 231 was diagnosed with atherosclerotic heart disease in both hospitalizations, and clopidogrel was used in both treatments. Moreover, there is no corresponding record of patient 200 in the note, so it can be concluded that clopidogrel was not used in the treatment of patient 200. In this way, the two problems described in the previous section are solved.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>An annotated HIN from <xref ref-type="table" rid="T1">Table 1</xref>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcell-09-735687-g0003.tif"/>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Schema of patient-annotated HIN.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcell-09-735687-g0004.tif"/>
</fig>
<p>For a given annotated HIN, in order to help readers better understand the object type, link type, and annotation type in the network, we provide its meta-description.</p>
<p><bold>Definition 2</bold>. <bold><italic>AHIN Network Schema</italic>. </bold>The network schema of AHIN is recorded as <italic>SG</italic> &#x0003D; (<italic>A, R, I</italic>). This is a meta template of AHIN <italic>G</italic> &#x0003D; (<italic>V, E, C</italic>). It has object type mapping &#x003C8; (<italic>v</italic>) &#x02208; <italic>A</italic>, relation type mapping &#x003C6; (<italic>e</italic>) &#x02208; <italic>R</italic>, and annotation type mapping &#x003B8;:<italic>C</italic> &#x02192; <italic>I</italic>. It is defined on object type set <italic>A</italic>, relation type set <italic>R</italic>, and annotation type set <italic>I</italic>.</p>
</sec>
<sec>
<title>4.2. Weighted Meta Path and S-PathSim</title>
<p>The weighted meta path, designed to capture complex relationship between two annotated HIN objects, is based on network expansion structure. And the network expansion structure is defined as follows.</p>
<p><bold>Definition 3</bold>. <bold><italic>Network Expansion Structure</italic>. </bold>Network expansion structure <italic>S</italic> is a set of directed weighted graphs, which is defined on an annotated HIN schema <italic>SG</italic> &#x0003D; (<italic>A, R, I</italic>). It expands the annotated heterogeneous information network into an easy-to-process format. Formally, <italic>S</italic> &#x0003D; (<italic>D</italic><sub>1</sub>, <italic>D</italic><sub>2</sub>, &#x02026;, <italic>D</italic><sub><italic>n</italic></sub>), where <italic>D</italic><sub><italic>n</italic></sub> &#x0003D; (<italic>V</italic><sub><italic>n</italic></sub>, <italic>E</italic><sub><italic>n</italic></sub>) is a directed weighted graph with <italic>D</italic><sub><italic>n</italic></sub>, <italic>V</italic><sub><italic>n</italic></sub> being the set of nodes and edges, respectively. For any edge <italic>e</italic> &#x02208; <italic>E</italic><sub><italic>n</italic></sub>, a weight <italic>w</italic>(<italic>e</italic>) is associated, with the default value 1.</p>
<p>Below we use an example to introduce the expansion of the network structure. <xref ref-type="fig" rid="F5">Figure 5A</xref> demonstrates the expansion from a given annotated heterogeneous information network into the network expansion structure. There are annotations {&#x0003C; <italic>P</italic><sub>1</sub>, 2 &#x0003E;, &#x0003C; <italic>P</italic><sub>2</sub>, 3 &#x0003E;}, and {&#x0003C; <italic>P</italic><sub>1</sub>, 3 &#x0003E;} in graph <italic>G</italic>. The key-value pairs &#x0003C; <italic>P</italic><sub>1</sub>, 2 &#x0003E; and &#x0003C; <italic>P</italic><sub>1</sub>, 3 &#x0003E; correspond to the entity <italic>P</italic><sub>1</sub>, so we can get the graph <italic>D</italic><sub>1</sub>, and the corresponding edge weights are 2, 3, respectively. And the key-value pair &#x0003C; <italic>P</italic><sub>2</sub>, 3 &#x0003E; corresponds to the entity <italic>P</italic><sub>2</sub>, so we get the graph <italic>D</italic><sub>2</sub>, and the corresponding edge weight is 3. For the other edges, our default weight is 1.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Network expansion structure and weighted meta path. <bold>(A)</bold> Network expansion structure. <bold>(B)</bold> Weighted meta path.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcell-09-735687-g0005.tif"/>
</fig>
<p>After introducing the network expansion structure, we propose the concept of weighted meta path.</p>
<p><bold>Definition 4</bold>. <bold><italic>Weighted Meta Path</italic>. </bold>Weighted meta path <italic>P</italic> is a path defined on the network schema <italic>SG</italic> &#x0003D; (<italic>A, R, I</italic>), and based on network expansion structure <italic>S</italic> &#x0003D; (<italic>D</italic><sub>1</sub>, <italic>D</italic><sub>2</sub>, &#x02026;, <italic>D</italic><sub><italic>n</italic></sub>). Weighted meta path is denoted in the form of <inline-formula><mml:math id="M1"><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mover class="stackrel"><mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mover><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mover class="stackrel"><mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mover><mml:mo>&#x02026;</mml:mo><mml:mover class="stackrel"><mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mover><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, which defines a composite relation between object <italic>A</italic><sub>1</sub> and <italic>A</italic><sub><italic>l</italic>&#x0002B;1</sub>, where <italic>R</italic><sub><italic>l</italic></sub> represents the relationship between <italic>A</italic><sub>1</sub> and <italic>A</italic><sub><italic>l</italic>&#x0002B;1</sub>, and <italic>w</italic>(<italic>e</italic><sub><italic>l</italic></sub>) represents the weight of the relationship.</p>
<p>Just like the meta path, if the relationship of the weighted meta path <italic>P</italic> is symmetric, then we say that it is symmetric. For a specified weighted meta path, it has a specified template. If there is no multiple relationship between the same object types, we can use the type name to represent the template of the weighted meta path: <italic>P</italic> &#x0003D; (<italic>A</italic><sub>1</sub><italic>A</italic><sub>2</sub>&#x02026;<italic>A</italic><sub><italic>l</italic>&#x0002B;1</sub>). As shown in <xref ref-type="fig" rid="F5">Figure 5B</xref>, <italic>P</italic><sub>1</sub> and <italic>P</italic><sub>2</sub> have the same template <italic>PDMDP</italic>. <italic>P</italic><sub>1</sub> and <italic>P</italic><sub>2</sub> are symmetric weighted meta paths.</p>
<p>When <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, the weighted meta paths <italic>P</italic> &#x0003D; (<italic>A</italic><sub>1</sub><italic>A</italic><sub>2</sub> &#x02026; <italic>A</italic><sub><italic>l</italic>&#x0002B;1</sub>) and <inline-formula><mml:math id="M3"><mml:msup><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02026;</mml:mo><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> are concatenable, so that a new weighted meta path <inline-formula><mml:math id="M4"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02026;</mml:mo><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02026;</mml:mo><mml:msubsup><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is obtained.</p>
<p>For each weighted meta path <italic>P</italic>, there is a score <italic>S</italic>(<italic>P</italic>), and <italic>S</italic>(<italic>P</italic>) is the product of the weights of the relationships in <italic>P</italic>. For example, the weighted meta path <italic>P</italic><sub>1</sub>, <italic>S</italic>(<italic>P</italic><sub>1</sub>) &#x0003D; 1 &#x0002A; 2 &#x0002A; 3 &#x0002A; 1 &#x0003D; 6. In fact, <italic>S</italic>(<italic>P</italic>) represents the weight of the relationship between the first and last objects in the weighted meta path <italic>P</italic>, and can also be understood as the number of connection paths between the two objects. As shown in <xref ref-type="fig" rid="F6">Figure 6</xref>, the weighted meta path <italic>P</italic><sub>3</sub>, <italic>W</italic>({<sub><italic>D</italic><sub>1</sub>, <italic>M</italic><sub>1</sub>}<italic>P</italic>3</sub>) &#x0003D; 2, represents that patient <italic>P</italic><sub>1</sub> has used the drug <italic>M</italic><sub>1</sub> twice because of disease <italic>D</italic><sub>1</sub>. Therefore, <italic>S</italic>(<italic>P</italic><sub>3</sub>) &#x0003D; <italic>W</italic>({<sub><italic>P</italic><sub>1</sub>, <italic>D</italic><sub>1</sub>}<italic>P</italic><sub>3</sub></sub>) &#x0002A; <italic>W</italic>({<sub><italic>D</italic><sub>1</sub>, <italic>M</italic><sub>1</sub>}<italic>P</italic><sub>3</sub></sub>) &#x0003D; 2 can also be obtained, then the number of connection paths between <italic>P</italic><sub>1</sub> and <italic>M</italic><sub>1</sub> is 2. In the same way, <italic>S</italic>(<italic>P</italic><sub>4</sub>) &#x0003D; <italic>W</italic>({<sub><italic>M</italic><sub>1</sub>, <italic>D</italic><sub>1</sub>}<italic>P</italic><sub>4</sub></sub>) &#x0002A; <italic>W</italic>({<sub><italic>D</italic><sub>1</sub>, <italic>P</italic><sub>2</sub>}<italic>P</italic><sub>4</sub></sub>) &#x0003D; 3, then the number of connection paths between patient <italic>P</italic><sub>2</sub> and drug <italic>M</italic><sub>1</sub> is 3. <italic>P</italic><sub>1</sub> can be obtained by concatenating <italic>P</italic><sub>3</sub> and <italic>P</italic><sub>4</sub>, then we can get that the number of connection paths between patient <italic>P</italic><sub>1</sub> and patient <italic>P</italic><sub>2</sub> is <italic>S</italic>(<italic>P</italic><sub>1</sub>) &#x0003D; <italic>S</italic>(<italic>P</italic><sub>3</sub>) &#x0002A; <italic>S</italic>(<italic>P</italic><sub>4</sub>) &#x0003D; 6.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Concatenation of weighted meta path.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcell-09-735687-g0006.tif"/>
</fig>
<p>Based on the annotated HIN and weighted meta path, we propose a new measure, named S-PathSim.</p>
<p><bold>Definition 5</bold>. <bold><italic>S-PathSim</italic>. </bold>Given a symmetric weighted meta path, S-PathSim between two objects of the same type x and y is:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>s</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>S</italic><sub><italic>sum</italic></sub>(<italic>P</italic><sub><italic>x</italic>&#x02192;<italic>y</italic></sub>) is the sum of score of the weighted meta path between <italic>x</italic> and <italic>y</italic>, <italic>S</italic><sub><italic>sum</italic></sub>(<italic>P</italic><sub><italic>x</italic>&#x02192;<italic>x</italic></sub>) is that between <italic>x</italic> and <italic>x</italic>, and <italic>S</italic><sub><italic>sum</italic></sub>(<italic>P</italic><sub><italic>y</italic>&#x02192;<italic>y</italic></sub>) is that between <italic>y</italic> and <italic>y</italic>. If there are two weighted meta-paths <italic>P</italic><sub><italic>a</italic></sub> and <italic>P</italic><sub><italic>b</italic></sub> between <italic>x</italic> and <italic>y</italic>, and <italic>S</italic>(<italic>P</italic><sub><italic>a</italic></sub>) &#x0003D; 4, <italic>S</italic>(<italic>P</italic><sub><italic>b</italic></sub>) &#x0003D; 3, then <italic>S</italic><sub><italic>sum</italic></sub>(<italic>P</italic><sub><italic>x</italic>&#x02192;<italic>y</italic></sub>) &#x0003D; <italic>S</italic>(<italic>P</italic><sub><italic>a</italic></sub>) &#x0002B; <italic>S</italic>(<italic>P</italic><sub><italic>b</italic></sub>) &#x0003D; 7.</p>
<p>Take the patients in <xref ref-type="table" rid="T1">Table 1</xref> as an example, and patient 231 has two admissions. During his first hospitalization, he developed arteriosclerotic heart disease and had some medicine including atorvastatin, bisoprolol, and clopidogrel. Patient 200 also developed arteriosclerotic heart disease and he had the medicine aspirin, atorvastatin, and perindopril. According to these information, we can get an heterogeneous information network G as shown in <xref ref-type="fig" rid="F5">Figure 5A</xref>. According to Definition 5, we can get <italic>S</italic>_<italic>sum</italic>(<italic>patient</italic>231 &#x02192; <italic>patient</italic>200) &#x0003D; 6, <italic>S</italic><sub><italic>sum</italic></sub>(<italic>patient</italic>231 &#x02192; <italic>patient</italic>231) &#x0003D; 22, <italic>S</italic><sub><italic>sum</italic></sub>(<italic>patient</italic>200 &#x02192; <italic>patient</italic>200) &#x0003D; 9, therefore <italic>s</italic>(<italic>patient</italic>231, <italic>patient</italic>200) &#x0003D; 6/11.</p>
<p>As mentioned before, <italic>S</italic>(<italic>P</italic>) can be understood as the number of connecting paths of the first and last two objects in the weighted meta path <italic>P</italic>. If there are more connection paths between two objects, then we can consider them to have a higher similarity. However, the result obtained by using the number of paths as the judgment condition will be biased toward high-visibility objects. Therefore, we use the number of connection paths from two objects to their own as a balance factor. This idea has been applied to PathSim, and we extend it to the annotated HIN here, and propose S-PathSim.</p>
<p>Properties of S-PathSim:</p>
<list list-type="bullet">
<list-item><p>(1) Symmetric: <italic>s</italic>(<italic>x, y</italic>) &#x0003D; <italic>s</italic>(<italic>y, x</italic>). Considering the semantics of <italic>S</italic><sub><italic>sum</italic></sub>(<italic>P</italic><sub><italic>x</italic>&#x02192;<italic>y</italic></sub>), it is easy to understand <italic>S</italic><sub><italic>sum</italic></sub>(<italic>P</italic><sub><italic>x</italic>&#x02192;<italic>y</italic></sub>) &#x0003D; <italic>S</italic><sub><italic>sum</italic></sub>(<italic>P</italic><sub><italic>y</italic>&#x02192;<italic>x</italic></sub>), so <italic>s</italic>(<italic>x, y</italic>) &#x0003D; <italic>s</italic>(<italic>y, x</italic>).</p></list-item>
<list-item><p>(2) Self-maximum: <italic>s</italic>(<italic>x, y</italic>) &#x02208; [0, 1], and <italic>s</italic>(<italic>x, x</italic>) &#x0003D; 1. The weighted meta path template <italic>mn</italic> and <italic>nm</italic> can be concatenated into a new weighted meta path <italic>mnm</italic>.<italic>mnm</italic><sub><italic>i</italic></sub> is the <italic>i</italic>th path of the weighted meta-path template <italic>mnm</italic>, as mentioned before, <italic>S</italic>(<italic>mnm</italic><sub><italic>i</italic></sub>) &#x0003D; <italic>S</italic>(<italic>mn</italic><sub><italic>i</italic></sub>) &#x0002A; <italic>S</italic>(<italic>nm</italic><sub><italic>i</italic></sub>). Assuming that <italic>mn</italic> is the weighted meta path template, the <italic>k</italic>th weighted element path is expressed as <italic>a</italic><sub><italic>k</italic></sub>, and <italic>nm</italic> is the weighted meta path template, and the <italic>k</italic>th weighted element path is expressed as <italic>b</italic><sub><italic>k</italic></sub>, then <inline-formula><mml:math id="M6"><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:munderover><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>*</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>; the same can be obtained as <inline-formula><mml:math id="M7"><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:munderover><mml:mi>S</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula><mml:math id="M8"><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>o</mml:mi></mml:mrow></mml:munderover><mml:mi>S</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>. There must be <italic>p</italic> &#x02264; <italic>q, p</italic> &#x02264; <italic>o</italic>. Then <inline-formula><mml:math id="M9"><mml:mn>2</mml:mn><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:munderover><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>*</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02264;</mml:mo><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:munderover><mml:mi>S</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:munderover><mml:mi>S</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x02264;</mml:mo><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:munderover><mml:mi>S</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mi>S</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, so <italic>S</italic>(<italic>x, y</italic>) &#x02264; 1. And it is easy to understand that <italic>s</italic>(<italic>x, y</italic>) &#x02265; 0, so <italic>s</italic>(<italic>x, y</italic>) &#x02208; [0, 1], <italic>s</italic>(<italic>x, x</italic>) &#x0003D; 1. In the above formula, <italic>p</italic> represents the number of weighted meta path between <italic>x</italic> and <italic>y</italic>, <italic>q</italic> represents the number of weighted meta path between <italic>x</italic> and <italic>x</italic>, and <italic>o</italic> represents the number of weighted meta path between <italic>y</italic> and <italic>y</italic>.</p></list-item>
</list>
</sec>
<sec>
<title>4.3. Temporal Information Encoding</title>
<p>Temporal information is critical to understanding the patients&#x00027; dynamics. However, the AHIN described previously cannot capture the temporal information, so for the problem to be solved in this article, we propose an N-disease method to embed temporal information into the AHIN.</p>
<p>N-disease is inspired by the natural language processing model N-grams. Its basic idea is to arrange the patients&#x00027; diseases set into time series according to the time when they were developed, sequentially collect the N-grams from the disease sequences, and then replace the disease object with the disease N-grams in the annotated HIN. Assuming that P<sub>1</sub> has the diseases [<italic>D</italic><sub>1</sub>, <italic>D</italic><sub>2</sub>, <italic>D</italic><sub>3</sub>] and P<sub>2</sub> has the disease [<italic>D</italic><sub>2</sub>, <italic>D</italic><sub>3</sub>, <italic>D</italic><sub>4</sub>], then the results obtained after the 2-disease operation and the 3-disease operation are shown in <xref ref-type="fig" rid="F7">Figure 7</xref>. In fact, the patient annotation HIN given in <xref ref-type="fig" rid="F4">Figure 4</xref> is essentially the patient annotation HIN after 1-disease operation.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Annotated HIN after <bold>(A)</bold> 2-disease and <bold>(B)</bold> 3-disease operations.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcell-09-735687-g0007.tif"/>
</fig>
<p>It should be noted that as <italic>N</italic> becomes larger and larger, the accuracy of the patient&#x00027;s annotation of diseases and drug connections in the HIN will gradually decrease. As shown in <xref ref-type="fig" rid="F7">Figure 7A</xref>, the node [<italic>D</italic><sub>1</sub>, <italic>D</italic><sub>2</sub>] is connected to the drug; then you do not know whether this drug is used to treat disease <italic>D</italic><sub>1</sub> or disease <italic>D</italic><sub>2</sub>. Fortunately, we can trade off the accuracy and temporal information by changing <italic>N</italic>.</p>
</sec>
<sec>
<title>4.4. MBH and MBHT</title>
<p>Retrieving top-k similar patients of specified patients has practical significance. It allows doctors to analyze similar patients to provide better treatment options. Previously, we have introduced the annotated HIN&#x02013;based measurement method S-PathSim and temporal information embedding method N-disease. In this section, we define two patient similarity search methods, MBH and MBHT, according to the definition introduced earlier.</p>
<p>MBH is a method based on annotated HIN. In detail, first, annotated HIN is constructed using the patient&#x00027;s medical record information. After specifying a patient, S-PathSim is used to calculate the patient similarity and return the top-k similar patient.</p>
<p>MBHT is a method based on annotating HIN and temporal information. The difference between MBHT and MBH is that MBHT needs to construct the annotated HIN processed by the N-disease based on patient&#x00027;s medical record information, and embed the temporal information into the annotated HIN, then use S-PathSim to calculate the patient similarity and return the top-k similar patient.</p>
<p>It is easy to understand that MBHT is the combination of N-disease and MBH. When <italic>N</italic> = 1, MBHT is MBH. MBHT uses the temporal information in the patient&#x00027;s medical records, but it also loses some accuracy, and we need to make a trade-off between timing and accuracy.</p>
</sec>
</sec>
<sec id="s5">
<title>5. Simulation Experiments and Results Analysis</title>
<sec>
<title>5.1. Data Description</title>
<p>We perform experiments on a real dataset, which primarily includes information about the medical treatments and drug details of each person. Each person has multiple records (<italic>n</italic> &#x0003E; 2). Moreover, each record contains a diagnosis (i.e., ICD10) and information about multiple drugs. To improve the experiment quality, we randomly divided the data into four sub-datasets. <xref ref-type="table" rid="T2">Table 2</xref> shows the description of the divided datasets. In addition, we did not perform any other desensitization treatment (such as removing diseases with less than five patients), so our experiment is performed on a real-world dataset without any unjustifiable data manipulations.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Description of datasets.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Sub-dataset</bold></th>
<th valign="top" align="center"><bold>Dataset A</bold></th>
<th valign="top" align="center"><bold>Dataset B</bold></th>
<th valign="top" align="center"><bold>Dataset C</bold></th>
<th valign="top" align="center"><bold>Dataset D</bold></th>
<th valign="top" align="center"><bold>Total</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Number of patient</td>
<td valign="top" align="center">13,461</td>
<td valign="top" align="center">13,461</td>
<td valign="top" align="center">13,461</td>
<td valign="top" align="center">13,460</td>
<td valign="top" align="center">53,853</td>
</tr>
<tr>
<td valign="top" align="left">Disease types</td>
<td valign="top" align="center">946</td>
<td valign="top" align="center">953</td>
<td valign="top" align="center">946</td>
<td valign="top" align="center">943</td>
<td valign="top" align="center">1,928</td>
</tr>
<tr>
<td valign="top" align="left">Drug types</td>
<td valign="top" align="center">1,400</td>
<td valign="top" align="center">1,412</td>
<td valign="top" align="center">1,403</td>
<td valign="top" align="center">1,390</td>
<td valign="top" align="center">2,217</td>
</tr>
<tr>
<td valign="top" align="left">Number of diagnoses per capita</td>
<td valign="top" align="center">6.653</td>
<td valign="top" align="center">6.540</td>
<td valign="top" align="center">6.647</td>
<td valign="top" align="center">6.456</td>
<td valign="top" align="center">6.620</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>5.2. Experimental Settings</title>
<p>In application, comparative analysis is often performed by retrieving top-k similar patients of designated patients to support clinical decision making. In the experiment, we also evaluate the model by retrieving the top-k similar patients of the specified patients. We set <italic>k</italic> = 10. We used two metrics for quantitative evaluation.</p>
<p><italic>nDCG</italic> (normalized Discounted Cumulative Gain, with the value between 0 and 1, the higher the better) Zhang et al. (<xref ref-type="bibr" rid="B31">2020</xref>) is an indicator used to measure the quality of the ranking. The main idea is that the products that the user likes are supposed to be ranked in front of the recommendation list rather than in the back so as to significantly increase the user experience. It is obtained by <italic>DCG</italic> (Discounted Cumulative Gain) normalization, where <italic>rel</italic> is a sorted list, <italic>i</italic> is the position number of the current result, and <italic>IDCG</italic> is the largest <italic>DCG</italic> in the ideal state.</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>D</mml:mi><mml:mi>C</mml:mi><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mfrac><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>i</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E3"><label>(3)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>n</mml:mi><mml:mi>D</mml:mi><mml:mi>C</mml:mi><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>D</mml:mi><mml:mi>C</mml:mi><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>D</mml:mi><mml:mi>C</mml:mi><mml:mi>G</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The <italic>HL</italic> (half-life utility) (Sarwar et al., <xref ref-type="bibr" rid="B16">2001</xref>) index is proposed under the assumption that the probability that the user browses the product and the specific ranking value of the product in the recommendation list decrease exponentially. It measures the practicality of the recommendation system for a user. It is the difference between the user&#x00027;s actual rating and the model rating. So <italic>HL</italic> can also be used to evaluate top-k search results.</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>H</mml:mi><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mfrac><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>d</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Among them, <italic>r</italic><sub><italic>ua</italic></sub> represents the true similarity of patient <italic>u</italic> and patient <italic>a</italic>, <italic>d</italic> is the default score, in the experiment we set <italic>d</italic> to the average similarity, and <italic>l</italic><sub><italic>ua</italic></sub> is the ranking of patient <italic>a</italic> in the recommended list of patient <italic>u</italic>. <italic>h</italic> is the half-life of the system, that is, there is a 50% probability that the user will browse the recommended list position, we set <italic>h</italic> = 3.</p>
<p>In order to verify the effectiveness of the proposed MBH based on S-PathSim, we set up a comparison experiment between MBH and the similarity search method based on PathSim. In addition, in order to explore the effect of N-disease on the results, <italic>N</italic> was set to 1, 2, 3, 4, respectively, and count the results of MBHT for comparative analysis. Finally, we explored the effect of N-disease on algorithm efficiency. The experimental environment is as follows: INTELCorei5 CPU, 2.80 GHz; 4G memory.</p>
</sec>
<sec>
<title>5.3. Comparison of Patient Similarity Search Method</title>
<p>This article proposes annotated HIN and S-PathSim, and defines MBH, a patient similarity search method based on the annotated HIN and S-PathSim. PathSim is an excellent object similarity measurement method based on HIN. PathSim can be used to retrieve the similarity of patients. Here, we compare MBH with PathSim-based methods to verify the effectiveness of MBH:</p>
<list list-type="order">
<list-item><p>MBH: Map the patient information to the annotated HIN, the schema is shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, through the weighted meta path as shown in <xref ref-type="fig" rid="F5">Figure 5B</xref>; the S-PathSim is used to measure the similarity of patients, and get the top-k similar patients of the specified patients.</p></list-item>
<list-item><p>Baseline: Map patient information into HIN. The schema is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. The meta path used is (<italic>PDMDP</italic>). PathSim is used to calculate the patient similarity, and the top-k search result of the specified patient is obtained.</p></list-item>
</list>
<p>It is worth mentioning that the above steps are run simultaneously in 4 sets of datasets, effectively avoiding accidental.</p>
<p><xref ref-type="fig" rid="F8">Figure 8</xref> shows the experimental results of the two models on 4 sets of datasets. <xref ref-type="fig" rid="F8">Figure 8A</xref> uses <italic>nDCG</italic> as the evaluation criterion, and it can be observed that MBH is superior to baseline on four datasets. <xref ref-type="fig" rid="F8">Figure 8B</xref> uses <italic>HL</italic> as the evaluation criterion, which proves that MBH has better practicability than baseline.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Comparison result of MBH (method based on annotated HIN) and baseline. <bold>(A)</bold> uses nDCG as the evaluation criterion and <bold>(B)</bold> uses HL as the evaluation criterion.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcell-09-735687-g0008.tif"/>
</fig>
</sec>
<sec>
<title>5.4. The Impact of N-Disease</title>
<p>We propose N-disease to embed temporal information into annotated HIN, and the difference between MBH and MBHT is whether N-disease is used or not. In this section, we explore the comparison results of MBH and MBHT, and the effect of N-disease on MBHT. We set <italic>N</italic> to 1, 2, 3, and 4, respectively. When <italic>N</italic> = 1, the annotated HIN does not contain temporal information, and MBHT is MBH. When <italic>N</italic> = 4, annotated HIN contains the largest amount of temporal information. However, after a threshold, with the increase of <italic>N</italic>, the annotated HIN captures increasingly more temporal information while its patient similarity search performance decreases steadily. We should carefully choose the threshold for <italic>N</italic> to obtain the best results.</p>
<p>The experimental results are shown in <xref ref-type="table" rid="T3">Table 3</xref>. In datasets A, C, and D, MBHT has the best results when <italic>N</italic> = 2; in dataset B, MBHT achieved the best results when <italic>N</italic> = 3. Among the average values of the 4 datasets, <italic>N</italic> = 2 makes MBHT achieve the best results. In general, <italic>N</italic> = 2 can achieve the best results of MBHT, and <italic>N</italic> = 2 can balance the time-consuming and accuracy of annotated HIN. At the same time, the experimental results also show that MBHT is better than MBH.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>The effect of N-disease measured by <italic>nDCG</italic> (normalized Discounted Cumulative Gain).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Dataset</bold></th>
<th valign="top" align="center"><bold><italic>N</italic> &#x0003D; 1</bold></th>
<th valign="top" align="center"><bold><italic>N</italic> &#x0003D; 2</bold></th>
<th valign="top" align="center"><bold><italic>N</italic> &#x0003D; 3</bold></th>
<th valign="top" align="center"><bold><italic>N</italic> &#x0003D; 4</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Dataset A</td>
<td valign="top" align="center">0.696</td>
<td valign="top" align="center"><bold>0.704</bold></td>
<td valign="top" align="center">0.695</td>
<td valign="top" align="center">0.694</td>
</tr>
<tr>
<td valign="top" align="left">Dataset B</td>
<td valign="top" align="center">0.855</td>
<td valign="top" align="center"><bold>0.859</bold></td>
<td valign="top" align="center">0.807</td>
<td valign="top" align="center">0.761</td>
</tr>
<tr>
<td valign="top" align="left">Dataset C</td>
<td valign="top" align="center">0.778</td>
<td valign="top" align="center">0.767</td>
<td valign="top" align="center"><bold>0.800</bold></td>
<td valign="top" align="center">0.791</td>
</tr>
<tr>
<td valign="top" align="left">Dataset D</td>
<td valign="top" align="center">0.825</td>
<td valign="top" align="center"><bold>0.837</bold></td>
<td valign="top" align="center">0.799</td>
<td valign="top" align="center">0.787</td>
</tr> <tr style="border-top: thin solid #000000;">
<td valign="top" align="left">Mean</td>
<td valign="top" align="center">0.788</td>
<td valign="top" align="center"><bold>0.791</bold></td>
<td valign="top" align="center">0.775</td>
<td valign="top" align="center">0.758</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Note: The bold values are the best results</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>In the following, we explore the effect of N-disease on MBHT efficiency. We assume that when N-disease method is not used (i.e., <italic>N</italic> = 1), the running time of the program is unit 1. The experimental results are as follows.</p>
<p>It can be seen from <xref ref-type="table" rid="T4">Table 4</xref> that the efficiency of the algorithm is improved by using N-disease; especially when <italic>N</italic> = 2, the algorithm has the highest efficiency. The use of N-disease changes the number of annotated HIN nodes and the relationship between the nodes, which in turn changes the efficiency of the algorithm. Since N-disease will affect the efficiency of MBHT, this paper gives an explanation from a practical point of view.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>The effect of N-disease on efficiency.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Dataset</bold></th>
<th valign="top" align="center"><bold><italic>N</italic> &#x0003D; 1</bold></th>
<th valign="top" align="center"><bold><italic>N</italic> &#x0003D; 2</bold></th>
<th valign="top" align="center"><bold><italic>N</italic> &#x0003D; 3</bold></th>
<th valign="top" align="center"><bold><italic>N</italic> &#x0003D; 4</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Dataset A</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0.918</td>
<td valign="top" align="center">0.978</td>
<td valign="top" align="center">0.972</td>
</tr>
<tr>
<td valign="top" align="left">Dataset B</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0.828</td>
<td valign="top" align="center">0.902</td>
<td valign="top" align="center">0.893</td>
</tr>
<tr>
<td valign="top" align="left">Dataset C</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0.767</td>
<td valign="top" align="center">0.957</td>
<td valign="top" align="center">0.893</td>
</tr>
<tr>
<td valign="top" align="left">Dataset D</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0.893</td>
<td valign="top" align="center">0.929</td>
<td valign="top" align="center">0.954</td>
</tr> <tr style="border-top: thin solid #000000;">
<td valign="top" align="left">Mean</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0.883</td>
<td valign="top" align="center">0.941</td>
<td valign="top" align="center">0.928</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>When the program is implemented, we divide MBHT into two steps. The first step is data statistics, and the second step is S-PathSim calculation. The use of MBHT has more data statistics steps than the use of MBH alone, but we know from practice that the time consumed by the data statistics step is quite small and can even be ignored. When we calculate S-PathSim, we use a lot of multiplication, which takes most of the total running time. We found that when <italic>N</italic> = 2, the number of multiplication operations is significantly smaller than when MBH is used alone. This explains why the running time of the program when <italic>N</italic> = 2 is shorter than that when using MBH alone.</p>
<p>In short, we conclude that when <italic>N</italic> =2, annotated HIN achieves a balance between time consuming and accuracy, and can effectively improve the efficiency of the algorithm.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusion</title>
<p>In this paper, a new method of patient similarity calculation is proposed that uses the disease and drug data of patients, and further uses the annotated HIN proposed in this paper to create a model. The heterogeneous network adds the annotation of patient information to the connecting links between diseases and drugs, which solves the problem of the classic HIN in losing the information regarding these associations. At the same time, based on the annotated HIN, we propose S-PathSim to measure patient similarity. Furthermore, N-disease is proposed to encode temporal information into the annotated HIN. Our measurement does not rely on high-dimensional and sparse vectors, and effectively captures the patient&#x00027;s medical events and the temporal information in EHRs. Finally, based on S-PathSim and N-disease, two patient similarity search methods, MBH and MBHT, are proposed. The experimental results show that the method proposed in this paper is superior to competitive baseline method.</p>
</sec>
<sec sec-type="data-availability" id="s7">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>H-zH designed and conducted the experiments and finished this paper writing. X-dL determined the technical route and methods. WG devised the assessment method for the methods. X-bJ did some research on the related work. Z-mY discussed the methods and ideas. S-pW discussed the methods and ideas and submitted the paper. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>This work was supported by the National Key Research and Development Project of China (No. 2019YFB1705904), National Nature Science Foundation of China (Nos.91846205, 61772316, 61907026), Innovation Methods Work Special Project (No. 2020IM020100), Science and Technology Development Plan Project of Shandong Province (No. 2019JZZY020505), Key Research &#x00026; development Project of Shandong Province (No. 2019GGX101009), Shandong-Chongqing Technological Collaboration Plan (No. cstc2020jscx-lyjsAX0010), and Project of Shandong Province Higher Educational Science and Technology Program (No. J18KA392).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec> </body>
<back>

<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barkhordari</surname> <given-names>M.</given-names></name> <name><surname>Niamanesh</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>ScaDiPaSi: an effective scalable and distributable mapreduce-based method to find patient similarity on huge healthcare networks</article-title>. <source>Big Data Res</source>. <volume>2</volume>, <fpage>19</fpage>&#x02013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1016/j.bdr.2015.02.004</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chan</surname> <given-names>L. W.-C.</given-names></name> <name><surname>Chan</surname> <given-names>T.-K.</given-names></name> <name><surname>Cheng</surname> <given-names>L.</given-names></name> <name><surname>Mak</surname> <given-names>W.</given-names></name></person-group> (<year>2010</year>). <article-title>Machine learning of patient similarity: a case study on predicting survival in cancer patient after locoregional chemotherapy,</article-title> in <source>IEEE International Conference on Bioinformatics and Biomedicine Workshops</source> (<publisher-loc>HongKong</publisher-loc>), <fpage>467</fpage>&#x02013;<lpage>470</lpage>. <pub-id pub-id-type="doi">10.1109/BIBMW.2010.5703846</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Che</surname> <given-names>C.</given-names></name> <name><surname>Xiao</surname> <given-names>C.</given-names></name> <name><surname>Liang</surname> <given-names>J.</given-names></name> <name><surname>Jin</surname> <given-names>B.</given-names></name> <name><surname>Zho</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name></person-group> (<year>2017</year>). <article-title>An RNN architecture with dynamic temporal matching for personalized predictions of Parkinson&#x00027;s disease,</article-title> in <source>Proceedings of the 2017 SIAM International Conference on Data Mining (SDM)</source> (<publisher-loc>Houston, TX</publisher-loc>), <fpage>198</fpage>&#x02013;<lpage>206</lpage>. <pub-id pub-id-type="doi">10.1137/1.9781611974973.23</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>R.</given-names></name> <name><surname>Huang</surname> <given-names>Z.</given-names></name> <name><surname>Zheng</surname> <given-names>Y.</given-names></name> <name><surname>Yan</surname> <given-names>J.</given-names></name> <name><surname>Wong</surname> <given-names>K. Y.</given-names></name> <name><surname>Ng</surname> <given-names>E.</given-names></name></person-group> (<year>2017</year>). <article-title>Meta structure: computing relevance in large heterogeneous information networks,</article-title> in <source>Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data</source> (<publisher-loc>Beijing</publisher-loc>), <fpage>3</fpage>&#x02013;<lpage>7</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Zhang</surname> <given-names>P.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Risk prediction with electronic healthrecords: a deep learning approach,</article-title> in <source>Proceedings of the 2016 SIAM International Conference on Data Mining</source> (<publisher-loc>Miami, FL</publisher-loc>), <fpage>432</fpage>&#x02013;<lpage>440</lpage>. <pub-id pub-id-type="doi">10.1137/1.9781611974348.49</pub-id><pub-id pub-id-type="pmid">34342588</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gottlieb</surname> <given-names>A.</given-names></name> <name><surname>Stein</surname> <given-names>G. Y.</given-names></name> <name><surname>Ruppin</surname> <given-names>E.</given-names></name> <name><surname>Altman</surname> <given-names>R. B.</given-names></name> <name><surname>Sharan</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <article-title>A method for inferring medical diagnoses from patient similarities</article-title>. <source>BMC Med</source>. <volume>11</volume>:<fpage>194</fpage>. <pub-id pub-id-type="doi">10.1186/1741-7015-11-194</pub-id><pub-id pub-id-type="pmid">24004670</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>J.</given-names></name> <name><surname>Bailey</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Exploiting transitive similarity andtemporal dynamics for similarity search in heterogeneous information net-works,</article-title> in <source>International Conference on Database Systems for Advanced Applications</source> (<publisher-loc>Bali</publisher-loc>), <fpage>141</fpage>&#x02013;<lpage>155</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-05813-9_10</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>L.</given-names></name> <name><surname>Gong</surname> <given-names>Y.</given-names></name> <name><surname>Xing</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Semantic representation with heterogeneous information network using matrix factorization for clustering in the internet of things</article-title>. <source>IEEE Access</source> <volume>7</volume>, <fpage>31233</fpage>&#x02013;<lpage>31242</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2903310</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Huai</surname> <given-names>M.</given-names></name> <name><surname>Miao</surname> <given-names>C.</given-names></name> <name><surname>Suo</surname> <given-names>Q.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name></person-group> (<year>2018</year>). <article-title>Uncorrelated patient similarity learning,</article-title> in <source>Proceedings of the 2018 SIAM International Conference on Data Mining</source> (<publisher-loc>San Diego, CA</publisher-loc>), <fpage>270</fpage>&#x02013;<lpage>278</lpage>. <pub-id pub-id-type="doi">10.1137/1.9781611975321.31</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kasabov</surname> <given-names>N.</given-names></name> <name><surname>Hu</surname> <given-names>Y.</given-names></name></person-group> (<year>2010</year>). <article-title>Integrated optimization method for personalised modelling and case studies for medical decision support</article-title>. <source>Int. J. Funct. Inform. Pers. Med</source>. <volume>3</volume>, <fpage>236</fpage>&#x02013;<lpage>236</lpage>. <pub-id pub-id-type="doi">10.1504/IJFIPM.2010.039123</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>J.</given-names></name> <name><surname>Maslove</surname> <given-names>D. M.</given-names></name> <name><surname>Dubin</surname> <given-names>J. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Personalized mortality prediction driven by electronic medical data and a patient similarity metric</article-title>. <source>PLoS ONE</source> <volume>10</volume>:<fpage>e0127428</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0127428</pub-id><pub-id pub-id-type="pmid">25978419</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name> <name><surname>Xiong</surname> <given-names>Y.</given-names></name> <name><surname>Zheng</surname> <given-names>G.</given-names></name></person-group> (<year>2014</year>). <article-title>An efficient drug-target interaction mining algorithm in heterogeneous biological networks,</article-title> in <source>Pacific-Asia Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>Tainan; Taiwan</publisher-loc>), <fpage>65</fpage>&#x02013;<lpage>76</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-13186-3_7</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>L.</given-names></name> <name><surname>Cheng</surname> <given-names>W.-Y.</given-names></name> <name><surname>Glicksberg</surname> <given-names>B. S.</given-names></name> <name><surname>Gottesman</surname> <given-names>O.</given-names></name> <name><surname>Tamler</surname> <given-names>R.</given-names></name> <name><surname>Chen</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Identification of type 2 diabetes subgroups through topological analysis of patient similarity</article-title>. <source>Sci. Transl. Med</source>. <volume>7</volume>:<fpage>311r</fpage>a174. <pub-id pub-id-type="doi">10.1126/scitranslmed.aaa9364</pub-id><pub-id pub-id-type="pmid">26511511</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Meng</surname> <given-names>X.</given-names></name> <name><surname>Shi</surname> <given-names>C.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Wu</surname> <given-names>B.</given-names></name></person-group> (<year>2014</year>). <article-title>Relevance measure in large scale heteroge-neous networks,</article-title> in <source>Asia-Pacific Web Conference</source> (<publisher-loc>Changsha</publisher-loc>), <fpage>636</fpage>&#x02013;<lpage>643</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-11116-2_61</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ng</surname> <given-names>K.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name></person-group> (<year>2015</year>). <article-title>Personalized predictive modeling and risk factor identification using patient similarity</article-title>. <source>AMIA Summits Transl. Sci. Proc</source>. <volume>2015</volume>, <fpage>132</fpage>&#x02013;<lpage>136</lpage>. <pub-id pub-id-type="pmid">26306255</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sarwar</surname> <given-names>B.</given-names></name> <name><surname>Karypis</surname> <given-names>G.</given-names></name> <name><surname>Konstan</surname> <given-names>J.</given-names></name> <name><surname>Riedl</surname> <given-names>J.</given-names></name></person-group> (<year>2001</year>). <article-title>Item-based collaborative filtering recommendation algorithms,</article-title> in <source>Proceedings of the 10th international conference on World Wide Web</source> (<publisher-loc>Hong Kong</publisher-loc>), <fpage>285</fpage>&#x02013;<lpage>295</lpage>. <pub-id pub-id-type="doi">10.1145/371920.372071</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sha</surname> <given-names>Y.</given-names></name> <name><surname>Venugopalan</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>M. D.</given-names></name></person-group> (<year>2016</year>). <article-title>A novel temporal similarity measure for patients based on irregularly measured data in electronic health records,</article-title> in <source>Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics</source> (<publisher-loc>Seattle, WA</publisher-loc>), <fpage>337</fpage>&#x02013;<lpage>344</lpage>. <pub-id pub-id-type="doi">10.1145/2975167.2975202</pub-id><pub-id pub-id-type="pmid">32577627</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sharafoddini</surname> <given-names>A.</given-names></name> <name><surname>Dubin</surname> <given-names>J. A.</given-names></name> <name><surname>Lee</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Patient similarity in prediction models based on health data: a scoping review</article-title>. <source>JMIR Med. Inform</source>. <volume>5</volume>:<fpage>e7</fpage>. <pub-id pub-id-type="doi">10.2196/medinform.6730</pub-id><pub-id pub-id-type="pmid">28258046</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shi</surname> <given-names>C.</given-names></name> <name><surname>Kong</surname> <given-names>X.</given-names></name> <name><surname>Huang</surname> <given-names>Y.</given-names></name> <name><surname>Yu</surname> <given-names>P. S.</given-names></name> <name><surname>Wu</surname> <given-names>B.</given-names></name></person-group> (<year>2014</year>). <article-title>HeteSim: a general framework for relevancemeasure in heterogeneous networks</article-title>. <source>IEEE Trans. Knowledge Data Eng</source>. <volume>26</volume>, <fpage>2479</fpage>&#x02013;<lpage>2492</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2013.2297920</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Edabollahi</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>). <article-title>Supervised patient similarity measure of heterogeneous patient records</article-title>. <source>ACM SIGKDD Explorat. Newslett</source>. <volume>14</volume>, <fpage>16</fpage>&#x02013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1145/2408736.2408740</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Han</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Rankclus: integrating clustering with ranking for heterogeneous information network analysis,</article-title> in <source>Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology</source> (<publisher-loc>Saint Petersburg</publisher-loc>), <fpage>565</fpage>&#x02013;<lpage>576</lpage>. <pub-id pub-id-type="doi">10.1145/1516360.1516426</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Han</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Pathsim: meta path-based top-k similarity search in heterogeneous information networks,</article-title> in <source>Proceedings of the VLDB Endowment</source> (<publisher-loc>Seattle, WA</publisher-loc>), <fpage>992</fpage>&#x02013;<lpage>1003</lpage>. <pub-id pub-id-type="doi">10.14778/3402707.3402736</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Suo</surname> <given-names>Q.</given-names></name> <name><surname>Zhong</surname> <given-names>W.</given-names></name> <name><surname>Ma</surname> <given-names>F.</given-names></name> <name><surname>Ye</surname> <given-names>Y.</given-names></name> <name><surname>Huai</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Multi-task sparse metric learning for monitoring patient similarity progression,</article-title> in <source>2018 IEEE International Conference on Data Mining (ICDM)</source> (<publisher-loc>Singapore</publisher-loc>), <fpage>477</fpage>&#x02013;<lpage>486</lpage>. <pub-id pub-id-type="doi">10.1109/ICDM.2018.00063</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tsevas</surname> <given-names>S.</given-names></name> <name><surname>Iakovidis</surname> <given-names>D. K.</given-names></name></person-group> (<year>2011</year>). <article-title>Fusion of multimodal temporal clinical data for the retrieval of similar patient cases,</article-title> in <source>10th International Workshop on Biomedical Engineering</source> (<publisher-loc>Kos</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1109/IWBE.2011.6079049</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2012a</year>). <article-title>Medical prognosis based on patient similarity and expert feedback,</article-title> in <source>Proceedings of the 21st International Conference on Pattern Recognition</source> (<publisher-loc>Tsukuba</publisher-loc>), <fpage>1799</fpage>&#x02013;<lpage>1802</lpage>.</citation>
</ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Lee</surname> <given-names>N.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name> <name><surname>Ebadollahi</surname> <given-names>S.</given-names></name></person-group> (<year>2012b</year>). <article-title>Towards heterogeneous temporal clinical event pattern discovery: a convolutional approach,</article-title> in <source>Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> (<publisher-loc>Beijing</publisher-loc>), <fpage>453</fpage>&#x02013;<lpage>461</lpage>. <pub-id pub-id-type="doi">10.1145/2339530.2339605</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Psf: A unedified patient similarity evaluation framework through metric learning with weak supervision</article-title>. <source>IEEE J. Biomed. Health Inform</source>. <volume>19</volume>, <fpage>1053</fpage>&#x02013;<lpage>1060</lpage>. <pub-id pub-id-type="doi">10.1109/JBHI.2015.2425365</pub-id><pub-id pub-id-type="pmid">25910264</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>R.</given-names></name> <name><surname>Ma</surname> <given-names>X.</given-names></name> <name><surname>Jiang</surname> <given-names>C.</given-names></name> <name><surname>Ye</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Heterogeneous information network-based music recommendation system in mobile networks</article-title>. <source>Comput. Commun</source>. <volume>150</volume>, <fpage>429</fpage>&#x02013;<lpage>437</lpage>. <pub-id pub-id-type="doi">10.1016/j.comcom.2019.12.002</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Xia</surname> <given-names>E.</given-names></name> <name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Yu</surname> <given-names>Y.</given-names></name> <name><surname>Mei</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>A data-driven clinical decision support system for acute coronary syndrome patient similarity,</article-title> in <source>2019 IEEE International Conference on Healthcare Informatics</source> (<publisher-loc>Xi&#x00027;an</publisher-loc>), <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/ICHI.2019.8904614</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhan</surname> <given-names>M.</given-names></name> <name><surname>Cao</surname> <given-names>S.</given-names></name> <name><surname>Qian</surname> <given-names>B.</given-names></name> <name><surname>Chang</surname> <given-names>S.</given-names></name> <name><surname>Wei</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Low-rank sparse feature selection for patient similarity learning,</article-title> in <source>2016 IEEE 16th International Conference on Data Mining (ICDM)</source> (<publisher-loc>Barcelona</publisher-loc>), <fpage>1335</fpage>&#x02013;<lpage>1340</lpage>. <pub-id pub-id-type="doi">10.1109/ICDM.2016.0182</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>G.</given-names></name> <name><surname>Yu</surname> <given-names>B.</given-names></name> <name><surname>Xie</surname> <given-names>Y.</given-names></name> <name><surname>Pan</surname> <given-names>K.</given-names></name></person-group> (<year>2020</year>). <article-title>Proximity-aware heterogeneous information network embedding</article-title>. <source>Knowledge Based Syst</source>. <volume>193</volume>:<fpage>105468</fpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2019.105468</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>P.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Sorrentino</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Towards personalized medicine: leveraging patient similarity and drug similarity analytics</article-title>. <source>AMIA. Jt Summits. Transl. Sci. Proc</source>. <volume>2014</volume>, <fpage>132</fpage>&#x02013;<lpage>136</lpage>. <pub-id pub-id-type="pmid">25717413</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>Z.</given-names></name> <name><surname>Yin</surname> <given-names>C.</given-names></name> <name><surname>Qian</surname> <given-names>B.</given-names></name> <name><surname>Cheng</surname> <given-names>Y.</given-names></name> <name><surname>Wei</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name></person-group> (<year>2016</year>). <article-title>Measuring patient similarities via a deep architecture with medical concept embedding,</article-title> in <source>2016 IEEE 16th Intermnational Conference on Data Mining</source> (<publisher-loc>Barcelona</publisher-loc>), <fpage>749</fpage>&#x02013;<lpage>758</lpage>. <pub-id pub-id-type="doi">10.1109/ICDM.2016.0086</pub-id></citation>
</ref>
</ref-list>

</back>
</article>