<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Med.</journal-id>
<journal-title>Frontiers in Medicine</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Med.</abbrev-journal-title>
<issn pub-type="epub">2296-858X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmed.2023.1237616</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Medicine</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Automating venous thromboembolism risk assessment: a dual-branch deep learning method using electronic medical records</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Yang</surname> <given-names>Jianhua</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2313772/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>He</surname> <given-names>Jianfeng</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1865618/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Zhang</surname> <given-names>Hongjiang</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2396775/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Faculty of Information Engineering and Automation, Kunming University of Science and Technology</institution>, <addr-line>Kunming</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>First People&#x00027;s Hospital of Anning City (Jinfang Branch)</institution>, <addr-line>Anning</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Bing Yang, Tianjin Medical University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Hairui Wang, China Medical University, China; Peng Zhang, Chinese Academy of Sciences, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Jianfeng He <email>jfenghe&#x00040;kmust.edu.cn</email></corresp>
<corresp id="c002">Hongjiang Zhang <email>m18988283534&#x00040;163.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>10</day>
<month>08</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>10</volume>
<elocation-id>1237616</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>06</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>07</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Yang, He and Zhang.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Yang, He and Zhang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<sec>
<title>Background</title>
<p>Venous thromboembolism (VTE) is a prevalent cardiovascular disease. Although risk assessment and preventive measures are effective, manual assessment is inefficient and covers a small population in clinical practice. Hence, it is necessary to explore intelligent methods for VTE risk assessment.</p>
</sec>
<sec>
<title>Methods</title>
<p>The Padua scale has been widely used in VTE risk assessment, and we divided its assessment into disease category judgment and comprehensive clinical information judgment according to the characteristics of the Padua scale. We proposed a dual-branch deep learning (DB-DL) assessment method. First, in the disease category branch, we propose a deep learning-based Padua disease classification model (PDCM) for determining patients&#x00027; Padua disease categories by considering patients&#x00027; diagnosis, symptoms, and symptom weights. In the branch of comprehensive clinical information, we use the Chinese lexical analysis (LAC) word separation technique, combined with professional corpus and rules, to extract and judge the comprehensive clinical factors in the electronic medical record (EMR).</p>
</sec>
<sec>
<title>Results</title>
<p>We validated the accuracy of the method with the Padua assessment results of 7,690 Chinese clinical EMRs. First, our proposed method allows for a fully automated assessment, and the average time to assess one patient is only 0.37 s. Compared to the gold standard, our method has an Area Under Curve (AUC) value of 0.883, a specificity value of 0.957, and a sensitivity value of 0.816 for assessing the Padua risk patient class.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>Our DB-DL assessment method automates VTE risk assessment, thereby addressing the challenges of time-consuming evaluation and limited population coverage. Thus, this method is highly clinically valuable.</p>
</sec></abstract>
<kwd-group>
<kwd>venous thromboembolism</kwd>
<kwd>deep learning</kwd>
<kwd>electronic medical record</kwd>
<kwd>intelligent assessment</kwd>
<kwd>Padua</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="7"/>
<equation-count count="5"/>
<ref-count count="45"/>
<page-count count="13"/>
<word-count count="9290"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Intensive Care Medicine and Anesthesiology</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Venous thromboembolism (VTE) is a disease with high morbidity and a high risk of death (<xref ref-type="bibr" rid="B1">1</xref>). Reportedly, there are &#x0007E;10 million cases of VTE annually worldwide (<xref ref-type="bibr" rid="B2">2</xref>), and the incidence of VTE can be as high as 0.2% per year (<xref ref-type="bibr" rid="B3">3</xref>). VTE has become the third leading cause of cardiovascular disease-related death (<xref ref-type="bibr" rid="B4">4</xref>). Effective prevention of VTE can significantly reduce its incidence, and VTE risk assessment plays a crucial role in clinical practice (<xref ref-type="bibr" rid="B5">5</xref>). However, only a small proportion of patients currently receive VTE prophylaxis in China (<xref ref-type="bibr" rid="B6">6</xref>). Therefore, the prevention and treatment of VTE are quite important in medical practice.</p>
<p>The current approach to VTE prevention is to assess the patient&#x00027;s risk level for the disease through scales and to take different preventive approaches according to the different risk levels (<xref ref-type="bibr" rid="B7">7</xref>). Common risk assessment scales include Padua, Caprini, and Wells. Authorities such as the American College of Chest Physicians recommend the Padua scale as a risk assessment method for VTE (<xref ref-type="bibr" rid="B8">8</xref>, <xref ref-type="bibr" rid="B9">9</xref>). The Padua Risk Assessment Scale is designed to assess a patient&#x00027;s disease category and their combined clinical status, which incorporates elements such as medication use, height and weight, and surgical status. The Padua Scale utilizes a linear weighting method to obtain risk assessment scores and risk levels. The Padua scale is shown in <xref ref-type="table" rid="T1">Table 1</xref>. Compared with other scales, the Padua scale is highly accurate, relatively easy to judge, and widely applicable. However, the current process of intelligently assessing the Padua risk by doctors is time-consuming. In addition, doctors may overlook the association between the patient&#x00027;s disease and thrombosis, which may lead to risk assessment in which some risk factors may be neglected, thereby leaving the patient without the correct prevention protocol (<xref ref-type="bibr" rid="B10">10</xref>). Furthermore, assessment results can be heterogeneous due to differences in the doctors&#x00027; understanding of the disease. Therefore, it is important to explore an intelligent and efficient automated assessment method for determining the Padua scale to prevent VTE.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Padua scale.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Risk factors</bold></th>
<th valign="top" align="center"><bold>Score</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Prior VTE</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Active cancer</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Heart/respiratory failure</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">Acute MI/ischemic stroke</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">Acute infection/rheumatologic disorder</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">Elderly age (&#x02265;70 years)</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">BMI &#x02265;30 kg/m<sup>2</sup></td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">Ongoing hormonal treatment</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">Thrombophilic</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Reduced mobility</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Recent (&#x02264;1 month of) trauma and/or surgery</td>
<td valign="top" align="center">2</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Artificial intelligence can learn and extract key features from medical data to automate the analysis and processing of medical data (<xref ref-type="bibr" rid="B11">11</xref>). Some research has been carried out on VTE assessment using artificial intelligence techniques. However, most studies have focused on exploring risk factors for VTE and constructing various assessment scales to enhance accuracy in different patient populations (<xref ref-type="bibr" rid="B12">12</xref>&#x02013;<xref ref-type="bibr" rid="B15">15</xref>). Few studies have proposed automatic risk assessment methods for VTE based on causative factors. For example, Pierre et al. (<xref ref-type="bibr" rid="B10">10</xref>) used International Classification of Diseases, 9th Revision (ICD-9) to match billing codes in a data warehouse for the purpose of automatically assessing the Padua scale. This approach requires building a complete enterprise data warehouse (EDW) and transforming electronic medical record (EMR) text into structured data to determine risk factors. Similarly, Qatawneh et al. (<xref ref-type="bibr" rid="B16">16</xref>) transformed the 35 assessment items of the Caprini scale into numerical variables and input them into a multilayer perceptron (MLP) to achieve an automatic assessment of patients&#x00027; VTE risk. Chen et al. (<xref ref-type="bibr" rid="B17">17</xref>) designed medical text annotation for the scale items of the Wells scale and automatically assessed the Wells scale by extracting entities and relationships. Nonetheless, current research has mostly focused on exploring risk factors for constructing different assessment scales to improve accuracy in different patient populations. Although several studies have explored methods for automated VTE risk assessment, these methods have limitations, such as requiring significant manual intervention and time investment, focusing only on patient diagnosis in terms of disease category assessment, and ignoring the potential influence of clinical patient symptoms on assessment results. Few studies have examined the intelligent use of the Padua scale assessment, a widely used method in clinical internal medicine. The purpose of this study is to explore a whole-process intelligent risk assessment method based on deep learning for the Padua scale and to improve the validity and accuracy of the intelligent assessment method for practical application in VTE prevention. The study will provide a new direction for the application of deep learning technology in clinical research.</p>
<p>First, we used each patient&#x00027;s EMR as the assessment target. EMR text is an important text resource that includes a variety of information about a patient&#x00027;s medical process and is widely used at all levels of care (<xref ref-type="bibr" rid="B18">18</xref>). We can extract the factors associated with the Padua scale from the EMR and use natural language processing (NLP) techniques in deep learning for automated risk assessment.</p>
<p>Second, we divided the assessment of the Padua scale into two branches according to its characteristics: the disease category branch (Branch A) and the clinical comprehensive factor (Branch B). In Branch A, we proposed the Padua disease classification model (PDCM), extracted features of diagnostic and symptom texts by NLP techniques (where the information of symptoms comes from Branch B), and designed algorithms to calculate a symptom weight matrix (SWM) of different importance to increase feature information. Then, the above information is fused to determine the patient&#x00027;s disease category using a deep learning model. In Branch B, we used Chinese lexical analysis (LAC) (<xref ref-type="bibr" rid="B19">19</xref>) technology, combined with professional corpus and rules, to automatically extract and judge the comprehensive clinical factors, such as patient symptoms, surgery, medication, and activity status of the EMR, thereby solving the problem that the Padua-related factors needed to be extracted manually. In summary, this study aims to achieve complete automation of the Padua scale assessment process by constructing a two-branch method. This approach not only saves time and reduces labor costs but also considers the influence of patient symptoms when assessing disease categories, thereby improving the accuracy and precision of intelligent assessment.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and methods</title>
<sec>
<title>Data</title>
<p>In this study, we utilized data from International Classification of Diseases, 10th Revision (ICD-10) (<xref ref-type="bibr" rid="B20">20</xref>), ICD-9-CM3, DiseaseKG (<xref ref-type="bibr" rid="B21">21</xref>), and the World Health Organization&#x00027;s Drug List (<xref ref-type="bibr" rid="B22">22</xref>) to construct the proposed dual-branch deep learning (DB-DL) assessment method. For the evaluation and testing of our method, we used a dataset of EMR data from independent hospitals, which served as our gold standard. We split these data into three distinct categories: training data for the PDCM, medical corpus data, and EMR test data. These categories are elaborated further in the following sections.</p>
<sec>
<title>Padua disease classification model training data (PDCM training data)</title>
<p>ICD-10 (<xref ref-type="bibr" rid="B20">20</xref>) is a library of medical terminology and corresponding codes developed by the World Health Organization, providing an authoritative and widely used classification and coding system for the medical profession. ICD-10 is widely recognized by the medical community for its broad scope of coverage, which can provide a consistent terminology and coding system for medical practitioners and facilitate information sharing and exchange between different medical institutions.</p>
<p>We collated the diagnostic texts of the ICD-10 according to the disease categories of the Padua scale. The number of diagnostic texts for each category is shown in <xref ref-type="table" rid="T2">Table 2</xref>. Several methods were used to process the data to better fit the clinical situation.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Padua category ICD-10 diagnostic text.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Category</bold></th>
<th valign="top" align="center"><bold>Count</bold></th>
<th valign="top" align="center"><bold>Symptom count</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Prior VTE</td>
<td valign="top" align="center">51</td>
<td valign="top" align="center">42</td>
</tr>
<tr>
<td valign="top" align="left">Active cancer</td>
<td valign="top" align="center">1,097</td>
<td valign="top" align="center">248</td>
</tr>
<tr>
<td valign="top" align="left">Heart/respiratory failure</td>
<td valign="top" align="center">27</td>
<td valign="top" align="center">61</td>
</tr>
<tr>
<td valign="top" align="left">Acute MI/ischemic stroke</td>
<td valign="top" align="center">75</td>
<td valign="top" align="center">51</td>
</tr>
<tr>
<td valign="top" align="left">Acute infection/ rheumatologic disorder</td>
<td valign="top" align="center">750</td>
<td valign="top" align="center">862</td>
</tr>
<tr>
<td valign="top" align="left">Others</td>
<td valign="top" align="center">20,007</td>
<td valign="top" align="center">11,726</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec>
<title>Data amplification</title>
<p>In real-life cases, there is uncertainty in the diagnostic conclusions of doctors due to the complexity of the disease and the difficulty of diagnosis. VTE risk cannot be assessed without a clear diagnosis. To identify uncertain diagnoses among doctors, we randomly selected 10% of the Padua category diagnoses and combined frequently occurring uncertain diagnostic descriptions such as &#x0201C;?&#x0201D; and &#x0201C;undecided&#x0201D; as negative samples.</p>
</sec>
<sec>
<title>ICD diagnosis and symptom integration</title>
<p>This study used the DiseaseKG (<xref ref-type="bibr" rid="B21">21</xref>) data, which were sourced from an authoritative Chinese medical website. This database covers 44,656 medical terms generated during medical procedures in all aspects of medical care and provides a correspondence between 312,159 medical terms. DiseaseKG has broad coverage and is a reliable data source. We used the database to integrate the symptoms of the corresponding diagnosis and add characteristic information. Specifically, the symptoms of the disease are one of the keys to determining the category of the disease. We integrate the symptoms of the corresponding diagnosis to add characteristic information. There are 54,710 triples in DiseaseKG (<xref ref-type="bibr" rid="B21">21</xref>), which represent diseases and symptoms corresponding to diseases. We used the ICD-10 diagnosis text to match &#x0201C;Disease&#x0201D; in the triples to incorporate the corresponding &#x0201C;Symptoms&#x0201D; for the diagnosis. The number of symptoms corresponding to each category is shown in &#x0201C;Symptom Count&#x0201D; in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
</sec>
</sec>
<sec>
<title>Medical corpus data</title>
<p>Medical corpora are an important source of medical text terms that can be widely used in clinical decision-making, mortality prediction, and other clinical applications (<xref ref-type="bibr" rid="B23">23</xref>). We used four corpora, which consist of some standard corpora and summaries of corresponding common terms. The surgical/trauma corpus was obtained from ICD-9-CM3 (<xref ref-type="bibr" rid="B24">24</xref>). The hormonal drug corpus was obtained from the World Health Organization&#x00027;s Drug List (<xref ref-type="bibr" rid="B22">22</xref>). The reduced mobility corpus was derived primarily from the summary of clinical terms, such as &#x0201C;deep coma.&#x0201D; The symptom corpus comes from the symptoms in the DiseaseKG (<xref ref-type="bibr" rid="B21">21</xref>). Each corpus and its corpus quantity are shown in <xref ref-type="table" rid="T3">Table 3</xref>.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Corpus data.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Category</bold></th>
<th valign="top" align="center"><bold>Count</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">ICD-9-CM3</td>
<td valign="top" align="center">13,655</td>
</tr>
<tr>
<td valign="top" align="left">World health organization&#x00027;s drug list for hormonal treatment</td>
<td valign="top" align="center">452</td>
</tr>
<tr>
<td valign="top" align="left">Reduced mobility</td>
<td valign="top" align="center">34</td>
</tr>
<tr>
<td valign="top" align="left">DiseaseKG symptom</td>
<td valign="top" align="center">5,598</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>EMR test data</title>
<p>The test data used in this article were collected from the clinical EMRs of a hospital in Yunnan Province, including the characteristics of medical records, past history, past diagnosis, patient symptoms, patient complaints, examination results, doctor diagnosis, and treatment plan. The test data are used to evaluate the accuracy of our proposed method under real-world conditions. We collected 18,698 EMRs with Padua assessment results. EMR features with missing values were removed. Finally, 7,690 clinical EMRs and their corresponding Padua risk assessment items were obtained. The number of Padua scale items assessed by the doctor for EMR is shown in <xref ref-type="table" rid="T4">Table 4</xref>.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>EMR data.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Category</bold></th>
<th valign="top" align="center"><bold>Count</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Prior VTE</td>
<td valign="top" align="center">38</td>
</tr>
<tr>
<td valign="top" align="left">Active cancer</td>
<td valign="top" align="center">140</td>
</tr>
<tr>
<td valign="top" align="left">Heart/respiratory failure</td>
<td valign="top" align="center">361</td>
</tr>
<tr>
<td valign="top" align="left">Acute MI/ischemic stroke</td>
<td valign="top" align="center">421</td>
</tr>
<tr>
<td valign="top" align="left">Acute infection/rheumatologic disorder</td>
<td valign="top" align="center">1,334</td>
</tr>
<tr>
<td valign="top" align="left">Elderly age (&#x02265;70 years)</td>
<td valign="top" align="center">1,099</td>
</tr>
<tr>
<td valign="top" align="left">BMI &#x02265;30 kg/m<sup>2</sup></td>
<td valign="top" align="center">495</td>
</tr>
<tr>
<td valign="top" align="left">Ongoing hormonal treatment</td>
<td valign="top" align="center">123</td>
</tr>
<tr>
<td valign="top" align="left">Thrombophilic</td>
<td valign="top" align="center">12</td>
</tr>
<tr>
<td valign="top" align="left">Reduced mobility</td>
<td valign="top" align="center">108</td>
</tr>
<tr>
<td valign="top" align="left">Recent (&#x02264;1 month of) trauma and/or surgery</td>
<td valign="top" align="center">60</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>This study was approved by the Institutional Review Board of the First People&#x00027;s Hospital of Anning City. This study complied with the Declaration of Helsinki (accession number 2017YYLH035).</p>
</sec>
</sec>
<sec>
<title>Proposed method</title>
<p>The doctor&#x00027;s diagnosis and the patient&#x00027;s symptoms in the EMR are key to judging the patient&#x00027;s Padua disease category. The combination of tests, medications, treatments, and other factors in the medical process is critical to deriving a comprehensive Padua clinical factor judgment. During the model-building phase, we propose a dual-branch method for automatic Padua assessment. In the disease category branch (Branch A), we use ICD-10 diagnosis text combined with symptom text from DiseaseKG to train a deep learning model. It is used to judge the patient&#x00027;s disease category items in Padua, including &#x0201C;active cancer,&#x0201D; &#x0201C;prior VTE,&#x0201D; &#x0201C;acute infection/rheumatic disease,&#x0201D; &#x0201C;heart/respiratory failure,&#x0201D; and &#x0201C;acute Myocardial Infarction (MI)/ischemic stroke.&#x0201D; In the clinical comprehensive factor branch (Branch B), we used a professional corpus to judge items including &#x0201C;recent (&#x02264;1 month of) trauma and/or surgery,&#x0201D; &#x0201C;reduced mobility,&#x0201D; and &#x0201C;ongoing hormone therapy&#x0201D; in Padua, as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Proposed method and overall process of Padua intelligent assessment.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1237616-g0001.tif"/>
</fig>
<p>In the testing phase of our approach, we evaluate the accuracy and validity of the proposed method utilizing real EMRs, as described in Section EMR test data. For Branch A, both the patient&#x00027;s diagnosis and symptoms are needed. The diagnosis information from the EMR is structured and can be used directly, while symptoms are extracted by Branch B. We then use the collected diagnostic and symptom information as a unit for disease classification prediction using the PDCM. For Branch B, we utilized the LAC model to segment sentences within EMRs, and we combined a professional corpus and rules to extract and determine the information on comprehensive clinical factors, such as symptoms, activities, and medication. In this article, we further propose an automatic assessment method for &#x0201C;thrombophilic&#x0201D;, &#x0201C;age&#x0201D;, and &#x0201C;BMI &#x02265; 30 kg/m<sup>2</sup> (obesity)&#x0201D; items of the Padua scale. Among them, age and obesity can be determined by simply extracting the corresponding data and performing calculations with a computer. According to Manderstedt et al. and Di Minno et al. (<xref ref-type="bibr" rid="B25">25</xref>, <xref ref-type="bibr" rid="B26">26</xref>), we extracted laboratory tests for protein C, protein S, <sc>d</sc>-dimer, and antithrombin III to determine &#x0201C;thrombophilia&#x0201D;. In the following, we elaborate on the details of the methods used in both branches.</p>
<sec>
<title>Padua disease category model branch (PDCM, branch A)</title>
<p>This article proposes a classification model for Padua diseases related to the Padua scale, as shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. We developed an algorithm to establish a SWM layer (I) for calculating the weights of diagnoses and corresponding symptoms. Subsequently, we employed an ALBERT layer (II) to convert diagnosis and symptom texts into word vectors. Then, we used the BiLSTM layer (III) to extract features. Next, we used a concatenate and output layer (IV) to concatenate the feature information from the diagnosis and symptom texts and input them into a dropout layer to enhance our model&#x00027;s generalization capability. Subsequently, we concatenated the symptom weights obtained from (I) with the dropout-processed features, and finally, we predicted the merged results using a linear layer.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Structure of the Padua disease classification model. The Roman numerals I&#x02013;IV in the figure represent the different layers in the PDCM model, which will be described in detail below.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1237616-g0002.tif"/>
</fig>
<p>In summary, this study combines the ALBERT layer and the BiLSTM layer, aiming to better capture the semantic information of diagnosis and symptom text. Symptoms play a crucial role in disease diagnosis. Therefore, we incorporated the diagnosis of corresponding symptoms into our model, aiming to increase the information of the features, which in turn enhances accuracy and reliability. In addition, we propose an algorithm to calculate the SWM of different symptoms for each disease category in Padua. This SWM is fused with the information on diagnostic and symptom features extracted by BiLSTM. This allows symptoms of different importance to exert different effects on disease classification and expands the range of features. The following section provides a detailed description of each component of the PDCM.</p>
<sec>
<title>Symptom weight matrix layer (I)</title>
<p>Some symptoms are common in various diseases (e.g., &#x0201C;fever&#x0201D;). These symptoms can easily cause noise in the classification task. The various symptoms that correspond to each diagnosis have different levels of importance to the diagnosis, while symptoms of the same category of diseases are similar. Term frequency&#x02013;inverse word frequency (TF-IWF) is an algorithm used to evaluate the extent to which a word can reflect its corpus. We use the TF-IWF (<xref ref-type="bibr" rid="B27">27</xref>) algorithm to calculate the importance of symptoms in disease categories. The SWM was proposed according to the corresponding symptoms of the diagnosis. The SWM is calculated as follows:</p>
<p>In <xref ref-type="table" rid="T2">Table 2</xref>, we aggregate the symptom corpus <italic>N</italic><sub><italic>i</italic></sub> corresponding to the diagnosis of each category of Padua, where <italic>N</italic><sub><italic>i</italic></sub> represents the symptom corpus of category <italic>i</italic>. The total number of occurrences of a certain symptom <italic>t</italic> in the symptom library <italic>N</italic><sub><italic>i</italic></sub> is <italic>N</italic><sub><italic>i,t</italic></sub>, and the total number of words in the symptom library <italic>N</italic><sub><italic>i</italic></sub> is count (<italic>N</italic><sub><italic>i</italic></sub>). Then, the TF of symptom <italic>t</italic> relative to the symptom library <italic>N</italic><sub><italic>i</italic></sub> is as follows:</p>
<disp-formula id="E1"><mml:math id="M1"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mi>T</mml:mi><mml:mi>F</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula><p>Then, all <italic>N</italic><sub><italic>i</italic></sub> is the total symptom corpus <italic>w</italic>. Let the frequency of all symptoms be <italic>W</italic><sub><italic>c</italic></sub>, among which the frequency of symptom <italic>t</italic> in all words of <italic>W</italic> is <italic>W</italic><sub><italic>c,t</italic></sub>; then, the IWF of symptom <italic>t</italic> relative to the total symptom corpus is as follows:</p>
<disp-formula id="E2"><mml:math id="M2"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mi>I</mml:mi><mml:mi>W</mml:mi><mml:mi>F</mml:mi><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mfrac><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>&#x000A0;</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Thus, TF&#x02013;IWF of symptom <italic>t</italic> relative to <italic>N</italic><sub><italic>i</italic></sub> is as follows:</p>
<disp-formula id="E3"><mml:math id="M3"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mi>T</mml:mi><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mi>T</mml:mi><mml:mi>F</mml:mi><mml:mo>-</mml:mo><mml:mi>I</mml:mi><mml:mi>W</mml:mi><mml:mi>F</mml:mi><mml:mo>=</mml:mo><mml:mi>T</mml:mi><mml:mi>F</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>I</mml:mi><mml:mi>W</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Finally, we developed the SWM algorithm to calculate the SWM for adding feature information. The SWM construction algorithm is shown in <xref ref-type="table" rid="T5">Table 5</xref>.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Algorithm: medical symptom weight matrix construction procedure.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Algorithm 1: medical symptom TF-IWF construct Procedure</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Input:<bold>N</bold><sub><bold>k</bold></sub>={<bold>N</bold><sub><bold>0</bold></sub>,<bold>N</bold><sub><bold>1</bold></sub>,<bold>N</bold><sub><bold>2</bold></sub>,<bold>N</bold><sub><bold>3</bold></sub>,<bold>N</bold><sub><bold>4</bold></sub>,<bold>N</bold><sub><bold>5</bold></sub>},<bold>N</bold><sub><bold>k</bold></sub>, represents Padua&#x00027;s symptom for different disease categories. <bold>X</bold><sub><bold>t</bold></sub>={t1,t2...,Tn}, <bold>X</bold><sub><bold>t</bold></sub> are the different symptoms corresponding to the diagnosis.</td>
</tr>
<tr>
<td valign="top" align="left">Output: The constructed symptom weighting matrix.</td>
</tr>
<tr>
<td valign="top" align="left">1: &#x00023;&#x00023; Construct an initial one-dimensional matrix of length 6, corresponding to the 6 disease categories of the Padua scale.</td>
</tr>
<tr>
<td valign="top" align="left">2:SWM=[0,0,0,0,0,0]</td>
</tr>
<tr>
<td valign="top" align="left">3: &#x00023;&#x00023; Calculate the weights of the corresponding category for each symptom separately.</td>
</tr>
<tr>
<td valign="top" align="left">4: for <italic>t</italic> in <bold>X</bold><sub><bold>t</bold></sub> do:</td>
</tr>
<tr>
<td valign="top" align="left">5: for <italic>i</italic> in <bold>range</bold><bold>(</bold><bold>k</bold><bold>)</bold> do:</td>
</tr>
<tr>
<td valign="top" align="left">5: &#x000A0;If <italic>i</italic> in <bold>N</bold><sub><bold>k</bold></sub> do:</td>
</tr>
<tr>
<td valign="top" align="left">7: &#x000A0; SWM<sub><italic>k</italic></sub>=(<bold>[</bold><bold>TI</bold><sub><bold>i</bold><bold>, </bold><bold>t</bold></sub><bold>/max</bold><bold>(</bold><bold>TI</bold><sub><bold>i</bold></sub><bold>)</bold>])</td>
</tr>
<tr>
<td valign="top" align="left">8: &#x00023;&#x00023; symptom weight matrix summation for each symptom.</td>
</tr>
<tr>
<td valign="top" align="left">9: &#x000A0;SWM &#x0003D; &#x0002B;SWM<sub><italic>k</italic></sub></td>
</tr>
<tr>
<td valign="top" align="left">10: return SWM</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>ALBERT word embedding layer (II)</title>
<p>The initial phase of model training involves text vectorization. Currently, several vectorization options exist, including Word2vec (<xref ref-type="bibr" rid="B28">28</xref>), BERT (<xref ref-type="bibr" rid="B29">29</xref>), and ALBERT (<xref ref-type="bibr" rid="B30">30</xref>). BERT has emerged as the most prevalent pretraining model due to its transformer structure. It employs bidirectional encoding, which offers more robust feature extraction capabilities than Word2vec. Furthermore, BERT addresses the contextual ambiguity of words that Word2vec struggles to resolve. Recently, in 2020, Lan et al. proposed a lightweight pretraining model named ALBERT. This model simplifies BERT using decomposition embedding parameterization, cross-layer parameter sharing, and other methods that significantly reduce computational parameters. Models with few computational parameters can greatly reduce memory overhead in terms of deployment.</p>
<p>According to the diagnosis and length of symptoms, we use ALBERT to vectorize the input diagnostic or symptom text into vectors of size (20, 768) or (50, 768).</p>
</sec>
<sec>
<title>BiLSTM layer (La)</title>
<p>The long short-term memory (LSTM) neural network (<xref ref-type="bibr" rid="B31">31</xref>) is a recurrent neural network (RNN) that overcomes the gradient explosion problem of traditional RNNs. However, LSTM considers only past information and ignores future information. To use context information effectively, BiLSTM combines forward and backward LSTM is used to obtain two separate hidden states: <inline-formula><mml:math id="M4"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The two hidden states are then concatenated to form the final output <inline-formula><mml:math id="M5"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> =[<inline-formula><mml:math id="M6"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msubsup></mml:math></inline-formula>,<italic>h</italic><sub><italic>t</italic></sub>] of time <italic>t</italic>.</p>
<p>We used two BiLSTM layers. The output dimension of the first layer of BiLSTM is (20,768) or (50,768), which aims to extract the features of the word vector, and the second layer of BiLSTM outputs the hidden state, which contains all time steps with a feature dimension of 768.</p>
</sec>
<sec>
<title>Concatenate and output layer (IV)</title>
<p>Diagnosis and symptoms were represented by BiLSTM extraction features as <italic>h</italic><sub><italic>diagnosis</italic></sub> and <italic>h</italic><sub><italic>symptoms</italic></sub>, respectively. We concatenated <italic>h</italic><sub><italic>diagnosis</italic></sub> and <italic>h</italic><sub><italic>symptoms</italic></sub>, <italic>h</italic><sub><italic>Concatenated</italic></sub> as follows:</p>
<disp-formula id="E4"><mml:math id="M7"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>g</mml:mi><mml:mi>n</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>,</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>y</mml:mi><mml:mi>m</mml:mi><mml:mi>p</mml:mi><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>m</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><italic>h</italic><sub><italic>Concatenated</italic></sub> has the features of diagnosis and diagnosis of corresponding symptoms. Then we input <italic>h</italic><sub><italic>Concatenated</italic></sub> to the dropout layer to increase the generalization performance of the neural network. The output of the dropout layer is represented as <italic>h</italic><sub><italic>Dropouted</italic></sub>. Then, we concatenate the output SWM of the symptom weight matrix with <italic>h</italic><sub><italic>Dropouted</italic></sub> to obtain the following:</p>
<disp-formula id="E5"><mml:math id="M8"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>W</mml:mi><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:mi>S</mml:mi><mml:mi>W</mml:mi><mml:mi>M</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>,</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>]</mml:mo><mml:mo>&#x000A0;</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Finally, we input <italic>h</italic><sub><italic>SWM</italic></sub> to the classification layer for classification using softmax. Due to the unbalanced diagnosis of Padua&#x00027;s corresponding categories, we used the focal loss (<xref ref-type="bibr" rid="B32">32</xref>) function.</p>
</sec>
</sec>
<sec>
<title>Clinical comprehensive factor branch (branch B)</title>
<p>In this article, we propose a clinically comprehensive factor branch related to the Padua scale that utilizes LAC splitting techniques and negative word filtering to achieve the extraction of patient symptoms, medication information, and activity in the EMR text, as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Clinically Comprehensive factor extraction judgment process. The icons next to the letters <bold>(A&#x02013;F)</bold> represent the following contents: <bold>(A)</bold> total corpus; <bold>(B)</bold> symptom corpus; <bold>(C)</bold> trauma or surgery corpus; <bold>(D)</bold> hormone corpus; <bold>(E)</bold> activity reduction corpus; <bold>(F)</bold> process. The example content in the module, the example in the figure, is the statement &#x0201C;On April 16, 2021, he was hospitalized Hospital for artificial joint replacement&#x0201D; that appears in the EMR.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1237616-g0003.tif"/>
</fig>
<p>The LAC (<xref ref-type="bibr" rid="B19">19</xref>) lexical segmentation tool can perform automatic lexical segmentation of sentences and provide lexical information of words after lexical segmentation. Medical terms tend to be more accurate, and direct use of LAC lexical segmentation is likely to result in inaccurate lexical segmentation. LAC provides a method of loading an intervening lexicon that allows LAC to perform accurate lexical segmentation when accurate medical terms are encountered.</p>
<p>First, to accurately and precisely segment and match medical terms and diagnoses in the EMR, we used the LAC word-splitting tool. Using the total corpus as a preloaded corpus, we represent the EMR as a collection containing multiple sentences. Then, we obtained the result of LAC segmentation by feeding each sentence into the LAC model, which includes the segmented words and the corresponding lexical properties.</p>
<p>When we use the LAC segmentation tool for each sentence in the EMR, we can obtain a set of vocabulary and corresponding lexical properties. These vocabularies and lexical properties are the basis for performing medical terminology matching. In Section Medical corpus data, four professional corpora were selected as references for matching terms, including ICD-9-CM3, DiseaseKG-Symptoms, Reduced Mobility, and World Health Organization&#x00027;s Drug List. These four corpora are standard classification systems widely adopted in the medical field, and thus, they cover most of the medical terms and disease diagnostic results.</p>
<p>To make the matched results more accurate and reflect the actual situation of patients, we introduced a negative word matching and filtering mechanism. In medical terminology and disease diagnostic results, a negative situation often refers to the exclusion of certain symptoms or conditions. For example, &#x0201C;patient is not using hormones&#x0201D; means that the patient is not using hormones. If we were to match that description directly, the result would be &#x0201C;hormone use,&#x0201D; which does not match the patient&#x00027;s actual condition. Therefore, we added negative words to the matching process. When a negative word appears in the LAC result, we filter out the sentence and exclude the corresponding symptom or condition from the sentence. Negative words include but are not limited to &#x0201C;not used,&#x0201D; &#x0201C;not seen,&#x0201D; &#x0201C;none,&#x0201D; &#x0201C;not found,&#x0201D; and so on. Specifically, we extracted all the negative words and matched them according to their preceding and following contexts. For example, in the phrase &#x0201C;no hormone use,&#x0201D; &#x0201C;no use&#x0201D; is a negative word, so we filter the phrase. Similarly, in &#x0201C;no abnormalities seen,&#x0201D; &#x0201C;no abnormalities seen&#x0201D; is a negative word, and it is followed by &#x0201C;abnormalities,&#x0201D; so we will exclude &#x0201C;abnormalities&#x0201D; from the results.</p>
<p>By using the negative word matching and filtering mechanisms, we were able to more accurately extract the symptoms and conditions that represent the actual condition of the patient in each sentence. This has important implications for disease diagnosis and treatment, providing clinicians with a more accurate reference base, as well as providing more accurate data support for medical research. Notably, the item &#x0201C;recent (&#x02264;1 month of) trauma and/or surgery&#x0201D; has a time judgment requirement. Combined with the feature of LAC to divide sentences according to lexicality, we compare the time corresponding to the time adverb extracted by LAC with the current time to judge this item. In <xref ref-type="fig" rid="F3">Figure 3</xref>, the sentence &#x02018;On April 16, 2021, he was hospitalized Hospital for artificial joint replacement&#x00027; within the Electronic Medical Record (EMR) identifies &#x02018;On April 16, 2021&#x00027; as the &#x02018;TIME&#x00027;. We can use this time and the current time to make a judgment.</p>
<p>In summary, our method is based on the LAC word division tool and several professional corpora to achieve accurate extraction and recognition of medical terms and disease diagnosis results in EMRs through matching and filtering mechanisms.</p>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec>
<title>Evaluation index and experimental environment</title>
<p>In this article, the AUC (<xref ref-type="bibr" rid="B33">33</xref>) and F1 (<xref ref-type="bibr" rid="B34">34</xref>) were selected as the main evaluation indexes. In addition, three common multi-label classification evaluation indexes are used as subevaluation indexes, including the Hamming Distance (HD) (<xref ref-type="bibr" rid="B35">35</xref>), One-Error (OE) (<xref ref-type="bibr" rid="B36">36</xref>), Label Ranking Loss (RL) (<xref ref-type="bibr" rid="B37">37</xref>), and Coverage (Cov) (<xref ref-type="bibr" rid="B38">38</xref>). The higher the AUC and F1 values are, the lower the HD, OE, and Cov values are, and the better the performance of the model.</p>
<p>For the purposes of training, executing, and evaluating performance, the training experiment was executed on a computer running the Windows 10 operating system with an Intel<sup>&#x000AE;</sup> Core&#x02122; i7-11700KF CPU, an NVIDIA GTX3080 graphics processing unit, and 10 GB of memory. The computer was equipped with 32 GB of RAM and ran in the Python 3.7 and TensorFlow 2.7.0 environments.</p>
<p>Our proposed approach is divided into a deep learning model for disease diagnosis and an assessment of clinical situations. We evaluated the two parts of the experiments separately.</p>
</sec>
<sec>
<title>Experiment of branch A</title>
<p>The items &#x0201C;active cancer,&#x0201D; &#x0201C;prior VTE,&#x0201D; &#x0201C;acute infection/rheumatologic disorder,&#x0201D; &#x0201C;heart/respiratory failure,&#x0201D; and &#x0201C;acute MI/ischemic stroke&#x0201D; are based on the proposed PDCM.</p>
<p>We split the PDCM training data in Section Padua disease classification model training data (PDCM training data) by 7:3 as a training set and a validation set. The number of training iterations was 100, and the model performance was checked using the validation set. We used the early-stopping (<xref ref-type="bibr" rid="B39">39</xref>) method during the training. If the effect has not improved for 10 consecutive rounds, then training is terminated. The previous model with the highest F1 was saved.</p>
<sec>
<title>Model comparison results and analysis</title>
<p>We test our proposed method using the EMR data in Section EMR test data. To further evaluate the effectiveness of our proposed PDCM, we have selected the following baseline models for comparison: IDCNN (<xref ref-type="bibr" rid="B40">40</xref>), BiRNN (<xref ref-type="bibr" rid="B41">41</xref>), Transformer (<xref ref-type="bibr" rid="B42">42</xref>), TEXTCNN (<xref ref-type="bibr" rid="B43">43</xref>), and BiLSTM (<xref ref-type="bibr" rid="B44">44</xref>) are commonly used in the field of text classification for comparison. Among the comparative methods, we also use diagnostic and symptom text inputs. We chose ALBERT as our vectorization technique for both PDCM and the comparative methods because of its extensive embedding features, lower parameter count, and suitability for clinical deployment, as described in Section Ablation experiment. Below, we present the results in <xref ref-type="table" rid="T6">Table 6</xref>. Furthermore, in the following section, we provide a detailed description of the prediction process for each approach.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>Comparative experimental results.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Category</bold></th>
<th valign="top" align="center"><bold>AUC</bold></th>
<th valign="top" align="center"><bold>HD</bold></th>
<th valign="top" align="center"><bold>RL</bold></th>
<th valign="top" align="center"><bold>Cov</bold></th>
<th valign="top" align="center"><bold>OE</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">TEXTCNN</td>
<td valign="top" align="center">0.700</td>
<td valign="top" align="center">0.047</td>
<td valign="top" align="center">0.132</td>
<td valign="top" align="center">0.349</td>
<td valign="top" align="center">0.223</td>
<td valign="top" align="center">0.722</td>
</tr>
<tr>
<td valign="top" align="left">BiRNN</td>
<td valign="top" align="center">0.735</td>
<td valign="top" align="center">0.049</td>
<td valign="top" align="center">0.151</td>
<td valign="top" align="center">0.404</td>
<td valign="top" align="center">0.225</td>
<td valign="top" align="center">0.769</td>
</tr>
<tr>
<td valign="top" align="left">IDCNN</td>
<td valign="top" align="center">0.769</td>
<td valign="top" align="center">0.052</td>
<td valign="top" align="center">0.135</td>
<td valign="top" align="center">0.360</td>
<td valign="top" align="center">0.234</td>
<td valign="top" align="center">0.752</td>
</tr>
<tr>
<td valign="top" align="left">Transformer</td>
<td valign="top" align="center">0.779</td>
<td valign="top" align="center">0.044</td>
<td valign="top" align="center">0.145</td>
<td valign="top" align="center">0.381</td>
<td valign="top" align="center">0.197</td>
<td valign="top" align="center">0.791</td>
</tr>
<tr>
<td valign="top" align="left">BiLSTM</td>
<td valign="top" align="center">0.791</td>
<td valign="top" align="center">0.039</td>
<td valign="top" align="center">0.117</td>
<td valign="top" align="center">0.390</td>
<td valign="top" align="center">0.179</td>
<td valign="top" align="center">0.800</td>
</tr>
<tr>
<td valign="top" align="left">PDCM (Ours)</td>
<td valign="top" align="center">0.836</td>
<td valign="top" align="center">0.036</td>
<td valign="top" align="center">0.100</td>
<td valign="top" align="center">0.282</td>
<td valign="top" align="center">0.167</td>
<td valign="top" align="center">0.838</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec>
<title>TEXTCNN</title>
<p>TEXTCNN uses convolutional neural networks for text classification. In TEXTCNN, we employ ALBERT for vectorizing diagnostic and symptom text data. Following the original paper, we use convolutional kernels of sizes 3, 4, and 5 to capture features from the text. Max pooling is applied to extract the most salient features produced by each convolutional kernel. The pooled outputs are then merged and combined with the feature information from the diagnosis and symptoms. Finally, the merged features are passed through a dropout layer and a classification layer to predict the disease category of the patients. Compared with TEXTCNN, our method improves the AUC by 13.6%.</p>
</sec>
<sec>
<title>BiRNN</title>
<p>BiRNN is an RNN model that can input information in both forward and backward transmission directions. In BiRNN, we input the vectorized diagnostic and symptom information separately into BiRNN as time steps. This enables us to extract text features by first obtaining the features of each time step. Subsequently, we input the time step features into the next BiRNN to obtain the final hidden state and extract overall features. Finally, we merge the extracted diagnostic and symptom features, apply a dropout layer, and perform classification using a linear layer. In contrast, the AUC of our method improved by 10.1%.</p>
</sec>
<sec>
<title>IDCNN</title>
<p>IDCNN introduces the concept of dilation rate, allowing the model to increase its receptive field without adding parameters. This enables capturing longer-range dependencies. In the prediction process of IDCNN, iterative dilation convolution is used to capture contextual feature information at different scales from vectorized diagnostic and symptom data. The diagnostic and symptom features are then merged after applying Dropout for regularization. Finally, the merged features are passed through the classification layer to predict disease categories. IDCNN is 6.7% less effective than PDCM (Ours) in the AUC.</p>
</sec>
<sec>
<title>Transformer</title>
<p>The structure of the Transformer is composed of an encoder and a decoder. Inputs are provided to the encoder layer, comprising vectors and positional information for diagnoses and symptoms separately. It employs stacked self-attention mechanisms and encoder&#x02013;decoder attention mechanisms to capture sequence correlation information. Finally, the features of diagnoses and symptoms are concatenated, and the patient&#x00027;s disease category is produced through a linear layer. In contrast, the AUC of our method improved by 5.7%.</p>
</sec>
<sec>
<title>BiLSTM</title>
<p>BiLSTM is an improvement of BIRNN, which solves the problem of gradient explosion in RNN well by designing a forget gate. We vectorize the diagnosis and symptom information and input it into two BiLSTMs to extract temporal and global features, respectively. Finally, we concatenate these features, apply a dropout layer for regularization, and use a linear layer to predict the disease category of the patient. Compared with BiLSTM, our method improves the AUC by 4.5%.</p>
</sec>
<sec>
<title>PDCM (Ours)</title>
<p>Compared to BiLSTM, our PDCM model incorporates a SWM, enabling better calculation of the influence weight for each symptom category on the overall category. PDCM strengthens the association between symptoms and disease categories using a symptom weighting matrix. Additionally, PDCM expands the feature range to achieve optimal performance.</p>
<p>It can be seen from the data in <xref ref-type="table" rid="T6">Table 6</xref> that the PDCM used in this article achieved the best results in terms of AUC and F1.</p>
</sec>
</sec>
<sec>
<title>Ablation experiment</title>
<p>In Section Model comparison results and analysis, we experimentally concluded that the PDCM model works best. For this, we designed ablation experiments to determine the effectiveness of several methods. In the proposed method, Diagnose (Only), Diagnose &#x0002B; symptom, PDCM, and PDCM (Without) were used. Diagnose (Only) means training with diagnosis only, Diagnose &#x0002B; symptom means training with diagnosis and corresponding symptoms for diagnosis; the PDCM represents the use of diagnosis &#x0002B; symptom prediction while incorporating the symptom weighting matrix presented in Section Symptom weight matrix layer (I). To analyze the effectiveness of these methods, an ablation experiment was designed. In Section Padua disease classification model training data (PDCM training data), we mentioned data augmentation for the diagnostic incorporation of uncertain text. To verify the effect of this part, we removed this part of the augmentation as PDCM (Without). The experimental results are shown in <xref ref-type="table" rid="T7">Table 7</xref>.</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p>Ablation experiment.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left"><bold>Category</bold></th>
<th valign="top" align="center"><bold>AUC</bold></th>
<th valign="top" align="center"><bold>HD</bold></th>
<th valign="top" align="center"><bold>RL</bold></th>
<th valign="top" align="center"><bold>Cov</bold></th>
<th valign="top" align="center"><bold>OE</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">PDCM (Without)</td>
<td valign="top" align="center">0.709</td>
<td valign="top" align="center">0.046</td>
<td valign="top" align="center">0.170</td>
<td valign="top" align="center">0.451</td>
<td valign="top" align="center">0.216</td>
<td valign="top" align="center">0.745</td>
</tr>
<tr>
<td valign="top" align="left">Diagnose (Only)</td>
<td valign="top" align="center">0.782</td>
<td valign="top" align="center">0.041</td>
<td valign="top" align="center">0.140</td>
<td valign="top" align="center">0.389</td>
<td valign="top" align="center">0.190</td>
<td valign="top" align="center">0.798</td>
</tr>
<tr>
<td valign="top" align="left">Diagnose &#x0002B; symptom</td>
<td valign="top" align="center">0.791</td>
<td valign="top" align="center">0.039</td>
<td valign="top" align="center">0.117</td>
<td valign="top" align="center">0.390</td>
<td valign="top" align="center">0.179</td>
<td valign="top" align="center">0.800</td>
</tr>
<tr>
<td valign="top" align="left">PDCM (Ours)</td>
<td valign="top" align="center">0.836</td>
<td valign="top" align="center">0.036</td>
<td valign="top" align="center">0.100</td>
<td valign="top" align="center">0.282</td>
<td valign="top" align="center">0.167</td>
<td valign="top" align="center">0.838</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>It can be seen from the experimental results that the PDCM proposed by us has achieved the best effect. Compared with PDCM (Without), F1 and AUC were improved by 9.3 and 12.7%, respectively. We compared Diagnose (Only) with Diagnose &#x0002B; symptom and found a 0.9% increase in AUC after incorporation of symptoms.</p>
<p>To compare the differences among different methods in detail, we calculated the AUC value of each algorithm for each item, and the results are shown in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>AUC values of multiple methods for ablation experiments.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1237616-g0004.tif"/>
</fig>
<p>In summary, the PDCM proposed by us achieved the best effect. First, we compared Diagnose (Only) with Diagnose &#x0002B; symptom, and found that the test result was significantly higher than that of Diagnose (Only) after the integration of symptoms. The Diagnose &#x0002B; symptom method achieves a better effect on the evaluation of the &#x0201C;active cancer&#x0201D; item (AUC: 0.837 vs. 0.824) and the &#x0201C;acute infection and/or rheumatologic disorder&#x0201D; item (AUC: 0.789 vs. 0.749). Comparing PDCM and PDCM (Without), we found that the model effect was significantly improved in &#x0201C;active cancer&#x0201D; (AUC: 0.842 vs. 0.641), &#x0201C;acute infection and/or rheumatologic disorder&#x0201D; (AUC: 0.808 vs. 0.791), and &#x0201C;acute MI and/or ischemic stroke&#x0201D; (AUC: 0.804 vs. 0.643) items.</p>
</sec>
</sec>
<sec>
<title>Experiment of branch B (clinical comprehensive factor extraction judgment results)</title>
<p>We test our proposed method using the EMR data in Section EMR test data. Items of &#x0201C;Reduced mobility,&#x0201D; &#x0201C;Recent (&#x02264;1 month of) trauma and/or surgery,&#x0201D; and &#x0201C;Ongoing treatment&#x0201D; use the method in Branch B. Items of &#x0201C;Elderly age (&#x02265;70 years),&#x0201D; &#x0201C;BMI&#x0003E;30 kg/m<sup>2</sup>,&#x0201D; and &#x0201C;thrombophilic&#x0201D; use computerized numerical calculations. We treated these items as independent dichotomous items using the AUC assessment. The AUC values of these items are shown in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Clinical comprehensive factor branch AUC.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1237616-g0005.tif"/>
</fig>
<p>Items of &#x0201C;Reduced mobility,&#x0201D; &#x0201C;Thrombophilic,&#x0201D; &#x0201C;Ongoing treatment,&#x0201D; &#x0201C;Recent (&#x02264;1 month of) trauma and/or surgery,&#x0201D; &#x0201C;Elderly age (&#x02265;70 years),&#x0201D; and &#x0201C;BMI&#x0003E;30 kg/m<sup>2</sup>&#x0201D; had AUCs of 0.809, 0.95, 0.888, 0.995, 0.827, and 0.960, respectively.</p>
</sec>
<sec>
<title>Padua overall evaluation results</title>
<p>We integrated Branch A and B, presented in Section Proposed method, into the VTE software system for practical use. The scoring interface in real-world applications is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. According to the authoritative standards of the American College of Chest Physicians, a total score below 3 on the Padua scale is considered low risk, and a score above 3 is considered high risk. The risk assessment level is crucial as it directly determines the patient&#x00027;s follow-up treatment plan. We used the scores to assess the patient&#x00027;s level of risk. It is a common phenomenon in healthcare that there are far fewer high-risk people than low-risk people. In this section, we use AUC, sensitivity, specificity (<xref ref-type="bibr" rid="B33">33</xref>), and precision value (<xref ref-type="bibr" rid="B34">34</xref>) to evaluate the performance of our method, which is not affected by data imbalance.</p>
<p>Of the 7,690 EMRs tested, 7,548 were assessed by doctors as low risk and 142 as high risk. Among the 7,548 low-risk EMRs, 4,341 samples with a score of 0 were also assessed as low risk. Although a score of 0 is also considered low risk, it represents a much lower level of risk. Considering the imbalance in the samples, we utilized the AUC value to evaluate the risk levels for 0 scores, low risk, and high risk. The AUC of 0.883 in assessing patients&#x00027; risk of VTE indicates that the model has a high level of accuracy in distinguishing between low-risk and high-risk EMRs.</p>
<p>The precision value can be used to further evaluate the false rate of our proposed method. With an evaluation value for precision of 0.87, it indicates that our model has high precision in its predictions, denoting a low overall false positive rate and validating the reliability of our method.</p>
<p>We further use sensitivity and specificity to evaluate the accuracy of low and high risk. Both indicators are equally unaffected by the data imbalance. Sensitivity and specificity represent the ability of the method to detect positive samples and exclude negative samples, respectively. The high specificity value of 0.957 suggests that the model is able to correctly assess a large proportion of low-risk EMRs, while the sensitivity value of 0.816 indicates that the model is also effective at assessing high-risk EMRs.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>Our proposed two-branch model automatically predicts VTE risk from the EMRs without doctor input, greatly reducing the burden on doctors. Currently, most patients do not receive effective VTE risk assessment and prevention (<xref ref-type="bibr" rid="B6">6</xref>). Our proposed method for automatic VTE risk assessment helps to improve the prevention rate of VTE. Furthermore, intelligent assessment using an artificial intelligence approach helps to eliminate the heterogeneity caused by the assessment of different doctors. Moreover, our proposed DB-DL assessment method achieves higher accuracy than other intelligent methods.</p>
<p>In terms of automation of VTE risk assessment, although both Pierre et al. (<xref ref-type="bibr" rid="B10">10</xref>) and Qatawneh et al. (<xref ref-type="bibr" rid="B16">16</xref>) automate the assessment to some extent, their methods require additional steps. Specifically, Qatawneh et al.&#x00027;s approach involves information that primarily exists in textual form, which requires considerable time investment in converting textual information into numerical values. On the other hand, Pierre et al. require the construction of a complete EDW and subsequent transformation of EMR text into structured data to determine risk factors, which may not be feasible for healthcare organizations without available resources for building a large-scale structured data warehouse. In contrast, the object of our proposed DB-DL assessment method is EMR text, and EMR systems are widely used at all levels of healthcare systems, which makes our method more applicable. Furthermore, we elaborate on our DB-DL assessment method in Section Proposed method. The DB-DL assessment method is divided into two branches, with Branch A using our designed PDCM deep learning model to determine the patient&#x00027;s disease category, which mainly utilizes diagnostic and symptom text data. By using LAC combined with negative word filtering in Branch B, we are able to extract and determine the patient&#x00027;s symptoms, hormone use, and activity, thus providing automation of the entire DB-DL assessment method. Our method automates the entire process of assessment without manual transformation or doctor input compared to previous work.</p>
<p>The accuracy of the automated assessment of high- and low-risk levels in Padua is crucial and determines the next preventive measure or treatment for the patient. In terms of the accuracy of the automated assessment of the Padua scale, we tested the accuracy of our proposed DB-DL assessment method using the EMR test data as an independent data source. Pierre et al. (<xref ref-type="bibr" rid="B10">10</xref>), who also studied the automated assessment method regarding the Padua scale, had an AUC of 0.81 in assessing Padua high and low risk for them, while our DB (Pierre et al.) used a structured medical record from a processed data warehouse matched to ICD-9 codes to determine the patient&#x00027;s VTE risk items. This has less characteristic information and ignores the impact of the patient&#x00027;s symptoms and test results on the disease category. Such characteristics are less informative and ignore the impact of the patient&#x00027;s symptoms and test results on the disease category. For example, the symptom &#x0201C;Precardial pain&#x0201D; is correlated (<xref ref-type="bibr" rid="B45">45</xref>) with the item &#x0201C;Acute MI/ischemic stroke&#x0201D; in the Padua scale. Our DB-DL assessment method utilizes a deep learning model, PDCM, and a combination of clinical factors to determine the branch; our method also accounts for the patient&#x00027;s diagnosis and multiple texts, including physical examination, tests, medications, and diagnosis, to achieve higher accuracy.</p>
<p>In our DB-DL assessment method, Branches A and B represent different items of the Padua scale, respectively. We further discuss the accuracy of the table items represented by Branches A and B. In Branch A, we propose the PDCM deep learning model and input the patient&#x00027;s diagnosis &#x0002B; symptom prediction to determine their disease category. We compare the accuracy of the proposed PDCM model with the common disease classification deep learning models IDCNN (<xref ref-type="bibr" rid="B40">40</xref>), BiRNN (<xref ref-type="bibr" rid="B41">41</xref>), Transformer (<xref ref-type="bibr" rid="B42">42</xref>), TEXTCNN (<xref ref-type="bibr" rid="B43">43</xref>), and BiLSTM (<xref ref-type="bibr" rid="B44">44</xref>) on this task. BiRNN is suitable for handling sequential problems but prone to the gradient explosion problem, and our proposed method outperforms BiRNN (AUC: 0.837 vs. 0.735). The Transformer model is not affected by the gradient explosion problem of traditional RNN and can better capture relationships and dependencies at a distance in the input sequence. However, it sacrifices traditional RNN and local feature capture. In contrast, our model outperforms the Transformer model (AUC: 0.837 vs. 0.779). While IDCNN and TEXTCNN have their advantages, IDCNN may lose information, while TEXTCNN has a fixed window size that limits its ability to incorporate all textual information. Our model outperforms the IDCNN and TEXTCNN models (AUC: 0.837 vs. 0.769 and AUC: 0.837 vs. 0.700, respectively). BiLSTM can model stacked LSTM layers and better incorporate context but still falls short of our PDCM model (AUC: 0.837 vs. 0.791). The latter accounts for symptom weights and achieves higher accuracy.</p>
<p>To further validate the impact of each module of our proposed PDCM model on the accuracy of Branch A assessment, in Section Ablation experiment, we designed ablation experiments in which we compared four methods, namely Diagnose (Only), Diagnose &#x0002B; symptom, PDCM, and PDCM (Without). Diagnose (Only) represents only Diagnose &#x0002B; symptom stands for training and predicting patients&#x00027; disease categories using Diagnose only. Diagnose &#x0002B; symptom stands for training and predicting patients&#x00027; disease categories using Diagnose &#x0002B; symptom. PDCM is our proposed deep learning model, which stands for training and predicting patients&#x00027; disease categories using Diagnose &#x0002B; symptom and fused symptom weights. In Section Padua disease classification model training data (PDCM training data), we performed Data amplification to incorporate uncertain diagnostic descriptions to improve the model&#x00027;s generalization performance. To analyze the effectiveness of data amplification, we removed this part of the data extension and used PDCM (Without) representation. First, we compared Diagnose (Only) and Diagnose &#x0002B; symptom in terms of the validity of diagnostic integration of symptoms and found that the test results after the integration of symptoms were significantly higher than those of Diagnose (Only). This indicates that the model obtained more feature information after incorporating symptom information and obtained better generalization performance in the actual test. In terms of data augmentation, PDCM (Without) and PDCM were compared. We found that many uncertain diagnoses were incorrectly judged as true by the model when no data augmentation method was used. Diagnose &#x0002B; symptom or PDCM can be used to verify the validity of incorporating the symptom weighting matrix in PDCM. We then compared Diagnose &#x0002B; symptom with the PDCM model and found that the accuracy of determining patient disease categories was substantially improved after incorporating symptom weights. The PDCM used the TF-IWF algorithm to calculate the weight of each symptom category&#x00027;s influence on the category and showed the best results by integrating the corresponding diagnoses based on the symptom weights.</p>
<p>The Branch B assessment method achieved good results. The items &#x0201C;thrombophilic,&#x0201D; &#x0201C;Elderly age (&#x02265;70 years),&#x0201D; and &#x0201C;BMI &#x0003E;30 kg/m<sup>2</sup>&#x0201D; were calculated from the physical examination values, and the AUCs were higher than 0.95, indicating the high accuracy of these items. However, for the items &#x0201C;Reduced mobility&#x0201D; and &#x0201C;Ongoing hormonal treatment,&#x0201D; the AUC values were 0.809 and 0.827, respectively, with average prediction accuracy. Among them, the &#x0201C;Reduced mobility&#x0201D; item was more complicated to determine, and its lack of representation in the EMR may be one of the main reasons for its poor prediction. In addition, the poor matching effect of the &#x0201C;Ongoing hormonal treatment&#x0201D; item was caused by differences in doctors&#x00027; judgment criteria for whether hormonal drugs were used. Taken together, although some items had average predictive effects, the overall assessment results were still reliable. In Branch B, we extracted medical data for rule determination using LAC in combination with negative word filtering. We used professional corpora, such as ICD-9-CM3, DiseaseKG, and Organization Scripted Drug List, which have wide coverage and achieve good accuracy in clinical situation determination.</p>
<p>Our proposed DB-DL assessment method intelligently assesses the Padua risk class of patients in Branches A and B. Branch A uses the proposed deep learning model PDCM to assess the patient&#x00027;s disease class. Our proposed PDCM shows the best results compared to other deep learning models, and Branch B uses a variety of professional corpora to extract and determine the clinical comprehensive factors. We achieve good accuracy with the wide coverage of the corpus we use. Ultimately, our DB-DL assessment method constructed using both A and B branches demonstrated good accuracy in assessing patients at high/low-risk levels. In addition, our method also showed high accuracy in the assessment of individual Padua table items.</p>
<p>In terms of practical application, the Padua scale assessment is widely used. Our proposed method is based on textual information in EMRs and only requires the extraction of diagnoses, symptoms, and other integrated medical terms from the EMR system to perform an automatic Padua risk assessment. It can be embedded in different EMR systems. We count the average time our method takes to assess a medical record, and our proposed method takes only 0.37 s on average in an EMR. The speed of human assessment of the Padua scale by doctors is mentioned in the study of Pierre et al. (<xref ref-type="bibr" rid="B10">10</xref>) to be &#x0007E;2&#x02013;14 min; our proposed method has a huge advantage compared to the speed of human assessment.</p>
</sec>
<sec sec-type="conclusions" id="s5">
<title>Conclusions</title>
<p>In this article, we propose a dual-branch method that utilizes a deep learning model and clinically comprehensive factors to develop an intelligent method to assess the risk of VTE in patients. Compared to the doctors&#x00027; assessment used as a gold standard, our proposed method attains an AUC value of 0.883 for judging high- and low-risk levels, and it takes only 0.37 s to assess an EMR. Therefore, the proposed method in this article can be applied to implement an intelligent assessment of the Padua scale and has engineering applications for assisting doctors in assessing the risk of VTE. Future studies should incorporate work that includes incorporating increasingly diverse clinical data, validating our methods in larger patient populations, and exploring advanced models and algorithms to improve assessment accuracy.</p>
</sec>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.</p>
</sec>
<sec sec-type="ethics-statement" id="s7">
<title>Ethics statement</title>
<p>Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.</p>
</sec>
<sec sec-type="author-contributions" id="s8">
<title>Author contributions</title>
<p>JH and JY designed the study, reviewed the design and results, submitted the draft, and drafted the submitted manuscript draft. HZ provided expertise in VTE assessment and also processed and managed the data alongside JH. Additionally, JH and HZ secured funding and managed the project. All authors have read and approved the final manuscript.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>This study has received funding from the National Natural Science Foundation of China, No. 82160347 and the Yunnan Key Laboratory of Smart City in Cyberspace Security, No. 202102AE090031.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moumneh</surname> <given-names>T</given-names></name> <name><surname>Riou</surname> <given-names>J</given-names></name> <name><surname>Douillet</surname> <given-names>D</given-names></name> <name><surname>Henni</surname> <given-names>S</given-names></name> <name><surname>Mottier</surname> <given-names>D</given-names></name> <name><surname>Tritschler</surname> <given-names>T</given-names></name> <etal/></person-group>. <article-title>Validation of risk assessment models predicting venous thromboembolism in acutely ill medical inpatients: a cohort study</article-title>. <source>J Thromb Haemost.</source> (<year>2020</year>) <volume>18</volume>:<fpage>1398</fpage>&#x02013;<lpage>407</lpage>. <pub-id pub-id-type="doi">10.1111/jth.14796</pub-id><pub-id pub-id-type="pmid">32168402</pub-id></citation></ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Di Nisio</surname> <given-names>M</given-names></name> <name><surname>van Es</surname> <given-names>N</given-names></name> <name><surname>B&#x000FC;ller</surname> <given-names>HR</given-names></name></person-group>. <article-title>Deep vein thrombosis and pulmonary embolism</article-title>. <source>Lancet.</source> (<year>2016</year>) <volume>388</volume>:<fpage>3060</fpage>&#x02013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1016/S0140-6736(16)30514-1</pub-id><pub-id pub-id-type="pmid">27375038</pub-id></citation></ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scheres</surname> <given-names>LJJ</given-names></name> <name><surname>Lijfering</surname> <given-names>WM</given-names></name> <name><surname>Cannegieter</surname> <given-names>SC</given-names></name></person-group>. <article-title>Current and future burden of venous thrombosis: not simply predictable</article-title>. <source>Res. Pract Thromb Haemost.</source> (<year>2018</year>) <volume>2</volume>:<fpage>199</fpage>&#x02013;<lpage>208</lpage>. <pub-id pub-id-type="doi">10.1002/rth2.12101</pub-id><pub-id pub-id-type="pmid">30046722</pub-id></citation></ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duffett</surname> <given-names>L</given-names></name></person-group>. <article-title>Deep venous thrombosis</article-title>. <source>Ann Intern Med.</source> (<year>2022</year>) <volume>175</volume>:<fpage>C129</fpage>&#x02013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.7326/AITC202209200</pub-id><pub-id pub-id-type="pmid">36095313</pub-id></citation></ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abboud</surname> <given-names>J</given-names></name> <name><surname>Abdel Rahman</surname> <given-names>A</given-names></name> <name><surname>Kahale</surname> <given-names>L</given-names></name> <name><surname>Dempster</surname> <given-names>M</given-names></name> <name><surname>Adair</surname> <given-names>P</given-names></name></person-group>. <article-title>Prevention of health care associated venous thromboembolism through implementing VTE prevention clinical practice guidelines in hospitalized medical patients: a systematic review and meta-analysis</article-title>. <source>Implement Sci</source>. (<year>2020</year>) <volume>15</volume>:<fpage>49</fpage>. <pub-id pub-id-type="doi">10.1186/s13012-020-01008-9</pub-id><pub-id pub-id-type="pmid">32580777</pub-id></citation></ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>C</given-names></name> <name><surname>Yi</surname> <given-names>Q</given-names></name> <name><surname>Ge</surname> <given-names>H</given-names></name> <name><surname>Wei</surname> <given-names>H</given-names></name> <name><surname>Liu</surname> <given-names>H</given-names></name> <name><surname>Zhang</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>Validation of risk assessment models predicting venous thromboembolism in inpatients with acute exacerbation of chronic obstructive pulmonary disease: A multicenter cohort study in china</article-title>. <source>Thromb Haemostasis.</source> (<year>2022</year>) <volume>122</volume>:<fpage>1177</fpage>&#x02013;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.1055/a-1693-0063</pub-id><pub-id pub-id-type="pmid">34758489</pub-id></citation></ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shargall</surname> <given-names>Y</given-names></name> <name><surname>Litle</surname> <given-names>VR</given-names></name></person-group>. <article-title>European perspectives in thoracic surgery, the ESTS venous thromboembolism (VTE) working group</article-title>. <source>J Thorac Dis.</source> (<year>2018</year>) <volume>10</volume>:<fpage>S963</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.21037/jtd.2018.04.70</pub-id><pub-id pub-id-type="pmid">29744223</pub-id></citation></ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stevens</surname> <given-names>SM</given-names></name> <name><surname>Woller</surname> <given-names>SC</given-names></name> <name><surname>Kreuziger</surname> <given-names>LB</given-names></name> <name><surname>Bounameaux</surname> <given-names>H</given-names></name> <name><surname>Doerschug</surname> <given-names>K</given-names></name> <name><surname>Geersing</surname> <given-names>G</given-names></name> <etal/></person-group>. <article-title>Antithrombotic therapy for VTE disease</article-title>. <source>Chest.</source> (<year>2021</year>) <volume>160</volume>:<fpage>e545</fpage>&#x02013;<lpage>608</lpage>. <pub-id pub-id-type="doi">10.1016/j.chest.2021.07.055</pub-id><pub-id pub-id-type="pmid">34352278</pub-id></citation></ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Group</surname> <given-names>CDOT</given-names></name></person-group>. <article-title>Chinese Guidelines for the Prevention and Management of Perioperative Venous thromboembolism in Thoracic malignancies (2022 edition)</article-title>. (<year>2022</year>) <volume>8</volume>:<fpage>721</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.3760/cma.j.cn112139-20220430-00194</pub-id><pub-id pub-id-type="pmid">35790523</pub-id></citation></ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elias</surname> <given-names>P</given-names></name> <name><surname>Khanna</surname> <given-names>R</given-names></name> <name><surname>Dudley</surname> <given-names>A</given-names></name> <name><surname>Davies</surname> <given-names>J</given-names></name> <name><surname>Jacolbia</surname> <given-names>R</given-names></name> <name><surname>McArthur</surname> <given-names>K</given-names></name> <etal/></person-group>. <article-title>Automating venous thromboembolism risk calculation using electronic health record data upon hospital admission: the automated padua prediction score</article-title>. <source>J Hosp Med</source>. (<year>2017</year>) <volume>12</volume>:<fpage>231</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.12788/jhm.2714</pub-id><pub-id pub-id-type="pmid">28411291</pub-id></citation></ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kulkarni</surname> <given-names>S</given-names></name> <name><surname>Seneviratne</surname> <given-names>N</given-names></name> <name><surname>Baig</surname> <given-names>MS</given-names></name> <name><surname>Khan</surname> <given-names>AHA</given-names></name></person-group>. <article-title>Artificial intelligence in medicine: where are we now?</article-title> <source>Acad Radiol.</source> (<year>2020</year>) <volume>27</volume>:<fpage>62</fpage>&#x02013;<lpage>70</lpage>. <pub-id pub-id-type="doi">10.1016/j.acra.2019.10.001</pub-id><pub-id pub-id-type="pmid">31636002</pub-id></citation></ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ferroni</surname> <given-names>P</given-names></name> <name><surname>Zanzotto</surname> <given-names>FM</given-names></name> <name><surname>Scarpato</surname> <given-names>N</given-names></name> <name><surname>Riondino</surname> <given-names>S</given-names></name> <name><surname>Nanni</surname> <given-names>U</given-names></name> <name><surname>Roselli</surname> <given-names>M</given-names></name> <etal/></person-group>. <article-title>Risk assessment for venous thromboembolism in chemotherapy-Treated ambulatory cancer patients</article-title>. <source>Med Decis Making.</source> (<year>2017</year>) <volume>37</volume>:<fpage>234</fpage>&#x02013;<lpage>42</lpage>. <pub-id pub-id-type="doi">10.1177/0272989X16662654</pub-id><pub-id pub-id-type="pmid">27491558</pub-id></citation></ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>H</given-names></name> <name><surname>Sheng</surname> <given-names>W</given-names></name> <name><surname>Li</surname> <given-names>J</given-names></name> <name><surname>Hou</surname> <given-names>L</given-names></name> <name><surname>Yang</surname> <given-names>J</given-names></name> <name><surname>Cai</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>A novel hierarchical machine learning model for hospital-acquired venous thromboembolism risk assessment among multiple-departments</article-title>. <source>J Biomed Inform.</source> (<year>2021</year>) <volume>122</volume>:<fpage>103892</fpage>. <pub-id pub-id-type="doi">10.1016/j.jbi.2021.103892</pub-id><pub-id pub-id-type="pmid">34454079</pub-id></citation></ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>JI</given-names></name> <name><surname>Kim</surname> <given-names>D</given-names></name> <name><surname>Lee</surname> <given-names>JA</given-names></name> <name><surname>Zheng</surname> <given-names>K</given-names></name> <name><surname>Amin</surname> <given-names>A</given-names></name></person-group>. <article-title>Personalized risk prediction for 30-day readmissions with venous thromboembolism using machine learning</article-title>. <source>J Nurs Scholarship.</source> (<year>2021</year>) <volume>53</volume>:<fpage>278</fpage>&#x02013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1111/jnu.12637</pub-id><pub-id pub-id-type="pmid">33617689</pub-id></citation></ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>C</given-names></name> <name><surname>Song</surname> <given-names>J</given-names></name> <name><surname>Li</surname> <given-names>H</given-names></name> <name><surname>Yu</surname> <given-names>W</given-names></name> <name><surname>Hao</surname> <given-names>Y</given-names></name> <name><surname>Xu</surname> <given-names>K</given-names></name> <etal/></person-group>. <article-title>Predicting venous thrombosis in osteoarthritis using a machine learning algorithm: a Population-Based cohort study</article-title>. <source>J Pers Med.</source> (<year>2022</year>) <volume>12</volume>:<fpage>114</fpage>. <pub-id pub-id-type="doi">10.3390/jpm12010114</pub-id><pub-id pub-id-type="pmid">35055429</pub-id></citation></ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qatawneh</surname> <given-names>Z</given-names></name> <name><surname>Alshraideh</surname> <given-names>M</given-names></name> <name><surname>Almasri</surname> <given-names>N</given-names></name> <name><surname>Tahat</surname> <given-names>L</given-names></name> <name><surname>Awidi</surname> <given-names>A</given-names></name></person-group>. <article-title>Clinical decision support system for venous thromboembolism risk classification</article-title>. <source>Appl Comput Inform</source>. 2017:S1698315181. <pub-id pub-id-type="doi">10.1016/j.aci.2017.09.003</pub-id></citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J</given-names></name> <name><surname>Yang</surname> <given-names>J</given-names></name> <name><surname>He</surname> <given-names>J</given-names></name></person-group>. <article-title>Prediction of venous thrombosis Chinese electronic medical records based on deep learning and rule reasoning</article-title>. <source>Appl Sci.</source> (<year>2022</year>) <volume>12</volume>:<fpage>10824</fpage>. <pub-id pub-id-type="doi">10.3390/app122110824</pub-id></citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>Z</given-names></name> <name><surname>Bai</surname> <given-names>K</given-names></name> <name><surname>Yang</surname> <given-names>L</given-names></name> <name><surname>Wang</surname> <given-names>Y</given-names></name> <name><surname>Tian</surname> <given-names>Y</given-names></name></person-group>. <article-title>Review on text mining of electronic medical record</article-title>. <source>Journal of Computer Research and Development.</source> (<year>2021</year>) <volume>58</volume>:<fpage>513</fpage>&#x02013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.7544/issn1000-1239.2021.20200402</pub-id></citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiao</surname> <given-names>Z</given-names></name> <name><surname>Sun</surname> <given-names>S</given-names></name> <name><surname>Sun</surname> <given-names>K</given-names></name></person-group>. <article-title>Chinese lexical analysis with deep Bi-GRU-CRF network</article-title>. <source>arXiv e-prints</source>. <volume>2018</volume>:<fpage>1807</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.1807.01882</pub-id></citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>WHO</surname></name> <name><surname>Hospital</surname> <given-names>PUMC</given-names></name> <name><surname>Jingwu</surname> <given-names>D</given-names></name></person-group>. <source>International Statistical Classification of Diseases and Related Health Problems: Tenth Revision.</source> <publisher-loc>Beijing</publisher-loc>: <publisher-name>People&#x00027;s Medical Publishing House</publisher-name> (<year>2008</year>).</citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="web"><person-group person-group-type="author"><name><surname>OpenKG</surname> <given-names>PCZJ</given-names></name></person-group> (<year>2021</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.openkg.cn/dataset/disease-information">http://www.openkg.cn/dataset/disease-information</ext-link></citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal"><person-group person-group-type="author"><collab>WHO</collab></person-group>. <source>WHO Model Lists of Essential Medicines, 17th edition</source> (<publisher-loc>March 2011</publisher-loc>).</citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>Y</given-names></name> <name><surname>Sun</surname> <given-names>W</given-names></name> <name><surname>Rumshisky</surname> <given-names>A</given-names></name></person-group>. <article-title>MCN: a comprehensive corpus for medical concept normalization</article-title>. <source>J Biomed Inform.</source> (<year>2019</year>) <volume>92</volume>:<fpage>103132</fpage>. <pub-id pub-id-type="doi">10.1016/j.jbi.2019.103132</pub-id><pub-id pub-id-type="pmid">30802545</pub-id></citation></ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aimin</surname> <given-names>L</given-names></name></person-group>. <article-title>Surgery and operation, ninth clinical revision of the international classification of diseases, ICD-9-CM-3</article-title> (<year>2013</year>).</citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Manderstedt</surname> <given-names>E</given-names></name> <name><surname>Lind Halld&#x000E9;n</surname> <given-names>C</given-names></name> <name><surname>Halld&#x000E9;n</surname> <given-names>C</given-names></name> <name><surname>Elf</surname> <given-names>J</given-names></name> <name><surname>Svensson</surname> <given-names>PJ</given-names></name> <name><surname>Dahlb&#x000E4;ck</surname> <given-names>B</given-names></name> <etal/></person-group>. <article-title>Classic thrombophilias and thrombotic risk among middle-aged and older adults: a population-based cohort study</article-title>. <source>J Am Heart Assoc</source>. (<year>2022</year>) <volume>11</volume>:<fpage>e023018</fpage>. <pub-id pub-id-type="doi">10.1161/JAHA.121.023018</pub-id><pub-id pub-id-type="pmid">35112923</pub-id></citation></ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Di Minno</surname> <given-names>MND</given-names></name> <name><surname>Calcaterra</surname> <given-names>I</given-names></name> <name><surname>Papa</surname> <given-names>A</given-names></name> <name><surname>Lupoli</surname> <given-names>R</given-names></name> <name><surname>Di Minno</surname> <given-names>A</given-names></name> <name><surname>Maniscalco</surname> <given-names>M</given-names></name> <etal/></person-group>. <article-title>Diagnostic accuracy of D-Dimer testing for recurrent venous thromboembolism: a systematic review with meta-analysis</article-title>. <source>Eur J Intern Med.</source> (<year>2021</year>) <volume>89</volume>:<fpage>39</fpage>&#x02013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/j.ejim.2021.04.004</pub-id><pub-id pub-id-type="pmid">33933338</pub-id></citation></ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname> <given-names>H</given-names></name> <name><surname>Wu</surname> <given-names>L</given-names></name></person-group>. <article-title>Microblog Emotional Analysis Based on TF-IWF Weighted Word2vec Model</article-title>. In: <source>2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS)</source>. (<year>2018</year>). p. <fpage>893</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/ICSESS.2018.8663837</pub-id></citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sivakumar</surname> <given-names>S</given-names></name> <name><surname>Videla</surname> <given-names>LS</given-names></name> <name><surname>Kumar</surname> <given-names>TR</given-names></name> <name><surname>Nagaraj</surname> <given-names>J</given-names></name> <name><surname>Itnal</surname> <given-names>S</given-names></name> <name><surname>Haritha</surname> <given-names>D</given-names></name></person-group>. <article-title>Review on Word2Vec Word Embedding Neural Net</article-title>. in <source>2020 international conference on smart electronics and communication (ICOSEC)</source>. (<year>2020</year>). p. <fpage>282</fpage>&#x02013;<lpage>90</lpage>.</citation>
</ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Devlin</surname> <given-names>J</given-names></name> <name><surname>Chang</surname> <given-names>M</given-names></name> <name><surname>Lee</surname> <given-names>K</given-names></name> <name><surname>Toutanova</surname> <given-names>K</given-names></name></person-group>. <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>. <source>arXiv e-prints</source>. <volume>2018</volume>:<fpage>1810</fpage>&#x02013;<lpage>4805</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.1810.04805</pub-id></citation>
</ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lan</surname> <given-names>Z</given-names></name> <name><surname>Chen</surname> <given-names>M</given-names></name> <name><surname>Goodman</surname> <given-names>S</given-names></name> <name><surname>Gimpel</surname> <given-names>K</given-names></name> <name><surname>Sharma</surname> <given-names>P</given-names></name> <name><surname>Soricut</surname> <given-names>R</given-names></name></person-group>. <article-title>ALBERT: a lite BERT for self-supervised learning of language representations</article-title>. <source>arXiv e-prints</source>. <volume>2019</volume>:<fpage>1909</fpage>&#x02013;<lpage>11942</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.1909.11942</pub-id></citation>
</ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Greff</surname> <given-names>K</given-names></name> <name><surname>Srivastava</surname> <given-names>RK</given-names></name> <name><surname>Koutnik</surname> <given-names>J</given-names></name> <name><surname>Steunebrink</surname> <given-names>BR</given-names></name> <name><surname>Schmidhuber</surname> <given-names>J</given-names></name> <name><surname>LSTM</surname></name></person-group>. <article-title>A search space odyssey</article-title>. <source>Ieee T Neur Net Lear.</source> (<year>2017</year>) <volume>28</volume>:<fpage>2222</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2016.2582924</pub-id><pub-id pub-id-type="pmid">27411231</pub-id></citation></ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>T</given-names></name> <name><surname>Goyal</surname> <given-names>P</given-names></name> <name><surname>Girshick</surname> <given-names>R</given-names></name> <name><surname>He</surname> <given-names>K</given-names></name> <name><surname>Dollar</surname> <given-names>P</given-names></name></person-group>. <article-title>Focal loss for dense object detection</article-title>. <source>Ieee T Pattern Anal.</source> (<year>2020</year>) <volume>42</volume>:<fpage>318</fpage>&#x02013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2018.2858826</pub-id><pub-id pub-id-type="pmid">35679384</pub-id></citation></ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trebach</surname> <given-names>J</given-names></name> <name><surname>Su</surname> <given-names>MK</given-names></name></person-group>. <article-title>Biostatistics and epidemiology for the toxicologist: rock the ROC curve</article-title>. <source>J Med Toxicol.</source> (<year>2022</year>) <volume>18</volume>:<fpage>163</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1007/s13181-022-00879-2</pub-id><pub-id pub-id-type="pmid">35119595</pub-id></citation></ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sokolova</surname> <given-names>M</given-names></name> <name><surname>Lapalme</surname> <given-names>G</given-names></name></person-group>. <article-title>A systematic analysis of performance measures for classification tasks</article-title>. <source>Inform Process Manag.</source> (<year>2009</year>) <volume>45</volume>:<fpage>427</fpage>&#x02013;<lpage>37</lpage>. <pub-id pub-id-type="doi">10.1016/j.ipm.2009.03.002</pub-id></citation>
</ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>G</given-names></name> <name><surname>Zhou</surname> <given-names>ZH</given-names></name></person-group>. <article-title>On the consistency of multi-label learning</article-title>. <source>Artif Intell.</source> (<year>2013</year>) <volume>199</volume>:<fpage>22</fpage>&#x02013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.1016/j.artint.2013.03.001</pub-id></citation>
</ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>C</given-names></name> <name><surname>Liu</surname> <given-names>TY</given-names></name> <name><surname>Lan</surname> <given-names>Y</given-names></name> <name><surname>Ma</surname> <given-names>Z</given-names></name> <name><surname>Li</surname></name> <name><surname>H</surname></name></person-group>. <article-title>Ranking measures and loss functions in learning to rank</article-title>. <source>Adv Neural Inf Process Sys.</source> (<year>2009</year>) <volume>22</volume>:<fpage>315</fpage>&#x02013;<lpage>23</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://proceedings.neurips.cc/paper/2009/file/2f55707d4193dc27118a0f19a1985716-Paper.pdf">https://proceedings.neurips.cc/paper/2009/file/2f55707d4193dc27118a0f19a1985716-Paper.pdf</ext-link></citation>
</ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jain</surname> <given-names>H</given-names></name> <name><surname>Prabhu</surname> <given-names>Y</given-names></name> <name><surname>Varma</surname> <given-names>M</given-names></name></person-group>. <article-title>Extreme multi-label loss functions for recommendation, tagging, ranking &#x00026; other missing label applications</article-title>. <volume>2016</volume>:<fpage>935</fpage>&#x02013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.1145/2939672.2939756</pub-id></citation>
</ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>M</given-names></name> <name><surname>Zhang</surname> <given-names>K</given-names></name> <name><surname>(editors)</surname></name></person-group>. <article-title>Multi-label learning by exploiting label dependency</article-title>. <source>Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>. <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name> (<year>2010</year>). <pub-id pub-id-type="doi">10.1145/1835804.1835930</pub-id></citation>
</ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choi</surname> <given-names>H</given-names></name> <name><surname>Choi</surname> <given-names>D</given-names></name> <name><surname>Lee</surname> <given-names>H</given-names></name> <name><surname>Assoc</surname> <given-names>CL</given-names></name></person-group>. <article-title>Early stopping based on unlabeled samples in text classification</article-title>. <source>Proceedings of the 60th annual meeting of the association for computational linguistics (acl 2022), vol 1: (long papers).</source> <volume>2022</volume>:<fpage>708</fpage>&#x02013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2022.acl-long.52</pub-id></citation>
</ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strubell</surname> <given-names>E</given-names></name> <name><surname>Verga</surname> <given-names>P</given-names></name> <name><surname>Belanger</surname> <given-names>D</given-names></name> <name><surname>Mccallum</surname> <given-names>A</given-names></name></person-group>. <article-title>Fast and accurate entity recognition with iterated dilated convolutions</article-title>. <source>arXiv preprint arXiv:1702.02098</source>. (<year>2017</year>). <pub-id pub-id-type="doi">10.18653/v1/D17-1283</pub-id></citation>
</ref>
<ref id="B41">
<label>41.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>C</given-names></name> <name><surname>Yao</surname> <given-names>C</given-names></name> <name><surname>Chen</surname> <given-names>P</given-names></name> <name><surname>Shi</surname> <given-names>J</given-names></name> <name><surname>Gu</surname> <given-names>Z</given-names></name> <name><surname>Zhou</surname> <given-names>Z</given-names></name></person-group>. <article-title>Artificial intelligence algorithm with ICD coding technology guided by the embedded electronic medical record system in medical record information management</article-title>. <source>J Healthc Eng.</source> (<year>2021</year>) <volume>2021</volume>:<fpage>1</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1155/2021/3293457</pub-id><pub-id pub-id-type="pmid">34497706</pub-id></citation></ref>
<ref id="B42">
<label>42.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vaswani</surname> <given-names>A</given-names></name> <name><surname>Shazeer</surname> <given-names>N</given-names></name> <name><surname>Parmar</surname> <given-names>N</given-names></name> <name><surname>Uszkoreit</surname> <given-names>J</given-names></name> <name><surname>Jones</surname> <given-names>L</given-names></name> <name><surname>Gomez</surname> <given-names>AN</given-names></name> <etal/></person-group>. <article-title>Attention is all you need</article-title>. <source>arXiv e-prints</source>. (<year>2017</year>):1706&#x02013;3762. <pub-id pub-id-type="doi">10.48550/arXiv.1706.03762</pub-id></citation>
</ref>
<ref id="B43">
<label>43.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>Y</given-names></name></person-group>. <article-title>Convolutional neural networks for sentence classification</article-title>. In: <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).</source> (<year>2014</year>). p. <fpage>1746</fpage>&#x02013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.3115/v1/D14-1181</pub-id></citation>
</ref>
<ref id="B44">
<label>44.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dhaka</surname> <given-names>P</given-names></name> <name><surname>Nagpal</surname> <given-names>B</given-names></name></person-group>. <article-title>WoM-based deep BiLSTM: smart disease prediction model using WoM-based deep BiLSTM classifier</article-title>. <source>Multimed Tools Appl.</source> (<year>2023</year>) <volume>82</volume>:<fpage>25061</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1007/s11042-023-14336-x</pub-id></citation>
</ref>
<ref id="B45">
<label>45.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saito</surname> <given-names>Y</given-names></name> <name><surname>Oyama</surname> <given-names>K</given-names></name> <name><surname>Tsujita</surname> <given-names>K</given-names></name> <name><surname>Yasuda</surname> <given-names>S</given-names></name> <name><surname>Kobayashi</surname> <given-names>Y</given-names></name></person-group>. <article-title>Treatment strategies of acute myocardial infarction: updates on revascularization, pharmacological therapy, and beyond</article-title>. <source>J Cardiol</source>. (<year>2023</year>) <volume>81</volume>:<fpage>168</fpage>&#x02013;<lpage>78</lpage>. <pub-id pub-id-type="doi">10.1016/j.jjcc.2022.07.003</pub-id><pub-id pub-id-type="pmid">35882613</pub-id></citation></ref>
</ref-list> 
</back>
</article>