<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="brief-report" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Digit. Health</journal-id>
<journal-title>Frontiers in Digital Health</journal-title><abbrev-journal-title abbrev-type="pubmed">Front. Digit. Health</abbrev-journal-title>
<issn pub-type="epub">2673-253X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdgth.2022.841853</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Digital Health</subject>
<subj-group>
<subject>Brief Research Report</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Measuring the impact of anonymization on real-world consolidated health datasets engineered for secondary research use: Experiments in the context of MODELHealth project</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Pitoglou</surname><given-names>Stavros</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="cor1">&#x002A;</xref><uri xlink:href="https://loop.frontiersin.org/people/1610453/overview"/></contrib>
<contrib contrib-type="author"><name><surname>Filntisi</surname><given-names>Arianna</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref><uri xlink:href="https://loop.frontiersin.org/people/1713106/overview"/></contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Anastasiou</surname><given-names>Athanasios</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref><uri xlink:href="https://loop.frontiersin.org/people/1017304/overview"/></contrib>
<contrib contrib-type="author"><name><surname>Matsopoulos</surname><given-names>George K.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref><uri xlink:href="https://loop.frontiersin.org/people/1922868/overview"/></contrib>
<contrib contrib-type="author"><name><surname>Koutsouris</surname><given-names>Dimitrios</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib>
</contrib-group>
<aff id="aff1"><label><sup>1</sup></label><addr-line>Computer Solutions SA</addr-line>, <institution>Research &#x0026; Development Dpt.</institution>, <addr-line>Athens</addr-line>, <country>Greece</country></aff>
<aff id="aff2"><label><sup>2</sup></label><addr-line>School of Electrical and Computer Engineering</addr-line>, <institution>National Technical University of Athens</institution>, <addr-line>Athens</addr-line>, <country>Greece</country></aff>
<author-notes>
<fn fn-type="edited-by"><p><bold>Edited by:</bold> Constantinos S. Pattichis, University of Cyprus, Cyprus</p></fn>
<fn fn-type="edited-by"><p><bold>Reviewed by:</bold> Brijesh Mehta, Automaton AI Infosystems Pvt Ltd, India Manisha Mantri, Center for Development of Advanced Computing (C-DAC), India</p></fn>
<corresp id="cor1"><label>&#x002A;</label><bold>Correspondence:</bold> Stavros Pitoglou <email>spitoglou@biomed.ntua.gr</email></corresp>
<fn fn-type="other" id="fn001"><p><bold>Specialty Section:</bold> This article was submitted to Connected Health, a section of the journal Frontiers in Digital Health</p></fn>
</author-notes>
<pub-date pub-type="epub"><day>01</day><month>09</month><year>2022</year></pub-date>
<pub-date pub-type="collection"><year>2022</year></pub-date>
<volume>4</volume><elocation-id>841853</elocation-id>
<history>
<date date-type="received"><day>22</day><month>12</month><year>2021</year></date>
<date date-type="accepted"><day>10</day><month>08</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Pitoglou, Filntisi, Anastasiou, Matsopoulos and Koutsouris.</copyright-statement>
<copyright-year>2022</copyright-year><copyright-holder>Pitoglou, Filntisi, Anastasiou, Matsopoulos and Koutsouris</copyright-holder><license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License (CC BY)</ext-link>. The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<sec><title>Introduction</title>
<p>Electronic Health Records (EHRs) are essential data structures, enabling the sharing of valuable medical care information for a diverse patient population and being reused as input to predictive models for clinical research. However, issues such as the heterogeneity of EHR data and the potential compromisation of patient privacy inhibit the secondary use of EHR data in clinical research.</p>
</sec>
<sec><title>Objectives</title>
<p>This study aims to present the main elements of the MODELHealth project implementation and the evaluation method that was followed to assess the efficiency of its mechanism.</p>
</sec>
<sec><title>Methods</title>
<p>The MODELHealth project was implemented as an Extract-Transform-Load system that collects data from the hospital databases, performs harmonization to the HL7 FHIR standard and anonymization using the k-anonymity method, before loading the transformed data to a central repository. The integrity of the anonymization process was validated by developing a database query tool. The information loss occurring due to the anonymization was estimated with the metrics of generalized information loss, discernibility and average equivalence class size for various values of k.</p>
</sec>
<sec><title>Results</title>
<p>The average values of generalized information loss, discernibility and average equivalence class size obtained across all tested datasets and k values were 0.008473&#x2009;&#x00B1;&#x2009;0.006216252886, 115,145,464.3&#x2009;&#x00B1;&#x2009;79,724,196.11 and 12.1346&#x2009;&#x00B1;&#x2009;6.76096647, correspondingly. The values of those metrics appear correlated with factors such as the k value and the dataset characteristics, as expected.</p>
</sec>
<sec><title>Conclusion</title>
<p>The experimental results of the study demonstrate that it is feasible to perform effective harmonization and anonymization on EHR data while preserving essential patient information.</p>
</sec>
</abstract>
<kwd-group>
<kwd>electronic health records</kwd>
<kwd>harmonization</kwd>
<kwd>anonymization</kwd>
<kwd>information loss</kwd>
<kwd>real data</kwd>
</kwd-group>
<contract-num rid="cn001">T1EDK-04066</contract-num>
<contract-sponsor id="cn001">This research has been co-funded by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call Research-Create-Innovate</contract-sponsor>
<counts>
<fig-count count="2"/>
<table-count count="2"/><equation-count count="3"/><ref-count count="43"/><page-count count="0"/><word-count count="0"/></counts>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro"><title>Introduction</title>
<p>Electronic Health Record (EHR) systems are being increasingly adopted to represent various data types, such as patient medical histories, laboratory test results, medication, demographics, billing records and diagnosis codes. EHR systems are the building blocks of Health Information Exchange (HIE) networks, enabling the sharing of data and information about patients&#x0027; medical and health history (<xref ref-type="bibr" rid="B1">1</xref>&#x2013;<xref ref-type="bibr" rid="B3">3</xref>).</p>
<p>EHRs surpass many existing registries and data repositories in volume, offering a window into the medical care information of a diverse population. Their effectiveness when reused for the purpose of clinical research is proven in various instances (<xref ref-type="bibr" rid="B4">4</xref>&#x2013;<xref ref-type="bibr" rid="B7">7</xref>). However, their reuse has been limited due to issues such as its high dimensionality, heterogeneity, incompleteness, noise and errors, and redundant terminology (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B5">5</xref>).</p>
<p>Interoperability is a crucial requirement for the efficiency of healthcare information systems and the utilization of health data for clinical research. The related concept of data harmonization aims to transform heterogeneous data into a standard format using computational approaches such as lexical and semantic mapping, enabling the integrative analysis of the data and, therefore, enhancing the statistical power of the clinical studies which make use of such data. Health Level Seven (HL7) is currently the most widely used set of standards for the structure and exchange of clinical data (<xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>Anonymization is another essential issue regarding the secondary use of clinical data. Patient data must be disseminated without compromising their privacy against threats such as identity, membership and attribute disclosure (<xref ref-type="bibr" rid="B2">2</xref>). Data privacy protection can be pursued with methods such as encryption, authentication, and de-identification, which however can be inapplicable or insufficient in preserving confidential information. For example, the removal of data identifiers such as each individual&#x0027;s name and social security number does not prohibit their possible reidentification through the linkage of other data attributes. To prevent such attacks, the concept of k-anonymity, as well as its extensions l-diversity and t-closeness, have been proposed (<xref ref-type="bibr" rid="B9">9</xref>, <xref ref-type="bibr" rid="B10">10</xref>).</p>
<p>The k-anonymity concept, introduced by Samarati and Sweeny (<xref ref-type="bibr" rid="B11">11</xref>), focuses on reducing data granularity. A dataset is k-anonymous if each record is indistinguishable from at least k&#x2212;1 records with respect to specific identifying attributes. A quasi-identifier (QI) set is a minimal set of dataset attributes that can be joined with external information to re-identify individual records. K-anonymity requires that each equivalence class EQ (i.e., a set of records that are indistinguishable from each other with respect to the QI set) contains at least k records. K-anonymity can be provided using suppression and generalization techniques. Suppression involves replacing a portion of the original data with a special selected value to suggest its nondisclosure, while generalization focuses on replacing the values of an attribute with less specific but consistent values. K-anonymity is considered as the &#x201C;bedrock&#x201D; anonymization algorithm and is used as a foundation process, even in the rare case that the overall privacy it provides could be considered inadequate, allowing the potential disclosure of sensitive attributes that lack diversity through the use of background knowledge (<xref ref-type="bibr" rid="B11">11</xref>&#x2013;<xref ref-type="bibr" rid="B16">16</xref>).</p>
<p>Given the sensitive nature and complexity of clinical data, a systematic overall approach is needed for their secondary use, examples of which can be found in the literature. Ciampi et al. (<xref ref-type="bibr" rid="B17">17</xref>) proposed an architecture for the extraction, transformation and loading of clinical data, which incorporates de-identification and standardization to the HL7 CDA and FHIR formats (<xref ref-type="bibr" rid="B17">17</xref>). Somolinos et al. (<xref ref-type="bibr" rid="B18">18</xref>) proposed a pseudonymizing system developed according to the ISO/EN 13606 standard for facilitating the exchange and secondary use of data, allowing the total or partial anonymization of EHR extracts (<xref ref-type="bibr" rid="B18">18</xref>, <xref ref-type="bibr" rid="B19">19</xref>). Quiroz et al. (<xref ref-type="bibr" rid="B20">20</xref>) developed an SQL-based ETL framework for the conversion of health databases to the OMOP CDM (<xref ref-type="bibr" rid="B20">20</xref>, <xref ref-type="bibr" rid="B21">21</xref>). Ong et al. (<xref ref-type="bibr" rid="B22">22</xref>) developed a GUI-based ETL system for the conversion of data to the OMOP CDM (<xref ref-type="bibr" rid="B22">22</xref>).</p>
<p>This paper proposes an integrated solution to the problem of clinical data reuse that has been implemented in the context of the MODELHealth project. The project is based on an ETL system that extracts EHR data from several hospital databases (Section 2.1), transforms the data by performing harmonization to the HL7 FHIR standard and anonymization with the k-anonymity method (Sections 2.2, 2.2.1, 2.2.2), and loads the transformed data to a central, document-based repository (Section 2.3) (<xref ref-type="bibr" rid="B23">23</xref>, <xref ref-type="bibr" rid="B24">24</xref>). The data used is raw EHRs from selected Greek hospital databases regarding patients, hospitalization encounters, medical procedures and observations, diagnostic reports and locations. An essential objective of the MODELHealth project has been the utilization of the transformed clinical data as input to predictive models. This goal was met by developing two public-facing REST Application Program Interfaces (Data API, Machine Learning API) and client software (Data Client, ML Client). The Data API and Client serve the purpose of making the transformed data stored in the central repository available to the interested users, while the Machine Learning API exposes the functionality of trained and validated machine learning models to the interested users. The information loss that occurred due to the anonymization was evaluated using three metrics, described in Section 2.4. The components of the MODELHealth project were developed in the Python programming language, and are depicted in <xref ref-type="fig" rid="F1">Figure&#x00A0;1</xref>.</p>
<fig id="F1" position="float"><label>Figure 1</label>
<caption><p>The components of the MODELHealth project.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="fdgth-04-841853-g001.tif"/>
</fig>
</sec>
<sec id="s2" sec-type="methods"><title>Methods</title>
<sec id="s2a"><title>Extraction</title>
<p>The data extraction process involves the automated extraction of data from three hospital databases and their mapping to relational objects that reflect the database schema with the use of the SQLAlchemy Object Relational Mapper software. Implementing the MODELHealth ETL process included versioning, allowing the additive extraction, processing and loading of the data in several points in time. Each version includes all the data extracted from a health unit database until that time point. The primary key value of the last extracted record is stored for every version and every database table so that future execution of the ETL process will take into account only the new records. The detailed ER diagrams of the relational database tables from which the EHR data originated can be seen in <xref ref-type="sec" rid="s10">Supplementary Figure S1</xref>.</p>
</sec>
<sec id="s2b"><title>Transformation</title>
<sec id="s2b1"><title>Harmonization</title>
<p>The harmonization process refers to mapping the extracted data from the form of relational objects to FHIR (Fast Healthcare Interoperability Resources) ontology objects. FHIR is a RESTful API using the HTTP protocol and leveraging the HL7 Reference Information Model (RIM). FHIR defines a system of clinical, administrative, financial and infrastructure resources, its ontologies being organized in the clinical, financial, specialized, base and foundation categories (<xref ref-type="bibr" rid="B25">25</xref>&#x2013;<xref ref-type="bibr" rid="B30">30</xref>).</p>
<p>The harmonization of the extracted data has been achieved with in-house software. First, the relational data are converted to the corresponding FHIR ontologies through custom specialized programming libraries and transformative functions related to the database schema from which the data originated. FHIR data were converted to the JSON (JavaScript Object Notation) format, as this is the preferred representation of the standard. The main FHIR entities incorporated were the Patient, Observation, DiagnosticReport, Encounter and Location ontologies. <xref ref-type="sec" rid="s10">Supplementary Figure S2</xref> depicts the FHIR entities according to which the relational data were harmonized.</p>
</sec>
<sec id="s2b2"><title>Anonymization</title>
<p>The anonymization process involves modifying several fields in a given dataset to prevent the individuals&#x0027; reidentification. In the scope of this project, anonymization of the harmonized EHRs was carried out using Mondrian, a greedy algorithm that implements k-anonymity through multidimensional recoding and applies to both categorical and numeric data. Mondrian performs k-anonymization of a given dataset with logarithmic worst-case time complexity in two stages. The first stage focuses on partitioning the given dataset on several multidimensional regions covering its domain space by applying a recursive algorithm similar to the ones used to construct kd-trees. The second stage focuses on applying re-coding functions to the dataset, formulated using summary statistics from each region (<xref ref-type="bibr" rid="B31">31</xref>).</p>
<p>The data fields subjected to anonymization were the birthDate and address attributes of the Patient FHIR ontology and the longitude and latitude corresponding to the address. Each address was translated to longitude and latitude coordinates through the OpenStreetMap API, which were then added as numerical fields to the patient record and were included in the anonymization process (<xref ref-type="bibr" rid="B32">32</xref>). <xref ref-type="sec" rid="s10">Supplementary Figure S3</xref> depicts an example of the anonymization of a sample subset of male patient records, which was subjected to the ETL process and stored in the document-based database MongoDB (see Section 2.3). A sample harmonized, non-anonymized record is depicted at the top, with the FHIR id, maritalStatus fields, as well as the _id field, which serves as a primary key for MongoDB, having been suppressed for clarity. A sample anonymized record using k&#x2009;&#x003D;&#x2009;5 is displayed at the bottom, having used the FHIR fields &#x201C;address&#x201D;, &#x201C;birthDate&#x201D;, as well as the added fields &#x201C;ord_latitude&#x201D; and &#x201C;ord_longitute&#x201D; as QI attributes.</p>
</sec>
</sec>
<sec id="s2c"><title>Loading</title>
<p>The loading process involved the transmission of the transformed data through a streaming process and their subsequent storage to the central repository. Data was streamed in predefined-sized packages through a TCP/IP connection. The central repository was implemented with the non-relational database MongoDB, in which every record is stored in the BSON format. MongoDB is a fitting choice for storing and retrieving JSON documents, as it is designed to handle effectively document-oriented, semi-structured data (<xref ref-type="bibr" rid="B33">33</xref>).</p>
</sec>
<sec id="s2d"><title>Information loss evaluation</title>
<p>The impact of the anonymization on the harmonized EHR data was estimated using the metrics of generalized information loss, discernibility and average equivalence class size.</p>
<p>Generalized information loss (GIL) captures the penalty incurred when generalizing a specific attribute by quantifying the fraction of the generalized domain values. GIL for an anonymized table T&#x002A; was calculated according to Equation (<xref rid="e1" ref-type="disp-formula">1</xref>), where T is the original table, i&#x2009;&#x003D;&#x2009;1,&#x2026;,n corresponds to an attribute, j&#x2009;&#x003D;&#x2009;1,&#x2026;,&#x007C;T&#x007C; corresponds to a table record, U<sub>i</sub>, L<sub>i</sub> are the upper and lower values of each arithmetic attribute i, U<sub>ij</sub>, L<sub>ij</sub> are the upper and lower values of arithmetic attribute i for the equivalence class the record j belongs in, Ni is the number of different values for each categorical attribute i and N<sub>ij</sub> is the number of different values for categorical attribute i in the equivalence class the record j belongs in (<xref ref-type="bibr" rid="B34">34</xref>&#x2013;<xref ref-type="bibr" rid="B36">36</xref>).</p>
<p>The discernibility metric (DM) measures how indistinguishable a record is from others by assigning a penalty to each record, equal to the size of the equivalence class in which it belongs. DM for an anonymized table T&#x002A; was calculated according to Equation (<xref rid="e2" ref-type="disp-formula">2</xref>), where &#x007C;EQ&#x007C; is the number of records of the equivalence class EQ (<xref ref-type="bibr" rid="B31">31</xref>, <xref ref-type="bibr" rid="B36">36</xref>, <xref ref-type="bibr" rid="B37">37</xref>).</p>
<p>The average equivalence class size (C<sub>AVG</sub>) measures how well the created equivalence classes approach the best case, where each record is generalized in an equivalence class of k records. It was calculated according to Equation (<xref rid="e3" ref-type="disp-formula">3</xref>), where &#x007C;T&#x007C; is the number of table records, &#x007C;EQs&#x007C; is the total number of equivalence classes created in the anonymized table T&#x002A;, and k is the minimum equivalence class size allowed (<xref ref-type="bibr" rid="B31">31</xref>, <xref ref-type="bibr" rid="B37">37</xref>).</p><disp-formula id="e1"><label>(1)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block" id="DM1"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mrow><mml:mi mathvariant="normal">GIL</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>T</mml:mi><mml:mo>&#x2217;</mml:mo><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="normal">n</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">j</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="normal">T</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:munderover><mml:mrow></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">U</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">ij</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">L</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">ij</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">U</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="normal">if</mml:mi></mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="normal">is</mml:mi></mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="normal">arithmetic</mml:mi><mml:mo>,</mml:mo></mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">N</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">ij</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">N</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="normal">if</mml:mi></mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="normal">is</mml:mi></mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="normal">categorical</mml:mi></mml:mrow></mml:mstyle></mml:mrow><mml:mspace width="2em"/></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula><disp-formula id="e2"><label>(2)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block" id="DM2"><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">D</mml:mi><mml:mi mathvariant="bold">M</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">T</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">E</mml:mi><mml:mi mathvariant="bold">Q</mml:mi><mml:mi mathvariant="bold">s</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">t</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">E</mml:mi><mml:mi mathvariant="bold">Q</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">k</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">E</mml:mi><mml:mi mathvariant="bold">Q</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math></disp-formula><disp-formula id="e3"><label>(3)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block" id="DM3"><mml:mrow><mml:msub><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="bold">V</mml:mi><mml:mi mathvariant="bold">G</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">T</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">T</mml:mi></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">E</mml:mi><mml:mi mathvariant="bold">Q</mml:mi><mml:mi mathvariant="bold">s</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2009;</mml:mo><mml:mrow><mml:mi mathvariant="bold">k</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
<p>The information loss evaluation has been applied to experimental datasets originating from three hospital databases. More specifically, the patient data populating the table CARE_PERSON of three hospital databases were subjected to the ETL process for the k values 5, 10, 15, 20. The transformed datasets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub> correspond to the three origin database schemas, while the dataset S<sub>123</sub> constitutes the union of S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>. The four datasets were evaluated in terms of the information loss that occurred during the anonymization stage using Equations (<xref rid="e1" ref-type="disp-formula">1</xref>&#x2013;<xref rid="e3" ref-type="disp-formula">3</xref>). The technical characteristics of the datasets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub> are presented in <xref ref-type="table" rid="T1">Table&#x00A0;1</xref>.</p>
<table-wrap id="T1" position="float"><label>Table 1</label>
<caption><p>The number of records (&#x007C;T&#x007C;) and the size in GBs of the tested datasets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub> for all tested k values.</p></caption>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th valign="top" align="left" rowspan="2">Dataset\k</th>
<th valign="top" align="center" colspan="4">&#x007C;T&#x007C; after ETL<hr/></th>
<th valign="top" align="center" colspan="4">Dataset Size (GB) after ETL<hr/></th>
</tr>
<tr>
<th valign="top" align="center">5</th>
<th valign="top" align="center">10</th>
<th valign="top" align="center">15</th>
<th valign="top" align="center">20</th>
<th valign="top" align="center">5</th>
<th valign="top" align="center">10</th>
<th valign="top" align="center">15</th>
<th valign="top" align="center">20</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">S<sub>1</sub></td>
<td valign="top" align="center" colspan="4">54,003</td>
<td valign="top" align="left">0.009</td>
<td valign="top" align="left">0.012</td>
<td valign="top" align="left">0.014</td>
<td valign="top" align="left">0.016</td>
</tr>
<tr>
<td valign="top" align="left">S<sub>2</sub></td>
<td valign="top" align="center" colspan="4">91,838</td>
<td valign="top" align="left">0.008</td>
<td valign="top" align="left">0.008</td>
<td valign="top" align="left">0.009</td>
<td valign="top" align="left">0.009</td>
</tr>
<tr>
<td valign="top" align="left">S<sub>3</sub></td>
<td valign="top" align="center" colspan="4">76,043</td>
<td valign="top" align="left">0.007</td>
<td valign="top" align="left">0.007</td>
<td valign="top" align="left">0.008</td>
<td valign="top" align="left">0.008</td>
</tr>
<tr>
<td valign="top" align="left">S<sub>123</sub></td>
<td valign="top" align="center" colspan="4">221,884</td>
<td valign="top" align="left">0.024</td>
<td valign="top" align="left">0.027</td>
<td valign="top" align="left">0.031</td>
<td valign="top" align="left">0.033</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s3" sec-type="results"><title>Results</title>
<sec id="s3a"><title>Data quality evaluation</title>
<p>The result of the ETL process regarding the data stored in the central repository was evaluated in terms of data quality. There were no duplicate entries found, which can be attributed to the origin relational database design as well as the lack of corresponding defects in the ETL process. There were null address values, which were intentionally not rejected during the transform stage since the field of patient address underwent anonymization (<xref ref-type="bibr" rid="B38">38</xref>, <xref ref-type="bibr" rid="B39">39</xref>).</p>
</sec>
<sec id="s3b"><title>Anonymity validation</title>
<p>The integrity of the data anonymization process was validated through the development of a simple validation tool, the object of which is to perform queries to the central repository to retrieve the anonymized data, group them by the QI attributes in order to retrieve the equivalence classes and check if there is an equivalence class with size greater than the k value chosen during the extraction stage. The application of this method proved that the data contents of the central repository do not violate the k-anonymity condition since no equivalence class consisting of fewer than k documents was found.</p>
</sec>
<sec id="s3c"><title>Information loss evaluation</title>
<p>The generalized information loss (GIL), discernibility metric (DM) and average equivalence class size (C<sub>AVG</sub>) metrics (Section 2.4) were applied on the ETL output of the experimental datasets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub> for all tested k values. The results of the evaluation can be seen in <xref ref-type="table" rid="T2">Table&#x00A0;2</xref> and <xref ref-type="fig" rid="F2">Figure&#x00A0;2</xref>.</p>
<fig id="F2" position="float"><label>Figure 2</label>
<caption><p>Results of the information loss metrics (<bold>A</bold>) Generalized Information Loss (GIL), (<bold>B</bold>) Discernibility Metric (DM) and (<bold>D</bold>) Average Equivalence Class size (C<sub>AVG</sub>), as well as (<bold>D</bold>) the number of Equivalence Classes (&#x007C;EQs&#x007C;) of the harmonized, anonymized data sets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub> for the tested k values. The results are depicted in scientific format.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="fdgth-04-841853-g002.tif"/>
</fig>
<table-wrap id="T2" position="float"><label>Table 2</label>
<caption><p>Results of the generalized information loss (GIL), discernibility metric (DM) and average equivalence class size (C<sub>AVG</sub>) on data sets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub> for the chosen k values. The average values (Avg) and the standard deviation (Std) of the results have been also included.</p></caption>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th valign="top" align="left" rowspan="2">k</th>
<th valign="top" align="center" colspan="4">GIL<hr/></th>
<th valign="top" align="center" colspan="4">DM<hr/></th>
<th valign="top" align="center" colspan="4">C<sub>AVG</sub><hr/></th>
</tr>
<tr>
<th valign="top" align="center">S<sub>1</sub></th>
<th valign="top" align="center">S<sub>2</sub></th>
<th valign="top" align="center">S<sub>3</sub></th>
<th valign="top" align="center">S<sub>123</sub></th>
<th valign="top" align="center">S<sub>1</sub></th>
<th valign="top" align="center">S<sub>2</sub></th>
<th valign="top" align="center">S<sub>3</sub></th>
<th valign="top" align="center">S<sub>123</sub></th>
<th valign="top" align="center">S<sub>1</sub></th>
<th valign="top" align="center">S<sub>2</sub></th>
<th valign="top" align="center">S<sub>3</sub></th>
<th valign="top" align="center">S<sub>123</sub></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="center">0.0103</td>
<td valign="top" align="center">0.0033</td>
<td valign="top" align="center">0.00299</td>
<td valign="top" align="center">0.0042</td>
<td valign="top" align="center">24,269,339</td>
<td valign="top" align="center">134,915,704</td>
<td valign="top" align="center">70,783,239</td>
<td valign="top" align="center">229,968,282</td>
<td valign="top" align="center">4.5921</td>
<td valign="top" align="center">26.277</td>
<td valign="top" align="center">16.7311</td>
<td valign="top" align="center">11.2063</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="center">0.0161</td>
<td valign="top" align="center">0.0044</td>
<td valign="top" align="center">0.0046</td>
<td valign="top" align="center">0.0061</td>
<td valign="top" align="center">24,400,773</td>
<td valign="top" align="center">134,945,050</td>
<td valign="top" align="center">70,828,393</td>
<td valign="top" align="center">230,174,216</td>
<td valign="top" align="center">4.1382</td>
<td valign="top" align="center">21.6089</td>
<td valign="top" align="center">14.2938</td>
<td valign="top" align="center">9.8092</td>
</tr>
<tr>
<td valign="top" align="left">15</td>
<td valign="top" align="center">0.0198</td>
<td valign="top" align="center">0.0054</td>
<td valign="top" align="center">0.0059</td>
<td valign="top" align="center">0.0075</td>
<td valign="top" align="center">24,523,747</td>
<td valign="top" align="center">134,987,178</td>
<td valign="top" align="center">70,877,395</td>
<td valign="top" align="center">230,388,320</td>
<td valign="top" align="center">3.7269</td>
<td valign="top" align="center">19.6866</td>
<td valign="top" align="center">12.7696</td>
<td valign="top" align="center">8.8365</td>
</tr>
<tr>
<td valign="top" align="left">20</td>
<td valign="top" align="center">0.0242</td>
<td valign="top" align="center">0.0057</td>
<td valign="top" align="center">0.0065</td>
<td valign="top" align="center">0.0087</td>
<td valign="top" align="center">24,691,295</td>
<td valign="top" align="center">135,010,136</td>
<td valign="top" align="center">70,931,465</td>
<td valign="top" align="center">230,632,896</td>
<td valign="top" align="center">3.7295</td>
<td valign="top" align="center">16.5176</td>
<td valign="top" align="center">11.8447</td>
<td valign="top" align="center">8.3856</td>
</tr>
<tr>
<td valign="top" align="left">Avg</td>
<td valign="top" align="center">0.0176</td>
<td valign="top" align="center">0.0047</td>
<td valign="top" align="center">0.00501</td>
<td valign="top" align="center">0.0066</td>
<td valign="top" align="center">24,471,289</td>
<td valign="top" align="center">134,964,517</td>
<td valign="top" align="center">70,855,123</td>
<td valign="top" align="center">230,290,929</td>
<td valign="top" align="center">4.0467</td>
<td valign="top" align="center">21.0225</td>
<td valign="top" align="center">13.9098</td>
<td valign="top" align="center">9.5594</td>
</tr>
<tr>
<td valign="top" align="left">Std</td>
<td valign="top" align="center">0.0059</td>
<td valign="top" align="center">0.00105</td>
<td valign="top" align="center">0.0016</td>
<td valign="top" align="center">0.00195</td>
<td valign="top" align="center">179,732.014</td>
<td valign="top" align="center">42,254.338</td>
<td valign="top" align="center">63,785.958</td>
<td valign="top" align="center">285,277.319</td>
<td valign="top" align="center">0.41179</td>
<td valign="top" align="center">4.0838</td>
<td valign="top" align="center">2.1348</td>
<td valign="top" align="center">1.2483</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>It can be observed that GIL, DM and C<sub>AVG</sub> follow the same trends as k increases regardless of the experimental dataset. More specifically, increasing k results in the increase of GIL, the increase of DM and the decrease of C<sub>AVG</sub> for all tested datasets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub>.</p>
<p>GIL depends on the dataset QI values and the record number &#x007C;T&#x007C; of a given dataset (<xref rid="e1" ref-type="disp-formula">Equation 1</xref>), meaning that a smaller &#x007C;T&#x007C; can lead to a larger GIL value. Indeed, in <xref ref-type="fig" rid="F2">Figure&#x00A0;2A</xref>, it can be observed that GIL takes the highest values in the smallest dataset S<sub>1</sub> and lower values in the larger datasets S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub>. The average and standard deviation GIL values obtained for datasets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub> were 0.0176&#x2009;&#x00B1;&#x2009;0.0059, 0.0047&#x2009;&#x00B1;&#x2009;0.00105, 0.00501&#x2009;&#x00B1;&#x2009;0.0016, 0.0066&#x2009;&#x00B1;&#x2009;0.00195, respectively.</p>
<p>DM depends on the number of records in each EQ, as well as the number of EQs (&#x007C;EQs&#x007C;) created (<xref rid="e2" ref-type="disp-formula">Equation 2</xref>). As record number &#x007C;T&#x007C; increases, anonymization can result in more and larger EQs increasing DM, as can be seen in <xref ref-type="fig" rid="F2">Figure&#x00A0;2B</xref>. The average and standard deviation DM values obtained for datasets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub> were 24,471,289&#x2009;&#x00B1;&#x2009;179,732.014, 134,964,517&#x2009;&#x00B1;&#x2009;42,254.338, 70,855,123&#x2009;&#x00B1;&#x2009;63,785.958, 230,290,929&#x2009;&#x00B1;&#x2009;285,277.319, respectively.</p>
<p>C<sub>AVG</sub> is proportional to the record number &#x007C;T&#x007C; but inversely proportional to &#x007C;EQs&#x007C; and k (<xref rid="e3" ref-type="disp-formula">Equation 3</xref>). In <xref ref-type="fig" rid="F2">Figure&#x00A0;2C</xref>, it can be observed that C<sub>AVG</sub> takes the smallest values in dataset S<sub>1</sub> with the lowest record number. The highest values occur in dataset S<sub>2</sub>, which is second in terms of record number and at the same time has a rather low number of equivalence classes &#x007C;EQs&#x007C; (<xref ref-type="fig" rid="F2">Figure&#x00A0;2D</xref>). The fact that C<sub>AVG</sub> does not take the highest values in the largest dataset, S<sub>123</sub> coincides with the high &#x007C;EQs&#x007C; value of S<sub>123</sub> (<xref ref-type="fig" rid="F2">Figure&#x00A0;2D</xref>). The average and standard deviation DM values obtained for the datasets S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>, S<sub>123</sub> were 4.0467&#x2009;&#x00B1;&#x2009;0.41179, 21.0255&#x2009;&#x00B1;&#x2009;4.0838, 13.9098&#x2009;&#x00B1;&#x2009;2.1348, 9.5594&#x2009;&#x00B1;&#x2009;1.2483, respectively.</p>
</sec>
</sec>
<sec id="s4" sec-type="discussion"><title>Discussion</title>
<p>In this paper, an integrated architecture for the facilitation of the secondary usage of clinical data has been proposed. The MODELHealth project has aimed to enable an organization to access real health record data in a universally accepted format and carry out research at a low cost. Data was harmonized to the HL7 FHIR standard, and anonymized according to the k-anonymity principle through the Mondrian algorithm. The effect of anonymization was quantified using the generalized information loss, discernibility metric and average class size metrics. In future work and subsequent versions of the platform, extensions of k-anonymity will be considered in order to add more privacy features to the central data repository, as well as other state-of-the-art approaches, such as differential privacy.</p>
<p>A noteworthy challenge that was met at the stage of transformation concerned the quality of EHR data, which were characterized by high dimensionality, heterogeneity, noise and sparseness. Different codes, measure units and terminologies were often used to represent the same clinical phenotype. Therefore, the harmonization of these EHR data, initially stored in relational health center databases, to the FHIR scheme required extensive transformations through custom software.</p>
<p>The development of predictive models utilizing EHRs has been proposed as a promising means towards the improvement of personalized medicine and health care quality. Numerous machine learning methods have been successfully applied to patient hospitalization metadata to accomplish meaningful prediction of medical-related outcomes. Deep neural networks, in particular, have proven their ability to handle large volumes of relatively messy clinical data and have emerged as a preferred method (<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B40">40</xref>&#x2013;<xref ref-type="bibr" rid="B44">44</xref>). The applicability of the MODELHealth data as input to predictive models was reassured through the development of proof-of-concept machine learning models that utilized the transformed clinical data.</p>
</sec>
<sec id="s5" sec-type="conclusions"><title>Conclusions</title>
<p>The secondary research use of EHR data without compromising the patients&#x0027; rights to privacy is one of the most discussed topics in Health IT nowadays as well as a source of great controversy on whichever level (academic, technical, administrative, political) this discussion takes place. The results of this study add experimental data in favor of the side of the argument that adequate anonymization while preserving actionable and meaningful information can be performed on health datasets <italic>via</italic> proper utilization of network and data flow architectures and algorithmic tools already available in the respective literature.</p>
</sec>
</body>
<back>
<sec id="s6" sec-type="data-availability"><title>Data availability statement</title>
<p>The project datasets and code cannot be made publicly available, because the submitted paper is part of the MODELHealth project, which has been co-funded by the European Regional Development Fund of the European Union and Greek national funds.</p>
</sec>
<sec id="s7"><title>Ethics statement</title>
<p>This article does not contain any studies involving human participants or animals performed by any of the authors.</p>
</sec>
<sec id="s8"><title>Author contributions</title>
<p>SP, AF, and AA contributed to the writing of the paper. AF wrote the main part of the computer code and conducted the experiments. DK and GKM provided scientific supervision. SP and DK were the coordinator and scientific director of the MODELHealth project, respectively. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="s9" sec-type="funding-information"><title>Funding</title>
<p>This research has been co-funded by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call Research-Create-Innovate (Project Code: T1EDK-04066).</p>
</sec>
<sec id="s11" sec-type="COI-statement"><title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s12" sec-type="disclaimer"><title>Publisher&#x0027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s10" sec-type="supplementary-material"><title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fdgth.2022.841853/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fdgth.2022.841853/full&#x0023;supplementary-material</ext-link>.</p>
<supplementary-material id="SD1" content-type="local-data">
<media mimetype="application" mime-subtype="zip" xlink:href="Data_Sheet_1_v1.zip"/>
</supplementary-material>
</sec>
<ref-list><title>References</title>
<ref id="B1"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heart</surname><given-names>T</given-names></name><name><surname>Ben-Assuli</surname><given-names>O</given-names></name><name><surname>Shabtai</surname><given-names>I</given-names></name></person-group>. <article-title>A review of PHR, EMR and EHR integration: a more personalized healthcare and public health policy</article-title>. <source>Health Policy Technol</source>. (<year>2017</year>) <volume>6</volume>(<issue>1</issue>):<fpage>20</fpage>&#x2013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1016/j.hlpt.2016.08.002</pub-id></citation></ref>
<ref id="B2"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gkoulalas-Divanis</surname><given-names>A</given-names></name><name><surname>Loukides</surname><given-names>G</given-names></name><name><surname>Sun</surname><given-names>J</given-names></name></person-group>. <article-title>Publishing data from electronic health records while preserving privacy: a survey of algorithms</article-title>. <source>J Biomed Inform</source>. (<year>2014</year>) <volume>50</volume>:<fpage>4</fpage>&#x2013;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbi.2014.06.002</pub-id><pub-id pub-id-type="pmid">24936746</pub-id></citation></ref>
<ref id="B3"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khokhar</surname><given-names>RH</given-names></name><name><surname>Chen</surname><given-names>R</given-names></name><name><surname>Fung</surname><given-names>BCM</given-names></name><name><surname>Lui</surname><given-names>SM</given-names></name></person-group>. <article-title>Quantifying the costs and benefits of privacy-preserving health data publishing</article-title>. <source>J Biomed Inform</source>. (<year>2014</year>) <volume>50</volume>(<issue>August</issue>):<fpage>107</fpage>&#x2013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1016/J.JBI.2014.04.012</pub-id><pub-id pub-id-type="pmid">24768775</pub-id></citation></ref>
<ref id="B4"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weiskopf</surname><given-names>NG</given-names></name><name><surname>Weng</surname><given-names>C</given-names></name></person-group>. <article-title>Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research</article-title>. <source>J Am Med Inform Assoc</source>. (<year>2013</year>) <volume>20</volume>(<issue>1</issue>):<fpage>144</fpage>&#x2013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1136/amiajnl-2011-000681</pub-id><pub-id pub-id-type="pmid">22733976</pub-id></citation></ref>
<ref id="B5"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miotto</surname><given-names>R</given-names></name><name><surname>Li</surname><given-names>L</given-names></name><name><surname>Kidd</surname><given-names>BA</given-names></name><name><surname>Dudley</surname><given-names>JT</given-names></name></person-group>. <article-title>Deep patient: an unsupervised representation to predict the future of patients from the electronic health records</article-title>. <source>Sci Rep</source>. (<year>2016</year>) <volume>6</volume>(<issue>May</issue>). <pub-id pub-id-type="doi">10.1038/srep26094</pub-id></citation></ref>
<ref id="B6"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bean</surname><given-names>DM</given-names></name><name><surname>Wu</surname><given-names>H</given-names></name><name><surname>Dzahini</surname><given-names>O</given-names></name><name><surname>Broadbent</surname><given-names>M</given-names></name><name><surname>Stewart</surname><given-names>R</given-names></name><name><surname>Dobson</surname><given-names>RJB</given-names></name></person-group>. <article-title>Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records</article-title>. <source>Sci Rep</source>. (<year>2017</year>) <volume>7</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1038/s41598-017-16674-x</pub-id><pub-id pub-id-type="pmid">28127051</pub-id></citation></ref>
<ref id="B7"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname><given-names>J</given-names></name><name><surname>Henriksson</surname><given-names>A</given-names></name><name><surname>Asker</surname><given-names>L</given-names></name><name><surname>Bostr&#x00F6;m</surname><given-names>H</given-names></name></person-group>. <article-title>Predictive modeling of structured electronic health records for adverse drug event detection</article-title>. <source>BMC Med Inform Decis Mak</source>. (<year>2015</year>) <volume>15</volume>(<issue>4</issue>):<fpage>S1</fpage>. <pub-id pub-id-type="doi">10.1186/1472-6947-15-S4-S1</pub-id><pub-id pub-id-type="pmid">26606038</pub-id></citation></ref>
<ref id="B8"><label>8.</label><citation citation-type="other">&#x201C;<collab>Health Level Seven International &#x007C; HL7 International</collab>.&#x201D; <year>n.d</year>. <ext-link ext-link-type="uri" xlink:href="https://www.hl7.org/">https://www.hl7.org/</ext-link> (<comment>Accessed July 29, 2022</comment>.).</citation></ref>
<ref id="B9"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abouelmehdi</surname><given-names>K</given-names></name><name><surname>Beni-Hssane</surname><given-names>A</given-names></name><name><surname>Khaloufi</surname><given-names>H</given-names></name><name><surname>Saadi</surname><given-names>M</given-names></name></person-group>. <article-title>Big data security and privacy in healthcare: a review</article-title>. <source>Procedia Comput Sci</source>. (<year>2017</year>) <volume>113</volume>:<fpage>73</fpage>&#x2013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1016/j.procs.2017.08.292</pub-id></citation></ref>
<ref id="B10"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname><given-names>H</given-names></name><name><surname>Shim</surname><given-names>K</given-names></name></person-group>. <article-title>Approximate algorithms with generalizing attribute values for K-anonymity</article-title>. <source>Inf Syst</source>. (<year>2010</year>) <volume>35</volume>(<issue>8</issue>):<fpage>933</fpage>&#x2013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1016/j.is.2010.06.002</pub-id></citation></ref>
<ref id="B11"><label>11.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Samarati</surname><given-names>P</given-names></name><name><surname>Sweeney</surname><given-names>L</given-names></name></person-group>. &#x201C;<conf-name>Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppresion</conf-name>.&#x201D; In: <conf-name>Proceedings of the IEEE symposium on research in security and privacy</conf-name> (<year>1998</year>). p. <fpage>384</fpage>&#x2013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1145/1150402.1150499</pub-id></citation></ref>
<ref id="B12"><label>12.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Aggarwal</surname><given-names>CC</given-names></name><name><surname>Yu</surname><given-names>PS</given-names></name></person-group>. <comment>&#x201C;A General Survey of Privacy-Preserving Data Mining Models and Algorithms.&#x201D; In, 11&#x2013;52. doi: 10.1007/978-0-387-70992-5_2.</comment> (<year>2008</year>).</citation></ref>
<ref id="B13"><label>13.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Li</surname><given-names>N</given-names></name><name><surname>Li</surname><given-names>T</given-names></name><name><surname>Venkatasubramanian</surname><given-names>S</given-names></name></person-group>. &#x201C;<conf-name>T-Closeness: privacy beyond k-anonymity and l-diversity</conf-name>.&#x201D; In: <conf-name>2007 IEEE 23rd international conference on data engineering</conf-name>. <publisher-name>IEEE</publisher-name>. (<year>2007</year>). p. <fpage>106</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1109/ICDE.2007.367856</pub-id></citation></ref>
<ref id="B14"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Machanavajjhala</surname><given-names>A</given-names></name><name><surname>Kifer</surname><given-names>D</given-names></name><name><surname>Gehrke</surname><given-names>J</given-names></name><name><surname>Venkitasubramaniam</surname><given-names>M</given-names></name></person-group>. <article-title>&#x2113;-Diversity: privacy beyond k-anonymity</article-title>. <source>ACM Trans Knowl Discov Data</source>. (<year>2007</year>) <volume>1</volume>(<issue>1</issue>):<fpage>24</fpage>. <pub-id pub-id-type="doi">10.1145/1217299.1217302</pub-id></citation></ref>
<ref id="B15"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Emam</surname><given-names>KE</given-names></name><name><surname>Dankar</surname><given-names>FK</given-names></name></person-group>. <article-title>Protecting privacy using K-anonymity</article-title>. <source>J Am Med Inform Assoc</source>. (<year>2008</year>) <volume>15</volume>(<issue>5</issue>):<fpage>627</fpage>&#x2013;<lpage>37</lpage>. <pub-id pub-id-type="doi">10.1197/jamia.M2716</pub-id><pub-id pub-id-type="pmid">18579830</pub-id></citation></ref>
<ref id="B16"><label>16.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Truta</surname><given-names>TM</given-names></name><name><surname>Vinay</surname><given-names>B</given-names></name></person-group>. &#x201C;<conf-name>Privacy protection: p-sensitive k-anonymity property</conf-name>.&#x201D; In: <conf-name>ICDEW 2006 - Proceedings of the 22nd international conference on data engineering workshops</conf-name>. <publisher-name>Institute of Electrical and Electronics Engineers Inc</publisher-name>. (<year>2006</year>). <pub-id pub-id-type="doi">10.1109/ICDEW.2006.116</pub-id></citation></ref>
<ref id="B17"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ciampi</surname><given-names>M</given-names></name><name><surname>Sicuranza</surname><given-names>M</given-names></name><name><surname>Silvestri</surname><given-names>S</given-names></name></person-group>. <article-title>A privacy-preserving and standard-based architecture for secondary use of clinical data</article-title>. <source>Information</source>. (<year>2022</year>) <volume>13</volume>(<issue>2</issue>):<fpage>87</fpage>. <pub-id pub-id-type="doi">10.3390/info13020087</pub-id></citation></ref>
<ref id="B18"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Somolinos</surname><given-names>R</given-names></name><name><surname>Mu&#x00F1;oz</surname><given-names>A</given-names></name><name><surname>Elena Hernando</surname><given-names>M</given-names></name><name><surname>Pascual</surname><given-names>M</given-names></name><name><surname>C&#x00E1;ceres</surname><given-names>J</given-names></name><name><surname>S&#x00E1;nchez-De-madariaga</surname><given-names>R</given-names></name><etal/></person-group> <article-title>Service for the pseudonymization of electronic healthcare records based on ISO/EN 13606 for the secondary use of information</article-title>. <source>IEEE J Biomed Health Inform</source>. (<year>2015</year>) <volume>19</volume>(<issue>6</issue>):<fpage>1937</fpage>&#x2013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.1109/JBHI.2014.2360546</pub-id><pub-id pub-id-type="pmid">25265637</pub-id></citation></ref>
<ref id="B19"><label>19.</label><citation citation-type="other"><collab>&#x201C;ISO - ISO 13606-1</collab>. <comment>Health Informatics &#x2014; Electronic Health Record Communication &#x2014; Part 1: Reference Model.&#x201D; n.d. Accessed July 29, 2022. https://www.iso.org/standard/67868.html</comment> (<year>2019</year>).</citation></ref>
<ref id="B20"><label>20.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Quiroz</surname><given-names>JC</given-names></name><name><surname>Chard</surname><given-names>T</given-names></name><name><surname>Sa</surname><given-names>Z</given-names></name><name><surname>Ritchie</surname><given-names>A</given-names></name><name><surname>Jorm</surname><given-names>L</given-names></name><name><surname>Gallego</surname><given-names>B</given-names></name></person-group>. <comment>&#x201C;Extract, Transform, Load Framework for the Conversion of Health Databases to OMOP.&#x201D; Edited by Thomas Martin Deserno. <italic>PLOS ONE</italic> 17 (4): e0266911. doi: 10.1371/journal.pone.0266911</comment> (<year>2022</year>).</citation></ref>
<ref id="B21"><label>21.</label><citation citation-type="other"><collab>&#x201C;OMOP Common Data Model &#x2013; OHDSI.&#x201D;</collab> . <comment>Accessed July 29, 2022. https://www.ohdsi.org/data-standardization/the-common-data-model/</comment> (<year>n.d</year>).</citation></ref>
<ref id="B22"><label>22.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ong</surname><given-names>TC</given-names></name><name><surname>Kahn</surname><given-names>MG</given-names></name><name><surname>Kwan</surname><given-names>BM</given-names></name><name><surname>Yamashita</surname><given-names>T</given-names></name><name><surname>Brandt</surname><given-names>E</given-names></name><name><surname>Hosokawa</surname><given-names>P</given-names></name><etal/></person-group> <article-title>Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading</article-title>. <source>BMC Med Inform Decis Mak</source>. (<year>2017</year>) <volume>17</volume>(<issue>1</issue>):<fpage>134</fpage>. <pub-id pub-id-type="doi">10.1186/s12911-017-0532-3</pub-id><pub-id pub-id-type="pmid">28903729</pub-id></citation></ref>
<ref id="B23"><label>23.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Anastasiou</surname><given-names>A</given-names></name><name><surname>Pitoglou</surname><given-names>S</given-names></name><name><surname>Androutsou</surname><given-names>T</given-names></name><name><surname>Kostalas</surname><given-names>E</given-names></name><name><surname>Matsopoulos</surname><given-names>G</given-names></name><name><surname>Koutsouris</surname><given-names>D</given-names></name></person-group>. &#x201C;<conf-name>Modelhealth: an innovative software platform for machine learning in healthcare leveraging indoor localization services</conf-name>.&#x201D; In: <conf-name>Proceedings - IEEE international conference on mobile data management; 2019-June</conf-name>. <publisher-name>Institute of Electrical and Electronics Engineers Inc</publisher-name>. (<year>2019</year>). p. <fpage>443</fpage>&#x2013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1109/MDM.2019.000-5</pub-id></citation></ref>
<ref id="B24"><label>24.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Pitoglou</surname><given-names>S</given-names></name><name><surname>Anastasiou</surname><given-names>A</given-names></name><name><surname>Androutsou</surname><given-names>T</given-names></name><name><surname>Giannouli</surname><given-names>D</given-names></name><name><surname>Kostalas</surname><given-names>E</given-names></name><name><surname>Matsopoulos</surname><given-names>G</given-names></name><etal/></person-group> &#x201C;<conf-name>MODELHealth: facilitating machine learning on big health data networks</conf-name>.&#x201D; In: <conf-name>Proceedings of the annual international conference of the IEEE Engineering in medicine and biology society, EMBS</conf-name>. <publisher-name>Institute of Electrical and Electronics Engineers (IEEE)</publisher-name>. (<year>2019</year>). p.&#x00A0; <fpage>2174</fpage>&#x2013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1109/EMBC.2019.8857394</pub-id></citation></ref>
<ref id="B25"><label>25.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bender</surname><given-names>D</given-names></name><name><surname>Sartipi</surname><given-names>K</given-names></name></person-group>. &#x201C;<conf-name>HL7 FHIR: an Agile and RESTful approach to healthcare information exchange</conf-name>.&#x201D; In: <conf-name>Proceedings of CBMS 2013 - 26th IEEE international symposium on computer-based medical systems</conf-name>. (<year>2013</year>). p. <fpage>326</fpage>&#x2013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1109/CBMS.2013.6627810</pub-id></citation></ref>
<ref id="B26"><label>26.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Pezoulas</surname><given-names>VC</given-names></name><name><surname>Exarchos</surname><given-names>TP</given-names></name><name><surname>Fotiadis</surname><given-names>DI</given-names></name></person-group>. &#x201C;<article-title>Medical data harmonization</article-title>.&#x201D; In: <source>Medical data sharing, harmonization and analytics</source>. <publisher-name>Elsevier</publisher-name>. (<year>2020</year>). p. <fpage>137</fpage>&#x2013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1016/b978-0-12-816507-2.00005-0</pub-id>. <ext-link ext-link-type="uri" xlink:href="https://www.sciencedirect.com/book/9780128165072/medical-data-sharing-harmonization-and-analytics?via=ihub=">https://www.sciencedirect.com/book/9780128165072/medical-data-sharing-harmonization-and-analytics?via=ihub=</ext-link></citation></ref>
<ref id="B27"><label>27.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saripalle</surname><given-names>R</given-names></name><name><surname>Runyan</surname><given-names>C</given-names></name><name><surname>Russell</surname><given-names>M</given-names></name></person-group>. <article-title>Using HL7 FHIR to achieve interoperability in patient health record</article-title>. <source>J Biomed Inform</source>. (<year>2019</year>) <volume>94</volume>. <pub-id pub-id-type="doi">10.1016/j.jbi.2019.103188</pub-id><pub-id pub-id-type="pmid">31063828</pub-id></citation></ref>
<ref id="B28"><label>28.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Silva</surname><given-names>RJ</given-names></name><name><surname>Sloane</surname><given-names>EB</given-names></name><name><surname>Cooper</surname><given-names>T</given-names></name></person-group>. <article-title>Application of HL7&#x00AE; FHIR for device and health information system interoperability</article-title>.&#x201D; In: Iadanza E, editor. <source>Clinical engineering handbook</source>. <publisher-name>Elsevier</publisher-name>. (<year>2020</year>). p.&#x00A0; <fpage>611</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1016/b978-0-12-813467-2.00086-9</pub-id>. <ext-link ext-link-type="uri" xlink:href="https://www.sciencedirect.com/book/9780128134672/clinical-engineering-handbook">https://www.sciencedirect.com/book/9780128134672/clinical-engineering-handbook</ext-link></citation></ref>
<ref id="B29"><label>29.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Kiourtis</surname><given-names>A</given-names></name><name><surname>Mavrogiorgou</surname><given-names>A</given-names></name><name><surname>Kyriazis</surname><given-names>D</given-names></name></person-group>. &#x201C;<conf-name>FHIR Ontology mapper (FOM): aggregating structural and semantic similarities of ontologies towards their alignment to HL7 FHIR</conf-name>.&#x201D; In: <conf-name>2018 IEEE 20th international conference on E-health networking, applications and services, Healthcom 2018</conf-name>. <publisher-name>Institute of Electrical and Electronics Engineers Inc</publisher-name>. (<year>2018</year>). <pub-id pub-id-type="doi">10.1109/HealthCom.2018.8531149</pub-id></citation></ref>
<ref id="B30"><label>30.</label><citation citation-type="other"><person-group person-group-type="author"><name><surname>Neumann</surname><given-names>A</given-names></name><name><surname>Laranjeiro</surname><given-names>N</given-names></name><name><surname>Bernardino</surname><given-names>J</given-names></name></person-group>. <comment>&#x201C;An Analysis of Public REST Web Service APIs.&#x201D; <italic>IEEE Transactions on Services Computing</italic>, June 13, 2018. doi: 10.1109/TSC.2018.2847344</comment> (<year>2018</year>).</citation></ref>
<ref id="B31"><label>31.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>LeFevre</surname><given-names>K</given-names></name><name><surname>DeWitt</surname><given-names>DJ</given-names></name><name><surname>Ramakrishnan</surname><given-names>R</given-names></name></person-group>. &#x201C;<conf-name>Mondrian multidimensional K-anonymity</conf-name>.&#x201D; In: <conf-name>Proceedings - international conference on data engineering</conf-name> (<year>2006</year>). <fpage>25</fpage> p. <pub-id pub-id-type="doi">10.1109/ICDE.2006.101</pub-id></citation></ref>
<ref id="B32"><label>32.</label><citation citation-type="other"><collab>&#x201C;OpenStreetMap &#x2014; Geocoder 1.38.1.&#x201D;</collab> (<year>n.d</year>). <comment>Accessed April 21, 2020.</comment> <ext-link ext-link-type="uri" xlink:href="https://geocoder.readthedocs.io/providers/OpenStreetMap.html">https://geocoder.readthedocs.io/providers/OpenStreetMap.html</ext-link></citation></ref>
<ref id="B33"><label>33.</label><citation citation-type="other"><collab>&#x201C;MongoDB.&#x201D;</collab>. (<year>n.d</year>). <comment>Accessed April 21, 2020</comment>. <ext-link ext-link-type="uri" xlink:href="https://www.mongodb.com/">https://www.mongodb.com/</ext-link></citation></ref>
<ref id="B34"><label>34.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ayala-Rivera</surname><given-names>V</given-names></name><name><surname>McDonagh</surname><given-names>P</given-names></name><name><surname>Cerqueus</surname><given-names>T</given-names></name><name><surname>Murphy</surname><given-names>L</given-names></name></person-group>. <article-title>A systematic comparison and evaluation of K-anonymization algorithms for practitioners</article-title>. <source>Trans Data Privacy</source>. (<year>2014</year>) <volume>7</volume>(<issue>3</issue>):<fpage>337</fpage>&#x2013;<lpage>70</lpage>. <pub-id pub-id-type="doi">10.5555/2870614.2870620</pub-id></citation></ref>
<ref id="B35"><label>35.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Iyengar</surname><given-names>VS</given-names></name></person-group>. &#x201C;<conf-name>Transforming data to satisfy privacy constraints</conf-name>.&#x201D; In: <conf-name>Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining - KDD &#x2018;02, 279</conf-name>. <publisher-loc>New York</publisher-loc>, <publisher-loc>NY</publisher-loc>, <publisher-loc>USA</publisher-loc>: <publisher-name>ACM Press</publisher-name>. (<year>2002</year>). <pub-id pub-id-type="doi">10.1145/775047.775089</pub-id></citation></ref>
<ref id="B36"><label>36.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Nergiz</surname><given-names>ME</given-names></name><name><surname>Clifton</surname><given-names>C</given-names></name></person-group>. &#x201C;<conf-name>Thoughts on K-anonymization</conf-name>.&#x201D; In: <conf-name>22nd international conference on data engineering workshops (ICDEW&#x2019;06)</conf-name>. <publisher-name>IEEE</publisher-name>. (<year>2006</year>). <fpage>96</fpage>&#x2013;<lpage>96</lpage>. <pub-id pub-id-type="doi">10.1109/ICDEW.2006.147</pub-id></citation></ref>
<ref id="B37"><label>37.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bayardo</surname><given-names>RJ</given-names></name><name><surname>Agrawal</surname><given-names>R</given-names></name></person-group>. &#x201C;<conf-name>Data privacy through optimal K-anonymization</conf-name>.&#x201D; In: <conf-name>Proceedings - international conference on data engineering</conf-name>. <publisher-name>IEEE</publisher-name>. (<year>2005</year>). p. <fpage>217</fpage>&#x2013;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1109/ICDE.2005.42</pub-id></citation></ref>
<ref id="B38"><label>38.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Souibgui</surname><given-names>M</given-names></name><name><surname>Atigui</surname><given-names>F</given-names></name><name><surname>Zammali</surname><given-names>S</given-names></name><name><surname>Cherfi</surname><given-names>S</given-names></name><name><surname>Yahia</surname><given-names>SB</given-names></name></person-group>. &#x201C;<conf-name>Data quality in ETL process: a preliminary study</conf-name>.&#x201D; <source>Procedia Comput Sci</source>, <volume>159</volume>:<fpage>676</fpage>&#x2013;<lpage>87</lpage>. <publisher-name>Elsevier B.V</publisher-name>. (<year>2019</year>). <pub-id pub-id-type="doi">10.1016/j.procs.2019.09.223</pub-id></citation></ref>
<ref id="B39"><label>39.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Theodorou</surname><given-names>V</given-names></name><name><surname>Abell&#x00F3;</surname><given-names>A</given-names></name><name><surname>Lehner</surname><given-names>W</given-names></name><name><surname>Thiele</surname><given-names>M</given-names></name></person-group>. <article-title>Quality measures for ETL processes: from goals to implementation</article-title>. <source>Concurrency Comput Pract Exp</source>. (<year>2016</year>) <volume>28</volume>(<issue>15</issue>):<fpage>3969</fpage>&#x2013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1002/cpe.3729</pub-id></citation></ref>
<ref id="B40"><label>40.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Gangwar</surname><given-names>PS</given-names></name><name><surname>Hasija</surname><given-names>Y</given-names></name></person-group>. &#x201C;<conf-name>Deep learning for analysis of electronic health records (EHR)</conf-name>.&#x201D; In: <conf-name>Deep learning techniques for biomedical and health informatics</conf-name>. (<year>2020</year>). p. <fpage>149</fpage>&#x2013;<lpage>66</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-33966-1_8</pub-id></citation></ref>
<ref id="B41"><label>41.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Pitoglou</surname><given-names>S</given-names></name><name><surname>Koumpouros</surname><given-names>Y</given-names></name><name><surname>Anastasiou</surname><given-names>A.</given-names></name></person-group> &#x201C;<conf-name>Using electronic health records and machine learning to make medical-related predictions from non-medical data</conf-name>.&#x201D; In: <conf-name>Institute of electrical and electronics engineers (IEEE)</conf-name>. (<year>2019</year>). p. <fpage>56</fpage>&#x2013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1109/icmlde.2018.00021</pub-id></citation></ref>
<ref id="B42"><label>42.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rajkomar</surname><given-names>A</given-names></name><name><surname>Oren</surname><given-names>E</given-names></name><name><surname>Chen</surname><given-names>K</given-names></name><name><surname>Dai</surname><given-names>AM</given-names></name><name><surname>Hajaj</surname><given-names>N</given-names></name><name><surname>Hardt</surname><given-names>M</given-names></name><etal/></person-group> <article-title>Scalable and accurate deep learning with electronic health records</article-title>. <source>Npj Digit Med</source>. (<year>2018</year>) <volume>1</volume>(<issue>1</issue>):<fpage>18</fpage>. <pub-id pub-id-type="doi">10.1038/s41746-018-0029-1</pub-id><pub-id pub-id-type="pmid">31304302</pub-id></citation></ref>
<ref id="B43"><label>43.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ravi</surname><given-names>D</given-names></name><name><surname>Wong</surname><given-names>C</given-names></name><name><surname>Deligianni</surname><given-names>F</given-names></name><name><surname>Berthelot</surname><given-names>M</given-names></name><name><surname>Andreu-Perez</surname><given-names>J</given-names></name><name><surname>Lo</surname><given-names>B</given-names></name><etal/></person-group> <article-title>Deep learning for health informatics</article-title>. <source>IEEE J Biomed Health Inform</source>. (<year>2017</year>) <volume>21</volume>(<issue>1</issue>):<fpage>4</fpage>&#x2013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1109/JBHI.2016.2636665</pub-id><pub-id pub-id-type="pmid">28055930</pub-id></citation></ref>
<ref id="B44"><label>44.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nguyen</surname><given-names>P</given-names></name><name><surname>Tran</surname><given-names>T</given-names></name><name><surname>Wickramasinghe</surname><given-names>N</given-names></name><name><surname>Venkatesh</surname><given-names>S</given-names></name></person-group>. <article-title>Deepr: a convolutional net for medical records</article-title>. <source>IEEE J Biomed Health Inform</source>. (<year>2017</year>) <volume>21</volume>(<issue>1</issue>):<fpage>22</fpage>&#x2013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1109/JBHI.2016.2633963</pub-id><pub-id pub-id-type="pmid">27913366</pub-id></citation></ref></ref-list>
</back>
</article>