AUTHOR=Sepas Ali , Bangash Ali Haider , Alraoui Omar , El Emam Khaled , El-Hussuna Alaa TITLE=Algorithms to anonymize structured medical and healthcare data: A systematic review JOURNAL=Frontiers in Bioinformatics VOLUME=Volume 2 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/bioinformatics/articles/10.3389/fbinf.2022.984807 DOI=10.3389/fbinf.2022.984807 ISSN=2673-7647 ABSTRACT=Introduction: Utilizing medical health data for secondary purposes such as research is paramount for the development of better pharmaceuticals for patients and improving the quality of health care. Despite many anonymizations algorithms/software developed for structured medical health data (MHD) in the last decade, it is not clear how one anonymization approach compares with another when it comes to data utility and risk of reidentification. The aim of this systematic review was to analyze the strengths and weaknesses of algorithms/software that anonymize structured MHD regarding risk of reidentification and data utility. Methods: This systematic review was conducted in accordance with PRSIMA guidelines for systematic review. A comprehensive systematic search was performed in the following databases Pubmed, ACM digital library, Medline, IEEE, Embase, Web of Science Collection, Scopus, Proquest dissertation and Theses Global. To find additional eligible articles a manual search was conducted. The following parameters were extracted from eligible studies: author, year of publication, sample size, relevant algorithms and/or software applied to anonymize medical health data (MHD), and summary of outcomes. Results: From a 1516 initial hits, 55 records including research articles, reviews, and books were included. Sixty-seven were built anonymization of demographics data, 17 for diagnosis codes, and 3 for genomic data. One of the most common approaches were k-anonymity, which is utilized mainly for demographics data frequently in combination with another algorithm e.g. l-diversity. No approaches were developed for protection against membership disclosure attack on diagnosis codes yet. Conclusion: Anonymization of demographics data, diagnosis codes and genomic data with sufficient data utility for secondary use while minimizing RR is feasible, however it is not possible to eliminate risk of reidentification completely.