<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Public Health</journal-id>
<journal-title>Frontiers in Public Health</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Public Health</abbrev-journal-title>
<issn pub-type="epub">2296-2565</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpubh.2022.846118</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Public Health</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A Machine Learning Based Framework to Identify and Classify Non-alcoholic Fatty Liver Disease in a Large-Scale Population</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Ji</surname> <given-names>Weidong</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1616974/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Xue</surname> <given-names>Mingyue</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhang</surname> <given-names>Yushan</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Yao</surname> <given-names>Hua</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/766554/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Wang</surname> <given-names>Yushan</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University</institution>, <addr-line>Guangzhou</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University</institution>, <addr-line>Urumqi</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Maternal and Child Health, School of Public Health, Sun Yat-sen University</institution>, <addr-line>Guangzhou</addr-line>, <country>China</country></aff>
<aff id="aff4"><sup>4</sup><institution>Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University</institution>, <addr-line>Urumqi</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Redhwan Ahmed Al-Naggar, National University of Malaysia, Malaysia</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Julia Wattacheril, Columbia University Irving Medical Center, United States; Cristina Elena Singer, University of Medicine and Pharmacy of Craiova, Romania</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Yushan Wang <email>wangyus8877&#x00040;163.com</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Family Medicine and Primary Care, a section of the journal Frontiers in Public Health</p></fn>
<fn fn-type="equal" id="fn002"><p>&#x02020;These authors have contributed equally to this work</p></fn></author-notes>
<pub-date pub-type="epub">
<day>04</day>
<month>04</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>10</volume>
<elocation-id>846118</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>12</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>02</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Ji, Xue, Zhang, Yao and Wang.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Ji, Xue, Zhang, Yao and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Non-alcoholic fatty liver disease (NAFLD) is a common serious health problem worldwide, which lacks efficient medical treatment. We aimed to develop and validate the machine learning (ML) models which could be used to the accurate screening of large number of people. This paper included 304,145 adults who have joined in the national physical examination and used their questionnaire and physical measurement parameters as model&#x00027;s candidate covariates. Absolute shrinkage and selection operator (LASSO) was used to feature selection from candidate covariates, then four ML algorithms were used to build the screening model for NAFLD, used a classifier with the best performance to output the importance score of the covariate in NAFLD. Among the four ML algorithms, XGBoost owned the best performance (accuracy = 0.880, precision = 0.801, recall = 0.894, F-1 = 0.882, and AUC = 0.951), and the importance ranking of covariates is accordingly BMI, age, waist circumference, gender, type 2 diabetes, gallbladder disease, smoking, hypertension, dietary status, physical activity, oil-loving and salt-loving. ML classifiers could help medical agencies achieve the early identification and classification of NAFLD, which is particularly useful for areas with poor economy, and the covariates&#x00027; importance degree will be helpful to the prevention and treatment of NAFLD.</p></abstract>
<kwd-group>
<kwd>machine learning</kwd>
<kwd>screening model</kwd>
<kwd>LASSO</kwd>
<kwd>non-alcoholic fatty liver disease (NAFLD)</kwd>
<kwd>predictive models</kwd>
</kwd-group>
<counts>
<fig-count count="4"/>
<table-count count="3"/>
<equation-count count="2"/>
<ref-count count="57"/>
<page-count count="10"/>
<word-count count="6038"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Non-alcoholic fatty liver disease (NAFLD) has become a sever public health problem worldwide (<xref ref-type="bibr" rid="B1">1</xref>, <xref ref-type="bibr" rid="B2">2</xref>). The prevalence rate of NAFLD is around 20&#x0007E;30% and is increasing constantly. In the past 10 years, the prevalence rate of NAFLD has doubled (<xref ref-type="bibr" rid="B3">3</xref>). NAFLD is closely related to overweight or obesity, hyperlipidemia, type 2 diabetes mellitus (T2DM) and other chronic metabolic diseases: the prevalence of NAFLD is 60&#x02013;90%, 27&#x02013;92%, and 28&#x02013;70% in obesity, hyperlipidemia and T2DM, respectively (<xref ref-type="bibr" rid="B4">4</xref>). NAFLD is a group of disease spectrum, the development of which is liver steatohepatitis, non-alcoholic steatohepatitis (NASH), liver fibrosis, cirrhosis, and even liver cancer. NAFLD is main cause leading to the fastest growing of liver cancer, and NASH has become the leading cause of liver failure in the United States (<xref ref-type="bibr" rid="B5">5</xref>&#x02013;<xref ref-type="bibr" rid="B8">8</xref>). In recent years, the prevalence of NAFLD in China has gradually increased, and the prevalence has become younger: in 2014, a large sample meta-analysis reported that the prevalence of NAFLD in adults in mainland China was 20.09% (<xref ref-type="bibr" rid="B9">9</xref>). Therefore, the large-scale cohort or epidemiological study of NAFLD is of great significance. The implementation of the national physical examination encourages large-scale research, but a simple and easy method is still needed to classify NAFLD patients in the population.</p>
<p>Histologic biopsy is the gold standard for diagnosis of NAFLD, but it is invasive and requires high technology. Ultrasound, CT and MRI are the common diagnostic methods, but the cost of imaging examination is high when large-scale population screening. In order to facilitate the diagnosis of NAFLD, several predictive models have been introduced. Fatty liver index is an algorithm based on serum triglyceride and gamma glutamyl transferase (GGT) levels, body mass index (BMI) and waist circumference, which can predict liver steatosis in general population (<xref ref-type="bibr" rid="B10">10</xref>, <xref ref-type="bibr" rid="B11">11</xref>). NAFLD liver fat score uses a formula including metabolic syndrome, T2DM, fasting serum insulin, aspartate aminotransferase (AST) and alanine aminotransferase (ALT) levels to estimate the percentage of liver fat content (<xref ref-type="bibr" rid="B12">12</xref>). SteatoTest is a logistic regression model of 12 predicting parameters: a2-macroglo-bulin (A2M), apolipoprotein A1 (ApoA1), haptoglobin, total bilirubin, GGT levels, cholesterol, triglycerides, glucose, age, gender and BMI (<xref ref-type="bibr" rid="B13">13</xref>). A prediction model based on laboratory includes six parameters: alanine aminotransferase, high-density lipoprotein cholesterol, triglyceride, hemoglobin A1c (HbA1c), white blood cell count and the presence of hypertension, and this model is used for the screening of NAFLD in common population (<xref ref-type="bibr" rid="B14">14</xref>). However, there is a problem that these prediction parameters are difficult to obtain. Although the existing NAFLD prediction models have been widely used, their application in large-scale epidemiological research and many areas of developing countries like China is limited.</p>
<p>It has been applied in medicine to establish accurate prediction model through machine learning. Machine learning outperforms conventional statistical methods with its ability to better identify variables relevant to clinical outcomes, better predictive performance, better modeling of complex relationships, ability to learn from multiple modules of data, and robustness to data noise. These tools have been used to diagnose fatty liver, meningitis, glaucoma, coronary heart disease, cancer and other diseases (<xref ref-type="bibr" rid="B14">14</xref>&#x02013;<xref ref-type="bibr" rid="B21">21</xref>). Our purpose is to use machine learning to analyze the data of 304,145 physical examinees, and to establish a simple NAFLD screening model that does not rely on indicators tested in laboratory.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and Methods</title>
<sec>
<title>Study Population</title>
<p>The Chinese government provides free medical examinations for the people of Xinjiang. This data comes from the medical examination of Urumqi in 2018, consisting 643,439 cases. People who signed a written informed consent were eligible to participate in the study. Potential participants were excluded if they: (1) self-reported drinkers; (2) patients with specific diseases which can lead to fatty liver (3) age &#x0003C; 20; With a strict data filtration, 304,145 subjects contained in further analysis.</p>
</sec>
<sec>
<title>Definition of NAFLD</title>
<p>The diagnosis of NAFLD was determined by the professionals of various physical examination institutions according to the standard of China Association of liver diseases (<xref ref-type="bibr" rid="B22">22</xref>). Patients are diagnosed with NAFLD when meeting the following three criteria: subjects without drinking or drinking history; no specific diseases leading to fatty liver such as viral hepatitis, liver disease induced by drug, total parenteral nutrition, hepatolenticular degeneration, and autoimmune liver disease; and Liver imaging of subjects was consistent with the diagnostic criteria for diffuse fatty liver. After summarizing all the results of physical examination, two doctors from the hepatology, department of a third-class hospital in Urumqi checked the diagnosis results of fatty liver, which were consistent with the preliminary diagnosis results.</p>
</sec>
<sec>
<title>Variable Characteristics</title>
<p>There are three parts in NPE variables: questionnaire, physical examination and laboratory testing. The questionnaire has information about medical history, socioeconomics, and lifestyle (smoking, drinking, diet and exercise habits). Physical measurement indexes include height, body weight, heart rate and waist circumference. Laboratory test indicators include blood glucose, blood biochemistry and B-ultrasonic examination. In this study, we wanted to establish a simple model that can predict the risk of NAFLD without laboratory test variables. There were many missing values in NPE. We selected 17 variables with good data quality from the questionnaire and physical measurement parameters as candidate covariates (<xref ref-type="table" rid="T1">Table 1</xref>).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Characteristics of variables.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Characteristic</bold></th>
<th valign="top" align="center"><bold>NAFLD</bold><break/><bold>(<italic>N</italic> &#x0003D; 58,654)</bold></th>
<th valign="top" align="center"><bold>Normal</bold><break/><bold>(<italic>N</italic> &#x0003D; 245,490)</bold></th>
<th valign="top" align="center"><bold><italic>p</italic>-value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><bold>Age (years)</bold></td>
<td valign="top" align="center">62 (50&#x02013;71)</td>
<td valign="top" align="center">50 (40&#x02013;65)</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left"><bold>BMI (kg/m</bold><sup><bold>2</bold></sup><bold>)</bold></td>
<td valign="top" align="center">27.27(25.15&#x02013;29.64)</td>
<td valign="top" align="center">23.71(21.91&#x02013;25.80)</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Waist circumference (cm)</bold></td>
<td valign="top" align="center">92(85.55&#x02013;99)</td>
<td valign="top" align="center">84(78&#x02013;90)</td>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Ethnicity</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Han</td>
<td valign="top" align="center">38,132(65.01)</td>
<td valign="top" align="center">160,708(65.46)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Uygur</td>
<td valign="top" align="center">8,973(15.30)</td>
<td valign="top" align="center">42,775(17.42)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Kazak</td>
<td valign="top" align="center">1,317(2.25)</td>
<td valign="top" align="center">7,898(3.22)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Hui</td>
<td valign="top" align="center">9,151(15.60)</td>
<td valign="top" align="center">27,843(11.34)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Mongolian</td>
<td valign="top" align="center">98(0.17)</td>
<td valign="top" align="center">481(0.20)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">other nationalities</td>
<td valign="top" align="center">983(1.68)</td>
<td valign="top" align="center">5,785(2.36)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Gender</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Female</td>
<td valign="top" align="center">23,069(39.33)</td>
<td valign="top" align="center">104,083(42.40)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Male</td>
<td valign="top" align="center">35,585(60.67)</td>
<td valign="top" align="center">141,407(57.60)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Physical activity,</bold><break/><bold><italic>n</italic> (%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Inactive</td>
<td valign="top" align="center">43,876(74.80)</td>
<td valign="top" align="center">149,349(60.84)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Active</td>
<td valign="top" align="center">14,778(25.20)</td>
<td valign="top" align="center">96,141(39.16)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Career</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Trader or service people</td>
<td valign="top" align="center">35,124(59.88)</td>
<td valign="top" align="center">180,260(73.43)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Agriculture workers</td>
<td valign="top" align="center">19,268(32.85)</td>
<td valign="top" align="center">48,766(19.86)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Factory workers</td>
<td valign="top" align="center">1,839(3.14)</td>
<td valign="top" align="center">6,230(2.54)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Soldier</td>
<td valign="top" align="center">597(1.02)</td>
<td valign="top" align="center">1,058(0.43)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Others</td>
<td valign="top" align="center">1,826(3.11)</td>
<td valign="top" align="center">9,176(3.74)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Smoking</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">No smoking</td>
<td valign="top" align="center">50,571(86.22)</td>
<td valign="top" align="center">225,638(91.91)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">0&#x02013;20 cigarettes per day</td>
<td valign="top" align="center">6,119(10.43)</td>
<td valign="top" align="center">16,981(6.92)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">&#x0003E;20 cigarettes per day</td>
<td valign="top" align="center">1,964(3.35)</td>
<td valign="top" align="center">2,871(1.17)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Dietary status</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">Meat based</td>
<td valign="top" align="center">55,034(93.83)</td>
<td valign="top" align="center">233,163(94.98)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Meat balanced</td>
<td valign="top" align="center">1,980(3.38)</td>
<td valign="top" align="center">7,255(2.96)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Vegetarian based</td>
<td valign="top" align="center">1,640(2.80)</td>
<td valign="top" align="center">5,072(2.07)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Sugar loving</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">No</td>
<td valign="top" align="center">53,524(91.25)</td>
<td valign="top" align="center">233,709(95.20)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Yes</td>
<td valign="top" align="center">5,130(8.75)</td>
<td valign="top" align="center">11,781(4.80)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Oil loving</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">No</td>
<td valign="top" align="center">50,144(85.49)</td>
<td valign="top" align="center">232,123(94.55)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Yes</td>
<td valign="top" align="center">8,510(14.51)</td>
<td valign="top" align="center">13,367(5.45)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Salt loving</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">No</td>
<td valign="top" align="center">53,363(90.98)</td>
<td valign="top" align="center">235,452(95.91)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Yes</td>
<td valign="top" align="center">5,291(9.02)</td>
<td valign="top" align="center">10,038(4.09)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Mental disease</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">No</td>
<td valign="top" align="center">57,187(97.50)</td>
<td valign="top" align="center">240,826(98.10)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Yes</td>
<td valign="top" align="center">1,467(2.50)</td>
<td valign="top" align="center">4,664(1.90)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Eye diseases</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">No</td>
<td valign="top" align="center">55,545(94.70)</td>
<td valign="top" align="center">234,197(95.40)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Yes</td>
<td valign="top" align="center">3,109(5.30)</td>
<td valign="top" align="center">11,293(4.60)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Gallbladder disease</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">No</td>
<td valign="top" align="center">47,176(80.43)</td>
<td valign="top" align="center">227,057(92.49)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Yes</td>
<td valign="top" align="center">11,478(19.57)</td>
<td valign="top" align="center">18,433(7.51)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>T2DM</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">No</td>
<td valign="top" align="center">38,136(65.02)</td>
<td valign="top" align="center">223,951(91.23)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Yes</td>
<td valign="top" align="center">20,518(34.98)</td>
<td valign="top" align="center">21,539(8.77)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left"><bold>Hypertension</bold>, <italic><bold>n</bold></italic> <bold>(%)</bold></td>
<td/>
<td/>
<td valign="top" align="center">&#x0003C;0.001</td>
</tr>
<tr>
<td valign="top" align="left">No</td>
<td valign="top" align="center">37,127(63.30)</td>
<td valign="top" align="center">189,935(89.81)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Yes</td>
<td valign="top" align="center">21,527(36.70)</td>
<td valign="top" align="center">55,555(10.19)</td>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>BMI, Body Mass Index; T2DM, type 2 diabetes mellitus</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Variable Definitions</title>
<p>Potential risk factors to evaluate NALFD contained: age, Body Mass Index (BMI), waist circumference, ethnicity, gender, physical activity, career, smoking, eating habits and some comorbidities.</p>
<p>Sociodemographic information, such as age (years), gender included &#x0201C;male&#x0201D; and &#x0201C;female&#x0201D;; ethnic groups were divided into six categories: &#x0201C;Han&#x0201D;, &#x0201C;Uygur&#x0201D;, &#x0201C;Kazak&#x0201D;, &#x0201C;Hui&#x0201D;, &#x0201C;Mongolian&#x0201D; and &#x0201C;other nationalities&#x0201D;; career included &#x0201C;Trader or service people&#x0201D;, &#x0201C;agriculture workers&#x0201D;, &#x0201C;factory workers&#x0201D;, &#x0201C;soldier&#x0201D; and &#x0201C;others&#x0201D;; the baseline comorbidities were mental diseases, eye diseases, gallbladder disease, T2DM, and hypertension (yes and no). The presence of eye diseases was defined as following: retinal hemorrhage, papilledema and cataract.</p>
<p>Lifestyle information includes smoking, physical activity and eating habits. Physical activity was defined as physical activity of at least 20 min per day (yes or no) in leisure time during the past 6 months (<xref ref-type="bibr" rid="B23">23</xref>); Individuals were defined as smokers if they had smoked at least one cigarette a day for at least 6 months (<xref ref-type="bibr" rid="B24">24</xref>). We also included daily smoking amount (0, 0&#x02013;20 cigarettes, and &#x0003E;20 cigarettes). Dietary status included 3 options: &#x0201C;meat based&#x0201D;, &#x0201C;meat balanced&#x0201D;, &#x0201C;vegetarian based&#x0201D;, participants can choose one or more of them. Dietary hobby refers to whether participants are addicted to sugar, oil, or salt.</p>
</sec>
<sec>
<title>Statistical Analysis</title>
<p>Data cleaning was performed first, and a descriptive analysis of the basic characteristics of the cleaned data was carried out. Categorical variables were expressed as numbers (percentages). Continuous variables conforming to normal distribution were expressed as mean &#x000B1; standard deviation; Otherwise, the median and quartile were adopted. Chi-square or Fisher&#x00027;s exact test was used as appropriate to compare differences in categorical variables. The difference of <italic>P</italic> &#x0003C; 0.05 on both sides was considered statistically significant. Second, least absolute shrinkage and selection operator (LASSO) was used to filter variables, and the filtered variables were for subsequent model building. Third, because more normal subjects were included in this study than NAFLD subjects (an imbalanced class problem), the synthetic minority over-sampling technique (SMOTE) algorithm was used to solve this problem. Fourth, four machine learning models were constructed using the class-balanced data, and the performance of the models was compared. Finally, the variable importance ranking was carried out on the algorithm with the best model performance.</p>
<p>All data of cases including demographic and disease in the two groups were given in <xref ref-type="table" rid="T1">Table 1</xref>. The main objective of the ML techniques is to classify the NAFLD. The overview of the proposed ML algorithms has been shown in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Machine learning flowchart of this study. LR, logistic regression; RF, random forest; NB, Naive Bayesian; ML, machine learning; LASSO, least absolute shrinkage and selection operator.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpubh-10-846118-g0001.tif"/>
</fig>
<sec>
<title>Date Cleaning</title>
<p>NPE has a large amount of data, and the variables are chaotic, with a large number of missing values and outliers. Therefore, data preprocessing is a essential step (<xref ref-type="bibr" rid="B25">25</xref>). Firstly, we deleted nearly 200 variables that were not meaningful to this study. Secondly, we have made pre-processing of the nulls and outliers, deleting the variables with more than 20% nulls and imputing the variables otherwise. Besides, categorical variables were filled with the mode, and continuous variables were filled with the mean.</p>
</sec>
<sec>
<title>Feature Selection</title>
<p>For applying the LASSO penalized logistic regression as the approach to screen the risk factors. The purpose of this method was to minimize the LASSO cost function and to obtain all features with non-zero coefficients. The minimized objective function is:</p>
<disp-formula id="E1"><mml:math id="M1"><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:munder><mml:mrow><mml:mtext>min</mml:mtext></mml:mrow><mml:mi>w</mml:mi></mml:munder><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>2</mml:mn><mml:mi>n</mml:mi></mml:mrow></mml:mfrac><mml:mo>&#x02016;</mml:mo><mml:mi>X</mml:mi><mml:mi>w</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>y</mml:mi><mml:msubsup><mml:mo>&#x02016;</mml:mo><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mo>&#x02016;</mml:mo><mml:mi>w</mml:mi><mml:msub><mml:mo>&#x02016;</mml:mo><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>where <italic>X</italic> is a matrix of subject features, <italic>y</italic> is a vector of sample labels, <italic>n</italic> is the number of samples, <italic>w</italic> is a coefficient vector of the regression model, and &#x003B1;||<italic>w</italic>||<sub>1</sub> is the LASSO penalty with the constant &#x003B1; and the &#x02113;<sub>1</sub>-norm of the coefficient vector ||<italic>w</italic>||<sub>1</sub> (<xref ref-type="bibr" rid="B26">26</xref>).</p>
</sec>
<sec>
<title>Data Imbalance Processing</title>
<p>Normal subjects were more than subjects with NAFLD (an unbalanced-class problem). Generally, classes with few subjects are more difficult to predict than those with numerous subjects (<xref ref-type="bibr" rid="B27">27</xref>&#x02013;<xref ref-type="bibr" rid="B30">30</xref>). The SMOTE algorithm was used to solve the negative impact of class imbalance, which belonged to the method of over-sampling, the principle of the method is to increase the number of a few classes of samples in classification to achieve sample balance, it is widely used as which can preserve important information in samples.</p>
</sec>
<sec>
<title>Classifier Comparison</title>
<p>Classification models were based on four popular supervised ML methods. For the linear model, the logistic regression model (LR) (<xref ref-type="bibr" rid="B31">31</xref>). For the decision tree approach, random forest (RF) model in the bagging method was used to combine multiple trees and the XGBoost model in boosting procedure was used to combine stumps of trees (<xref ref-type="bibr" rid="B32">32</xref>). Finally, Naive Bayesian (NB) Model which was based on probability (<xref ref-type="bibr" rid="B33">33</xref>).</p>
</sec>
<sec>
<title>Model Evaluation</title>
<p>The data set balanced by the SMOTE algorithm was randomly divided into training set 70% and validation set 30% (<xref ref-type="bibr" rid="B34">34</xref>, <xref ref-type="bibr" rid="B35">35</xref>). The algorithms were compared based on confusion matrix and some indicators including accuracy, precision, recall, F-1 and receiver operating characteristic (ROC) (<xref ref-type="bibr" rid="B36">36</xref>). Several important measures, such as accuracy, precision, recall, F-1 could be calculated by using the confusion matrix.</p>
<disp-formula id="E2"><mml:math id="M2"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>A</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mi>u</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>y</mml:mi></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac><mml:mtext>&#x000A0;</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>F</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
<sec>
<title>Feature Importance Ranking</title>
<p>Tree-based models provide measures for variable importance. However, ML algorithms can not estimate an easy explanation number because the relationships that ML algorithms fitted are complex compared with regression models. Usually, this relationship is not directly summarized as any parameter, and there is no causal relationship or even statistical explanation (<xref ref-type="bibr" rid="B37">37</xref>). Instead, this measure can generally be thought of as ranking which variables are most &#x0201C;important&#x0201D; to the fitting model (<xref ref-type="bibr" rid="B38">38</xref>). Although variable importance ranking is not a substitute for target hypothesis testing for a given parameter, it can be used as a means of hypothesis generation to help identify factors worthy of further study and thus gain some insight into the factors influencing the prediction (<xref ref-type="bibr" rid="B39">39</xref>).</p>
<p>The software used in this study was python software version 3.7.2. &#x0201C;Pandas&#x0201D; library, &#x0201C;NumPy&#x0201D; library and &#x0201C;Matplotlib&#x0201D; library were used for null and outlier determination and interpolation, &#x0201C;Imlearn&#x0201D; library was used to solve data imbalance, and &#x0201C;Sklearn&#x0201D; library was used to establish ML models and verify the validation. LASSO penalized logistic regression by R statistical software version 3.3.2 &#x0201C;Glmnet&#x0201D; package.</p>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec>
<title>Patients and Variables</title>
<p>A total of 58,654 (19.3%) from the pool of 304,135 subjects was NAFLD. Each subject was composed of 17 kinds of variables (<xref ref-type="table" rid="T1">Table 1</xref>), it is observed that all attributes are highly statistically (<italic>p</italic> &#x0003C; 0.001) associated with NAFLD.</p>
</sec>
<sec>
<title>Feature Selection</title>
<p>Through LASSO regression, we got 12 non-zero coefficient characteristics, which showed that we reduced 17 indexes to 12 indexes. As it was shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. These features included age, gender, physical activity, smoking, BMI, waist circumference, dietary status, oil loving, salt loving, T2DM, gallbladder disease and hypertension. And these 12 indexes were for the subsequent construction of the model.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Lasso algorithm for feature selection. <bold>(A)</bold> mean-squared error (10-fold cross-validation criterion) of LASSO penalized logistic regression algorithm. <bold>(B)</bold> Vertical line was drawn at the value selected using 10 times cross-validation, where optimal lambda resulted in 12 features with nonzero coefficients.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpubh-10-846118-g0002.tif"/>
</fig>
</sec>
<sec>
<title>Validation of the Validation Set</title>
<p>Finally, we got 490,980 data sets consisting of 12 variables by SMOTE algorithm (<xref ref-type="table" rid="T2">Table 2</xref>), 343,686 subjects as the training set, and 147,294 subjects as the validation set. Our study has built four ML algorithms. <xref ref-type="table" rid="T3">Table 3</xref> showed the performance of all classifiers. The confusion matrix has been displayed by Heatmap, the larger the number, the darker the color of the region, that is, the closer the color of TN and TP regions is to orange. On the contrary, the lighter the color of FN and FP regions are, the higher the accuracy of the classification model is. We got that the result of XGBoost was better than of the others (accuracy = 0.880, precision = 0.801, recall = 0.894, F-1 = 0.882, and AUC = 0.951). <xref ref-type="fig" rid="F3">Figure 3</xref> presented the ROC of all classifiers.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Dataset description.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Dataset</bold></th>
<th valign="top" align="center"><bold>Samples distribution</bold></th>
<th valign="top" align="center"><bold>Ratio</bold></th>
<th valign="top" align="left"><bold>Description</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Original data</td>
<td valign="top" align="center">245,490/<break/>58,654</td>
<td valign="top" align="center">4:1</td>
<td valign="top" align="left">Original data with full instances</td>
</tr>
<tr>
<td valign="top" align="left">SMOTE data</td>
<td valign="top" align="center">245,490/<break/>245,490</td>
<td valign="top" align="center">1:1</td>
<td valign="top" align="left">Dataset is balanced utilizing SMOTE oversampling</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>The results of classification algorithms.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Confusion matrix</bold></th>
<th valign="top" align="center"><bold>Accuracy</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>F-1</bold></th>
<th valign="top" align="center"><bold>AUC</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><bold>LR</bold></td>
<td valign="top" align="center"><inline-graphic xlink:href="fpubh-10-846118-i0001.tif"/></td>
<td valign="top" align="center">0.778</td>
<td valign="top" align="center">0.783</td>
<td valign="top" align="center">0.768</td>
<td valign="top" align="center">0.775</td>
<td valign="top" align="center">0.857</td>
</tr>
<tr>
<td valign="top" align="left"><bold>RF</bold></td>
<td valign="top" align="center"><inline-graphic xlink:href="fpubh-10-846118-i0002.tif"/></td>
<td valign="top" align="center">0.862</td>
<td valign="top" align="center">0.851</td>
<td valign="top" align="center">0.878</td>
<td valign="top" align="center">0.864</td>
<td valign="top" align="center">0.937</td>
</tr>
<tr>
<td valign="top" align="left"><bold>XGBoost</bold></td>
<td valign="top" align="center"><inline-graphic xlink:href="fpubh-10-846118-i0003.tif"/></td>
<td valign="top" align="center">0.880</td>
<td valign="top" align="center">0.801</td>
<td valign="top" align="center">0.894</td>
<td valign="top" align="center">0.882</td>
<td valign="top" align="center">0.951</td>
</tr>
<tr>
<td valign="top" align="left"><bold>NB</bold></td>
<td valign="top" align="center"><inline-graphic xlink:href="fpubh-10-846118-i0004.tif"/></td>
<td valign="top" align="center">0.716</td>
<td valign="top" align="center">0.762</td>
<td valign="top" align="center">0.626</td>
<td valign="top" align="center">0.687</td>
<td valign="top" align="center">0.814</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>AUC the area under the receiver operating characteristic (ROC) curve</italic>.</p>
<p><italic>LR, logistics regression; RF, random forest; NB, na&#x000EF;ve bayesian</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>ROC curve of all algorithms. LR, logistic regression; RF, random forest; NB, Naive Bayesian; XGB, XGBoost.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpubh-10-846118-g0003.tif"/>
</fig>
</sec>
<sec>
<title>Variables Importance Ranking by XGBoost</title>
<p>In this study, we output the results in terms of XGBoost model, who owned the best classification performance. XGBoost provided the importance score of each variable, attributing the predictive risk in 3 ways. Specifically, we chose the default method, which represents the relative number of times a variable is used to distribute data across all trees. There was only a small difference in the importance scores of the three methods, which did not affect the level of variable influence. The important measurement scores of the 12 variables are shown in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Feature importance contributed to the XGBoost model measured by F-score.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpubh-10-846118-g0004.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>Non-alcoholic fatty liver disease (NAFLD) is the most common liver disease in the world, which is the main cause of liver cirrhosis and liver cancer. NAFLD lacks effective drug treatment, so early identification of disease and early prevention have the most effective means to improve the disease. In this study, through 12 questionnaires and physical measurement variables, four ML screening models based on 304,145 subjects for NAFLD in large-scale physical examination population were established, XGBoost got the best performance in the validation, which had accuracy = 0.880, precision = 0.801, recall = 0.894, F-1 = 0.882 and AUC = 0.951.</p>
<p>Detailed analysis of existing epidemiological data shows that the risk factors of NAFLD in China are similar to those in the West and other parts of Asia, metabolic syndrome (MetS) is associated with higher risk of non-alcoholic steatohepatitis and more progressive disease 0.2, In this study, BMI, waist, hypertension, gallbladder disease and T2DM were all the risk factors of MetS (<xref ref-type="bibr" rid="B40">40</xref>&#x02013;<xref ref-type="bibr" rid="B42">42</xref>). On one hand, MetS is a strong predictor of NAFLD, while on the other hand, NAFLD is a good predictor for the clustering of components of MetS (<xref ref-type="bibr" rid="B43">43</xref>). In addition, a number of other risk factors for NAFLD have been identified in Chinese studies. These risk factors include advancing age, male gender, physical inactivity, high-fat intake, high-sugar intake, overeating, smoking, expanding waist circumference, and high-raising BMI (<xref ref-type="bibr" rid="B41">41</xref>, <xref ref-type="bibr" rid="B42">42</xref>, <xref ref-type="bibr" rid="B44">44</xref>). The conclusions of these studies were consistent with those of this study.</p>
<p>Our research has several advantages. First of all, some of the existing NAFLD prediction models involve laboratory and clinical parameters, and obtaining these parameters requires high human and financial resources, which limits the application of these models in large-scale epidemiological research and areas with poor health care level (<xref ref-type="bibr" rid="B10">10</xref>, <xref ref-type="bibr" rid="B14">14</xref>, <xref ref-type="bibr" rid="B45">45</xref>). All the variables in this study come from non-invasive and easily available measurement indicators and questionnaire indicators. This model can be applied to the prediction of NAFLD in the early stage and non-invasive, without expensive laboratory tests, especially in the areas with high epidemiological risk and low socio-economic status.</p>
<p>Secondly, this study is based on a large number of Chinese populations, which has a wide range of choices and is more extrapolated and representative. In addition, our data set covers many major ethnic groups in China, thus better assessing the characteristics of China&#x00027;s population.</p>
<p>Third, the occurrence and development of NAFLD are closely related to lifestyle, so improving lifestyle is an effective treatment (<xref ref-type="bibr" rid="B46">46</xref>). Our model not only can be used as a screening model for NAFLD, but also includes adjustable indicators such as diet, smoking, exercise, etc., which can guide people to prevent and delay the occurrence of disease through a healthy lifestyle. Although it is not clear whether exercise has independent benefits for NAFLD, exercise do can improve cardiovascular health, reduce weight, reduce peripheral, fat and liver insulin resistance.</p>
<p>Fourth, the analysis of NAFLD data is a challenging issue because most of the medical data are nonlinear, non-normal, correlation structured, and complex in nature. This study used LASSO penalized logistic regression vs. ML algorithms. LASSO works by shrinking the estimates of the regression coefficients and prevent overfitting due to collinearity of the covariates, which combines the advantages of selection process (easy to explain) and expression (robust), which is particularly useful in large data sets requiring efficient and fast algorithms (<xref ref-type="bibr" rid="B47">47</xref>&#x02013;<xref ref-type="bibr" rid="B49">49</xref>). ML algorithms&#x00027; outstanding performance in the field of processing complex data structures and big data makes it dominant in the field of healthcare and medical imaging, and compared with other machine learning methods, the performance of XGBoost can be improved more than 10 times (<xref ref-type="bibr" rid="B25">25</xref>, <xref ref-type="bibr" rid="B50">50</xref>&#x02013;<xref ref-type="bibr" rid="B53">53</xref>).</p>
<p>Surprisingly, compared with patients having non-NAFLD in previous studies, patients tend to eat a high calorie diet, especially in the form of carbohydrates and fats. Zelber Sagi et al. showed that NAFLD patients consumed more soft drinks and meat than the control group (<xref ref-type="bibr" rid="B54">54</xref>). Soft drinks contain a lot of sugar, and the intake of sugar is related to NAFLD (<xref ref-type="bibr" rid="B55">55</xref>). Musso et al. found that NAFLD patients had higher levels of saturated fat and cholesterol and lower levels of unsaturated fatty acids in their diet than healthy people (<xref ref-type="bibr" rid="B56">56</xref>). Although the ideal diet for NAFLD patients has not been determined, the data indicate that diet is important (<xref ref-type="bibr" rid="B57">57</xref>). However, in our study, we only got the weak effect of meat and vegetable combination, salt and oil preference on NAFLD (<xref ref-type="fig" rid="F3">Figure 3</xref>), but not the effect of sugar preference on NAFLD. A possible reason for the irrelevance may be that the NPE diet survey was a cross-sectional study, with no professional evaluating the diet of the examined population. The main reason for the errors was that the self-reported eating habits of people undergoing physical examination were highly subjective and lack of professional evaluation indicators. Therefore, more accurate results can be obtained through follow-up of people&#x00027;s lives in future studies. Several limitations existed: firstly, previous studies confirmed that education and family history were important determinants of NAFLD, but we failed to obtain the education and family history of participants. Secondly, we lacked of objective and unified evaluation standard for some indicators, such as dietary status, which may reduce the accuracy of the prediction model. Thirdly, the data used in this study was the physical examination data of China, which might limit the extrapolation of the results. However, this study, based on a large sample of government, is one of the few literature studies providing NAFLD comprehensive epidemiological data for model development. Finally, the parameters in the dataset are not enough to compare with the scores of existing NAFLD prediction models. However, the purpose of this study is to provide a convenient and easily accessible model for the diagnosis of NAFLD through questionnaires and physical measurement. The results show that our model has high diagnostic accuracy and prediction ability.</p>
</sec>
<sec sec-type="conclusions" id="s5">
<title>Conclusion</title>
<p>This study used a simple NAFLD screening model based on a large sample of 304,145 Chinese. The model can obtain high accuracy without relying on laboratory measurement parameters, especially in areas with poor economic conditions and high epidemiology.</p>
</sec>
<sec sec-type="data-availability" id="s6">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s7">
<title>Ethics Statement</title>
<p>This study was performed in accordance with the principles outlined in the Declaration of Helsinki and approved by the Xinjiang Uygur Autonomous Region CDC Ethical Committee and the Institutional Review Board. People who signed a written informed consent were eligible to participate in the study.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>MX and HY conceived the study. YZ and YW collected the data. MX and WJ performed the statistical analyses and drafted the manuscript. HY critically reviewed and edited the manuscript. All authors contributed to data analysis, drafting and revising the article, gave final approval of the version to be published, have agreed on the journal to which the article has been submitted, and agree to be accountable for all aspects of the work.</p>
</sec>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>This work was supported by the Region Social Science Foundation of Xinjiang (Grant No. 2021D01C238).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack><p>Thanks to the health commission of Xinjiang Uygur Autonomous Region and the health management institute of Xinjiang Medical University for data support, as well as Professor Wang Kai, School of Medical Engineering and Technology of Xinjiang Medical University for guidance. Thanks to all the participants for their help.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Younossi</surname> <given-names>ZM</given-names></name> <name><surname>Koenig</surname> <given-names>AB</given-names></name> <name><surname>Abdelatif</surname> <given-names>D</given-names></name> <name><surname>Fazel</surname> <given-names>Y</given-names></name> <name><surname>Henry</surname> <given-names>L</given-names></name> <name><surname>Wymer</surname> <given-names>M</given-names></name></person-group>. <article-title>Global epidemiology of nonalcoholic fatty liver disease-Meta-analytic assessment of prevalence, incidence, and outcomes</article-title>. <source>Hepatology.</source> (<year>2016</year>) <volume>64</volume>:<fpage>73</fpage>&#x02013;<lpage>84</lpage>. <pub-id pub-id-type="doi">10.1002/hep.28431</pub-id><pub-id pub-id-type="pmid">26707365</pub-id></citation></ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rinella</surname> <given-names>ME</given-names></name></person-group>. <article-title>Nonalcoholic fatty liver disease: a systematic review</article-title>. <source>JAMA.</source> (<year>2015</year>) <volume>313</volume>:<fpage>2263</fpage>&#x02013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1001/jama.2015.5370</pub-id><pub-id pub-id-type="pmid">26057287</pub-id></citation></ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wesolowski</surname> <given-names>SR</given-names></name> <name><surname>Kasmi</surname> <given-names>KC</given-names></name> <name><surname>Jonscher</surname> <given-names>KR</given-names></name> <name><surname>Friedman</surname> <given-names>JE</given-names></name></person-group>. <article-title>Developmental origins of NAFLD: a womb with a clue</article-title>. <source>Nat Rev Gastroenterol Hepatol.</source> (<year>2017</year>) <volume>14</volume>:<fpage>81</fpage>&#x02013;<lpage>96</lpage>. <pub-id pub-id-type="doi">10.1038/nrgastro.2016.160</pub-id><pub-id pub-id-type="pmid">27780972</pub-id></citation></ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bellentani</surname> <given-names>S</given-names></name> <name><surname>Scaglioni</surname> <given-names>F</given-names></name> <name><surname>Marino</surname> <given-names>M</given-names></name> <name><surname>Bedogni</surname> <given-names>G</given-names></name></person-group>. <article-title>Epidemiology of non-alcoholic fatty liver disease</article-title>. <source>Dig Dis.</source> (<year>2010</year>) <volume>28</volume>:<fpage>155</fpage>&#x02013;<lpage>61</lpage>. <pub-id pub-id-type="doi">10.1159/000282080</pub-id><pub-id pub-id-type="pmid">20460905</pub-id></citation></ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marengo</surname> <given-names>A</given-names></name> <name><surname>Rosso</surname> <given-names>C</given-names></name> <name><surname>Bugianesi</surname> <given-names>E</given-names></name></person-group>. <article-title>Liver cancer: connections with obesity, fatty liver, and cirrhosis</article-title>. <source>Annu Rev Med.</source> (<year>2016</year>) <volume>67</volume>:<fpage>103</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-med-090514-013832</pub-id><pub-id pub-id-type="pmid">26473416</pub-id></citation></ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Diehl</surname> <given-names>AM</given-names></name> <name><surname>Day</surname> <given-names>C</given-names></name></person-group>. <article-title>Cause, pathogenesis, and treatment of nonalcoholic steatohepatitis</article-title>. <source>N Engl J Med.</source> (<year>2017</year>) <volume>377</volume>:<fpage>2063</fpage>&#x02013;<lpage>72</lpage>. <pub-id pub-id-type="doi">10.1056/NEJMra1503519</pub-id><pub-id pub-id-type="pmid">29166236</pub-id></citation></ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Doycheva</surname> <given-names>I</given-names></name> <name><surname>Issa</surname> <given-names>D</given-names></name> <name><surname>Watt</surname> <given-names>KD</given-names></name> <name><surname>Lopez</surname> <given-names>R</given-names></name> <name><surname>Rifai</surname> <given-names>G</given-names></name> <name><surname>Alkhouri</surname> <given-names>N</given-names></name></person-group>. <article-title>Nonalcoholic steatohepatitis is the most rapidly increasing indication for liver transplantation in young adults in the United States</article-title>. <source>J Clin Gastroenterol.</source> (<year>2018</year>) <volume>52</volume>:<fpage>339</fpage>&#x02013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1097/MCG.0000000000000925</pub-id><pub-id pub-id-type="pmid">28961576</pub-id></citation></ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wong</surname> <given-names>RJ</given-names></name> <name><surname>Aguilar</surname> <given-names>M</given-names></name> <name><surname>Cheung</surname> <given-names>R</given-names></name> <name><surname>Perumpail</surname> <given-names>RB</given-names></name> <name><surname>Harrison</surname> <given-names>SA</given-names></name> <name><surname>Younossi</surname> <given-names>ZM</given-names></name> <etal/></person-group>. <article-title>Nonalcoholic steatohepatitis is the second leading etiology of liver disease among adults awaiting liver transplantation in the United States</article-title>. <source>Gastroenterology.</source> (<year>2015</year>) <volume>148</volume>:<fpage>547</fpage>&#x02013;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.1053/j.gastro.2014.11.039</pub-id><pub-id pub-id-type="pmid">25461851</pub-id></citation></ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Z</given-names></name> <name><surname>Xue</surname> <given-names>J</given-names></name> <name><surname>Chen</surname> <given-names>P</given-names></name> <name><surname>Chen</surname> <given-names>L</given-names></name> <name><surname>Yan</surname> <given-names>S</given-names></name> <name><surname>Liu</surname> <given-names>L</given-names></name></person-group>. <article-title>Prevalence of nonalcoholic fatty liver disease in mainland of China: a meta-analysis of published studies</article-title>. <source>J Gastroenterol Hepatol.</source> (<year>2014</year>) <volume>29</volume>:<fpage>42</fpage>&#x02013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1111/jgh.12428</pub-id><pub-id pub-id-type="pmid">24219010</pub-id></citation></ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kwok</surname> <given-names>R</given-names></name> <name><surname>Tse</surname> <given-names>YK</given-names></name> <name><surname>Wong</surname> <given-names>GL</given-names></name> <name><surname>Ha</surname> <given-names>Y</given-names></name> <name><surname>Lee</surname> <given-names>AU</given-names></name> <name><surname>Ngu</surname> <given-names>MC</given-names></name> <etal/></person-group>. <article-title>Systematic review with meta-analysis: non-invasive assessment of non-alcoholic fatty liver disease&#x02013;the role of transient elastography and plasma cytokeratin-18 fragments</article-title>. <source>Aliment Pharmacol Ther.</source> (<year>2014</year>) <volume>39</volume>:<fpage>254</fpage>&#x02013;<lpage>69</lpage>. <pub-id pub-id-type="doi">10.1111/apt.12569</pub-id><pub-id pub-id-type="pmid">24308774</pub-id></citation></ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wong</surname> <given-names>VW</given-names></name> <name><surname>Wong</surname> <given-names>GL</given-names></name></person-group>. <article-title>When and how to use steatosis biomarkers?</article-title> <source>Aliment Pharmacol Ther.</source> (<year>2014</year>) <volume>40</volume>:<fpage>1359</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1111/apt.12983</pub-id><pub-id pub-id-type="pmid">25376197</pub-id></citation></ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bedogni</surname> <given-names>G</given-names></name> <name><surname>Bellentani</surname> <given-names>S</given-names></name> <name><surname>Miglioli</surname> <given-names>L</given-names></name> <name><surname>Masutti</surname> <given-names>F</given-names></name> <name><surname>Passalacqua</surname> <given-names>M</given-names></name> <name><surname>Castiglione</surname> <given-names>A</given-names></name> <etal/></person-group>. <article-title>The fatty liver index: a simple and accurate predictor of hepatic steatosis in the general population</article-title>. <source>BMC Gastroenterol.</source> (<year>2006</year>) <volume>6</volume>:<fpage>33</fpage>. <pub-id pub-id-type="doi">10.1186/1471-230X-6-33</pub-id><pub-id pub-id-type="pmid">17081293</pub-id></citation></ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kotronen</surname> <given-names>A</given-names></name> <name><surname>Peltonen</surname> <given-names>M</given-names></name> <name><surname>Hakkarainen</surname> <given-names>A</given-names></name> <name><surname>Sevastianova</surname> <given-names>K</given-names></name> <name><surname>Bergholm</surname> <given-names>R</given-names></name> <name><surname>Johansson</surname> <given-names>LM</given-names></name> <etal/></person-group>. <article-title>Prediction of non-alcoholic fatty liver disease and liver fat using metabolic and genetic factors</article-title>. <source>Gastroenterology.</source> (<year>2009</year>) <volume>137</volume>:<fpage>865</fpage>&#x02013;<lpage>72</lpage>. <pub-id pub-id-type="doi">10.1053/j.gastro.2009.06.005</pub-id><pub-id pub-id-type="pmid">19524579</pub-id></citation></ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yip</surname> <given-names>TC</given-names></name> <name><surname>Ma</surname> <given-names>AJ</given-names></name> <name><surname>Wong</surname> <given-names>VW</given-names></name> <name><surname>Tse</surname> <given-names>YK</given-names></name> <name><surname>Chan</surname> <given-names>HL</given-names></name> <name><surname>Yuen</surname> <given-names>PC</given-names></name> <etal/></person-group>. <article-title>Laboratory parameter-based machine learning model for excluding non-alcoholic fatty liver disease (NAFLD) in the general population</article-title>. <source>Aliment Pharmacol Ther.</source> (<year>2017</year>) <volume>46</volume>:<fpage>447</fpage>&#x02013;<lpage>56</lpage>. <pub-id pub-id-type="doi">10.1111/apt.14172</pub-id><pub-id pub-id-type="pmid">28585725</pub-id></citation></ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>H</given-names></name> <name><surname>Xu</surname> <given-names>CF</given-names></name> <name><surname>Shen</surname> <given-names>Z</given-names></name> <name><surname>Yu</surname> <given-names>CH</given-names></name> <name><surname>Li</surname> <given-names>YM</given-names></name></person-group>. <article-title>Application of machine learning techniques for clinical predictive modeling: a cross-sectional study on nonalcoholic fatty liver disease in China</article-title>. <source>Biomed Res Int.</source> (<year>2018</year>) <volume>2018</volume>:<fpage>4304376</fpage>. <pub-id pub-id-type="doi">10.1155/2018/4304376</pub-id><pub-id pub-id-type="pmid">30402478</pub-id></citation></ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perveen</surname> <given-names>S</given-names></name> <name><surname>Shahbaz</surname> <given-names>M</given-names></name> <name><surname>Keshavjee</surname> <given-names>K</given-names></name> <name><surname>Guergachi</surname> <given-names>A</given-names></name></person-group>. <article-title>A systematic machine learning based approach for the diagnosis of non-alcoholic fatty liver disease risk and progression</article-title>. <source>Sci Rep.</source> (<year>2018</year>) <volume>8</volume>:<fpage>2112</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-018-20166-x</pub-id><pub-id pub-id-type="pmid">29391513</pub-id></citation></ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>L&#x000E9;lis</surname> <given-names>VM</given-names></name> <name><surname>Guzm&#x000E1;n</surname> <given-names>E</given-names></name> <name><surname>Belmonte</surname> <given-names>MV</given-names></name></person-group>. <article-title>A statistical classifier to support diagnose meningitis in less developed areas of Brazil</article-title>. <source>J Med Syst.</source> (<year>2017</year>) <volume>41</volume>:<fpage>145</fpage>. <pub-id pub-id-type="doi">10.1007/s10916-017-0785-5</pub-id><pub-id pub-id-type="pmid">28801740</pub-id></citation></ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>ML</given-names></name> <name><surname>Chen</surname> <given-names>HY</given-names></name></person-group>. <article-title>Glaucoma classification model based on GDx VCC measured parameters by decision tree</article-title>. <source>J Med Syst.</source> (<year>2010</year>) <volume>34</volume>:<fpage>1141</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1007/s10916-009-9333-2</pub-id><pub-id pub-id-type="pmid">20703593</pub-id></citation></ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gregori</surname> <given-names>D</given-names></name> <name><surname>Bigi</surname> <given-names>R</given-names></name> <name><surname>Cortigiani</surname> <given-names>L</given-names></name> <name><surname>Bovenzi</surname> <given-names>F</given-names></name> <name><surname>Fiorentini</surname> <given-names>C</given-names></name> <name><surname>Picano</surname> <given-names>E</given-names></name></person-group>. <article-title>Non-invasive risk stratification of coronary artery disease: an evaluation of some commonly used statistical classifiers in terms of predictive accuracy and clinical usefulness</article-title>. <source>J Eval Clin Pract.</source> (<year>2009</year>) <volume>15</volume>:<fpage>777</fpage>&#x02013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-2753.2008.01034.x</pub-id><pub-id pub-id-type="pmid">19811588</pub-id></citation></ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chao</surname> <given-names>CM</given-names></name> <name><surname>Yu</surname> <given-names>YW</given-names></name> <name><surname>Cheng</surname> <given-names>BW</given-names></name> <name><surname>Kuo</surname> <given-names>YL</given-names></name></person-group>. <article-title>Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree</article-title>. <source>J Med Syst.</source> (<year>2014</year>) <volume>38</volume>:<fpage>106</fpage>. <pub-id pub-id-type="doi">10.1007/s10916-014-0106-1</pub-id><pub-id pub-id-type="pmid">25119239</pub-id></citation></ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kate</surname> <given-names>RJ</given-names></name> <name><surname>Nadig</surname> <given-names>R</given-names></name></person-group>. <article-title>Stage-specific predictive models for breast cancer survivability</article-title>. <source>Int J Med Inform.</source> (<year>2017</year>) <volume>97</volume>:<fpage>304</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijmedinf.2016.11.001</pub-id><pub-id pub-id-type="pmid">27919388</pub-id></citation></ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fan</surname> <given-names>JG</given-names></name> <name><surname>Jia</surname> <given-names>JD</given-names></name> <name><surname>Li</surname> <given-names>YM</given-names></name> <name><surname>Wang</surname> <given-names>BY</given-names></name> <name><surname>Lu</surname> <given-names>LG</given-names></name> <name><surname>Shi</surname> <given-names>JP</given-names></name> <etal/></person-group>. <article-title>Guidelines for the diagnosis and management of nonalcoholic fatty liver disease: update 2010: (published in Chinese on Chinese Journal of Hepatology 2010; 18:163&#x02013;166)</article-title>. <source>J Dig Dis.</source> (<year>2011</year>) <volume>12</volume>:<fpage>38</fpage>&#x02013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.1111/j.1751-2980.2010.00476.x</pub-id><pub-id pub-id-type="pmid">21276207</pub-id></citation></ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>L</given-names></name> <name><surname>Yan</surname> <given-names>K</given-names></name> <name><surname>Zeng</surname> <given-names>D</given-names></name> <name><surname>Lai</surname> <given-names>X</given-names></name> <name><surname>Chen</surname> <given-names>X</given-names></name> <name><surname>Fang</surname> <given-names>Q</given-names></name> <etal/></person-group>. <article-title>Association of polycyclic aromatic hydrocarbons metabolites and risk of diabetes in coke oven workers</article-title>. <source>Environ Pollut.</source> (<year>2017</year>) <volume>223</volume>:<fpage>305</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1016/j.envpol.2017.01.027</pub-id><pub-id pub-id-type="pmid">28131481</pub-id></citation></ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>L</given-names></name> <name><surname>Zhou</surname> <given-names>Y</given-names></name> <name><surname>Sun</surname> <given-names>H</given-names></name> <name><surname>Lai</surname> <given-names>H</given-names></name> <name><surname>Liu</surname> <given-names>C</given-names></name> <name><surname>Yan</surname> <given-names>K</given-names></name> <etal/></person-group>. <article-title>Dose-response relationship between polycyclic aromatic hydrocarbon metabolites and risk of diabetes in the general Chinese population</article-title>. <source>Environ Pollut.</source> (<year>2014</year>) <volume>195</volume>:<fpage>24</fpage>&#x02013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1016/j.envpol.2014.08.012</pub-id><pub-id pub-id-type="pmid">25194268</pub-id></citation></ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ngiam</surname> <given-names>KY</given-names></name> <name><surname>Khor</surname> <given-names>IW</given-names></name></person-group>. <article-title>Big data and machine learning algorithms for health-care delivery</article-title>. <source>Lancet Oncol.</source> (<year>2019</year>) <volume>20</volume>:<fpage>e262</fpage>&#x02013;<lpage>3</lpage>. <pub-id pub-id-type="doi">10.1016/S1470-2045(19)30149-4</pub-id><pub-id pub-id-type="pmid">31044724</pub-id></citation></ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>J</given-names></name> <name><surname>Sun</surname> <given-names>D</given-names></name> <name><surname>Chen</surname> <given-names>L</given-names></name> <name><surname>Fang</surname> <given-names>Z</given-names></name> <name><surname>Song</surname> <given-names>W</given-names></name> <name><surname>Guo</surname> <given-names>D</given-names></name> <etal/></person-group>. <article-title>Radiomics analysis of dynamic contrast-enhanced magnetic resonance imaging for the prediction of sentinel lymph node metastasis in breast cancer</article-title>. <source>Front Oncol.</source> (<year>2019</year>) <volume>9</volume>:<fpage>980</fpage>. <pub-id pub-id-type="doi">10.3389/fonc.2019.00980</pub-id><pub-id pub-id-type="pmid">31632912</pub-id></citation></ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>BJ</given-names></name> <name><surname>Ku</surname> <given-names>B</given-names></name> <name><surname>Nam</surname> <given-names>J</given-names></name> <name><surname>Pham</surname> <given-names>DD</given-names></name> <name><surname>Kim</surname> <given-names>JY</given-names></name></person-group>. <article-title>Prediction of fasting plasma glucose status using anthropometric measures for diagnosing type 2 diabetes</article-title>. <source>IEEE J Biomed Health Inform.</source> (<year>2014</year>) <volume>18</volume>:<fpage>555</fpage>&#x02013;<lpage>61</lpage>. <pub-id pub-id-type="doi">10.1109/JBHI.2013.2264509</pub-id><pub-id pub-id-type="pmid">24608055</pub-id></citation></ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>BJ</given-names></name> <name><surname>Kim</surname> <given-names>JY</given-names></name></person-group>. <article-title>A comparison of the predictive power of anthropometric indices for hypertension and hypotension risk</article-title>. <source>PLoS ONE.</source> (<year>2014</year>) <volume>9</volume>:<fpage>e84897</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0084897</pub-id><pub-id pub-id-type="pmid">24465449</pub-id></citation></ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>H</given-names></name> <name><surname>Yang</surname> <given-names>X</given-names></name> <name><surname>Zheng</surname> <given-names>S</given-names></name> <name><surname>Sun</surname> <given-names>C</given-names></name></person-group>. <article-title>Active learning from imbalanced data: a solution of online weighted extreme learning machine</article-title>. <source>IEEE Trans Neural Netw Learn Syst.</source> (<year>2019</year>) <volume>30</volume>:<fpage>1088</fpage>&#x02013;<lpage>103</lpage>. <pub-id pub-id-type="doi">10.1109/TNNLS.2018.2855446</pub-id><pub-id pub-id-type="pmid">30137013</pub-id></citation></ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname> <given-names>Y</given-names></name> <name><surname>Zhang</surname> <given-names>YQ</given-names></name> <name><surname>Chawla</surname> <given-names>NV</given-names></name> <name><surname>Krasser</surname> <given-names>S</given-names></name></person-group>. <article-title>SVMs modeling for highly imbalanced classification</article-title>. <source>IEEE Trans Syst Man Cybern B Cybern.</source> (<year>2009</year>) <volume>39</volume>:<fpage>281</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1109/TSMCB.2008.2002909</pub-id><pub-id pub-id-type="pmid">19068445</pub-id></citation></ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meadows</surname> <given-names>K</given-names></name> <name><surname>Gibbens</surname> <given-names>R</given-names></name> <name><surname>Gerrard</surname> <given-names>C</given-names></name> <name><surname>Vuylsteke</surname> <given-names>A</given-names></name></person-group>. <article-title>Prediction of patient length of stay on the intensive care unit following cardiac surgery: a logistic regression analysis based on the cardiac operative mortality risk calculator, EuroSCORE</article-title>. <source>J Cardiothorac Vasc Anesth.</source> (<year>2018</year>) <volume>32</volume>:<fpage>2676</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1053/j.jvca.2018.03.007</pub-id><pub-id pub-id-type="pmid">29678435</pub-id></citation></ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname> <given-names>L</given-names></name></person-group>. <article-title>Random forests</article-title>. <source>Mach Learn.</source> (<year>2001</year>) <volume>45</volume>:<fpage>5</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id></citation>
</ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berliner</surname> <given-names>LM</given-names></name></person-group>. <article-title>Bayesian statistics: an introduction</article-title>. <source>Technometrics.</source> (<year>1998</year>) <volume>34</volume>:<fpage>115</fpage>&#x02013;<lpage>115</lpage>. <pub-id pub-id-type="doi">10.2307/1269580</pub-id></citation>
</ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ramezankhani</surname> <given-names>A</given-names></name> <name><surname>Pournik</surname> <given-names>O</given-names></name> <name><surname>Shahrabi</surname> <given-names>J</given-names></name> <name><surname>Khalili</surname> <given-names>D</given-names></name> <name><surname>Azizi</surname> <given-names>F</given-names></name> <name><surname>Hadaegh</surname> <given-names>F</given-names></name></person-group>. <article-title>Applying decision tree for identification of a low risk population for type 2 diabetes</article-title>. <source>Tehran Lipid and Glucose Study Diabetes Res Clin Pract.</source> (<year>2014</year>) <volume>105</volume>:<fpage>391</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1016/j.diabres.2014.07.003</pub-id><pub-id pub-id-type="pmid">25085758</pub-id></citation></ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>CP</given-names></name> <name><surname>Zhi</surname> <given-names>XY</given-names></name> <name><surname>Ma</surname> <given-names>J</given-names></name> <name><surname>Cui</surname> <given-names>Z</given-names></name> <name><surname>Zhu</surname> <given-names>ZL</given-names></name> <name><surname>Zhang</surname> <given-names>C</given-names></name> <etal/></person-group>. <article-title>Performance comparison between Logistic regression, decision trees, and multilayer perceptron in predicting peripheral neuropathy in type 2 diabetes mellitus</article-title>. <source>Chin Med J.</source> (<year>2012</year>) <volume>125</volume>:<fpage>851</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="pmid">22490586</pub-id></citation></ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lavrac</surname> <given-names>N</given-names></name></person-group>. <article-title>Selected techniques for data mining in medicine</article-title>. <source>Artif Intell Med.</source> (<year>1999</year>) <volume>16</volume>:<fpage>3</fpage>&#x02013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1016/S0933-3657(98)00062-1</pub-id><pub-id pub-id-type="pmid">10225344</pub-id></citation></ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldstein</surname> <given-names>BA</given-names></name> <name><surname>Navar</surname> <given-names>AM</given-names></name> <name><surname>Carter</surname> <given-names>RE</given-names></name></person-group>. <article-title>Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges</article-title>. <source>Eur Heart J.</source> (<year>2017</year>) <volume>38</volume>:<fpage>1805</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1093/eurheartj/ehw302</pub-id><pub-id pub-id-type="pmid">27436868</pub-id></citation></ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldstein</surname> <given-names>BA</given-names></name> <name><surname>Polley</surname> <given-names>EC</given-names></name> <name><surname>Briggs</surname> <given-names>FB</given-names></name></person-group>. <article-title>Random forests for genetic association studies</article-title>. <source>Stat Appl Genet Mol Biol.</source> (<year>2011</year>) <volume>10</volume>:<fpage>32</fpage>. <pub-id pub-id-type="doi">10.2202/1544-6115.1691</pub-id><pub-id pub-id-type="pmid">22889876</pub-id></citation></ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taylor</surname> <given-names>J</given-names></name> <name><surname>Tibshirani</surname> <given-names>RJ</given-names></name></person-group>. <article-title>Statistical learning and selective inference</article-title>. <source>Proc Natl Acad Sci USA.</source> (<year>2015</year>) <volume>112</volume>:<fpage>7629</fpage>&#x02013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1507583112</pub-id><pub-id pub-id-type="pmid">26100887</pub-id></citation></ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liew</surname> <given-names>PL</given-names></name> <name><surname>Lee</surname> <given-names>WJ</given-names></name> <name><surname>Wang</surname> <given-names>W</given-names></name> <name><surname>Lee</surname> <given-names>YC</given-names></name> <name><surname>Chen</surname> <given-names>WY</given-names></name> <name><surname>Fang</surname> <given-names>CL</given-names></name> <etal/></person-group>. <article-title>Fatty liver disease: predictors of nonalcoholic steatohepatitis and gallbladder disease in morbid obesity</article-title>. <source>Obes Surg.</source> (<year>2008</year>) <volume>18</volume>:<fpage>847</fpage>&#x02013;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1007/s11695-007-9355-0</pub-id><pub-id pub-id-type="pmid">18459024</pub-id></citation></ref>
<ref id="B41">
<label>41.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fan</surname> <given-names>JG</given-names></name> <name><surname>Farrell</surname> <given-names>GC</given-names></name></person-group>. <article-title>Epidemiology of non-alcoholic fatty liver disease in China</article-title>. <source>J Hepatol.</source> (<year>2009</year>) <volume>50</volume>:<fpage>204</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1016/j.jhep.2008.10.010</pub-id><pub-id pub-id-type="pmid">19014878</pub-id></citation></ref>
<ref id="B42">
<label>42.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fan</surname> <given-names>JG</given-names></name> <name><surname>Saibara</surname> <given-names>T</given-names></name> <name><surname>Chitturi</surname> <given-names>S</given-names></name> <name><surname>Kim</surname> <given-names>BI</given-names></name> <name><surname>Sung</surname> <given-names>JJ</given-names></name> <name><surname>Chutaputti</surname> <given-names>A</given-names></name> <etal/></person-group>. <article-title>What are the risk factors and settings for non-alcoholic fatty liver disease in Asia-Pacific?</article-title> <source>J Gastroenterol Hepatol.</source> (<year>2007</year>) <volume>22</volume>:<fpage>794</fpage>&#x02013;<lpage>800</lpage>. <pub-id pub-id-type="doi">10.1111/j.1440-1746.2007.04952.x</pub-id><pub-id pub-id-type="pmid">17498218</pub-id></citation></ref>
<ref id="B43">
<label>43.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fan</surname> <given-names>JG</given-names></name> <name><surname>Zhu</surname> <given-names>J</given-names></name> <name><surname>Li</surname> <given-names>XJ</given-names></name> <name><surname>Chen</surname> <given-names>L</given-names></name> <name><surname>Lu</surname> <given-names>YS</given-names></name> <name><surname>Li</surname> <given-names>L</given-names></name> <etal/></person-group>. <article-title>Fatty liver and the metabolic syndrome among Shanghai adults</article-title>. <source>J Gastroenterol Hepatol.</source> (<year>2005</year>) <volume>20</volume>:<fpage>1825</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1111/j.1440-1746.2005.04058.x</pub-id><pub-id pub-id-type="pmid">16336439</pub-id></citation></ref>
<ref id="B44">
<label>44.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jung</surname> <given-names>HS</given-names></name> <name><surname>Chang</surname> <given-names>Y</given-names></name> <name><surname>Kwon</surname> <given-names>MJ</given-names></name> <name><surname>Sung</surname> <given-names>E</given-names></name> <name><surname>Yun</surname> <given-names>KE</given-names></name> <name><surname>Cho</surname> <given-names>YK</given-names></name> <etal/></person-group>. <article-title>Smoking and the risk of non-alcoholic fatty liver disease: a cohort study</article-title>. <source>Am J Gastroenterol.</source> (<year>2019</year>) <volume>114</volume>:<fpage>453</fpage>&#x02013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1038/s41395-018-0283-5</pub-id><pub-id pub-id-type="pmid">33365321</pub-id></citation></ref>
<ref id="B45">
<label>45.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname> <given-names>J</given-names></name> <name><surname>Chan</surname> <given-names>HL</given-names></name> <name><surname>Wong</surname> <given-names>GL</given-names></name> <name><surname>Chan</surname> <given-names>AW</given-names></name> <name><surname>Choi</surname> <given-names>PC</given-names></name> <name><surname>Chan</surname> <given-names>HY</given-names></name> <etal/></person-group>. <article-title>Assessment of non-alcoholic fatty liver disease using serum total cell death and apoptosis markers</article-title>. <source>Aliment Pharmacol Ther.</source> (<year>2012</year>) <volume>36</volume>:<fpage>1057</fpage>&#x02013;<lpage>66</lpage>. <pub-id pub-id-type="doi">10.1111/apt.12091</pub-id><pub-id pub-id-type="pmid">23252777</pub-id></citation></ref>
<ref id="B46">
<label>46.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Romero-G&#x000F3;mez</surname> <given-names>M</given-names></name> <name><surname>Zelber-Sagi</surname> <given-names>S</given-names></name> <name><surname>Trenell</surname> <given-names>M</given-names></name></person-group>. <article-title>Treatment of NAFLD with diet, physical activity and exercise</article-title>. <source>J Hepatol.</source> (<year>2017</year>) <volume>67</volume>:<fpage>829</fpage>&#x02013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1016/j.jhep.2017.05.016</pub-id><pub-id pub-id-type="pmid">28545937</pub-id></citation></ref>
<ref id="B47">
<label>47.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tibshirani</surname> <given-names>R</given-names></name></person-group>. <article-title>The lasso method for variable selection in the Cox model</article-title>. <source>Stat Med</source>. (<year>1997</year>) <volume>16</volume>:<fpage>385</fpage>&#x02013;<lpage>95</lpage>. <pub-id pub-id-type="pmid">9044528</pub-id></citation></ref>
<ref id="B48">
<label>48.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mueller-Using</surname> <given-names>S</given-names></name> <name><surname>Feldt</surname> <given-names>T</given-names></name> <name><surname>Sarfo</surname> <given-names>FS</given-names></name> <name><surname>Eberhardt</surname> <given-names>KA</given-names></name></person-group>. <article-title>Factors associated with performing tuberculosis screening of HIV-positive patients in Ghana: LASSO-based predictor selection in a large public health data set</article-title>. <source>BMC Public Health.</source> (<year>2016</year>) <volume>16</volume>:<fpage>563</fpage>. <pub-id pub-id-type="doi">10.1186/s12889-016-3239-y</pub-id><pub-id pub-id-type="pmid">27412114</pub-id></citation></ref>
<ref id="B49">
<label>49.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friedman</surname> <given-names>J</given-names></name> <name><surname>Hastie</surname> <given-names>T</given-names></name> <name><surname>Tibshirani</surname> <given-names>R</given-names></name></person-group>. <article-title>Regularization paths for generalized linear models <italic>via</italic> coordinate descent</article-title>. <source>J Stat Softw.</source> (<year>2010</year>) <volume>33</volume>:<fpage>1</fpage>&#x02013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.18637/jss.v033.i01</pub-id><pub-id pub-id-type="pmid">20808728</pub-id></citation></ref>
<ref id="B50">
<label>50.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>G</given-names></name></person-group>. <article-title>MLBCD: a machine learning tool for big clinical data</article-title>. <source>Health Inf Sci Syst.</source> (<year>2015</year>) <volume>3</volume>:<fpage>3</fpage>. <pub-id pub-id-type="doi">10.1186/s13755-015-0011-0</pub-id><pub-id pub-id-type="pmid">26417431</pub-id></citation></ref>
<ref id="B51">
<label>51.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Srivastava</surname> <given-names>SK</given-names></name> <name><surname>Singh</surname> <given-names>SK</given-names></name> <name><surname>Suri</surname> <given-names>JS</given-names></name></person-group>. <article-title>Healthcare text classification system and its performance evaluation: a source of better intelligence by characterizing healthcare text</article-title>. <source>J Med Syst.</source> (<year>2018</year>) <volume>42</volume>:<fpage>97</fpage>. <pub-id pub-id-type="doi">10.1007/s10916-018-0941-6</pub-id><pub-id pub-id-type="pmid">29654417</pub-id></citation></ref>
<ref id="B52">
<label>52.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuppili</surname> <given-names>V</given-names></name> <name><surname>Biswas</surname> <given-names>M</given-names></name> <name><surname>Sreekumar</surname> <given-names>A</given-names></name> <name><surname>Suri</surname> <given-names>HS</given-names></name> <name><surname>Saba</surname> <given-names>L</given-names></name> <name><surname>Edla</surname> <given-names>DR</given-names></name> <etal/></person-group>. <article-title>Extreme learning machine framework for risk stratification of fatty liver disease using ultrasound tissue characterization</article-title>. <source>J Med Syst.</source> (<year>2017</year>) <volume>41</volume>:<fpage>152</fpage>. <pub-id pub-id-type="doi">10.1007/s10916-017-0797-1</pub-id><pub-id pub-id-type="pmid">29218604</pub-id></citation></ref>
<ref id="B53">
<label>53.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Banchhor</surname> <given-names>SK</given-names></name> <name><surname>Londhe</surname> <given-names>ND</given-names></name> <name><surname>Araki</surname> <given-names>T</given-names></name> <name><surname>Saba</surname> <given-names>L</given-names></name> <name><surname>Radeva</surname> <given-names>P</given-names></name> <name><surname>Khanna</surname> <given-names>NN</given-names></name> <etal/></person-group>. <article-title>Calcium detection, its quantification, and grayscale morphology-based risk stratification using machine learning in multimodality big data coronary and carotid scans: a review</article-title>. <source>Comput Biol Med.</source> (<year>2018</year>) <volume>101</volume>:<fpage>184</fpage>&#x02013;<lpage>98</lpage>. <pub-id pub-id-type="doi">10.1016/j.compbiomed.2018.08.017</pub-id><pub-id pub-id-type="pmid">30149250</pub-id></citation></ref>
<ref id="B54">
<label>54.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zelber-Sagi</surname> <given-names>S</given-names></name> <name><surname>Nitzan-Kaluski</surname> <given-names>D</given-names></name> <name><surname>Goldsmith</surname> <given-names>R</given-names></name> <name><surname>Webb</surname> <given-names>M</given-names></name> <name><surname>Blendis</surname> <given-names>L</given-names></name> <name><surname>Halpern</surname> <given-names>Z</given-names></name> <etal/></person-group>. <article-title>Long term nutritional intake and the risk for non-alcoholic fatty liver disease (NAFLD): a population based study</article-title>. <source>J Hepatol.</source> (<year>2007</year>) <volume>47</volume>:<fpage>711</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1016/j.jhep.2007.06.020</pub-id><pub-id pub-id-type="pmid">17850914</pub-id></citation></ref>
<ref id="B55">
<label>55.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abid</surname> <given-names>A</given-names></name> <name><surname>Taha</surname> <given-names>O</given-names></name> <name><surname>Nseir</surname> <given-names>W</given-names></name> <name><surname>Farah</surname> <given-names>R</given-names></name> <name><surname>Grosovski</surname> <given-names>M</given-names></name> <name><surname>Assy</surname> <given-names>N</given-names></name></person-group>. <article-title>Soft drink consumption is associated with fatty liver disease independent of metabolic syndrome</article-title>. <source>J Hepatol.</source> (<year>2009</year>) <volume>51</volume>:<fpage>918</fpage>&#x02013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1016/j.jhep.2009.05.033</pub-id><pub-id pub-id-type="pmid">20385429</pub-id></citation></ref>
<ref id="B56">
<label>56.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Musso</surname> <given-names>G</given-names></name> <name><surname>Cassader</surname> <given-names>M</given-names></name> <name><surname>Gambino</surname> <given-names>R</given-names></name></person-group>. <article-title>Non-alcoholic steatohepatitis: emerging molecular targets and therapeutic strategies</article-title>. <source>Nat Rev Drug Discov.</source> (<year>2016</year>) <volume>15</volume>:<fpage>249</fpage>&#x02013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1038/nrd.2015.3</pub-id><pub-id pub-id-type="pmid">26794269</pub-id></citation></ref>
<ref id="B57">
<label>57.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCarthy</surname> <given-names>EM</given-names></name> <name><surname>Rinella</surname> <given-names>ME</given-names></name></person-group>. <article-title>The role of diet and nutrient composition in nonalcoholic Fatty liver disease</article-title>. <source>J Acad Nutr Diet.</source> (<year>2012</year>) <volume>112</volume>:<fpage>401</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1016/j.jada.2011.10.007</pub-id><pub-id pub-id-type="pmid">22717200</pub-id></citation></ref>
</ref-list>
</back>
</article>