<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2021.632385</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>An Improved Genome-Wide Polygenic Score Model for Predicting the Risk of Type 2 Diabetes</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Wei</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1218100/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhuang</surname> <given-names>Zhenhuang</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1217187/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Wenxiu</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1216587/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Huang</surname> <given-names>Tao</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1184280/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Liu</surname> <given-names>Zhonghua</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1149850/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Statistics and Actuarial Science, The University of Hong Kong</institution>, <addr-line>Hong Kong</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Epidemiology and Biostatistics, School of Public Health, Peking University</institution>, <addr-line>Beijing</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>Center for Intelligent Public Health, Institute for Artificial Intelligence, Peking University</institution>, <addr-line>Beijing</addr-line>, <country>China</country></aff>
<aff id="aff4"><sup>4</sup><institution>Key Laboratory of Molecular Cardiovascular Diseases, Ministry of Education, Peking University</institution>, <addr-line>Beijing</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Guolian Kang, St. Jude Children&#x2019;s Research Hospital, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Yufang Pei, Soochow University Medical College, China; Lei Sun, University of Toronto, Canada</p></fn>
<corresp id="c001">&#x002A;Correspondence: Zhonghua Liu, <email>zhhliu@hku.hk</email></corresp>
<corresp id="c002">Tao Huang, <email>huangtaotao@pku.edu.cn</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Statistical Genetics and Methodology, a section of the journal Frontiers in Genetics</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>11</day>
<month>02</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>632385</elocation-id>
<history>
<date date-type="received">
<day>23</day>
<month>11</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>01</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 Liu, Zhuang, Wang, Huang and Liu.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Liu, Zhuang, Wang, Huang and Liu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Polygenic risk score (PRS) has been shown to be predictive of disease risk such as type 2 diabetes (T2D). However, the existing studies on genetic prediction for T2D only had limited predictive power. To further improve the predictive capability of the PRS model in identifying individuals at high T2D risk, we proposed a new three-step filtering procedure, which aimed to include truly predictive single-nucleotide polymorphisms (SNPs) and avoid unpredictive ones into PRS model. First, we filtered SNPs according to the marginal association <italic>p</italic>-values (<italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup>) from large-scale genome-wide association studies. Second, we set linkage disequilibrium (LD) pruning thresholds (<italic>r</italic><sup>2</sup>) as 0.2, 0.4, 0.6, and 0.8. Third, we set <italic>p</italic>-value thresholds as 5&#x00D7;&#x2004;10<sup>&#x2212;2</sup>, 5&#x00D7;&#x2004;10<sup>&#x2212;4</sup>, 5&#x00D7;&#x2004;10<sup>&#x2212;6</sup>, and 5&#x00D7;&#x2004;10<sup>&#x2212;8</sup>. Then, we constructed and tested multiple candidate PRS models obtained by the PRSice-2 software among 182,422 individuals in the UK Biobank (UKB) testing dataset. We validated the predictive capability of the optimal PRS model that was chosen from the testing process in identifying individuals at high T2D risk based on the UKB validation dataset (<italic>n</italic> = 274,029). The prediction accuracy of the PRS model evaluated by the adjusted area under the receiver operating characteristics curve (AUC) showed that our PRS model had good prediction performance [AUC = 0.795, 95% confidence interval (CI): (0.790, 0.800)]. Specifically, our PRS model identified 30, 12, and 7% of the population at greater than five-, six-, and seven-fold risk for T2D, respectively. After adjusting for sex, age, physical measurements, and clinical factors, the AUC increased to 0.901 [95% CI: (0.897, 0.904)]. Therefore, our PRS model could be useful for population-level preventive T2D screening.</p>
</abstract>
<kwd-group>
<kwd>type 2 diabetes</kwd>
<kwd>UK Biobank</kwd>
<kwd>screening</kwd>
<kwd>prediction model</kwd>
<kwd>polygenic risk score</kwd>
</kwd-group>
<counts>
<fig-count count="3"/>
<table-count count="6"/>
<equation-count count="8"/>
<ref-count count="30"/>
<page-count count="9"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1">
<title>Introduction</title>
<p>Type 2 diabetes (T2D) is a global public health problem. Identifying individuals at high risk for T2D for early targeted detection, prevention, and intervention is of great public health significance. Besides the well-known behavioral and environmental factors, T2D has a strong genetic component (<xref ref-type="bibr" rid="B30">Zimmet et al., 2014</xref>). Genome-wide association studies (GWASs) have successfully identified many common genetic variants that confer T2D susceptibility (<xref ref-type="bibr" rid="B1">Burton et al., 2007</xref>; <xref ref-type="bibr" rid="B24">Scott et al., 2007</xref>; <xref ref-type="bibr" rid="B18">Palmer et al., 2012</xref>; <xref ref-type="bibr" rid="B27">Visscher et al., 2017</xref>; <xref ref-type="bibr" rid="B19">P&#x00E4;rna et al., 2020</xref>). However, all of these common genetic variants discovered by GWAS can only be able to account for a small proportion of the total heritability (<xref ref-type="bibr" rid="B15">McCarthy, 2010</xref>; <xref ref-type="bibr" rid="B8">Herder and Roden, 2011</xref>; <xref ref-type="bibr" rid="B20">Prasad and Groop, 2015</xref>) and thus lead to low predictive power. Polygenic risk score (PRS) that aggregates the information of many common single-nucleotide polymorphisms (SNPs) weighted by the effect size obtained from large-scale discovery GWAS has been used to predict T2D risk. PRS is expected to have better predictive power and the potential to improve the performance in T2D risk assessment (<xref ref-type="bibr" rid="B29">Wray et al., 2013</xref>; <xref ref-type="bibr" rid="B10">Khera et al., 2019</xref>).</p>
<p>The most commonly used method for constructing PRS is called clumping and thresholding (C + T) [or pruning and thresholding (P + T)] method, which applies two filtering steps. To retain SNPs that weakly correlated with each other, it first forms clumps around SNPs by using linkage disequilibrium (LD)-driven clumping procedure (<xref ref-type="bibr" rid="B21">Priv&#x00E9; et al., 2019</xref>). Each clumping contains all SNPs within 250 kb of the index SNPs, and the degree of LD is determined by a provided pairwise correlation (<italic>r</italic><sup>2</sup>). Then, it removes SNPs with <italic>p</italic>-values obtained from a disease-related GWAS larger than a given threshold. C+T is regarded as the most intuitive and easiest method to generate PRS. There are two common software programs (i.e., PLINK and PRSice) that can be used to implement C + T method. Recently, Choi et al. developed a new software PRSice-2 from <ext-link ext-link-type="uri" xlink:href="https://www.prsice.info">https://www.prsice.info</ext-link> (<xref ref-type="bibr" rid="B4">Choi and O&#x2019;Reilly, 2019</xref>), which is demonstrated to be more computationally efficient and scalable than alternative PRS software while maintaining comparable predictive power.</p>
<p>Several researchers have tried to construct PRS models based on the C + T method for predicting T2D risk by PLINK or PRSice software. The earliest PRS model assessed the combined risk of only three variants that had been published to predispose to T2D in 6,078 individuals. The area under the receiver operating characteristics curve (AUC) of their PRS model was 0.571 (<xref ref-type="bibr" rid="B28">Weedon et al., 2006</xref>). Thereafter, other researchers have attempted various strategies to improve the predictive ability of the PRS model, including increasing the number of SNPs, adjusting for sex and age, some physical measurements [e.g., body mass index (BMI), diastolic blood pressure (DBP), and systolic blood pressure (SBP)] (<xref ref-type="bibr" rid="B12">Lango et al., 2008</xref>) and clinical factors [e.g., triglyceride level (TL), glucose level (GL), and cholesterol level (CL)] (<xref ref-type="bibr" rid="B13">Lyssenko et al., 2008</xref>; <xref ref-type="bibr" rid="B16">Meigs et al., 2008</xref>; <xref ref-type="bibr" rid="B26">Vassy et al., 2014</xref>). The AUC of those improved PRS models increased to some extent (range from 0.600 to 0.800). However, there are still several limitations. First, their sample sizes are not large (range from 2,776 to 39,117). Second, they only take a small number of SNPs (range from 3 to 1,000) that passed the &#x201C;GWAS significant variant&#x201D; derivation strategy (<italic>p</italic>&#x2264;&#x2004;1&#x00D7;&#x2004;10<sup>&#x2212;8</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.2) into account, which is too strict and might miss predictive SNPs. Amit et al. (<xref ref-type="bibr" rid="B9">Khera et al., 2018</xref>) constructed the PRS model across the whole genome and finally included a total of 409,258 individuals with 6,917,436 SNPs from the UK Biobank (UKB) project. The AUC was 0.730 after adjusting for age, sex, and the first four principal components for ancestry. This strategy has a slight improvement in prediction accuracy; however, the computational burden is relatively large.</p>
<p>To further explore the prediction capability of the PRS model in identifying high-risk individuals for T2D, we proposed a new strategy to construct PRS model by the following three-step filtering procedure to consider a statistical compromise between signal and noise. First, rather than including SNPs across the whole genome, we selected a subset of SNPs by a lenient significance threshold (<italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup>) from a huge number of SNPs included in large-scale GWASs. Second, we set <italic>r</italic><sup>2</sup> equal to 0.2, 0.4, 0.6, and 0.8 as candidate LD pruning thresholds according to <xref ref-type="bibr" rid="B9">Khera et al. (2018)</xref>. Third, we set <italic>p</italic>-value thresholds as 5&#x00D7;&#x2004;10<sup>&#x2212;2</sup>, 5&#x00D7;&#x2004;10<sup>&#x2212;4</sup>, 5&#x00D7;&#x2004;10<sup>&#x2212;6</sup>, and 5&#x00D7;&#x2004;10<sup>&#x2212;8</sup>. After applying the above thresholds to the GWAS summary data, a total of 16 candidate PRS models were then generated based on the PRSice-2 software in the target samples. We conducted testing using the UKB testing dataset (<italic>n</italic> = 182,422) to avoid the model overfitting issue. Finally, we chose the best predictive PRS model among a set of candidate PRS models and evaluated it in the UKB validation dataset (<italic>n</italic> = 262,751). We also considered non-genetic risk factors, including sex, age, physical measurements, and clinical factors to further increase prediction accuracy. Real data analysis showed that our PRS model outperforms previous prediction models for T2D.</p>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="S2.SS1">
<title>Study Design and Population</title>
<p>Our study was conducted based on the UKB project<sup><xref ref-type="fn" rid="footnote1">1</xref></sup>, one of the largest prospective cohort studies (<xref ref-type="bibr" rid="B5">Conroy et al., 2019</xref>). Nearly half a million participants aged 40&#x2013;69 years were enrolled from the United Kingdom at the time of their baseline assessment visited from 2006 to 2010 (<xref ref-type="bibr" rid="B25">Sudlow et al., 2015</xref>). A wide kind of physical measures (e.g., height, weight, blood pressure, and spirometry) and biological samples (e.g., blood, urine, and saliva) were collected. It then converted the limited information contained in the biological samples into widely shared cohort-wide genotyping (<xref ref-type="bibr" rid="B2">Bycroft et al., 2018</xref>) and whole-exome sequencing data (<xref ref-type="bibr" rid="B10">Khera et al., 2019</xref>). More details about the study design, method, and participants of the UKB project have been provided elsewhere (<xref ref-type="bibr" rid="B25">Sudlow et al., 2015</xref>).</p>
<p>A total of 487,409 individuals with available genotyping array and altogether 625,394 variants were originally collected from UKB. We conducted strict quality control (QC) steps described by <xref ref-type="bibr" rid="B14">Marees et al. (2018)</xref> based on PLINK 2.0 from <ext-link ext-link-type="uri" xlink:href="https://www.cog-genomics.org/plink2">https://www.cog-genomics.org/plink2</ext-link>. Specifically, we first filtered out SNPs and individuals with very high levels of missingness. Based on a relaxed threshold of 0.2 (&#x003E;20%), we removed 89,752 variants and 30,855 subjects. There were also 262,751 SNPs removed with minor allele frequency &#x003C;0.03 and 1,204 SNPs removed with a <italic>p</italic>-value of Hardy&#x2013;Weinberg equilibrium Fisher&#x2019;s exact test &#x003C; 1&#x00D7;10<sup>&#x2212;6</sup>. Finally, 456,451 individuals and 271,687 variants passed QC and were considered in the following analysis.</p>
<p>The ascertainment of T2D was based on a composite of self-report, the International Classification of Diseases, Ninth Revision (ICD-9) codes of 25000 and 25010, and the International Classification of Diseases, Tenth Revision (ICD-10) code of E11. The individual-level data of T2D-related risk factors, including sex, age, physical measures [e.g., BMI, waist circumference (WC), DBP, and SBP] and clinical factors [e.g., GL, CL, TL, high-density lipoprotein (HDL), low-density lipoprotein (LDL)] were also collected from the UKB project. We further imputed the inevitably missing values of these factors by their means. To analyze individuals with a relatively homogeneous ancestry, the population was constructed centrally based on a combination of self-reported ancestry and genetically confirmed ancestry using the first 10 principal components (i.e., PC<sub>1</sub>,&#x2026;,PC<sub>10</sub>). To construct, test, and further validate the robustness of the polygenic predictor of T2D, we randomly divided the overall data into two parts, i.e., the testing and validation dataset. We assigned 40% of all individuals as the UKB testing dataset (<italic>n</italic> = 182,422) and the remaining 60% as the UKB validation dataset (<italic>n</italic> = 274,029). Other ratios were also tried to divide the testing and validation datasets, i.e., 30&#x2013;70%, 50&#x2013;50%, 60&#x2013;40%, and 70&#x2013;30%. Individuals in the UKB validation dataset were distinct from those in the UKB testing dataset. The detail of the study design is described in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Flowchart for the polygenic risk score (PRS) model for type 2 diabetes.</p></caption>
<graphic xlink:href="fgene-12-632385-g001.tif"/>
</fig>
</sec>
<sec id="S2.SS2">
<title>Genome-Wide Polygenic Score Construction, Testing, and Validation</title>
<p>The PRS model provides a quantitative metric of an individual&#x2019;s inherited risk based on the cumulative impact of many SNPs. Generally, the PRS model can be unweighted or weighted. Suppose that we have <italic>n</italic> subjects and <italic>K</italic> SNPs that passed the first-step filtering procedure. The unweighted PRS model is defined as,</p>
<disp-formula id="S2.Ex1">
<mml:math id="M1">
<mml:mrow>
<mml:msub>
<mml:mtext>PRS</mml:mtext>
<mml:mi>u</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">&#x2026;</mml:mi>
<mml:mo>.</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>K</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>G</italic><sub><italic>k</italic></sub>(<italic>k</italic> = 1,&#x2026;.,<italic>K</italic>) denotes the number of risk alleles for each genetic variant coded as 0, 1, or 2 under the additive genetic model. For the weighted PRS model, weights are generally assigned to each genetic variant according to the strength of association with a given disease. The weighted PRS model can be written as,</p>
<disp-formula id="S2.Ex2">
<mml:math id="M2">
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mtext>PRS</mml:mtext>
<mml:mi>w</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mo stretchy="false">^</mml:mo>
</mml:mover>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo rspace="4.2pt">+</mml:mo>
<mml:mtext>&#x2026;</mml:mtext>
</mml:mrow>
<mml:mo rspace="4.2pt">,</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mo stretchy="false">^</mml:mo>
</mml:mover>
<mml:mi>K</mml:mi>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>K</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula><mml:math id="INEQ17"><mml:mrow><mml:mpadded width="+1.7pt"><mml:msub><mml:mover accent="true"><mml:mi mathvariant="normal">&#x03B2;</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub></mml:mpadded><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo rspace="4.2pt">,</mml:mo><mml:mpadded width="+1.7pt"><mml:mtext>&#x2026;</mml:mtext></mml:mpadded><mml:mo>,</mml:mo><mml:mi>K</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> is the estimate of marginal genetic effect in the external large-scale GWAS. Both unweighted or weighted PRS models can be implemented by the PRSice-2 software (<xref ref-type="bibr" rid="B4">Choi and O&#x2019;Reilly, 2019</xref>).</p>
<p>For PRS model construction, we used summary statistics from a T2D GWAS conducted among 60,786 participants with 12,056,346 SNPs of European ancestry<sup><xref ref-type="fn" rid="footnote2">2</xref></sup> (<xref ref-type="bibr" rid="B17">Morris et al., 2012</xref>). Note that the UKB samples did not overlap with the samples from discovery GWAS. We first selected SNPs according to their association <italic>p</italic>-values (<italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup>) obtained from the above GWAS, and 50,224 SNPs remained. We then considered multiple <italic>r</italic><sup>2</sup> thresholds (0.2, 0.4, 0.6, and 0.8) according to <xref ref-type="bibr" rid="B9">Khera et al. (2018)</xref> and <italic>p</italic>-value thresholds (5&#x00D7;&#x2004;10<sup>&#x2212;2</sup>,5&#x00D7;&#x2004;10<sup>&#x2212;4</sup>,5&#x00D7;&#x2004;10<sup>&#x2212;6</sup>, and 5&#x00D7;&#x2004;10<sup>&#x2212;8</sup>) to conduct the second and third filtering procedures also on the DIAGRAM summary dataset. A total of 16 candidate PRS models were created for T2D based on the UKB testing dataset with 182,422 participants.</p>
<p>The PRS model with the best discriminative accuracy was determined based on the maximal AUC in the following logistic regression model adjusting for sex, age, and the first 10 principal components of ancestry. We use <italic>X</italic><sub>1</sub>,<italic>X</italic><sub>2</sub> and <bold>PC</bold> = (PC<sub>1</sub>,&#x2026;,PC<sub>10</sub>)<sup><italic>T</italic></sup> to represent the value of sex, age, and the first 10 principal components of ancestry, respectively, where <italic>T</italic> denotes the transpose of a vector or matrix. Let <italic>Y</italic> be the T2D status with 0 and 1 representing control and case. The predictive model for T2D can be represented as,</p>
<disp-formula id="S2.Ex4">
<mml:math id="M3">
<mml:mrow>
<mml:mtext class="ltx_markedasmath">Logit</mml:mtext>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>Y</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo rspace="4.2pt">,</mml:mo>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo rspace="4.2pt">,</mml:mo>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mo rspace="4.2pt">,</mml:mo>
<mml:msub>
<mml:mtext>PRS</mml:mtext>
<mml:mi>w</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mrow>
<mml:mtext>PC</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>C</mml:mi>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>g</mml:mi>
</mml:msub>
<mml:msub>
<mml:mtext>PRS</mml:mtext>
<mml:mi>w</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where &#x03B2;<sub><italic>0</italic></sub> is the intercept, and &#x03B2;<sub>1</sub>,&#x03B2;<sub>2</sub>,&#x03B2;<sub>PC</sub>=(&#x03B2;<sub><italic>PC1</italic></sub>,&#x2026;,&#x03B2;<sub><italic>PC10</italic></sub>), and &#x03B2;<sub><italic>g</italic></sub> are the regression coefficients for <italic>X</italic><sub>1</sub>,<italic>X</italic><sub>2</sub>,<bold>PC</bold>,andPRS<sub><italic>w</italic></sub>. Then, the AUCs could be calculated with trapezoids (<xref ref-type="bibr" rid="B7">Fawcett, 2006</xref>), and their 95% confidence intervals (CI) could be computed by Delong&#x2019;s method (<xref ref-type="bibr" rid="B6">DeLong et al., 1988</xref>). Both AUC and their CI could be implemented directly by the &#x201C;pROC&#x201D; package<sup><xref ref-type="fn" rid="footnote3">3</xref></sup> within R 3.6.3<sup><xref ref-type="fn" rid="footnote4">4</xref></sup>. More details about this package are provided elsewhere (<xref ref-type="bibr" rid="B23">Robin et al., 2011</xref>). The best score created in the testing dataset carried forward into subsequent validation step.</p>
</sec>
<sec id="S2.SS3">
<title>Statistical Analysis in Validation Dataset</title>
<p>Baseline characteristics of the study population were described as means &#x00B1; standard deviations (M &#x00B1; SD) or percentages. Two independent sample <italic>t</italic>-test or chi-square test was used to compare the baseline characteristics between the UKB testing and validation datasets. Wilcoxon signed-rank test was applied to give more information about the difference of PRSs between the individuals with T2D and individuals without T2D. The relationship between PRS and T2D was determined in the UKB validation dataset based on logistic regression model adjusting for sex, age, and the first 10 principal components of ancestry (<italic>model</italic><sub><italic>1</italic></sub>), which can be represented as,</p>
<disp-formula id="S2.Ex5">
<mml:math id="M4">
<mml:mrow>
<mml:mrow>
<mml:mi>T2D</mml:mi>
<mml:mo>&#x223C;</mml:mo>
<mml:mrow>
<mml:mpadded width="+1.7pt">
<mml:mtext>PRS</mml:mtext>
</mml:mpadded>
<mml:mo rspace="4.2pt">+</mml:mo>
<mml:mpadded width="+1.7pt">
<mml:mtext>sex</mml:mtext>
</mml:mpadded>
<mml:mo rspace="4.2pt">+</mml:mo>
<mml:mi>age</mml:mi>
<mml:mo rspace="4.2pt">+</mml:mo>
<mml:mtext mathvariant="bold">PC</mml:mtext>
</mml:mrow>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>We stratified 274,029 participants in the UKB validation dataset as 100 groups according to the percentiles of the PRS, and then, the prevalence of T2D could be determined within each group.</p>
<p>To further observe the contribution of PRS, sex, age, physical measurements, and other clinical risk factors to T2D, we provided other four types of prediction models:</p>
<disp-formula id="S2.E1">
<label>(1)</label>
<mml:math id="M5">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo rspace="4.2pt">:</mml:mo>
<mml:mrow>
<mml:mi>T2D</mml:mi>
<mml:mo>&#x223C;</mml:mo>
<mml:mrow>
<mml:mpadded width="+1.7pt">
<mml:mtext>sex</mml:mtext>
</mml:mpadded>
<mml:mo rspace="4.2pt">+</mml:mo>
<mml:mi>age</mml:mi>
<mml:mo rspace="4.2pt">+</mml:mo>
<mml:mtext mathvariant="bold">PC</mml:mtext>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>;</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S2.E2">
<label>(2)</label>
<mml:math id="M6">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+1.7pt">
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
</mml:mpadded>
</mml:mrow>
<mml:mo rspace="4.2pt">:</mml:mo>
<mml:mrow>
<mml:mi>T2D</mml:mi>
<mml:mo>&#x223C;</mml:mo>
<mml:mtext>PRS</mml:mtext>
</mml:mrow>
</mml:mrow>
<mml:mo>;</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S2.Ex6">
<label>(3)</label>
<mml:math id="M7">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mtext>4</mml:mtext>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mtext>T2D</mml:mtext>
<mml:mo>~</mml:mo>
<mml:mtext>sex</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>age</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>+</mml:mo>
<mml:mtext>BMI</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>GL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>CL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>HDL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>LDL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>TL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>WC</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>DBP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>SBP</mml:mtext>
<mml:mo>;</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S2.Ex7">
<label>(4)</label>
<mml:math id="M8">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mtext>5</mml:mtext>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mtext>T2D</mml:mtext>
<mml:mo>~</mml:mo>
<mml:mtext>PRS</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>sex</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>age</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>+</mml:mo>
<mml:mtext>BMI</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>GL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>CL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>HDL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>LDL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>TL</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>WC</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>DBP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext>SBP</mml:mtext>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>We have checked and did not find the presence of collinearity among the above variables. All of the above statistical analyses were conducted using R version 3.6.3 software.</p>
</sec>
</sec>
<sec id="S3">
<title>Results</title>
<p>A total of 456,451 participants collected in UKB were divided into the UKB testing dataset (<italic>n</italic> = 182,422) and the validation dataset (<italic>n</italic> = 274,029) randomly. The mean ages of participants were 57 years old, and 54% were female in both testing and validation datasets. There were nearly 5.494% (<italic>n</italic> = 10,023) participants who were cases in the testing dataset and 5.575% (<italic>n</italic> = 15,277) in the validation dataset. All of these factors were comparable at baseline. The details of baseline characteristics are shown in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Baseline characteristics of the UK Biobank (UKB) testing dataset and the UKB validation dataset (<italic>M</italic> &#x00B1; <italic>SD</italic> or %).</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Variable</td>
<td valign="top" align="center">UKB testing (<italic>n</italic> = 182,422)</td>
<td valign="top" align="center">UKB validation (<italic>n</italic> = 274,029)</td>
<td valign="top" align="center">Statistics and <italic>p-</italic>value</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><bold>Sex</bold></td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Male (%)</td>
<td valign="top" align="center">83,200 (45.609)</td>
<td valign="top" align="center">125,670 (45.860)</td>
<td valign="top" align="center"><italic>x</italic><sup>2</sup> = 2.783, <italic>p</italic> = 0.095</td>
</tr>
<tr>
<td valign="top" align="left">Female (%)</td>
<td valign="top" align="center">99,222 (54.391)</td>
<td valign="top" align="center">148,359 (54.140)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Age (years)</td>
<td valign="top" align="center">56.777 &#x00B1; 8.020</td>
<td valign="top" align="center">56.809 &#x00B1; 8.009</td>
<td valign="top" align="center"><italic>t</italic> = &#x2212;1.341, <italic>p</italic> = 0.179</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Physical measurements</bold></td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">BMI (kg/m<sup>2</sup>)</td>
<td valign="top" align="center">27.388 &#x00B1; 4.758</td>
<td valign="top" align="center">27.404 &#x00B1; 4.765</td>
<td valign="top" align="center"><italic>t</italic> = &#x2212;1.087, <italic>p</italic> = 0.277</td>
</tr>
<tr>
<td valign="top" align="left">WC (cm)</td>
<td valign="top" align="center">90.250 &#x00B1; 13.485</td>
<td valign="top" align="center">90.306 &#x00B1; 13.505</td>
<td valign="top" align="center"><italic>t</italic> = &#x2212;1.135, <italic>p</italic> = 0.175</td>
</tr>
<tr>
<td valign="top" align="left">DBP (mmHg)</td>
<td valign="top" align="center">82.174 &#x00B1; 10.311</td>
<td valign="top" align="center">82.171 &#x00B1; 10.313</td>
<td valign="top" align="center"><italic>t</italic> = &#x2212;0.118, <italic>p</italic> = 0.906</td>
</tr>
<tr>
<td valign="top" align="left">SBP (mmHg)</td>
<td valign="top" align="center">139.924 &#x00B1; 19.000</td>
<td valign="top" align="center">139.917 &#x00B1; 19.000</td>
<td valign="top" align="center"><italic>t</italic> = &#x2212;0.116, <italic>p</italic> = 0.908</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Clinical factors</bold></td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">CL (mmol/L)</td>
<td valign="top" align="center">5.711 &#x00B1; 1.115</td>
<td valign="top" align="center">5.710 &#x00B1; 1.117</td>
<td valign="top" align="center"><italic>t</italic> = &#x2212;0.314, <italic>p</italic> = 0.753</td>
</tr>
<tr>
<td valign="top" align="left">GL (mmol/L)</td>
<td valign="top" align="center">5.119 &#x00B1; 1.134</td>
<td valign="top" align="center">5.118 &#x00B1; 1.132</td>
<td valign="top" align="center"><italic>t</italic> = 0.150, <italic>p</italic> = 0.881</td>
</tr>
<tr>
<td valign="top" align="left">TL (mmol/L)</td>
<td valign="top" align="center">1.753 &#x00B1; 1.002</td>
<td valign="top" align="center">1.753 &#x00B1; 1.000</td>
<td valign="top" align="center"><italic>t</italic> = &#x2212;0.010, <italic>p</italic> = 0.992</td>
</tr>
<tr>
<td valign="top" align="left">HDL (mmol/L)</td>
<td valign="top" align="center">1.452 &#x00B1; 0.357</td>
<td valign="top" align="center">1.453 &#x00B1; 0.358</td>
<td valign="top" align="center"><italic>t</italic> = &#x2212;0.625, <italic>p</italic> = 0.532</td>
</tr>
<tr>
<td valign="top" align="left">LDL (mmol/L)</td>
<td valign="top" align="center">3.556 &#x00B1; 0.839</td>
<td valign="top" align="center">3.556 &#x00B1; 0.841</td>
<td valign="top" align="center"><italic>t</italic> = &#x2212;0.083, <italic>p</italic> = 0.934</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Type 2 diabetes</bold></td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Case (%)</td>
<td valign="top" align="center">10,023 (5.494)</td>
<td valign="top" align="center">15,277 (5.575)</td>
<td valign="top" align="center"><italic>x</italic><sup>2</sup> = 1.342, <italic>p</italic> = 0.247</td>
</tr>
<tr>
<td valign="top" align="left">Control (%)</td>
<td valign="top" align="center">172,399 (94.506)</td>
<td valign="top" align="center">258,752 (94.425)</td>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>BMI, body mass index; CL, cholesterol level; DBP, diastolic blood pressure; GL, glucose level; HDL, high-density lipoprotein; LDL, low-density lipoprotein; SBP, systolic blood pressure; TL, triglyceride level; WC, waist circumference.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<p>To obtain an optimal PRS model, we generated a total of 16 candidate PRS models implemented by PRSice-2 software. We evaluated the performance of these 16 PRS models in the UKB testing dataset and chose the best one for further validation analysis. The AUCs of these 16 candidate PRS models ranged from 0.691 to 0.792 (<xref ref-type="table" rid="T2">Table 2</xref>). We selected the best PRS model with the highest AUC [AUC = 0.792, 95% CI: (0.787, 0.796)] based on 25,454 SNPs when <italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.2. The AUCs of different ratios of the testing and validation datasets are shown in <xref ref-type="table" rid="T3">Table 3</xref>. We can see that the AUCs of different ratios were very close to each other, which ranged from 0.791 to 0.795. The AUC of the 40&#x2013;60% ratio had the best performance in the validation dataset [AUC = 0.795, 95% CI: (0.790, 0.800)]. Additional details of PRS model construction, testing, and validation are provided in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>The predictive power of candidate polygenic risk score (PRS) models for type 2 diabetes (T2D).</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Tuning parameter</td>
<td valign="top" align="center">SNP number</td>
<td valign="top" align="center">AUC (95% CI)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;8</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.2</td>
<td valign="top" align="center">363</td>
<td valign="top" align="center">0.706 (0.701&#x2013;0.711)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;8</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.4</td>
<td valign="top" align="center">486</td>
<td valign="top" align="center">0.702 (0.697&#x2013;0.707)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;8</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.6</td>
<td valign="top" align="center">670</td>
<td valign="top" align="center">0.696 (0.691&#x2013;0.701)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;8</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.8</td>
<td valign="top" align="center">957</td>
<td valign="top" align="center">0.691 (0.686&#x2013;0.697)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;6</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.2</td>
<td valign="top" align="center">750</td>
<td valign="top" align="center">0.715 (0.710&#x2013;0.720)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;6</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.4</td>
<td valign="top" align="center">1,013</td>
<td valign="top" align="center">0.709 (0.704&#x2013;0.714)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;6</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.6</td>
<td valign="top" align="center">1,335</td>
<td valign="top" align="center">0.701 (0.696&#x2013;0.706)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;6</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.8</td>
<td valign="top" align="center">1,853</td>
<td valign="top" align="center">0.696 (0.691&#x2013;0.701)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;4</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.2</td>
<td valign="top" align="center">2,616</td>
<td valign="top" align="center">0.736 (0.732&#x2013;0.741)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;4</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.4</td>
<td valign="top" align="center">3,394</td>
<td valign="top" align="center">0.726 (0.721&#x2013;0.731)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;4</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.6</td>
<td valign="top" align="center">4,299</td>
<td valign="top" align="center">0.715 (0.710&#x2013;0.720)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;4</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.8</td>
<td valign="top" align="center">5,690</td>
<td valign="top" align="center">0.708 (0.703&#x2013;0.713)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.2</td>
<td valign="top" align="center"><bold>25,454</bold></td>
<td valign="top" align="center"><bold>0.792 (0.787&#x2013;0.796)</bold></td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.4</td>
<td valign="top" align="center">32,600</td>
<td valign="top" align="center">0.782 (0.777&#x2013;0.787)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.6</td>
<td valign="top" align="center">40,001</td>
<td valign="top" align="center">0.771 (0.766&#x2013;0.776)</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.8</td>
<td valign="top" align="center">50,224</td>
<td valign="top" align="center">0.760 (0.755&#x2013;0.765)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>AUC was determined using a logistic regression model adjusted for sex, age, and the first 10 principal components of ancestry. The highest AUC is denoted by the bold values.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Area under the receiver operating characteristics curves (AUCs) of different ratios of the testing and validation dataset when <italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup> and <italic>r</italic><sup>2</sup> &#x003C; 0.2.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Dataset</td>
<td valign="top" align="center">30&#x2013;70%</td>
<td valign="top" align="center">40&#x2013;60%</td>
<td valign="top" align="center">50&#x2013;50%</td>
<td valign="top" align="center">60&#x2013;40%</td>
<td valign="top" align="center">70&#x2013;30%</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Testing</td>
<td valign="top" align="center">0.791</td>
<td valign="top" align="center">0.792</td>
<td valign="top" align="center">0.794</td>
<td valign="top" align="center">0.795</td>
<td valign="top" align="center">0.794</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">(0.781&#x2013;0.791)</td>
<td valign="top" align="center">(0.787&#x2013;0.796)</td>
<td valign="top" align="center">(0.790&#x2013;0.800)</td>
<td valign="top" align="center">(0.791&#x2013;0.799)</td>
<td valign="top" align="center">(0.790&#x2013;0.799)</td>
</tr>
<tr>
<td valign="top" align="left">Validation</td>
<td valign="top" align="center">0.794</td>
<td valign="top" align="center">0.795</td>
<td valign="top" align="center">0.793</td>
<td valign="top" align="center">0.792</td>
<td valign="top" align="center">0.791</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">(0.790&#x2013;0.799)</td>
<td valign="top" align="center">(0.790&#x2013;0.800)</td>
<td valign="top" align="center">(0.789&#x2013;0.797)</td>
<td valign="top" align="center">(0.787&#x2013;0.796)</td>
<td valign="top" align="center">(0.781&#x2013;0.791)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>AUC was determined using a logistic regression model adjusted for sex, age, and first 10 principal components of ancestry.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<p>To facilitate interpretation, we scaled PRS to have zero mean and one standard deviation. We investigated whether our PRS model could identify individuals at high T2D risk. <xref ref-type="fig" rid="F2">Figure 2</xref> showed that the median of the standardized PRS was 0.941 for individuals with T2D versus &#x2212;0.056 for individuals without T2D, a difference of 0.997 (<italic>p</italic> &#x003C; 0.00001). From <xref ref-type="fig" rid="F3">Figure 3A</xref>, we found that the standardized PRS approximated a normal distribution across the population with the empirical risk of T2D rising sharply in the right tail of the distribution. The PRS model identified nearly 30% of the population at greater than or equal to fivefold risk, 12% of the population at greater than or equal to sixfold risk, and the top 7% of the population at greater than or equal to sevenfold increased risk for T2D shown in <xref ref-type="fig" rid="F3">Figure 3A</xref>. Then, we stratified the population according to the percentiles of the PRS and defined the top 10 percentiles as &#x201C;high risk&#x201D; group while the bottom 10 percentiles as &#x201C;low risk&#x201D; group. <xref ref-type="fig" rid="F3">Figure 3B</xref> showed the prevalence of T2D increases with the percentiles of the PRS model. There were 5,642 (18.698%) cases in &#x201C;high risk&#x201D; group among 30,174 individuals, while only 282 (0.935%) cases in the &#x201C;low risk&#x201D; group, corresponding to a nearly 20-fold increase in the risk of T2D comparing the top 10 percentiles versus the bottom 10 percentiles.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Polygenic risk score (PRS) among type 2 diabetes (T2D) cases versus controls in the UK Biobank (UKB) validation dataset.</p></caption>
<graphic xlink:href="fgene-12-632385-g002.tif"/>
</fig>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Risk for type 2 diabetes (T2D) according to polygenic risk score (PRS). <bold>(A)</bold> Distribution of PRS for T2D in the UK Biobank (UKB) validation dataset (<italic>n</italic> = 301,736). The <italic>x</italic>-axis represents PRS for T2D, which was scaled to have zero mean and one standard deviation. Dotted lines reflect the proportion of the population with five-, six-, and seven-fold increased risk versus the remainder of the population, respectively. The odds ratio was assessed in a logistic regression model adjusting for sex, age, and the first 10 principal components of ancestry. <bold>(B)</bold> Prevalence of T2D according to 100 groups of the UKB validation dataset stratified according to the percentile of the PRS for T2D.</p></caption>
<graphic xlink:href="fgene-12-632385-g003.tif"/>
</fig>
<p>We further investigated the contribution of polygenic predictor, sex, age, physical measurements, and clinical factors in identifying individuals at high risk of T2D. <xref ref-type="table" rid="T4">Table 4</xref> showed that the AUCs of <italic>model</italic><sub><italic>3</italic></sub>, which only included PRS into the prediction model without adjusting for any other covariates, was 0.749 [95% CI: (0.744,0.754)] in the testing dataset and 0.755 [95% CI: (0.752, 0.755)] in the validation dataset. Interestingly, if only considering sex, age, and the first 10 principal components of ancestry into the model, the AUC was 0.667 [95% CI: (0.663, 0.672)]. After adding PRS, the AUC reached 0.795 [95% CI: (0.790, 0.800)], which increased about 13% than <italic>model</italic><sub><italic>2</italic></sub>. The AUC of <italic>model</italic><sub><italic>4</italic></sub> (i.e., considering sex, age, <bold><italic>PC</italic></bold>, BMI, WC, DBP, SBP, GL, CL, HDL, LDL, and TL simultaneously) was 0.880 [95% CI: (0.878, 0.888)] and raised to 0.901 [95% CI: (0.897, 0.904)] in the validation dataset when adding PRS into the model. In brief, the polygenic score indeed helps to identify high-risk individuals for T2D, while the role of T2D-related covariates could also help increase prediction accuracy. As showed in <xref ref-type="table" rid="T5">Table 5</xref>, PRS, sex, age, physical measurements, and most clinical factors were all significantly associated with T2D (<italic>p</italic> &#x003C; 0.0001).</p>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>Area under the receiver operating characteristics curve (AUC) of different models in the testing and validation dataset.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Dataset</td>
<td valign="top" align="center">Mean</td>
<td valign="top" align="center"><italic>model</italic><sub><italic>2</italic></sub></td>
<td valign="top" align="center"><italic>model</italic><sub><italic>3</italic></sub></td>
<td valign="top" align="center"><italic>model</italic><sub><italic>1</italic></sub></td>
<td valign="top" align="center"><italic>model</italic><sub><italic>4</italic></sub></td>
<td valign="top" align="center"><italic>model</italic><sub><italic>5</italic></sub></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Testing</td>
<td valign="top" align="center">&#x2212;0.003</td>
<td valign="top" align="center">0.671 (0.666&#x2013;0.676)</td>
<td valign="top" align="center">0.749 (0.744&#x2013;0.754)</td>
<td valign="top" align="center">0.792 (0.787&#x2013;0.796)</td>
<td valign="top" align="center">0.886 (0.882&#x2013;0.889)</td>
<td valign="top" align="center">0.902 (0.899&#x2013;0.905)</td>
</tr>
<tr>
<td valign="top" align="left">Validation</td>
<td valign="top" align="center">&#x2212;0.003</td>
<td valign="top" align="center">0.667 (0.663&#x2013;0.672)</td>
<td valign="top" align="center">0.755 (0.752&#x2013;0.755)</td>
<td valign="top" align="center">0.795 (0.790&#x2013;0.800)</td>
<td valign="top" align="center">0.882 (0.878&#x2013;0.888)</td>
<td valign="top" align="center">0.901 (0.897&#x2013;0.904)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic><italic>model</italic><sub><italic>1</italic></sub>: AUC was determined using a logistic regression model adjusted for sex, age, and the first 10 principal components of ancestry. <italic>model</italic><sub><italic>2</italic></sub>: AUC was determined using a logistic regression model only considering sex and age. model<sub>3</sub> : AUC was determined using a logistic regression model only considering genome-wide polygenic score. <italic>model</italic><sub><italic>4</italic></sub>: AUC was determined using a logistic regression model considering demographic factors, physical measurements, and clinical factors. model<sub>5</sub> : AUC was determined using a logistic regression model adjusted for sex, age, body mass index, waist circumference, diastolic blood pressure, systolic blood pressure, glucose level, cholesterol level, high-density lipoprotein, low-density lipoprotein, triglyceride level, and the first 10 principal components of ancestry.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T5">
<label>TABLE 5</label>
<caption><p>Parameter estimations under <italic>model</italic><sub><italic>5</italic></sub> in validation dataset.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Variables</td>
<td valign="top" align="center">Estimate beta</td>
<td valign="top" align="center">Stand error</td>
<td valign="top" align="right"><italic>Z</italic></td>
<td valign="top" align="center"><italic>p</italic>-value</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">(Intercept)</td>
<td valign="top" align="center">24.500</td>
<td valign="top" align="center">0.495</td>
<td valign="top" align="right">49.474</td>
<td valign="top" align="center">&#x003C; 2&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">PRS</td>
<td valign="top" align="center">12370.000</td>
<td valign="top" align="center">167.400</td>
<td valign="top" align="right">73.943</td>
<td valign="top" align="center">&#x003C; 2&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">CL</td>
<td valign="top" align="center">&#x2212;0.591</td>
<td valign="top" align="center">0.057</td>
<td valign="top" align="right">&#x2212;10.377</td>
<td valign="top" align="center">&#x003C; 2&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">HDL</td>
<td valign="top" align="center">0.051</td>
<td valign="top" align="center">0.063</td>
<td valign="top" align="right">0.876</td>
<td valign="top" align="center">0.381</td>
</tr>
<tr>
<td valign="top" align="left">LDL</td>
<td valign="top" align="center">0.010</td>
<td valign="top" align="center">0.068</td>
<td valign="top" align="right">0.140</td>
<td valign="top" align="center">0.888</td>
</tr>
<tr>
<td valign="top" align="left">TL</td>
<td valign="top" align="center">0.285</td>
<td valign="top" align="center">0.013</td>
<td valign="top" align="right">21.826</td>
<td valign="top" align="center">&#x003C; 2&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">Sex</td>
<td valign="top" align="center">&#x2212;0.214</td>
<td valign="top" align="center">0.028</td>
<td valign="top" align="right">&#x2212;7.731</td>
<td valign="top" align="center">1.070&#x00D7;&#x2004;10<sup>&#x2212;14</sup></td>
</tr>
<tr>
<td valign="top" align="left">WC</td>
<td valign="top" align="center">0.045</td>
<td valign="top" align="center">0.002</td>
<td valign="top" align="right">28.356</td>
<td valign="top" align="center">&#x003C; 2&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">BMI</td>
<td valign="top" align="center">0.036</td>
<td valign="top" align="center">0.004</td>
<td valign="top" align="right">9.325</td>
<td valign="top" align="center">&#x003C; 2&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">Age</td>
<td valign="top" align="center">0.060</td>
<td valign="top" align="center">0.002</td>
<td valign="top" align="right">38.401</td>
<td valign="top" align="center">&#x003C; 2&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">DBP</td>
<td valign="top" align="center">&#x2212;0.018</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="right">&#x2212;13.928</td>
<td valign="top" align="center">&#x003C; 2&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">SBP</td>
<td valign="top" align="center">0.005</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="right">7.626</td>
<td valign="top" align="center">2.410&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">GL</td>
<td valign="top" align="center">0.449</td>
<td valign="top" align="center">0.006</td>
<td valign="top" align="right">69.917</td>
<td valign="top" align="center">&#x003C; 2&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
<tr>
<td valign="top" align="left">PC10</td>
<td valign="top" align="center">0.020</td>
<td valign="top" align="center">0.004</td>
<td valign="top" align="right">4.726</td>
<td valign="top" align="center">2.280&#x00D7;&#x2004;10<sup>&#x2212;16</sup></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic>BMI, body mass index; CL, cholesterol level; DBP, diastolic blood pressure; GL, glucose level; PRS, genome-wide polygenic score; HDL, high-density lipoprotein; LDL, low-density lipoprotein; SBP, systolic blood pressure; TL, triglyceride level; WC, waist circumference.</italic></attrib>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S4">
<title>Discussion</title>
<p>Our results showed that the AUC of the best PRS model was 0.795 after adjusting for sex, age, and the first 10 principal components of ancestry. It demonstrated that the PRS was really helpful for identifying individuals at high risk of developing T2D. Meanwhile, the distributions of the PRS in cases and controls were substantially different from each other, i.e., the median PRS of cases (0.941) was much higher than that of the controls (&#x2212;0.056). Moreover, about 30% of participants were at greater than or equal to fivefold increased risk of developing T2D, 12% were at greater than or equal to sixfold risk, and the top 7% were at greater than or equal to sevenfold increased risk. Particularly, the stratified PRS according to their percentiles showed that the &#x201C;high-risk&#x201D; group is strongly associated with the risk of T2D.</p>
<p>The above results suggest that our PRS model can be used as a powerful tool in identifying individuals at high risk of T2D; improved previous studies that summarized in <xref ref-type="table" rid="T6">Table 6</xref>. The AUC of the PRS model assessed with only three SNPs that had been published to predispose to T2D in 6,078 individuals was 0.571 (<xref ref-type="bibr" rid="B28">Weedon et al., 2006</xref>). After including more SNPs, <xref ref-type="bibr" rid="B12">Lango et al. (2008)</xref> constructed a PRS model with 18 SNPs and obtained an AUC of 0.600 (<xref ref-type="bibr" rid="B12">Lango et al., 2008</xref>). A later study with 22 SNPs had an AUC of 0.570 (<xref ref-type="bibr" rid="B3">Chatterjee et al., 2013</xref>) and allowed for the identification of 3.0% of the population at twofold or higher than average risk for T2D. Notably, the above three studies with smaller sample sizes (range from 4,907 to 39,117), and a smaller number of SNPs (range from 3 to 22) had relatively poor predictive performance compared to our study (AUC = 0.755) with 25,454 SNPs among 274,029 individuals.</p>
<table-wrap position="float" id="T6">
<label>TABLE 6</label>
<caption><p>A comprehensive comparison with other researches.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Year</td>
<td valign="top" align="center">SNP</td>
<td valign="top" align="center">N</td>
<td valign="top" align="center">Case\Control</td>
<td valign="top" align="center">Case/N (%)</td>
<td valign="top" align="center">Dataset</td>
<td valign="top" align="center">AUC</td>
<td valign="top" align="center">Ethnicity</td>
<td valign="top" align="left">Covariates</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B28">Weedon et al., 2006</xref></td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">6,078</td>
<td valign="top" align="center">2,409\3,669</td>
<td valign="top" align="center">39</td>
<td valign="top" align="center">UKCS</td>
<td valign="top" align="center">0.571</td>
<td valign="top" align="center">British</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B12">Lango et al., 2008</xref></td>
<td valign="top" align="center">18</td>
<td valign="top" align="center">4,907</td>
<td valign="top" align="center">2,309\2598</td>
<td valign="top" align="center">47</td>
<td valign="top" align="center">GoDARTS</td>
<td valign="top" align="center">0.600</td>
<td valign="top" align="center">Scotland</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B12">Lango et al., 2008</xref></td>
<td valign="top" align="center">18</td>
<td valign="top" align="center">4,907</td>
<td valign="top" align="center">2,309\2598</td>
<td valign="top" align="center">47</td>
<td valign="top" align="center">GoDARTS</td>
<td valign="top" align="center">0.800</td>
<td valign="top" align="center">Scotland</td>
<td valign="top" align="left">Age, BMI, and sex</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B13">Lyssenko et al., 2008</xref></td>
<td valign="top" align="center">16</td>
<td valign="top" align="center">18,831</td>
<td valign="top" align="center">2,201\16,630</td>
<td valign="top" align="center">11.68</td>
<td valign="top" align="center">MPP and BS</td>
<td valign="top" align="center">0.750</td>
<td valign="top" align="center">Finland</td>
<td valign="top" align="left">Sex, age, family history, BMI, BP, TL, and GL</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B3">Chatterjee et al., 2013</xref></td>
<td valign="top" align="center">22</td>
<td valign="top" align="center">39,117</td>
<td valign="top" align="center">130\38,987</td>
<td valign="top" align="center">0.3</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.570</td>
<td valign="top" align="center">Caucasian</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B3">Chatterjee et al., 2013</xref></td>
<td valign="top" align="center">22</td>
<td valign="top" align="center">39,117</td>
<td valign="top" align="center">130\38,987</td>
<td valign="top" align="center">0.3</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.740</td>
<td valign="top" align="center">Caucasian</td>
<td valign="top" align="left">Sex, age, and family history</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B11">L&#x00E4;ll et al., 2017</xref></td>
<td valign="top" align="center">1,000</td>
<td valign="top" align="center">10,273</td>
<td valign="top" align="center">1,181\9,092</td>
<td valign="top" align="center">11.5</td>
<td valign="top" align="center">EBC</td>
<td valign="top" align="center">0.74</td>
<td valign="top" align="center">Estonia</td>
<td valign="top" align="left">Sex and age</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B11">L&#x00E4;ll et al., 2017</xref></td>
<td valign="top" align="center">1,000</td>
<td valign="top" align="center">10,273</td>
<td valign="top" align="center">1,181\9,092</td>
<td valign="top" align="center">11.5</td>
<td valign="top" align="center">EBC</td>
<td valign="top" align="center">0.767</td>
<td valign="top" align="center">Estonia</td>
<td valign="top" align="left">Sex, age, and BMI</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B11">L&#x00E4;ll et al., 2017</xref></td>
<td valign="top" align="center">1,000</td>
<td valign="top" align="center">10,273</td>
<td valign="top" align="center">1,181\9,092</td>
<td valign="top" align="center">11.5</td>
<td valign="top" align="center">EBC</td>
<td valign="top" align="center">0.790</td>
<td valign="top" align="center">Estonia</td>
<td valign="top" align="left">Sex, age, BMI, BP, GL, physical activity, smoking, and food consumption</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B9">Khera et al., 2018</xref></td>
<td valign="top" align="center">6,917,436</td>
<td valign="top" align="center">288,978</td>
<td valign="top" align="center">5,853\283,125</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">UKB</td>
<td valign="top" align="center">0.730</td>
<td valign="top" align="center">British</td>
<td valign="top" align="left">Sex and age</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013;</td>
<td valign="top" align="center">25,454</td>
<td valign="top" align="center">274,029</td>
<td valign="top" align="center">18,176\283,560</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">UKB</td>
<td valign="top" align="center">0.755</td>
<td valign="top" align="center">British</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013;</td>
<td valign="top" align="center">25,454</td>
<td valign="top" align="center">274,029</td>
<td valign="top" align="center">18,176\283,560</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">UKB</td>
<td valign="top" align="center">0.795</td>
<td valign="top" align="center">British</td>
<td valign="top" align="left">Sex and age</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013;</td>
<td valign="top" align="center">25,454</td>
<td valign="top" align="center">274,029</td>
<td valign="top" align="center">18,176\283,560</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">UKB</td>
<td valign="top" align="center">0.901</td>
<td valign="top" align="center">British</td>
<td valign="top" align="left">Sex, age, WC, BMI, SBP, DBP, GL, CL, TL, HDL, and LDL</td>
</tr>
</tbody>
</table></table-wrap>
<p>In addition, we highlight the role of non-genetic risk factors, i.e., sex, age, physical measurements, and clinical factors. When adjusting for sex and age, <xref ref-type="bibr" rid="B16">Meigs et al. (2008)</xref> obtained an AUC of 0.581 among 2,776 individuals, <xref ref-type="bibr" rid="B26">Vassy et al. (2014)</xref> provided an AUC of 0.726 among 11,883 people, and the AUC of <xref ref-type="bibr" rid="B11">L&#x00E4;ll et al. (2017)</xref> reached 0.740. Interestingly, the study that handled nearly 7 million variants in 288,978 individuals only generated an AUC of 0.730 after adding sex and age, which was smaller than ours (0.795) including only 25,454 SNPs (<xref ref-type="bibr" rid="B9">Khera et al., 2018</xref>). They further reported that 3.5% of the population had inherited a genetic predisposition that conferred greater than or equal to threefold increased risk for T2D, 0.2% of the population greater than or equal to fourfold, and 0.05 of the population greater than or equal to fivefold. Their study differs from ours in four aspects. First, our study has larger sample size (456,451 versus 409,258). Second, we first perform SNP selection based on genome-wide association <italic>p</italic>-values (<italic>p</italic>&#x2264;&#x2004;5&#x00D7;&#x2004;10<sup>&#x2212;2</sup>) so that we included more predictive SNPs (25,454) and avoided spurious SNPs into our PRS model. Third, they used the first 4 principal components of ancestry, while we used the first 10 principal components of ancestry for a better control of population stratification. Fourth, we generated PRS based on the more computationally efficient and scalable PRSice-2 software, while they used LDpred program (<xref ref-type="bibr" rid="B22">Ripke et al., 2015</xref>), which is much slower than PRSice-2. Those differences explain why our PRS model has better predictive power. Certainly, we also tried to incorporate more non-genetic risk factors, and the AUC increased from 0.755 to 0.901. Our study is thus more accurate in identifying individuals at low and high risk of developing T2D.</p>
<p>Our study has multiple strengths. First, we construct the PRS model based on the UKB dataset, which is one of the largest prospective cohort studies with comprehensive and abundant personal information, as well as high-quality genotyping data in the world. Second, we choose SNPs into our PRS model based on our proposed three-step filtering procedure. This approach is simple to implement and has a very good prediction performance. Third, we include new physical measurements and clinical factors (i.e., WC, DBP, HDL, and LDL) in our predictive model to increase prediction accuracy. Fourth, we adopted a new PRS software PRSice-2, which has been shown to outperform other competing methods and software in terms of prediction accuracy and computational speeds (<xref ref-type="bibr" rid="B4">Choi and O&#x2019;Reilly, 2019</xref>).</p>
<p>Although the present study has made important contributions in identifying individuals with increased risk of developing T2D; however, there exists one major limitation. Individuals in the UKB dataset are primarily European ancestry; the specific PRS calculated here may not have optimal predictive power for other ethnic groups because the allele frequencies, LD patterns, and effect sizes of common SNPs may be different across populations with different ethnic backgrounds.</p>
<p>In conclusion, our findings show that the PRS model is highly predictive of T2D risk even based on genetic data only, and the prediction accuracy improves after including non-genetic risk factors, suggesting that our PRS model can be used as a powerful tool for preventive T2D screening.</p>
</sec>
<sec id="S5">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.</p>
</sec>
<sec id="S6">
<title>Author Contributions</title>
<p>ZL and TH initiated the study. WL developed the strategy, performed the data analysis, and completed the manuscript writing. ZZ and WW contributed to the data collection and manuscript reservation. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> The work was supported by grants from the start-up research fund at University of Hong Kong and the National Key Research and Development Project (2019YFC2003400).</p>
</fn>
</fn-group>
<ack>
<p>The authors thank the UK Biobank project for providing the individual-level data to support out analysis and consortium DIAGRAM for sharing their summary-level data for type 2 diabetes freely. The authors also thank the associate editor and reviewers for their constructive comments.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burton</surname> <given-names>P. R.</given-names></name> <name><surname>Clayton</surname> <given-names>D. G.</given-names></name> <name><surname>Cardon</surname> <given-names>L. R.</given-names></name> <name><surname>Craddock</surname> <given-names>N.</given-names></name> <name><surname>Deloukas</surname> <given-names>P.</given-names></name> <name><surname>Duncanson</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2007</year>). <article-title>Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.</article-title> <source><italic>Nature</italic></source> <volume>447</volume> <fpage>661</fpage>&#x2013;<lpage>678</lpage>. <pub-id pub-id-type="doi">10.1038/nature05911</pub-id> <pub-id pub-id-type="pmid">17554300</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bycroft</surname> <given-names>C.</given-names></name> <name><surname>Freeman</surname> <given-names>C.</given-names></name> <name><surname>Petkova</surname> <given-names>D.</given-names></name> <name><surname>Band</surname> <given-names>G.</given-names></name> <name><surname>Elliott</surname> <given-names>L. T.</given-names></name> <name><surname>Sharp</surname> <given-names>K.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>The UK Biobank resource with deep phenotyping and genomic data.</article-title> <source><italic>Nature</italic></source> <volume>562</volume> <fpage>203</fpage>&#x2013;<lpage>209</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-018-0579-z</pub-id> <pub-id pub-id-type="pmid">30305743</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chatterjee</surname> <given-names>N.</given-names></name> <name><surname>Wheeler</surname> <given-names>B.</given-names></name> <name><surname>Sampson</surname> <given-names>J.</given-names></name> <name><surname>Hartge</surname> <given-names>P.</given-names></name> <name><surname>Chanock</surname> <given-names>S. J.</given-names></name> <name><surname>Park</surname> <given-names>J.-H.</given-names></name></person-group> (<year>2013</year>). <article-title>Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies.</article-title> <source><italic>Nat. Genet.</italic></source> <volume>45</volume> <fpage>400</fpage>&#x2013;<lpage>405</lpage>. <pub-id pub-id-type="doi">10.1038/ng.2579</pub-id> <pub-id pub-id-type="pmid">23455638</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choi</surname> <given-names>S. W.</given-names></name> <name><surname>O&#x2019;Reilly</surname> <given-names>P. F.</given-names></name></person-group> (<year>2019</year>). <article-title>PRSice-2: polygenic risk score software for biobank-scale data.</article-title> <source><italic>GigaScience</italic></source> <volume>8</volume>:giz082. <pub-id pub-id-type="doi">10.1093/gigascience/giz082</pub-id> <pub-id pub-id-type="pmid">31307061</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Conroy</surname> <given-names>M.</given-names></name> <name><surname>Sellors</surname> <given-names>J.</given-names></name> <name><surname>Effingham</surname> <given-names>M.</given-names></name> <name><surname>Littlejohns</surname> <given-names>T. J.</given-names></name> <name><surname>Boultwood</surname> <given-names>C.</given-names></name> <name><surname>Gillions</surname> <given-names>L.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>The advantages of UK Biobank&#x2019;s open-access strategy for health research.</article-title> <source><italic>J. Intern. Med.</italic></source> <volume>286</volume> <fpage>389</fpage>&#x2013;<lpage>397</lpage>. <pub-id pub-id-type="doi">10.1111/joim.12955</pub-id> <pub-id pub-id-type="pmid">31283063</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>DeLong</surname> <given-names>E. R.</given-names></name> <name><surname>DeLong</surname> <given-names>D. M.</given-names></name> <name><surname>Clarke-Pearson</surname> <given-names>D. L.</given-names></name></person-group> (<year>1988</year>). <article-title>Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.</article-title> <source><italic>Biometrics</italic></source> <volume>44</volume> <fpage>837</fpage>&#x2013;<lpage>845</lpage>. <pub-id pub-id-type="doi">10.2307/2531595</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fawcett</surname> <given-names>T.</given-names></name></person-group> (<year>2006</year>). <article-title>An introduction to ROC analysis.</article-title> <source><italic>Pattern Recogn. Lett.</italic></source> <volume>27</volume> <fpage>861</fpage>&#x2013;<lpage>874</lpage>. <pub-id pub-id-type="doi">10.1016/j.patrec.2005.10.010</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Herder</surname> <given-names>C.</given-names></name> <name><surname>Roden</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <article-title>Genetics of type 2 diabetes: pathophysiologic and clinical relevance.</article-title> <source><italic>Eur. J. Clin. Invest.</italic></source> <volume>41</volume> <fpage>679</fpage>&#x2013;<lpage>692</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-2362.2010.02454.x</pub-id> <pub-id pub-id-type="pmid">21198561</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khera</surname> <given-names>A. V.</given-names></name> <name><surname>Chaffin</surname> <given-names>M.</given-names></name> <name><surname>Aragam</surname> <given-names>K. G.</given-names></name> <name><surname>Haas</surname> <given-names>M. E.</given-names></name> <name><surname>Roselli</surname> <given-names>C.</given-names></name> <name><surname>Choi</surname> <given-names>S. H.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations.</article-title> <source><italic>Nat. Genet.</italic></source> <volume>50</volume> <fpage>1219</fpage>&#x2013;<lpage>1224</lpage>. <pub-id pub-id-type="doi">10.1038/s41588-018-0183-z</pub-id> <pub-id pub-id-type="pmid">30104762</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khera</surname> <given-names>A. V.</given-names></name> <name><surname>Chaffin</surname> <given-names>M.</given-names></name> <name><surname>Wade</surname> <given-names>K. H.</given-names></name> <name><surname>Zahid</surname> <given-names>S.</given-names></name> <name><surname>Brancale</surname> <given-names>J.</given-names></name> <name><surname>Xia</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Polygenic prediction of weight and obesity trajectories from birth to adulthood.</article-title> <source><italic>Cell</italic></source> <volume>177</volume> <issue>587.e</issue> <fpage>9</fpage>&#x2013;<lpage>596</lpage>.e9.</citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>L&#x00E4;ll</surname> <given-names>K.</given-names></name> <name><surname>M&#x00E4;gi</surname> <given-names>R.</given-names></name> <name><surname>Morris</surname> <given-names>A.</given-names></name> <name><surname>Metspalu</surname> <given-names>A.</given-names></name> <name><surname>Fischer</surname> <given-names>K.</given-names></name></person-group> (<year>2017</year>). <article-title>Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores.</article-title> <source><italic>Genet. Med.</italic></source> <volume>19</volume> <fpage>322</fpage>&#x2013;<lpage>329</lpage>. <pub-id pub-id-type="doi">10.1038/gim.2016.103</pub-id> <pub-id pub-id-type="pmid">27513194</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lango</surname> <given-names>H.</given-names></name> <name><surname>Palmer</surname> <given-names>C. N.</given-names></name> <name><surname>Morris</surname> <given-names>A. D.</given-names></name> <name><surname>Zeggini</surname> <given-names>E.</given-names></name> <name><surname>Hattersley</surname> <given-names>A. T.</given-names></name> <name><surname>McCarthy</surname> <given-names>M. I.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Assessing the combined impact of 18 common genetic variants of modest effect sizes on type 2 diabetes risk.</article-title> <source><italic>Diabetes Metab. Res. Rev.</italic></source> <volume>57</volume> <fpage>3129</fpage>&#x2013;<lpage>3135</lpage>. <pub-id pub-id-type="doi">10.2337/db08-0504</pub-id> <pub-id pub-id-type="pmid">18591388</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lyssenko</surname> <given-names>V.</given-names></name> <name><surname>Jonsson</surname> <given-names>A.</given-names></name> <name><surname>Almgren</surname> <given-names>P.</given-names></name> <name><surname>Pulizzi</surname> <given-names>N.</given-names></name> <name><surname>Isomaa</surname> <given-names>B.</given-names></name> <name><surname>Tuomi</surname> <given-names>T.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Clinical risk factors, DNA variants, and the development of type 2 diabetes.</article-title> <source><italic>New Engl. J. Med.</italic></source> <volume>359</volume> <fpage>2220</fpage>&#x2013;<lpage>2232</lpage>. <pub-id pub-id-type="doi">10.1056/nejmoa0801869</pub-id> <pub-id pub-id-type="pmid">19020324</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marees</surname> <given-names>A. T.</given-names></name> <name><surname>de Kluiver</surname> <given-names>H.</given-names></name> <name><surname>Stringer</surname> <given-names>S.</given-names></name> <name><surname>Vorspan</surname> <given-names>F.</given-names></name> <name><surname>Curis</surname> <given-names>E.</given-names></name> <name><surname>Marie-Claire</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>A tutorial on conducting genome-wide association studies: quality control and statistical analysis.</article-title> <source><italic>Int. J. Methods Psychiatr. Res.</italic></source> <volume>27</volume>:e1608. <pub-id pub-id-type="doi">10.1002/mpr.1608</pub-id> <pub-id pub-id-type="pmid">29484742</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCarthy</surname> <given-names>M. I.</given-names></name></person-group> (<year>2010</year>). <article-title>Genomics, type 2 diabetes, and obesity.</article-title> <source><italic>N. Engl. J. Med.</italic></source> <volume>363</volume> <fpage>2339</fpage>&#x2013;<lpage>2350</lpage>.</citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meigs</surname> <given-names>J. B.</given-names></name> <name><surname>Shrader</surname> <given-names>P.</given-names></name> <name><surname>Sullivan</surname> <given-names>L. M.</given-names></name> <name><surname>McAteer</surname> <given-names>J. B.</given-names></name> <name><surname>Fox</surname> <given-names>C. S.</given-names></name> <name><surname>Dupuis</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>Genotype score in addition to common risk factors for prediction of type 2 diabetes.</article-title> <source><italic>N. Engl. J. Med.</italic></source> <volume>359</volume> <fpage>2208</fpage>&#x2013;<lpage>2219</lpage>. <pub-id pub-id-type="doi">10.1056/nejmoa0804742</pub-id> <pub-id pub-id-type="pmid">19020323</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morris</surname> <given-names>A. P.</given-names></name> <name><surname>Voight</surname> <given-names>B. F.</given-names></name> <name><surname>Teslovich</surname> <given-names>T. M.</given-names></name> <name><surname>Ferreira</surname> <given-names>T.</given-names></name> <name><surname>Segre</surname> <given-names>A. V.</given-names></name> <name><surname>Steinthorsdottir</surname> <given-names>V.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes.</article-title> <source><italic>Nat. Genet.</italic></source> <volume>44</volume>:981. <pub-id pub-id-type="doi">10.1038/ng.2383</pub-id> <pub-id pub-id-type="pmid">22885922</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Palmer</surname> <given-names>N. D.</given-names></name> <name><surname>McDonough</surname> <given-names>C. W.</given-names></name> <name><surname>Hicks</surname> <given-names>P. J.</given-names></name> <name><surname>Roh</surname> <given-names>B. H.</given-names></name> <name><surname>Wing</surname> <given-names>M. R.</given-names></name> <name><surname>An</surname> <given-names>S. S.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>A genome-wide association search for type 2 diabetes genes in African Americans.</article-title> <source><italic>PLoS One</italic></source> <volume>7</volume>:e29202. <pub-id pub-id-type="doi">10.1371/journal.pone.0029202</pub-id> <pub-id pub-id-type="pmid">22238593</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>P&#x00E4;rna</surname> <given-names>K.</given-names></name> <name><surname>Snieder</surname> <given-names>H.</given-names></name> <name><surname>L&#x00E4;ll</surname> <given-names>K.</given-names></name> <name><surname>Fischer</surname> <given-names>K.</given-names></name> <name><surname>Nolte</surname> <given-names>I.</given-names></name></person-group> (<year>2020</year>). <article-title>Validating the doubly weighted genetic risk score for the prediction of type 2 diabetes in the lifelines and estonian biobank cohorts.</article-title> <source><italic>Genet. Epidemiol.</italic></source> <volume>44</volume> <fpage>589</fpage>&#x2013;<lpage>600</lpage>. <pub-id pub-id-type="doi">10.1002/gepi.22327</pub-id> <pub-id pub-id-type="pmid">32537749</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prasad</surname> <given-names>R. B.</given-names></name> <name><surname>Groop</surname> <given-names>L.</given-names></name></person-group> (<year>2015</year>). <article-title>Genetics of type 2 diabetes&#x2014;pitfalls and possibilities.</article-title> <source><italic>Genes</italic></source> <volume>6</volume> <fpage>87</fpage>&#x2013;<lpage>123</lpage>. <pub-id pub-id-type="doi">10.3390/genes6010087</pub-id> <pub-id pub-id-type="pmid">25774817</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Priv&#x00E9;</surname> <given-names>F.</given-names></name> <name><surname>Vilhj&#x00E1;lmsson</surname> <given-names>B. J.</given-names></name> <name><surname>Aschard</surname> <given-names>H.</given-names></name> <name><surname>Blum</surname> <given-names>M. G. B.</given-names></name></person-group> (<year>2019</year>). <article-title>Making the most of clumping and thresholding for polygenic scores.</article-title> <source><italic>Am. J. Hum. Genet.</italic></source> <volume>105</volume> <fpage>1213</fpage>&#x2013;<lpage>1221</lpage>. <pub-id pub-id-type="doi">10.1016/j.ajhg.2019.11.001</pub-id> <pub-id pub-id-type="pmid">31761295</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ripke</surname> <given-names>S.</given-names></name> <name><surname>Neale</surname> <given-names>B.</given-names></name> <name><surname>Corvin</surname> <given-names>A.</given-names></name> <name><surname>Walters</surname> <given-names>J. R.</given-names></name> <name><surname>Farh</surname> <given-names>K. H.</given-names></name> <name><surname>Holmans</surname> <given-names>P.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Modeling linkage disequilibrium increases accuracy of polygenic risk scores.</article-title> <source><italic>Am. J. Hum. Genet.</italic></source> <volume>97</volume> <fpage>576</fpage>&#x2013;<lpage>592</lpage>.</citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Robin</surname> <given-names>X.</given-names></name> <name><surname>Turck</surname> <given-names>N.</given-names></name> <name><surname>Hainard</surname> <given-names>A.</given-names></name> <name><surname>Tiberti</surname> <given-names>N.</given-names></name> <name><surname>Lisacek</surname> <given-names>F.</given-names></name> <name><surname>Sanchez</surname> <given-names>J.-C.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>pROC: an open-source package for R and S+ to analyze and compare ROC curves.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>12</volume>:77. <pub-id pub-id-type="doi">10.1186/1471-2105-12-77</pub-id> <pub-id pub-id-type="pmid">21414208</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scott</surname> <given-names>L. J.</given-names></name> <name><surname>Mohlke</surname> <given-names>K. L.</given-names></name> <name><surname>Bonnycastle</surname> <given-names>L. L.</given-names></name> <name><surname>Willer</surname> <given-names>C. J.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Duren</surname> <given-names>W. L.</given-names></name><etal/></person-group> (<year>2007</year>). <article-title>A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants.</article-title> <source><italic>Science</italic></source> <volume>316</volume> <fpage>1341</fpage>&#x2013;<lpage>1345</lpage>.</citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sudlow</surname> <given-names>C.</given-names></name> <name><surname>Gallacher</surname> <given-names>J.</given-names></name> <name><surname>Allen</surname> <given-names>N.</given-names></name> <name><surname>Beral</surname> <given-names>V.</given-names></name> <name><surname>Burton</surname> <given-names>P.</given-names></name> <name><surname>Danesh</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.</article-title> <source><italic>PLoS Med.</italic></source> <volume>12</volume>:e1001779. <pub-id pub-id-type="doi">10.1371/journal.pmed.1001779</pub-id> <pub-id pub-id-type="pmid">25826379</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vassy</surname> <given-names>J. L.</given-names></name> <name><surname>Hivert</surname> <given-names>M.-F.</given-names></name> <name><surname>Porneala</surname> <given-names>B.</given-names></name> <name><surname>Dauriz</surname> <given-names>M.</given-names></name> <name><surname>Florez</surname> <given-names>J. C.</given-names></name> <name><surname>Dupuis</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Polygenic type 2 diabetes prediction at the limit of common variant detection.</article-title> <source><italic>Diabetes Metab. Res. Rev.</italic></source> <volume>63</volume> <fpage>2172</fpage>&#x2013;<lpage>2182</lpage>. <pub-id pub-id-type="doi">10.2337/db13-1663</pub-id> <pub-id pub-id-type="pmid">24520119</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Visscher</surname> <given-names>P. M.</given-names></name> <name><surname>Wray</surname> <given-names>N. R.</given-names></name> <name><surname>Zhang</surname> <given-names>Q.</given-names></name> <name><surname>Sklar</surname> <given-names>P.</given-names></name> <name><surname>McCarthy</surname> <given-names>M. I.</given-names></name> <name><surname>Brown</surname> <given-names>M. A.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>10 years of GWAS discovery: biology, function, and translation.</article-title> <source><italic>Am. J. Hum. Genet.</italic></source> <volume>101</volume> <fpage>5</fpage>&#x2013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.1016/j.ajhg.2017.06.005</pub-id> <pub-id pub-id-type="pmid">28686856</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weedon</surname> <given-names>M. N.</given-names></name> <name><surname>McCarthy</surname> <given-names>M. I.</given-names></name> <name><surname>Hitman</surname> <given-names>G.</given-names></name> <name><surname>Walker</surname> <given-names>M.</given-names></name> <name><surname>Groves</surname> <given-names>C. J.</given-names></name> <name><surname>Zeggini</surname> <given-names>E.</given-names></name><etal/></person-group> (<year>2006</year>). <article-title>Combining information from common type 2 diabetes risk polymorphisms improves disease prediction.</article-title> <source><italic>PLoS Med.</italic></source> <volume>3</volume>:e374. <pub-id pub-id-type="doi">10.1371/journal.pmed.0030374</pub-id> <pub-id pub-id-type="pmid">17020404</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wray</surname> <given-names>N. R.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Hayes</surname> <given-names>B. J.</given-names></name> <name><surname>Price</surname> <given-names>A. L.</given-names></name> <name><surname>Goddard</surname> <given-names>M. E.</given-names></name> <name><surname>Visscher</surname> <given-names>P. M.</given-names></name></person-group> (<year>2013</year>). <article-title>Pitfalls of predicting complex traits from SNPs.</article-title> <source><italic>Nat. Rev. Genet.</italic></source> <volume>14</volume> <fpage>507</fpage>&#x2013;<lpage>515</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3457</pub-id> <pub-id pub-id-type="pmid">23774735</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zimmet</surname> <given-names>P. Z.</given-names></name> <name><surname>Magliano</surname> <given-names>D. J.</given-names></name> <name><surname>Herman</surname> <given-names>W. H.</given-names></name> <name><surname>Shaw</surname> <given-names>J. E.</given-names></name></person-group> (<year>2014</year>). <article-title>Diabetes: a 21st century challenge.</article-title> <source><italic>Lancet Diabetes Endocrinol.</italic></source> <volume>2</volume> <fpage>56</fpage>&#x2013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1016/s2213-8587(13)70112-8</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="footnote1">
<label>1</label>
<p><ext-link ext-link-type="uri" xlink:href="http://biobank.ctsu.ox.ac.uk/crystal/">http://biobank.ctsu.ox.ac.uk/crystal/</ext-link></p></fn>
<fn id="footnote2">
<label>2</label>
<p><ext-link ext-link-type="uri" xlink:href="http://diagram-consortium.org/">http://diagram-consortium.org/</ext-link></p></fn>
<fn id="footnote3">
<label>3</label>
<p><ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/pROC/index.html">https://cran.r-project.org/web/packages/pROC/index.html</ext-link></p></fn>
<fn id="footnote4">
<label>4</label>
<p><ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/bin/macosx/">https://cran.r-project.org/bin/macosx/</ext-link></p></fn>
</fn-group>
</back>
</article>