<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1230579</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2023.1230579</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A genome-wide association study coupled with machine learning approaches to identify influential demographic and genomic factors underlying Parkinson&#x2019;s disease</article-title>
<alt-title alt-title-type="left-running-head">Rahman and Liu</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fgene.2023.1230579">10.3389/fgene.2023.1230579</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Rahman</surname>
<given-names>Md Asad</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2305597/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Liu</surname>
<given-names>Jinling</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<xref ref-type="fn" rid="fn1">
<sup>&#x2020;</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2327005/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Department of Engineering Management and Systems Engineering</institution>, <institution>Missouri University of Science and Technology</institution>, <addr-line>Rolla</addr-line>, <addr-line>MO</addr-line>, <country>United States</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Department of Biological Sciences</institution>, <institution>Missouri University of Science and Technology</institution>, <addr-line>Rolla</addr-line>, <addr-line>MO</addr-line>, <country>United States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/23877/overview">Richard D. Emes</ext-link>, Nottingham Trent University, United Kingdom</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/591779/overview">Stanislaw Szlufik</ext-link>, Medical University of Warsaw, Poland</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1797416/overview">Arya Ashok</ext-link>, Tempus Labs, United States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Jinling Liu, <email>jinling.liu@ufl.edu</email>
</corresp>
<fn fn-type="present-address" id="fn1">
<label>
<sup>&#x2020;</sup>
</label>
<p>
<bold>Present address:</bold> Jinling Liu, Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>29</day>
<month>09</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1230579</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>06</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>09</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2023 Rahman and Liu.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Rahman and Liu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>
<bold>Background:</bold> Despite the recent success of genome-wide association studies (GWAS) in identifying 90 independent risk loci for Parkinson&#x2019;s disease (PD), the genomic underpinning of PD is still largely unknown. At the same time, accurate and reliable predictive models utilizing genomic or demographic features are desired in the clinic for predicting the risk of Parkinson&#x2019;s disease.</p>
<p>
<bold>Methods:</bold> To identify influential demographic and genomic factors associated with PD and to further develop predictive models, we utilized demographic data, incorporating 200 variables across 33,473 participants, along with genomic data involving 447,089 SNPs across 8,840 samples, both derived from the Fox Insight online study. We first applied correlation and GWAS analyses to find the top demographic and genomic factors associated with PD, respectively. We further developed and compared a variety of machine learning (ML) models for predicting PD. From the developed ML models, we performed feature importance analysis to reveal the predictability of each demographic or the genomic input feature for PD. Finally, we performed gene set enrichment analysis on our GWAS results to identify PD-associated pathways.</p>
<p>
<bold>Results:</bold> In our study, we identified both novel and well-known demographic and genetic factors (along with the enriched pathways) related to PD. In addition, we developed predictive models that performed robustly, with AUC &#x3d; 0.89 for demographic data and AUC &#x3d; 0.74 for genomic data. Our GWAS analysis identified several novel and significant variants and gene loci, including three intron variants in <italic>LMNA</italic> (<italic>p</italic>-values smaller than 4.0e-21) and one missense variant in <italic>SEMA4A</italic> (p-value &#x3d; 1.11e-26). Our feature importance analysis from the PD-predictive ML models highlighted some significant and novel variants from our GWAS analysis (e.g., the intron variant rs1749409 in the <italic>RIT1</italic> gene) and helped identify potentially causative variants that were missed by GWAS, such as rs11264300, a missense variant in the gene <italic>DCST1</italic>, and rs11584630, an intron variant in the gene <italic>KCNN3</italic>.</p>
<p>
<bold>Conclusion:</bold> In summary, by combining a GWAS with advanced machine learning models, we identified both known and novel demographic and genomic factors as well as built well-performing ML models for predicting Parkinson&#x2019;s disease.</p>
</abstract>
<kwd-group>
<kwd>Parkonson&#x2019;s disease</kwd>
<kwd>genome-wide association studies</kwd>
<kwd>machine learning</kwd>
<kwd>prediction model</kwd>
<kwd>feature importance analysis</kwd>
</kwd-group>
<contract-num rid="cn001">K01HL161538</contract-num>
<contract-sponsor id="cn001">National Heart, Lung, and Blood Institute<named-content content-type="fundref-id">10.13039/100000050</named-content>
</contract-sponsor>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Computational Genomics</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Parkinson&#x2019;s disease (PD) is a complex neurodegenerative disorder often linked to aging (<xref ref-type="bibr" rid="B14">Dauer and Przedborski, 2003</xref>). Symptoms of Parkinson&#x2019;s can be broadly divided into motor and non-motor categories (<xref ref-type="bibr" rid="B53">Sveinbjornsdottir, 2016</xref>). Primary motor symptoms of PD include bradykinesia, tremor, and rigidity (<xref ref-type="bibr" rid="B60">Xia and Mao, 2012</xref>). Other manifestations involve gait disturbances, impaired handwriting, grip force (related to the strength and control of hand grasping), and speech deficits (<xref ref-type="bibr" rid="B30">Moustafa et al., 2016</xref>). In PD, non-motor symptoms are categorized into sensory symptoms, neuropsychiatric dysfunctions, autonomic dysfunction, and sleep disorders (<xref ref-type="bibr" rid="B40">Poewe, 2008</xref>). Among these, sensory symptoms may include olfactory dysfunction, abnormal sensations, and pain. Neuropsychiatric dysfunctions can encompass mood disorders, frontal executive dysfunction, apathy, and anhedonia. Autonomic dysfunction might present symptoms like orthostatic hypotension, urogenital dysfunction, and constipation. Lastly, sleep disturbances can involve sleep fragmentation, insomnia, and rapid eye movement sleep behavior disorder.</p>
<p>In addition to age, environmental, and genomic factors also contribute to the development of PD (<xref ref-type="bibr" rid="B36">Noyce et al., 2012</xref>; <xref ref-type="bibr" rid="B20">Kieburtz and Wunderle, 2013</xref>; <xref ref-type="bibr" rid="B4">Blauwendraat et al., 2020</xref>). Specific environmental factors, such as exposure to pesticides and smoking, are associated with an increased risk of PD; conversely, caffeine intake is also linked to a decreased risk of PD. Advancements in high-throughput technologies have enabled genome-wide association studies (GWAS) to detect significant associations between genomic variants and various diseases, including PD. Following the identification of the first PD GWAS loci in 2009, 90 distinct risk loci have been discovered thus far (<xref ref-type="bibr" rid="B32">Nalls et al., 2014</xref>; <xref ref-type="bibr" rid="B8">Chang et al., 2017</xref>; <xref ref-type="bibr" rid="B57">Visscher et al., 2017</xref>; <xref ref-type="bibr" rid="B31">Nalls et al., 2019</xref>). Despite these current successes, many more significant variants are yet to be discovered to explain the genomic heritability of PD.</p>
<p>The increasing number of risk loci identified by GWAS has helped improve PD prediction and intervention (<xref ref-type="bibr" rid="B7">Chairta et al., 2021</xref>; <xref ref-type="bibr" rid="B21">Kim et al., 2021</xref>; <xref ref-type="bibr" rid="B45">Salas-Leal et al., 2021</xref>; <xref ref-type="bibr" rid="B63">Zheng et al., 2021</xref>; <xref ref-type="bibr" rid="B16">Dehestani et al., 2022</xref>). In prior studies, polygenic risk scores (PRS) were used to predict the risk of PD. These scores captured the cumulative effect of various PD genetic variants. Typically, the effectiveness of this PRS method for PD prediction was indicated by the value of an area under the receiver operating characteristics curve (AUC) ranging from 0.61 to 0.69 (<xref ref-type="bibr" rid="B31">Nalls et al., 2019</xref>; <xref ref-type="bibr" rid="B7">Chairta et al., 2021</xref>; <xref ref-type="bibr" rid="B21">Kim et al., 2021</xref>; <xref ref-type="bibr" rid="B45">Salas-Leal et al., 2021</xref>; <xref ref-type="bibr" rid="B63">Zheng et al., 2021</xref>; <xref ref-type="bibr" rid="B16">Dehestani et al., 2022</xref>). The prediction performance needs further improvement for the genomic prediction of PD to have clinical use. In addition, the PRS model lacks knowledge of specific variants&#x2019; involvement and their magnitude of impact for predicting PD risks. At the same time, many other studies have explored the utility of the existing demographic and clinical data (e.g., motor and non-motor symptoms) for predicting PD risks (<xref ref-type="bibr" rid="B34">Nielsen et al., 2017</xref>; <xref ref-type="bibr" rid="B62">Zham et al., 2017</xref>; <xref ref-type="bibr" rid="B48">Shah et al., 2018</xref>; <xref ref-type="bibr" rid="B47">Senturk, 2020</xref>). The application of advanced machine learning (ML) models with a combined feature space including genomic, demographic, and clinical data may further improve the accuracy of PD prediction.</p>
<p>In this study, the main purpose was to examine the key factors influencing PD by utilizing a large dataset containing demographic, clinical, and genetic data from the Fox Insight online study (<xref ref-type="bibr" rid="B51">Smolensky et al., 2020</xref>). This aim contained three key components: examining demographic and clinical variables through correlation and feature importance analyses, studying genomic factors using GWAS and feature importance analysis, and developing machine learning models for PD prediction. To find demographic and clinical variables associated with PD, we conducted a comprehensive analysis involving correlation assessment and feature importance analyses. To identify the potential genomic causes, we initially applied GWAS to the newly released genetic data by the Fox Insight study to search for novel and significant genomic variants for PD. Subsequently, we selected the top GWAS variants as input features to develop ML models for PD prediction; we applied and compared the performance of four popular ML models: artificial neural networks (ANNs), random forest (RF), support vector machine (SVM), and logistic regression (LR) (<xref ref-type="bibr" rid="B54">Svozil et al., 1997</xref>; <xref ref-type="bibr" rid="B25">Liaw and Wiener, 2002</xref>; <xref ref-type="bibr" rid="B35">Noble, 2006</xref>; <xref ref-type="bibr" rid="B52">Sperandei, 2014</xref>). Our strategy involved constructing three different kinds of ML predictive models: a demographic model (using demographic and clinical data only), a genetic model (using genetic data only), and a combined model (using both genetic and demographic/clinical data). Furthermore, we investigated and identified the most predictive demographic variables and genomic variants using two different feature importance methods: expected gradients applied to ANNs and feature importance score given by RF (<xref ref-type="bibr" rid="B29">Louppe et al., 2013</xref>). Lastly, we performed GWAS-based gene set enrichment analysis (GSEA) using our GWAS results and identified novel and known PD pathways.</p>
</sec>
<sec sec-type="materials|methods" id="s2">
<title>2 Materials and methods</title>
<sec id="s2-1">
<title>2.1 Data and data preprocessing</title>
<p>In Fox Insight, participants were genotyped on the V3, V4, and V5 platforms. The V5 platform consisted of a customized Illumina Infinium Global Screening Array containing approximately 690,000 SNPs. Roughly 80.4% of participants were genotyped on this platform. We used the V5 platform of Fox Insight Genetic Data. We applied the following functions from plink for further filtration and quality control to each chromosome: --mind 0.05 --geno 0.03 --maf 0.01 &#x2013;hwe 1e-6. We then imputed the missing SNP values (0.63% missing values) with the most frequent value for that particular SNP across the entire dataset. Dominant coding was then performed, and thus, the final SNP values are 0 or 1. After combining all 22 chromosomes, we obtained a total of 447,089 SNPs and 8,840 Samples. Phenotype data included the &#x2018;CurrPDDiag&#x2019; variable which was downloaded using the Fox DEN tool. Participants who answered the registration question &#x201c;Do you currently have a diagnosis of Parkinson&#x2019;s disease, or Parkinsonism, by a physician or other healthcare professional?&#x201d; were represented by the &#x2018;CurrPDDiag&#x2019; variable.</p>
<p>We also processed the demographic and clinical data (one-time questionnaires and routine longitudinal assessments data, referred as demographic data later for convenience) that were also downloaded from the Fox DEN tool. The routine longitudinal assessment dataset was generated through routine longitudinal health and medical questionnaires, and the one-time questionnaire dataset was about environmental exposure and healthcare preferences (<xref ref-type="bibr" rid="B51">Smolensky et al., 2020</xref>). Initially, all downloaded demographic data from the Fox DEN tool had 53k samples and 5,877 demographic variables. We kept the demographic variables shared between PD and non-PD individuals. We used the most recent record for each of these variables. Furthermore, we selected the subjects who also have genotype data available (&#x223c;8k samples). Among the subjects with demographic and genetic data, we identified and removed demographic variables with missingness &#x3e;5% in these individuals; we also removed variables that leak the PD information unsuitable for prediction, which left us 200 demographic variables. We further removed from the &#x223c;53k samples the individuals who have missingness &#x3e;5% in these selected 200 variables, which left us 33,473 samples and 200 variables.</p>
</sec>
<sec id="s2-2">
<title>2.2 Genome-wide association studies</title>
<p>GWAS is the standard approach for identifying the significant variants associated with traits at the population level. GWAS was performed using logistic regression adjusting for age (age at the onset for cases and age of last reported for controls), sex, and 10 principal components. We performed GWAS using R software (<ext-link ext-link-type="uri" xlink:href="http://www.r-project.org/">http://www.r-project.org/</ext-link>). The <italic>p</italic>-values from GWAS were used to evaluate whether corresponding SNPs were genome-wide significant or not. We used the Bonferroni correction method for selecting a threshold <italic>p</italic>-value of genome-wide significance (<xref ref-type="bibr" rid="B19">Kaler and Purcell, 2019</xref>).</p>
</sec>
<sec id="s2-3">
<title>2.3 Feature selection and machine learning model development</title>
<p>We divided the whole genetic dataset into an 80% training set, a 10% validation set, and a 10% test set containing 7,072, 884, and 884 subjects, respectively. We used the training set for feature selections through GWAS analysis and for training the model. The top SNPs with the lowest <italic>p</italic>-values from GWAS analysis were selected as potentially informative input features for ML models to predict the PD status. We reserved an intact validation set for tuning hyperparameters and finding the best ML model and an intact test set for the performance evaluation of the final ML model. Fox Insight studies had a highly unbalanced case-control ratio of around 30:1, so we applied random oversampling for the minority class in the training set to make a 2:1 (case-control) ratio for training the ML models. The oversampling method was not applied to GWAS analyses that were performed using the original data. The random oversampling method was not used in the validation or the test set either; thus, these sets consisted of actual data from Fox Insight to avoid both overfitting and reflect the actual performance. Both the validation and test set were unseen during GWAS analyses and training of the models to avoid potential information leakage. We used artificial neural networks (ANNs), random forest (RF), support vector machine (SVM), and logistic regression (LR) to predict the risk status of PD. The ANN was implemented using Keras while RF, SVM, and LR were implemented by using scikit-learn packages (<xref ref-type="bibr" rid="B39">Pedregosa et al., 2011</xref>; <xref ref-type="bibr" rid="B12">Chollet, 2015</xref>).</p>
<p>We also developed RF and ANN models to predict PD using demographic data. From the aforementioned processed demographic data containing 33,473 samples and 200 demographic variables, we performed a stratified random split to produce an 80% training set (<italic>n</italic> &#x3d; 26,765), a 10% (<italic>n</italic> &#x3d; 3,354) validation set, and a 10% test set (<italic>n</italic> &#x3d; 3,354). Within the training data, we employed multiple correlation techniques on a total of 200 variables to determine the most relevant features for our analysis. We applied the Matthews correlation coefficient to 188 binary variables, Cramer&#x2019;s V to 11 categorical variables with more than two discrete values, and the point-biserial correlation to one continuous variable (<xref ref-type="bibr" rid="B24">Kornbrot, 2014</xref>; <xref ref-type="bibr" rid="B2">Akoglu, 2018</xref>; <xref ref-type="bibr" rid="B10">Chicco et al., 2021</xref>). A threshold of 0.01 allowed us to identify a total of 139 variables that met our inclusion criteria. We further used the training set to tune the hyperparameters for both ANN and RF models based on the prediction performance on the validation set; we then used both the training and validation sets to train the final model that was used to predict the unseen test dataset. Furthermore, we developed a combined prediction model using both demographic and genetic features from subjects who have both demographic and genetic data. To comprehensively evaluate the prediction performance of the developed ML models in an unseen test dataset, we examined multiple metrics including the area under the ROC curve (AUC), precision, recall, and the F1-score (the harmonic mean of precision and recall).</p>
</sec>
<sec id="s2-4">
<title>2.4 Interpretation using feature importance and expected gradients</title>
<p>Mean decrease impurity (MDI) feature importance score is one of the methods used in the RF model to measure the relative importance of each input feature (<xref ref-type="bibr" rid="B29">Louppe et al., 2013</xref>). We applied &#x201c;feature_importance_&#x201d; (FI) to the RF model for identifying top features, later referred to as &#x201c;RF FI.&#x201d;</p>
<p>Shapley value is one of the most known methods that can interpret complex ML models and show the most impactful features. We applied the expected gradient (EG) method to the ANN model, later referred to as ANN EG. EG, an extension of the integrated gradient method, has a strong theoretical justification for finding the most important and contributing input features (e.g., SNPs and demographic factors) for the model&#x2019;s prediction by approximating the Shapley value (<xref ref-type="bibr" rid="B17">Erion et al., 2021</xref>). It has a set of axioms: implementation invariance, sensitivity, completeness, linearity, and symmetry preserving (<xref ref-type="bibr" rid="B17">Erion et al., 2021</xref>). We implemented EG using the SHAP (SHapley Additive exPlanations) Python package. The SHAP value from EG indicates the overall impact on predictions as well as the directionality of that impact indicated by positive or negative values. The mean absolute SHAP value for each feature across all of the data emphasizes the significant features for prediction, regardless of their directionality.</p>
</sec>
<sec id="s2-5">
<title>2.5 Gene set enrichment analysis</title>
<p>GSEA was used to identify KEGG pathways significantly associated with PD. We used the minimum <italic>p</italic>-value among all SNPs near a gene to represent the significance of that gene (<xref ref-type="bibr" rid="B58">Wang et al., 2007</xref>). Later, GSEA software was used to calculate the enrichment score (ES) and false discovery rate (FDR) q-value. The ES is the highest departure from zero that is observed during the walk, and FDR is utilized to control the rate of false positive findings in hypothesis testing, especially in multiple testing scenarios. We used &#x2018;GSEAPreranked,&#x2019; a module of the GSEA software, and provided it a list of genes that were ordered based on &#x2212;log10 (<italic>p</italic>-value). For multiple hypothesis testing corrections, 1,000 random permutations were carried out by gene set. In order to generate a normalized enrichment score (NES), the ES for each gene set was normalized so that it accurately reflects the size of each gene set, and FDR was further calculated corresponding to each NES.</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>3 Results</title>
<sec id="s3-1">
<title>3.1 Predictive ML models for PD developed from demographic data</title>
<p>We developed a set of ML models to predict the PD status from demographic data. As was described in <xref ref-type="sec" rid="s2">Section 2</xref>, we obtained a short list of 139 demographic variables from the initial 5,700 variables in 33,473 subjects. We developed from the training (80%; <italic>n</italic> &#x3d; 26,765) and validation sets (10%; <italic>n</italic> &#x3d; 3,354) both an RF model and an ANN model; the ANN model was trained using the SGD algorithm (batch size: 8, sigmoid activation functions, learning rate: 0.01, and 16 neurons in one hidden layer). The prediction performance of the final model was evaluated in the unseen test dataset (10%; <italic>n</italic> &#x3d; 3,354) using multiple metrics including AUC, precision, recall, and F1-score (<xref ref-type="table" rid="T1">Table 1</xref>). With this relatively large demographic dataset, both the RF and the ANN showed very good performance in predicting the PD status from the 139 demographic variables: both models achieved a high AUC of 0.89; for precision, the RF had 0.82 while the ANN had 0.81; for recall, the RF had 0.77 while the ANN had 0.79; and for F1-score, the RF had 0.79 while the ANN had 0.80.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Performance metrics of the demographic ML models for predicting PD.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">ML model</th>
<th align="left">AUC</th>
<th align="left">Precision</th>
<th align="left">Recall</th>
<th align="left">F1-score</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">RF</td>
<td align="left">0.89</td>
<td align="left">0.82</td>
<td align="left">0.77</td>
<td align="left">0.79</td>
</tr>
<tr>
<td align="left">ANN</td>
<td align="left">0.89</td>
<td align="left">0.81</td>
<td align="left">0.79</td>
<td align="left">0.80</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To understand the predictive performance of each of the 139 demographic variables, we acquired the feature importance score from the RF demographic model as well as performed EG analysis for the ANN demographic model. From the RF model, we listed the top 14 demographic variables that lead to the highest mean decrease in impurity and, thus, the most important features in predicting PD ranked by the RF model (<xref ref-type="fig" rid="F1">Figure 1A</xref>). Similarly, from the EG analysis for the ANN model, we identified the top 14 demographic variables that show higher mean absolute SHAP values and, thus, more predictive power in predicting PD in the ANN model (<xref ref-type="fig" rid="F1">Figure 1B</xref>). Interestingly, the feature importance and predictability ranked by these two methods from analyzing two different ML models were highly consistent for the top 14 demographic variables, with 12 being overlapped with each other (<xref ref-type="fig" rid="F1">Figures 1A, B</xref>). The three variables of &#x201c;sex,&#x201d; &#x201c;problems in mobility,&#x201d; and &#x201c;problems in activity&#x201d; were ranked within the top 5 predictive variables for PD by both RF FI and ANN EG. Other top variables included constipation, loss of smell, dribbling of saliva, work in last 7 days, urgency to pass urine, engage in household activity, exercise in past 7 days, self-care, difficulty swallowing food or drink, talking or moving about in sleep, and unpleasant sensations in legs (<xref ref-type="fig" rid="F1">Figure 1A</xref>). Multiple previous studies together identified most of these or very similar variables as significant variables associated with PD (<xref ref-type="bibr" rid="B34">Nielsen et al., 2017</xref>; <xref ref-type="bibr" rid="B41">Prashanth and Roy, 2018</xref>; <xref ref-type="bibr" rid="B28">Lo et al., 2019</xref>; <xref ref-type="bibr" rid="B49">Shah et al., 2020</xref>; <xref ref-type="bibr" rid="B61">Yu et al., 2022</xref>); this comprehensive list of top demographic/clinical variables identified in our study, based on their capability in predicting PD, adds further support to the influence of these factors in PD prediction. One of the top variables that has not been studied much is &#x201c;unpleasant sensations in legs,&#x201d; which is ranked 14th among all the 139 demographic variables by both RF FI and ANN EG. Yet, the exact question in the online survey for collecting information for this variable is &#x201c;have you experienced unpleasant sensations in your legs at night or while resting, and a feeling that you need to move in the last month?&#x201d; and this question is generally used as the first of the three questions in identifying restless legs syndrome (RLS) that is associated with PD (<xref ref-type="bibr" rid="B59">Wong et al., 2014</xref>). This suggested that the variable of &#x201c;unpleasant sensation in legs&#x201d; could be used a predictor for PD even before people were diagnosed having RLS.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Top predictive demographic or clinical variables for PD. <bold>(A)</bold> Top 14 demographic variables by RF feature importance scores and <bold>(B)</bold> top 14 demographic variables by the EG method.</p>
</caption>
<graphic xlink:href="fgene-14-1230579-g001.tif"/>
</fig>
<p>All the non-overlapping variables between the top 14 lists of the two models were ranked very similarly by RF FI and ANN EG, except for &#x201c;age.&#x201d; Specifically, &#x2018;sleep&#x2019;, &#x2018;engagement in household activity&#x2019;, and &#x2018;difficulty in swallowing food&#x2019; were ranked 12th, 16th, and 17th, respectively, by RF FI, while they were ranked 15th, 9th, and 12th, respectively, by ANN EG; &#x201c;age&#x201d; was ranked first by RF FI and 93rd by ANN EG. The much lower ranking of &#x201c;age&#x201d; by the EG method for the ANN model is likely because the ANN model does not handle a mixture of categorical and continuous variables well, with the EG method being biased for the continuous variable of age.</p>
</sec>
<sec id="s3-2">
<title>3.2 GWAS in the discovery (training) dataset, identifying both novel and well-known variants and genes of significance</title>
<p>To explore significant genomic variants, we applied GWAS to the preprocessed discovery (training) dataset that includes 6,868 PD cases and 204 controls with 447,089 SNPs. Males comprised 55% (<italic>n</italic> &#x3d; 3,885) of the discovery dataset, and the rest are females (<italic>n</italic> &#x3d; 3,187). A quantile&#x2013;quantile plot was constructed for all variants by comparing expected vs. observed genome-wide <italic>p</italic>-values as a quality control for the GWAS analysis (<xref ref-type="fig" rid="F2">Figure 2A</xref>). For the GWAS analysis, if considering 0.05/447,089 &#x3d; 1.12e-07 as the significance level for <italic>p</italic>-value after the Bonferroni correction (<xref ref-type="bibr" rid="B43">Ranstam, 2016</xref>), 14 SNPs reached such significance (<xref ref-type="sec" rid="s12">Supplementary Table S1</xref>). Among these 14 SNPs, two variants, rs76763715 (alias, i4000415), a missense variant in <italic>GBA</italic>, and rs1630500, an intergenic variant in <italic>GBA</italic>, as well as three gene loci, <italic>GBA</italic>, <italic>ARHGEF2</italic>, and <italic>LMNA</italic>, were previously reported for PD association (<xref ref-type="bibr" rid="B44">Reden&#x161;ek et al., 2017</xref>; <xref ref-type="bibr" rid="B18">Ferrari et al., 2018</xref>; <xref ref-type="bibr" rid="B38">Oyston et al., 2018</xref>). The well-known variant of rs76763715 showed the most significant association with PD in our GWAS analysis. The regional association plot revealed that within a &#xb1; 400-kb window, several significant SNPs (green) on chromosome 1 had a moderate level of coefficient of determination (r_squared &#x2265; 0.2) with rs76763715 (purple) (<xref ref-type="fig" rid="F2">Figure 2B</xref>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>GWAS in discovery. <bold>(A)</bold> The quantile&#x2013;quantile (QQ) plot was observed against expected <italic>p</italic>-values from the genome-wide association analysis and <bold>(B)</bold> a regional association plot of the rs76763715 locus.</p>
</caption>
<graphic xlink:href="fgene-14-1230579-g002.tif"/>
</fig>
<p>In addition to the previously reported variants or gene loci, our GWAS analysis also identified novel and significant variants or gene loci that could have a potential influence on PD. Three novel intron variants in the PD-associated <italic>LMNA</italic> gene loci were among the 14 significant SNPs with <italic>p</italic>-values smaller than 4.0e-21 (<xref ref-type="sec" rid="s12">Supplementary Table S1</xref>); this finding further supported previous reports on the involvement of <italic>LMNA</italic> in PD. A novel missense variant in the <italic>SEMA4A</italic> loci was identified with a very low <italic>p</italic>-value of 1.11e-26 (<xref ref-type="sec" rid="s12">Supplementary Table S1</xref>). <italic>SEMA4A</italic> encoded one class of semaphorin, which was often involved in immune responses and neurological diseases (<xref ref-type="bibr" rid="B55">Takegahara and Kumanogoh, 2010</xref>). For example, the SNP rs7702187 within <italic>SEMA5A</italic> (encoding another class of semphorins) was associated with PD (<xref ref-type="bibr" rid="B13">Clarimon et al., 2006</xref>). Other novel and significant gene loci (<xref ref-type="sec" rid="s12">Supplementary Table S1</xref>) included <italic>TRIM46</italic>, <italic>ASH1L</italic>, <italic>PBXIP1</italic>, <italic>RIT1</italic>, and <italic>PMF1-BGLAP</italic>. The <italic>RIT1</italic> gene belongs to the Ras family related to neurodegenerative disorders (<xref ref-type="bibr" rid="B42">Qu et al., 2019</xref>).</p>
</sec>
<sec id="s3-3">
<title>3.3 Predictive ML models for PD developed using genetic data</title>
<p>Based on the GWAS results from the discovery (training) set, we further evaluated the capability of the top SNPs with the lowest <italic>p</italic>-values in predicting PD in an unseen test dataset. Within this context, GWAS served as a feature selection method for building our PD-predicting ML models. We experimented with various <italic>p</italic>-value thresholds (i.e., different numbers of top SNPs with the lowest <italic>p</italic>-values) and assessed model performance using an independent validation set. Among the tested thresholds, the <italic>p</italic>-value threshold of 1e-5, leaving us the top 37 SNPs (<xref ref-type="sec" rid="s12">Supplementary Table S1</xref>), provided the best model performance (i.e., the highest AUC) in the validation set. This threshold was also commonly used for selecting SNPs in the development of PRS (<xref ref-type="bibr" rid="B11">Choi et al., 2020</xref>). Among these 37 SNPs, the three SNPs of rs76763715, rs1630500, and rs2049805 (<xref ref-type="table" rid="T2">Table 2</xref>; <xref ref-type="sec" rid="s12">Supplementary Table S1</xref>) were published before as PD variants in other studies (<xref ref-type="bibr" rid="B27">Liu et al., 2011</xref>; <xref ref-type="bibr" rid="B56">Vacic et al., 2014</xref>; <xref ref-type="bibr" rid="B15">Davis et al., 2016</xref>). We further performed LD pruning using the &#x2018;corr&#x2019; (correlation coefficient) method on the 37 SNPs and acquired 15 independent SNPs with a correlation coefficient threshold of 0.2. <xref ref-type="table" rid="T2">Table 2</xref> lists these 15 independent SNPs and their nearest gene loci and variant type, minor-allele frequencies (MAFs), GWAS ranks<italic>, p</italic>-values, beta coefficients, and standard error (SE).</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Gene loci of potential influence on PD.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">Variant</th>
<th align="center">Gene: variant type</th>
<th align="center">MAF</th>
<th align="center">GWAS rank</th>
<th align="center">GWAS <italic>p</italic>-value</th>
<th align="center">GWAS beta</th>
<th align="center">GWAS SE</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">rs76763715<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
<td align="center">
<italic>GBA</italic>: missense variant</td>
<td align="center">0.016</td>
<td align="center">1</td>
<td align="center">3.03E-90</td>
<td align="center">&#x2212;4.113</td>
<td align="center">0.204</td>
</tr>
<tr>
<td align="center">rs1749409</td>
<td align="center">
<italic>RIT1</italic>: intron variant</td>
<td align="center">0.091</td>
<td align="center">7</td>
<td align="center">2.55E-34</td>
<td align="center">&#x2212;1.824</td>
<td align="center">0.149</td>
</tr>
<tr>
<td align="center">rs1800247</td>
<td align="center">
<italic>PMF1-BGLAP</italic>: intron variant</td>
<td align="center">0.212</td>
<td align="center">12</td>
<td align="center">8.47E-10</td>
<td align="center">&#x2212;0.898</td>
<td align="center">0.146</td>
</tr>
<tr>
<td align="center">i709741</td>
<td align="center">None</td>
<td align="center">0.107</td>
<td align="center">15</td>
<td align="center">3.39E-07</td>
<td align="center">&#x2212;0.777</td>
<td align="center">0.152</td>
</tr>
<tr>
<td align="center">rs11264300</td>
<td align="center">
<italic>DCST1</italic>: missense variant</td>
<td align="center">0.366</td>
<td align="center">17</td>
<td align="center">8.31E-07</td>
<td align="center">&#x2212;0.836</td>
<td align="center">0.170</td>
</tr>
<tr>
<td align="center">rs4072037</td>
<td align="center">
<italic>MUC1</italic>: synonymous variant</td>
<td align="center">0.472</td>
<td align="center">22</td>
<td align="center">1.30E-06</td>
<td align="center">&#x2212;1.065</td>
<td align="center">0.220</td>
</tr>
<tr>
<td align="center">rs75337321</td>
<td align="center">
<italic>CACNA2D3</italic>: intron variant</td>
<td align="center">0.061</td>
<td align="center">26</td>
<td align="center">1.64E-06</td>
<td align="center">&#x2212;0.830</td>
<td align="center">0.173</td>
</tr>
<tr>
<td align="center">rs17377936</td>
<td align="center">None</td>
<td align="center">0.434</td>
<td align="center">27</td>
<td align="center">2.34E-06</td>
<td align="center">0.682</td>
<td align="center">0.145</td>
</tr>
<tr>
<td align="center">rs58519469</td>
<td align="center">
<italic>NTRK1</italic>: intron variant</td>
<td align="center">0.042</td>
<td align="center">28</td>
<td align="center">2.51E-06</td>
<td align="center">&#x2212;0.896</td>
<td align="center">0.190</td>
</tr>
<tr>
<td align="center">rs111408331</td>
<td align="center">None</td>
<td align="center">0.034</td>
<td align="center">31</td>
<td align="center">3.45E-06</td>
<td align="center">&#x2212;0.944</td>
<td align="center">0.203</td>
</tr>
<tr>
<td align="center">rs79372348</td>
<td align="center">None</td>
<td align="center">0.032</td>
<td align="center">32</td>
<td align="center">4.08E-06</td>
<td align="center">&#x2212;1.007</td>
<td align="center">0.219</td>
</tr>
<tr>
<td align="center">rs186852039</td>
<td align="center">
<italic>GBA2</italic>: intron variant</td>
<td align="center">0.033</td>
<td align="center">33</td>
<td align="center">4.09E-06</td>
<td align="center">&#x2212;0.921</td>
<td align="center">0.200</td>
</tr>
<tr>
<td align="center">rs11772125</td>
<td align="center">
<italic>AMZ1</italic>: intron variant</td>
<td align="center">0.069</td>
<td align="center">34</td>
<td align="center">4.30E-06</td>
<td align="center">&#x2212;0.799</td>
<td align="center">0.174</td>
</tr>
<tr>
<td align="center">rs11584630</td>
<td align="center">
<italic>KCNN3</italic>: intron variant</td>
<td align="center">0.352</td>
<td align="center">35</td>
<td align="center">4.58E-06</td>
<td align="center">&#x2212;0.747</td>
<td align="center">0.163</td>
</tr>
<tr>
<td align="center">rs72792300</td>
<td align="center">
<italic>ALK</italic>: intron variant</td>
<td align="center">0.015</td>
<td align="center">37</td>
<td align="center">7.46E-06</td>
<td align="center">&#x2212;1.178</td>
<td align="center">0.263</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="Tfn1">
<label>
<sup>a</sup>
</label>
<p>Symbol next to variant ID indicates previously reported SNPs.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>We used these 15 SNPs as the input features to train several ML models, including SVM, RF, LR, and ANN models. We tuned the hyperparameters for all four models based on their prediction performance in the validation set. In particular, the ANN model was trained using the stochastic gradient descent algorithm, with a batch size of 8, sigmoid activation functions, and a learning rate of 0.01. A three-layered ANN feed-forward network was used, consisting of one input layer, one hidden layer, and one output layer, while the hidden layer had four neurons. In <xref ref-type="table" rid="T3">Table 3</xref>, test set performance metrics are listed for all the developed ML models. As expected, when utilizing 15 randomly selected SNPs as the input features, the developed ANN model produced poor results, with an AUC of 0.50 and an F1-score of 0.49. When using the 15 independent SNPs identified by GWAS, the prediction performance of all the developed ML models (SVM, LR, RF, and ANN) greatly improved, with much higher AUCs and F1-scores. Among these, the ANN model performed the best overall, with a highest AUC of 0.74 and an F1-score of 0.64. We also derived a PRS from the 15 independent SNPs for each subject as the sum of their minor-allele SNP values, weighted by the log of their specific odds ratio from the GWAS analysis. The developed LR and ANN models from this one input feature of the PRS showed similar performance (the same AUC of 0.78, with an F1-score of 0.68 for ANN_PRS and 0.67 for LR_PRS), as expected. If compared to the best-performing ANN model using 15 independent SNPs, the ANN model using this one input feature of the PRS had better performance in PD prediction, with an improved AUC of 0.78 and a higher F1-score of 0.68; this is likely because the weights of these 15 SNPs used in calculating the PRS provided additional and useful information to help predict PD.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Performance metrics of the genetic ML models for predicting PD.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="center">ML model</th>
<th align="center">AUC</th>
<th align="center">Precision</th>
<th align="center">Recall</th>
<th align="center">F1-score</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center">ANN_Random</td>
<td align="center">0.50</td>
<td align="center">0.49</td>
<td align="center">0.50</td>
<td align="center">0.49</td>
</tr>
<tr>
<td align="center">SVM</td>
<td align="center">0.67</td>
<td align="center">0.58</td>
<td align="center">0.70</td>
<td align="center">0.60</td>
</tr>
<tr>
<td align="center">LR</td>
<td align="center">0.68</td>
<td align="center">0.60</td>
<td align="center">0.72</td>
<td align="center">0.64</td>
</tr>
<tr>
<td align="center">RF</td>
<td align="center">0.68</td>
<td align="center">0.57</td>
<td align="center">0.65</td>
<td align="center">0.59</td>
</tr>
<tr>
<td align="center">ANN</td>
<td align="center">0.74</td>
<td align="center">0.69</td>
<td align="center">0.61</td>
<td align="center">0.64</td>
</tr>
<tr>
<td align="center">ANN_PRS</td>
<td align="center">0.78</td>
<td align="center">0.65</td>
<td align="center">0.72</td>
<td align="center">0.68</td>
</tr>
<tr>
<td align="center">LR_PRS</td>
<td align="center">0.78</td>
<td align="center">0.63</td>
<td align="center">0.73</td>
<td align="center">0.67</td>
</tr>
<tr>
<td align="center">ANN_Combined</td>
<td align="center">0.78</td>
<td align="center">0.66</td>
<td align="center">0.74</td>
<td align="center">0.69</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In addition to developing these genetic models, we further developed a combined ANN model using both genetic and demographic data. For this combined dataset, we had 12,070 subjects with 15 SNPs and 139 demographic variables as input features. We trained this combined ANN model similar to the genetic ANN model. Interestingly, with both genetic and demographic variables, the predictive performance of the ANN model was considerably increased to an AUC of 0.78 and an F1-score of 0.69 (<xref ref-type="table" rid="T3">Table 3</xref>).</p>
<p>The EG method was used to determine the top predictive SNPs in the ANN model, whereas the feature importance score was used in the RF model. Upon reviewing the top half (7 SNPs) identified by both methods, it was found that five of the top seven SNPs were shared between the two sets, suggesting a degree of agreement between the results generated by the two different methods applied to the two different ML models. The missense variant of rs76763715 located inside <italic>GBA</italic> was ranked first by RF FI and third by ANN ES, suggesting its high influence on PD prediction. This is consistent with the evidence that it is the most significant SNP with the lowest <italic>p</italic>-value in our GWAS analysis, and its association with PD has been validated in different studies. The intron variant rs1749409 in the <italic>RIT1</italic> gene, which was ranked seventh by GWAS <italic>p</italic>-values (2.55e-34), was ranked second by the RF FI and first by the ANN EG for its magnitude in influencing the PD prediction.</p>
<p>In addition to providing additional PD prediction evidence for some of the significant GWAS, SNPs, RF FI, and ANN EG also identified variants with decent predictive capability that were missed by GWAS (i.e., not reaching the significance level after the Bonferroni correction). The missense variant rs11264300 located in the <italic>DCST1</italic> gene was ranked 17th by GWAS (top 14 are significant) and third and fourth by RF FI and ANN EG, respectively (<xref ref-type="fig" rid="F3">Figures 3A, B</xref>). Interestingly, a previous study identified an SNP in this <italic>DCST1</italic> gene as one of the most relevant PD polygenic risk score SNPs (<xref ref-type="bibr" rid="B22">Koch et al., 2021</xref>). These results suggested that this missense variant and the <italic>DCST1</italic> gene may have an important role in the development and progression of Parkinson&#x2019;s disease. Similarly, RF FI identified rs11584630, an intron variant in the gene of <italic>KCNN3</italic>, as a very predictive variant (ranked fifth) for PD; interestingly, <italic>KCNN3</italic> was previously reported to be associated with PD pathogenesis (<xref ref-type="bibr" rid="B50">Simunovic et al., 2010</xref>).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Top predictive genetic variants for PD. <bold>(A)</bold> The predictability of the 15 independent variants in the RF model explained by the feature importance scores and <bold>(B)</bold> the predictability of the 15 independent variants in the ANN model suggested by the SHAP values.</p>
</caption>
<graphic xlink:href="fgene-14-1230579-g003.tif"/>
</fig>
</sec>
<sec id="s3-4">
<title>3.4 Gene set enrichment analysis identified pathways associated with PD</title>
<p>We used GWAS-based pathway analysis to further examine the potential PD pathways from the ranked gene list obtained by our GWAS analysis. GSEA was used to identify KEGG pathways significantly associated with PD. We used the minimum <italic>p</italic>-value among all SNPs near a gene to represent the significance of that gene (<xref ref-type="bibr" rid="B58">Wang et al., 2007</xref>). Initially, 166 gene sets (i.e., pathways) were identified by GSEA (<xref ref-type="fig" rid="F4">Figure 4A</xref>) from which a total of 17 gene sets with relatively high NES were considered significant, reaching both the FDR (&#x3c;0.25) and nominal <italic>p</italic>-value (&#x3c;0.05) threshold (<xref ref-type="sec" rid="s12">Supplementary Table S2</xref>). <xref ref-type="fig" rid="F4">Figure 4B</xref> showed the top 12 (ranked by NES) significant pathways and their statistics, including the number of core enrichment genes, gene ratio, and nominal <italic>p</italic>-value. Gene ratio was calculated using the count of core enrichment genes divided by the count of pathway genes, whereas core enrichment genes were those that contribute most significantly (indicated by their <italic>p</italic>-values) to the observed enrichment of the gene set. Among the top 12 pathways, we identified six pathways whose functions were previously reported to be associated with PD (marked with a &#x2018;&#x2a;&#x2019; in <xref ref-type="fig" rid="F4">Figure 4B</xref>): long-term depression, gap junction, long-term potentiation, axon guidance, calcium signaling pathway, and tight junction. One study found that corticostriatal long-term potentiation (LTP) and long-term depression (LTD) were altered in PD models (<xref ref-type="bibr" rid="B5">Calabresi et al., 2007</xref>). <xref ref-type="bibr" rid="B46">Schwab et al. (2014)</xref> showed that the gap junction protein Cx36 was upregulated in PD patients. Variations in axon guidance pathway genes were predictive of three PD outcomes (<xref ref-type="bibr" rid="B26">Lin et al., 2009</xref>). <xref ref-type="bibr" rid="B6">Cal&#xec; et al. (2014)</xref> observed that calcium signaling was one of the earliest events in the pathogenesis of PD. The tight junction proteins occludin and ZO-1 were associated with the mouse model of Parkinson&#x2019;s disease (<xref ref-type="bibr" rid="B9">Chen et al., 2008</xref>). The aforementioned literature supported the PD association for half of the top 12 GSEA pathways; this further strengthened the potential involvement of our top GWAS SNPs (or gene loci) in PD. In addition to identifying the aforementioned six pathways whose associations with PD were previously reported, our GSEA also identified novel pathways potentially associated with PD. Interestingly, three out of the six novel pathways we identified for PD, including the functions of the vascular smooth muscle (VSM) contraction, extracellular matrix (ECM) receptor interaction pathways, and gonadotropin-releasing hormone (GnRH) signaling pathway, although there is no strong evidence yet in the literature for their involvement in PD, were reported to be linked with other neural diseases such as Alzheimer&#x2019;s disease (AD). This helped add support to the validity of our findings. For instance, the dysfunction of VSM cells (whose activity and responsiveness determine the dynamics of VSM contraction) was found to contribute to AD development by promoting neuroinflammation and Tau hyperphosphorylation (<xref ref-type="bibr" rid="B1">Aguilar-Pineda et al., 2021</xref>). Similarly, significant changes in ECM components occur during the early stages of AD (<xref ref-type="bibr" rid="B3">Anwar et al., 2022</xref>). Furthermore, increased mRNA levels of GnRH and its receptor were observed in plaque-bearing AD transgenic mice (<xref ref-type="bibr" rid="B37">Nuruddin et al., 2014</xref>). These pathways may serve as common pathways involved in different types of neural diseases, such as AD and PD.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Gene set enrichment analysis. <bold>(A)</bold> Normalized enrichment score vs. significance plot. The red line represents the nominal <italic>p</italic>-value for gene sets, while the blue line represents the FDR q-value for gene sets. The yellow rectangle indicates the FDR levels within 0.25, and the green rectangle indicates the nominal p-value levels within 0.05; <bold>(B)</bold> a dot plot of top KEGG pathways. The size of each dot represents the number of core enrichment genes, while the color represents the nominal <italic>p</italic>-value. The names with the &#x2a; symbol represent the pathways associated with PD, while others represent the potentially novel pathways associated with PD.</p>
</caption>
<graphic xlink:href="fgene-14-1230579-g004.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>4 Discussion</title>
<p>In this paper, we utilized a large collection of demographic and clinical variables together with the corresponding genomic data from the Fox Insight online study. We identified both novel and well-known demographic and genomic factors via correlation and GWAS analyses. From the top demographic and genomic factors, we further developed and compared a variety of ML models for predicting PD using demographic features alone, genomic features alone, and combined features considering both demographic and genomic factors. To understand the importance and predictability of the demographic and genomic factors, we performed EG analysis for the ANN demographic and genetic model as well as acquired feature importance scores from the RF demographic and genetic models. These input feature analyses, not yet adopted much by the PD domain, allowed us to interpret the ML models and identify the most predictive demographic and genomic factors for PD. Finally, we applied GSEA analysis based on our GWAS results and found both novel and previously reported PD pathways.</p>
<p>In the relatively large demographic dataset, both RF and ANN models did well with the same AUC of 0.89 and similar F1-scores of 0.79 and 0.80, respectively. The large overlapping of the top 14 demographic variables ranked by RF and ANN using two different feature analysis methods (RF feature importance and ES) strongly suggested the robustness of the models as well as the importance of these top demographic variables in PD prediction. As another line of evidence, most of these top demographic variables were reported previously for their association with PD. In the relatively small genetic dataset, the ANN model performed the best, and it also revealed the influence of each SNP feature on PD prediction; when including additional demographic features into the ANN model, the AUC and F1-scores were further increased to 0.78 and 0.69, respectively. The top predictive demographic and genetic features, together with the developed ML models, can potentially be used in the clinical setting to predict the PD risk before its onset for early intervention.</p>
<p>In this study, we performed rigorous experimentation to avoid potential overfitting and fair evaluation/analysis of the ML models as follows: 1) we strictly tuned hyperparameters for all the ML models based on their performance on a separate validation set and further evaluated the performance of the final model on an unseen test set; 2) we performed random bootstrapping for the control (non-PD) samples to get a more balanced dataset for the training set only to avoid potential information leakage and overfitting in the validation or test set; 3) in addition to looking at the evaluation metric of AUC, we also examined the precision, recall, and F1-score of all the ML models for a more comprehensive and less biased evaluation; and 4) we compared and developed different ML models for PD prediction, where we used two different methods (RF feature importance and EG) to analyze and understand the feature importance from two different models (RF and ANN). Despite these rigorous experimental designs, it would be ideal if we could obtain additional data and further validate these ML models on an independent study.</p>
<p>Through correlation analysis, GWAS, and feature importance analysis, we identified both novel and well-known demographic and genetic factors related to PD. For example, in our GWAS analysis, we identified well-known variants in the <italic>GBA</italic> gene, which encodes the glucocerebrosidase enzyme implicated in Gaucher&#x2019;s disease, a lysosomal storage disorder. It had been established that lysosomal dysfunction, associated with <italic>GBA</italic> gene mutations, was linked to neurodegeneration and, particularly, to Parkinson&#x2019;s disease (<xref ref-type="bibr" rid="B33">Navarro-Romero et al., 2020</xref>). Our findings reinforced the importance of the <italic>GBA</italic> gene lysosomal pathways in the pathophysiology of PD. Other than identifying well-known variants within <italic>GBA</italic>, our GWAS analysis also identified several novel and significant variants and gene loci; among these, three novel intron variants in <italic>LMNA</italic> (p-values smaller than 4.0e-21) and one novel missense variant in <italic>SEMA4A</italic> (p-value &#x3d; 1.11e-26) with very small <italic>p</italic>-values are particularly interesting, since <italic>LMNA</italic> and semphorins were reported to be associated with PD by other studies. The gene <italic>SEMA4A</italic> had previously been linked to Th17 cell-mediated neuroinflammation (<xref ref-type="bibr" rid="B23">Koda et al., 2020</xref>). Given that neuroinflammation is a well-recognized component of PD pathology, our findings suggested a potential role of <italic>SEMA4A</italic> in the progression of PD, potentially via modulating neuroinflammatory processes. Our feature importance analysis from the PD-predicting ANN and RF models provided another set of evidence to show the capability of the variants in predicting PD. These analyses highlighted some of the significant variants identified by GWAS, such as the well-known missense variant of rs76763715 located inside <italic>GBA</italic> and the intron variant rs1749409 in the <italic>RIT1</italic> gene, both of which were ranked within the top three most-predicting variants by both RF FI and ANN EG; these ML feature importance analyses also helped identify rs11264300, a missense variant in the gene of <italic>DCST1</italic>, and, rs11584630, an intron variant in the gene of <italic>KCNN3</italic>&#x2014;although these variants did not reach the GWAS significance, their corresponding genes were reported to be associated with PD by other studies (<xref ref-type="bibr" rid="B50">Simunovic et al., 2010</xref>; <xref ref-type="bibr" rid="B22">Koch et al., 2021</xref>). Overall, this coupling of ML approaches with the GWAS analysis is beneficial in validating the significance of GWAS-identified PD variants with additional PD prediction evidence and identifying potential PD variants that could have been missed by GWAS due to limited power.</p>
</sec>
<sec sec-type="conclusion" id="s5">
<title>5 Conclusion</title>
<p>In summary, by performing GWAS analysis coupled with ML approaches, we identified impactful demographic and genomic factors as well as developed ML models that may help predict PD. The new loci identified from GWAS or ML input feature importance analysis warranted further investigation.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The data analyzed in this study were obtained from the Fox Insight study via the Fox Insight Data Exploration Network (Fox DEN; <ext-link ext-link-type="uri" xlink:href="https://foxden.michaeljfox.org/insight/explore/insight.jsp">https://foxden.michaeljfox.org/insight/explore/insight.jsp</ext-link>), and the following licenses/restrictions were applied: qualified researchers may apply for access to Fox Insight datasets. Requests to access these datasets should be directed to Fox DEN: <ext-link ext-link-type="uri" xlink:href="https://foxden.michaeljfox.org/insight/register/genetic">https://foxden.michaeljfox.org/insight/register/genetic</ext-link>.</p>
</sec>
<sec id="s7">
<title>Ethics statement</title>
<p>Ethical review and approval were not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent from the patients/participants or patients/participants&#x2019; legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.</p>
</sec>
<sec id="s8">
<title>Author contributions</title>
<p>JL conceived and designed the experiments; MR carried out the data collection and modeling; and JL and MR analyzed the results and wrote the paper. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>This research was funded in part by the grant K01HL161538 from the National Heart, Lung, and Blood Institute (NHLBI).</p>
</sec>
<ack>
<p>The authors appreciate the shared data from the Fox Insight online study sponsored by the Michael J. Fox Foundation (MJFF).</p>
</ack>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s12">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2023.1230579/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2023.1230579/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="Table2.xlsx" id="SM1" mimetype="application/xlsx" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table1.xlsx" id="SM2" mimetype="application/xlsx" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aguilar-Pineda</surname>
<given-names>J. A.</given-names>
</name>
<name>
<surname>Vera-Lopez</surname>
<given-names>K. J.</given-names>
</name>
<name>
<surname>Shrivastava</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Ch&#xe1;vez-Fumagalli</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Nieto-Montesinos</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Alvarez-Fernandez</surname>
<given-names>K. L.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Vascular smooth muscle cell dysfunction contribute to neuroinflammation and Tau hyperphosphorylation in Alzheimer disease</article-title>. <source>iScience</source> <volume>24</volume>, <fpage>102993</fpage>. <pub-id pub-id-type="doi">10.1016/j.isci.2021.102993</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akoglu</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>User&#x2019;s guide to correlation coefficients</article-title>. <source>Turk J. Emerg. Med.</source> <volume>18</volume>, <fpage>91</fpage>&#x2013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1016/j.tjem.2018.08.001</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Anwar</surname>
<given-names>M. M.</given-names>
</name>
<name>
<surname>&#xd6;zkan</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>G&#xfc;rsoy-&#xd6;zdemir</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>The role of extracellular matrix alterations in mediating astrocyte damage and pericyte dysfunction in alzheimer&#x2019;s disease: A comprehensive review</article-title>. <source>Eur. J. Neurosci.</source> <volume>56</volume>, <fpage>5453</fpage>&#x2013;<lpage>5475</lpage>. <pub-id pub-id-type="doi">10.1111/ejn.15372</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blauwendraat</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Nalls</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Singleton</surname>
<given-names>A. B.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>The genetic architecture of Parkinson&#x2019;s disease</article-title>. <source>Lancet Neurol.</source> <volume>19</volume>, <fpage>170</fpage>&#x2013;<lpage>178</lpage>. <pub-id pub-id-type="doi">10.1016/S1474-4422(19)30287-X</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Calabresi</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Galletti</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Saggese</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Ghiglieri</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Picconi</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Neuronal networks and synaptic plasticity in Parkinson&#x2019;s disease: beyond motor deficits</article-title>. <source>Park. Relat. Disord.</source> <volume>13</volume>, <fpage>S259</fpage>&#x2013;<lpage>S262</lpage>. <pub-id pub-id-type="doi">10.1016/S1353-8020(08)70013-0</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cal&#xec;</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ottolini</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Brini</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Calcium signaling in Parkinson&#x2019;s disease</article-title>. <source>Cell Tissue Res.</source> <volume>357</volume>, <fpage>439</fpage>&#x2013;<lpage>454</lpage>. <pub-id pub-id-type="doi">10.1007/s00441-014-1866-0</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chairta</surname>
<given-names>P. P.</given-names>
</name>
<name>
<surname>Hadjisavvas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Georgiou</surname>
<given-names>A. N.</given-names>
</name>
<name>
<surname>Loizidou</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Yiangou</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Demetriou</surname>
<given-names>C. A.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Prediction of Parkinson&#x2019;s disease risk based on genetic profile and established risk factors</article-title>. <source>Genes (Basel)</source> <volume>12</volume>, <fpage>1278</fpage>. <pub-id pub-id-type="doi">10.3390/genes12081278</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Nalls</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Hallgr&#xed;msd&#xf3;ttir</surname>
<given-names>I. B.</given-names>
</name>
<name>
<surname>Hunkapiller</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Van Der Brug</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>F</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>A meta-analysis of genome-wide association studies identifies 17 new Parkinson&#x2019;s disease risk loci</article-title>. <source>Nat. Genet.</source> <volume>49</volume>, <fpage>1511</fpage>&#x2013;<lpage>1516</lpage>. <pub-id pub-id-type="doi">10.1038/ng.3955</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xun</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Ian</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Rugao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Jonathan</surname>
<given-names>D. G.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Caffeine protects against MPTP&#x2010;induced blood&#x2010;brain barrier dysfunction in mouse striatum</article-title>. <source>J. Neurochem.</source> <volume>107</volume>, <fpage>1147</fpage>&#x2013;<lpage>1157</lpage>. <pub-id pub-id-type="doi">10.1111/j.1471-4159.2008.05697.x</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chicco</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>T&#xf6;tsch</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Jurman</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>The matthews correlation coefficient (Mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation</article-title>. <source>BioData Min.</source> <volume>14</volume>, <fpage>13</fpage>. <pub-id pub-id-type="doi">10.1186/s13040-021-00244-z</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>S. W.</given-names>
</name>
<name>
<surname>Mak</surname>
<given-names>T. S. H.</given-names>
</name>
<name>
<surname>O&#x2019;Reilly</surname>
<given-names>P. F.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Tutorial: A guide to performing polygenic risk score analyses</article-title>. <source>Nat. Protoc.</source> <volume>15</volume>, <fpage>2759</fpage>&#x2013;<lpage>2772</lpage>. <pub-id pub-id-type="doi">10.1038/s41596-020-0353-1</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Chollet</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>keras. GitHub</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/fchollet/keras">https://github.com/fchollet/keras</ext-link>
</comment>.</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Clarimon</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Scholz</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Fung</surname>
<given-names>H. C.</given-names>
</name>
<name>
<surname>Hardy</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Eerola</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hellstr&#xf6;m</surname>
<given-names>O.</given-names>
</name>
<etal/>
</person-group> (<year>2006</year>). <article-title>Conflicting results regarding the semaphorin gene (SEMA5A) and the risk for Parkinson disease</article-title>. <source>Am. J. Hum. Genet.</source> <volume>78</volume>, <fpage>1082</fpage>&#x2013;<lpage>1084</lpage>. <pub-id pub-id-type="doi">10.1086/504727</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dauer</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Przedborski</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Parkinson&#x2019;s disease: mechanisms and models</article-title>. <source>Neuron</source> <volume>39</volume>, <fpage>889</fpage>&#x2013;<lpage>909</lpage>. <pub-id pub-id-type="doi">10.1016/S0896-6273(03)00568-3</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davis</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Andruska</surname>
<given-names>K. M.</given-names>
</name>
<name>
<surname>Benitez</surname>
<given-names>B. A.</given-names>
</name>
<name>
<surname>Racette</surname>
<given-names>B. A.</given-names>
</name>
<name>
<surname>Perlmutter</surname>
<given-names>J. S.</given-names>
</name>
<name>
<surname>Cruchaga</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Variants in GBA, SNCA, and MAPT influence Parkinson disease risk, age at onset, and progression</article-title>. <source>Neurobiol. Aging</source> <volume>37</volume>, <fpage>209 e1</fpage>&#x2013;<lpage>e209 e5</lpage>. <pub-id pub-id-type="doi">10.1016/j.neurobiolaging.2015.09.014</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dehestani</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Sreelatha</surname>
<given-names>A. A. K.</given-names>
</name>
<name>
<surname>Schulte</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bansal</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Gasser</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Mitochondrial and autophagy-lysosomal pathway polygenic risk scores predict Parkinson&#x2019;s disease</article-title>. <source>Mol. Cell. Neurosci.</source> <volume>121</volume>, <fpage>103751</fpage>. <pub-id pub-id-type="doi">10.1016/j.mcn.2022.103751</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Erion</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Janizek</surname>
<given-names>J. D.</given-names>
</name>
<name>
<surname>Sturmfels</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Lundberg</surname>
<given-names>S. M.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S-I.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Improving performance of deep learning models with axiomatic attribution priors and expected gradients</article-title>. <source>Nat. Mach. Intell.</source> <volume>3</volume>, <fpage>620</fpage>&#x2013;<lpage>631</lpage>. <pub-id pub-id-type="doi">10.1038/s42256-021-00343-w</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ferrari</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Kia</surname>
<given-names>D. A.</given-names>
</name>
<name>
<surname>Tomkins</surname>
<given-names>J. E.</given-names>
</name>
<name>
<surname>Hardy</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wood</surname>
<given-names>N. W.</given-names>
</name>
<name>
<surname>Lovering</surname>
<given-names>R. C.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Stratification of candidate genes for Parkinson&#x2019;s disease using weighted protein-protein interaction network analysis</article-title>. <source>BMC Genomics</source> <volume>19</volume>, <fpage>452</fpage>&#x2013;<lpage>458</lpage>. <pub-id pub-id-type="doi">10.1186/s12864-018-4804-9</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kaler</surname>
<given-names>A. S.</given-names>
</name>
<name>
<surname>Purcell</surname>
<given-names>L. C.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Estimation of a significance threshold for genome-wide association studies</article-title>. <source>BMC Genomics</source> <volume>20</volume>, <fpage>618</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1186/s12864-019-5992-7</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kieburtz</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Wunderle</surname>
<given-names>K. B.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Parkinson&#x2019;s disease: evidence for environmental risk factors</article-title>. <source>Mov. Disord.</source> <volume>28</volume>, <fpage>8</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1002/mds.25150</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Shin</surname>
<given-names>J-Y.</given-names>
</name>
<name>
<surname>Kwon</surname>
<given-names>N-J.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>C-U.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>C. S.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Evaluation of low-pass genome sequencing in polygenic risk score calculation for Parkinson&#x2019;s disease</article-title>. <source>Hum. Genomics</source> <volume>15</volume>, <fpage>58</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1186/s40246-021-00357-w</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koch</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Laabs</surname>
<given-names>B. H.</given-names>
</name>
<name>
<surname>Kasten</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Vollstedt</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Becktepe</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Br&#xfc;ggemann</surname>
<given-names>N</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Validity and prognostic value of a polygenic risk score for parkinson&#x2019;s disease</article-title>. <source>Genes (Basel)</source> <volume>12</volume>, <fpage>1859</fpage>. <pub-id pub-id-type="doi">10.3390/genes12121859</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koda</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Namba</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kinoshita</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nakatsuji</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Sugimoto</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sakakibara</surname>
<given-names>K</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Sema4A is implicated in the acceleration of Th17 cell-mediated neuroinflammation in the effector phase</article-title>. <source>J. Neuroinflammation</source> <volume>17</volume>, <fpage>82</fpage>. <pub-id pub-id-type="doi">10.1186/s12974-020-01757-w</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kornbrot</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Point biserial correlation</article-title>,&#x201d; in <source>Wiley StatsRef: Statistics reference online</source> (<publisher-loc>Hoboken</publisher-loc>: <publisher-name>Wiley</publisher-name>). <pub-id pub-id-type="doi">10.1002/9781118445112.stat06227</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liaw</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Wiener</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Classification and regression by randomForest</article-title>. <source>R. News</source> <volume>2</volume>, <fpage>3</fpage>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Timothy</surname>
<given-names>G. L.</given-names>
</name>
<name>
<surname>Demetrius</surname>
<given-names>M. M.</given-names>
</name>
<name>
<surname>Ole</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Axon guidance and synaptic maintenance: preclinical markers for neurodegenerative disease and therapeutics</article-title>. <source>Trends Neurosci.</source> <volume>32</volume>, <fpage>142</fpage>&#x2013;<lpage>149</lpage>. <pub-id pub-id-type="doi">10.1016/j.tins.2008.11.006</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Verbitsky</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kisselev</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Browne</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mejia-Sanatana</surname>
<given-names>H</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). <article-title>Genome-wide association study identifies candidate genes for Parkinson&#x2019;s disease in an Ashkenazi Jewish population</article-title>. <source>BMC Med. Genet.</source> <volume>12</volume>, <fpage>104</fpage>&#x2013;<lpage>116</lpage>. <pub-id pub-id-type="doi">10.1186/1471-2350-12-104</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lo</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Arora</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Baig</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lawton</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>El Mouden</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Barber</surname>
<given-names>T. R</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Predicting motor, cognitive and functional impairment in Parkinson&#x2019;s</article-title>. <source>Ann. Clin. Transl. Neurol.</source> <volume>6</volume>, <fpage>1498</fpage>&#x2013;<lpage>1509</lpage>. <pub-id pub-id-type="doi">10.1002/acn3.50853</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Louppe</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Wehenkel</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sutera</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Geurts</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2013</year>). &#x201c;<article-title>Understanding variable importances in Forests of randomized trees</article-title>,&#x201d; in <conf-name>Advances in Neural Information Processing Systems</conf-name>, <conf-loc>Lake Tahoe Nevada</conf-loc>, <conf-date>December 5 - 10, 2013</conf-date>.</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moustafa</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Chakravarthy</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Phillips</surname>
<given-names>J. R.</given-names>
</name>
<name>
<surname>Gupta</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Keri</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Polner</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>Motor symptoms in Parkinson&#x2019;s disease: A unified framework</article-title>. <source>Neurosci. Biobehav Rev.</source> <volume>68</volume>, <fpage>727</fpage>&#x2013;<lpage>740</lpage>. <pub-id pub-id-type="doi">10.1016/j.neubiorev.2016.07.010</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nalls</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Blauwendraat</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Vallerga</surname>
<given-names>C. L.</given-names>
</name>
<name>
<surname>Heilbron</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Bandres-Ciga</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>D</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Identification of novel risk loci, causal insights, and heritable risk for Parkinson&#x2019;s disease: A meta-analysis of genome-wide association studies</article-title>. <source>Lancet Neurol.</source> <volume>18</volume>, <fpage>1091</fpage>&#x2013;<lpage>1102</lpage>. <pub-id pub-id-type="doi">10.1016/S1474-4422(19)30320-5</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nalls</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Pankratz</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Lill</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>Do</surname>
<given-names>C. B.</given-names>
</name>
<name>
<surname>Hernandez</surname>
<given-names>D. G.</given-names>
</name>
<name>
<surname>Saad</surname>
<given-names>M</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson&#x2019;s disease</article-title>. <source>Nat. Genet.</source> <volume>46</volume>, <fpage>989</fpage>&#x2013;<lpage>993</lpage>. <pub-id pub-id-type="doi">10.1038/ng.3043</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Navarro-Romero</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Montpey&#xf3;</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Martinez-Vicente</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>The emerging role of the lysosome in Parkinson&#x2019;s disease</article-title>. <source>Cells</source> <volume>9</volume>, <fpage>2399</fpage>. <pub-id pub-id-type="doi">10.3390/cells9112399</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nielsen</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>Warden</surname>
<given-names>M. N.</given-names>
</name>
<name>
<surname>Camacho-Soto</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Willis</surname>
<given-names>A. W.</given-names>
</name>
<name>
<surname>Wright</surname>
<given-names>B. A.</given-names>
</name>
<name>
<surname>Racette</surname>
<given-names>B. A.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>A predictive model to identify Parkinson disease from administrative claims data</article-title>. <source>Neurology</source> <volume>89</volume>, <fpage>1448</fpage>&#x2013;<lpage>1456</lpage>. <pub-id pub-id-type="doi">10.1212/WNL.0000000000004536</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Noble</surname>
<given-names>W. S.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>What is a support vector machine?</article-title> <source>Nat. Biotechnol.</source> <volume>24</volume>, <fpage>1565</fpage>&#x2013;<lpage>1567</lpage>. <pub-id pub-id-type="doi">10.1038/nbt1206-1565</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Noyce</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Bestwick</surname>
<given-names>J. P.</given-names>
</name>
<name>
<surname>Silveira-Moriyama</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hawkes</surname>
<given-names>C. H.</given-names>
</name>
<name>
<surname>Giovannoni</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Lees</surname>
<given-names>A. J.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Meta-analysis of early nonmotor features and risk factors for Parkinson disease</article-title>. <source>Ann. Neurol.</source> <volume>72</volume>, <fpage>893</fpage>&#x2013;<lpage>901</lpage>. <pub-id pub-id-type="doi">10.1002/ana.23687</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nuruddin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Syverstad</surname>
<given-names>G. H. E.</given-names>
</name>
<name>
<surname>Lillehaug</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Leergaard</surname>
<given-names>T. B.</given-names>
</name>
<name>
<surname>Nilsson</surname>
<given-names>L. N. G.</given-names>
</name>
<name>
<surname>Ropstad</surname>
<given-names>E.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Elevated mRNA-Levels of gonadotropin-releasing hormone and its receptor in plaque-bearing Alzheimer&#x2019;s Disease transgenic mice</article-title>. <source>PLoS One</source> <volume>9</volume>, <fpage>e103607</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0103607</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oyston</surname>
<given-names>L. J.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>Y. Q.</given-names>
</name>
<name>
<surname>Khuong</surname>
<given-names>T. M.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Q-P.</given-names>
</name>
<name>
<surname>Lau</surname>
<given-names>M. T.</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Neuronal Lamin regulates motor circuit integrity and controls motor function and lifespan</article-title>. <source>Cell Stress</source> <volume>2</volume>, <fpage>225</fpage>&#x2013;<lpage>232</lpage>. <pub-id pub-id-type="doi">10.15698/cst2018.09.152</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pedregosa</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Varoquaux</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Gramfort</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Michel</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Thirion</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Grisel</surname>
<given-names>O.</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). <article-title>Scikit-learn: machine learning in Python</article-title>. <source>J. Mach. Learn. Res.</source> <volume>12</volume>, <fpage>2825</fpage>&#x2013;<lpage>2830</lpage>.</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Poewe</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Non&#x2010;motor symptoms in Parkinson&#x2019;s disease</article-title>. <source>Eur. J. Neurol.</source> <volume>15</volume>, <fpage>14</fpage>&#x2013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1111/j.1468-1331.2008.02056.x</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Prashanth</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Roy</surname>
<given-names>S. D.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Early detection of Parkinson&#x2019;s disease through patient questionnaire and predictive modelling</article-title>. <source>Int. J. Med. Inf.</source> <volume>119</volume>, <fpage>75</fpage>&#x2013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijmedinf.2018.09.008</pub-id>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>S-M.</given-names>
</name>
<name>
<surname>Lang</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>G-D.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X-L.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>The ras superfamily of small gtpases in non-neoplastic cerebral diseases</article-title>. <source>Front. Mol. Neurosci.</source> <volume>12</volume>, <fpage>121</fpage>. <pub-id pub-id-type="doi">10.3389/fnmol.2019.00121</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ranstam</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Multiple P-values and Bonferroni correction</article-title>. <source>Osteoarthr. Cartil.</source> <volume>24</volume>, <fpage>763</fpage>&#x2013;<lpage>764</lpage>. <pub-id pub-id-type="doi">10.1016/j.joca.2016.01.008</pub-id>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reden&#x161;ek</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Tro&#x161;t</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Dol&#x17e;an</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Genetic determinants of Parkinson&#x2019;s disease: can they help to stratify the patients based on the underlying molecular defect?</article-title> <source>Front. Aging Neurosci.</source> <volume>9</volume>, <fpage>20</fpage>. <pub-id pub-id-type="doi">10.3389/fnagi.2017.00020</pub-id>
</citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Salas-Leal</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Salas-Pacheco</surname>
<given-names>S. M.</given-names>
</name>
<name>
<surname>Gavilan-Ceniceros</surname>
<given-names>J. A. P.</given-names>
</name>
<name>
<surname>Castellanos-Juarez</surname>
<given-names>F. X.</given-names>
</name>
<name>
<surname>Mendez-Hernandez</surname>
<given-names>E. M.</given-names>
</name>
<name>
<surname>La Llave-Leon</surname>
<given-names>O</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>&#x3b1;-syn and SNP rs356219 as a potential biomarker in blood for Parkinson&#x2019;s disease in Mexican Mestizos</article-title>. <source>Neurosci. Lett.</source> <volume>754</volume>, <fpage>135901</fpage>. <pub-id pub-id-type="doi">10.1016/j.neulet.2021.135901</pub-id>
</citation>
</ref>
<ref id="B46">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Schwab</surname>
<given-names>B. C.</given-names>
</name>
<name>
<surname>Meijer</surname>
<given-names>H. G. E.</given-names>
</name>
<name>
<surname>van Wezel</surname>
<given-names>R. J. A.</given-names>
</name>
<name>
<surname>van Gils</surname>
<given-names>S. A.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Gap junctions as modulators of synchrony in Parkinson&#x2019;s disease</article-title>,&#x201d; in <conf-name>Society for Neuroscience Annual Meeting, Neuroscience</conf-name>, <conf-loc>Washington, DC</conf-loc>, <conf-date>November 15 - 19, 2014</conf-date>.</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Senturk</surname>
<given-names>Z. K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Early diagnosis of Parkinson&#x2019;s disease using machine learning algorithms</article-title>. <source>Med. Hypotheses</source> <volume>138</volume>, <fpage>109603</fpage>. <pub-id pub-id-type="doi">10.1016/j.mehy.2020.109603</pub-id>
</citation>
</ref>
<ref id="B48">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Shah</surname>
<given-names>P. M.</given-names>
</name>
<name>
<surname>Zeb</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Shafi</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Zaidi</surname>
<given-names>S. F. A.</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>M. A.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Detection of Parkinson disease in brain MRI using convolutional neural network</article-title>,&#x201d; in <conf-name>2018 24th International Conference on Automation and Computing (ICAC)</conf-name>, <conf-loc>Newcastle Upon Tyne, UK</conf-loc>, <conf-date>06-07 September 2018</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.23919/IConAC.2018.8749023</pub-id>
</citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shah</surname>
<given-names>V. V.</given-names>
</name>
<name>
<surname>McNames</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mancini</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Carlson-Kuhta</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Nutt</surname>
<given-names>J. G.</given-names>
</name>
<name>
<surname>El-Gohary</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Digital biomarkers of mobility in Parkinson&#x2019;s disease during daily living</article-title>. <source>J. Park. Dis.</source> <volume>10</volume>, <fpage>1099</fpage>&#x2013;<lpage>1111</lpage>. <pub-id pub-id-type="doi">10.3233/JPD-201914</pub-id>
</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Simunovic</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Ming</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yulei</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Robert</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kai</surname>
<given-names>C. S.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Evidence for gender-specific transcriptional profiles of nigral dopamine neurons in Parkinson disease</article-title>. <source>PLoS One</source> <volume>5</volume>, <fpage>e8856</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0008856</pub-id>
</citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smolensky</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Amondikar</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Crawford</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Neu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kopil</surname>
<given-names>C. M.</given-names>
</name>
<name>
<surname>Daeschler</surname>
<given-names>M</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Fox Insight collects online, longitudinal patient-reported outcomes and genetic data on Parkinson&#x2019;s disease</article-title>. <source>Sci. Data</source> <volume>7</volume>, <fpage>67</fpage>&#x2013;<lpage>69</lpage>. <pub-id pub-id-type="doi">10.1038/s41597-020-0401-2</pub-id>
</citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sperandei</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Understanding logistic regression analysis</article-title>. <source>Biochem. Med. Zagreb.</source> <volume>24</volume>, <fpage>12</fpage>&#x2013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.11613/BM.2014.003</pub-id>
</citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sveinbjornsdottir</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>The clinical symptoms of Parkinson&#x2019;s disease</article-title>. <source>J. Neurochem.</source> <volume>139</volume>, <fpage>318</fpage>&#x2013;<lpage>324</lpage>. <pub-id pub-id-type="doi">10.1111/jnc.13691</pub-id>
</citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Svozil</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kvasni&#x10d;ka</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Posp&#xed;chal</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Introduction to multi-layer feed-forward neural networks</article-title>. <source>Chemom. Intelligent Laboratory Syst.</source> <volume>39</volume>, <fpage>43</fpage>&#x2013;<lpage>62</lpage>. <pub-id pub-id-type="doi">10.1016/S0169-7439(97)00061-0</pub-id>
</citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Takegahara</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Kumanogoh</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Involvement of semaphorins and their receptors in neurological diseases</article-title>. <source>Clin. Exp. Neuroimmunol.</source> <volume>1</volume>, <fpage>33</fpage>&#x2013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1111/j.1759-1961.2009.00004.x</pub-id>
</citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vacic</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Ozelius</surname>
<given-names>L. J.</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>L. N.</given-names>
</name>
<name>
<surname>Bar-Shira</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gana-Weisz</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gurevich</surname>
<given-names>T</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>Genome-wide mapping of IBD segments in an Ashkenazi PD cohort identifies associated haplotypes</article-title>. <source>Hum. Mol. Genet.</source> <volume>23</volume>, <fpage>4693</fpage>&#x2013;<lpage>4702</lpage>. <pub-id pub-id-type="doi">10.1093/hmg/ddu158</pub-id>
</citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Visscher</surname>
<given-names>P. M.</given-names>
</name>
<name>
<surname>Wray</surname>
<given-names>N. R.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Sklar</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>McCarthy</surname>
<given-names>M. I.</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>M. A.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>10 years of GWAS discovery: biology, function, and translation</article-title>. <source>Am. J. Hum. Genet.</source> <volume>101</volume>, <fpage>5</fpage>&#x2013;<lpage>22</lpage>. <pub-id pub-id-type="doi">10.1016/j.ajhg.2017.06.005</pub-id>
</citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Mingyao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Maja</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Pathway-based approaches for analysis of genomewide association studies</article-title>. <source>Am. J. Hum. Genet.</source> <volume>81</volume>, <fpage>1278</fpage>&#x2013;<lpage>1283</lpage>. <pub-id pub-id-type="doi">10.1086/522374</pub-id>
</citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wong</surname>
<given-names>J. C.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Schwarzschild</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Ascherio</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Restless legs syndrome: an early clinical feature of Parkinson disease in men</article-title>. <source>Sleep</source> <volume>37</volume>, <fpage>369</fpage>&#x2013;<lpage>372</lpage>. <pub-id pub-id-type="doi">10.5665/sleep.3416</pub-id>
</citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xia</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>Z. H.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Progression of motor symptoms in Parkinson&#x2019;s disease</article-title>. <source>Neurosci. Bull.</source> <volume>28</volume>, <fpage>39</fpage>&#x2013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.1007/s12264-012-1050-z</pub-id>
</citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>Y-W.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>C-H.</given-names>
</name>
<name>
<surname>Su</surname>
<given-names>H-C.</given-names>
</name>
<name>
<surname>Chien</surname>
<given-names>C-Y.</given-names>
</name>
<name>
<surname>Sung</surname>
<given-names>P-S.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>T-Y.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>A new instrument combines cognitive and social functioning items for detecting mild cognitive impairment and dementia in Parkinson&#x2019;s disease</article-title>. <source>Front. Aging Neurosci.</source> <volume>14</volume>, <fpage>913958</fpage>. <pub-id pub-id-type="doi">10.3389/fnagi.2022.913958</pub-id>
</citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zham</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Arjunan</surname>
<given-names>S. P.</given-names>
</name>
<name>
<surname>Raghav</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>D. K.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Efficacy of guided spiral drawing in the classification of Parkinson&#x2019;s disease</article-title>. <source>IEEE J. Biomed. Health Inf.</source> <volume>22</volume>, <fpage>1648</fpage>&#x2013;<lpage>1652</lpage>. <pub-id pub-id-type="doi">10.1109/JBHI.2017.2762008</pub-id>
</citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Qiao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>Z</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Association analysis and polygenic risk score evaluation of 38 GWAS-identified Loci in a Chinese population with Parkinson&#x2019;s disease</article-title>. <source>Neurosci. Lett.</source> <volume>762</volume>, <fpage>136150</fpage>. <pub-id pub-id-type="doi">10.1016/j.neulet.2021.136150</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>