AUTHOR=Wang Ziyuan , Yan Junqiang TITLE=Interpretable machine learning for cognitive impairment prediction in Parkinson’s disease: a multicenter validation study with SHAP analysis JOURNAL=Frontiers in Aging Neuroscience VOLUME=Volume 17 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/aging-neuroscience/articles/10.3389/fnagi.2025.1688653 DOI=10.3389/fnagi.2025.1688653 ISSN=1663-4365 ABSTRACT=IntroductionParkinson’s disease (PD)-related cognitive impairment (PD-CI) is a common and impactful complication of PD, yet current predictive models often rely on specialized resources, lack interpretability, or have limited cross-population validation. This study aimed to develop an interpretable machine learning framework for PD-CI detection using only routine clinical data, addressing unmet needs in accessible and generalizable PD care.MethodsWe analyzed 1,279 participants from the Parkinson’s Progression Markers Initiative (PPMI) as the discovery cohort and 197 patients from an independent validation cohort. PD-CI was defined by a Montreal Cognitive Assessment (MoCA) score ≤26 and Unified Parkinson’s Disease Rating Scale Part I (UPDRS-I) score ≥1. Twenty-one clinical features—encompassing hematological parameters, metabolic markers, and demographics—were preprocessed with synthetic minority over-sampling. Four machine learning models were trained and optimized via nested 5-fold cross-validation.ResultsThe Random Forest algorithm achieved superior performance in the discovery cohort (AUC = 0.83), outperforming CatBoost (AUC = 0.82), XGBoost (AUC = 0.79), and neural networks (AUC = 0.66). External validation of the framework preserved 71.57% accuracy. SHAP interpretability analysis identified age, neutrophil-to-lymphocyte ratio (NLR), and serum uric acid as critical predictors, revealing synergistic risk effects between elevated inflammation markers and reduced antioxidant levels.DiscussionThis framework demonstrates diagnostic accuracy comparable to advanced neuroimaging while utilizing readily available clinical data, enhancing accessibility in resource-limited settings. It highlights neuroinflammation and oxidative stress as key mechanistic drivers of PD-CI, advancing pathophysiological understanding. Multicenter validation confirms the model’s robustness across ethnic populations, supporting its utility as a clinically actionable tool for PD-CI screening and monitoring.