AUTHOR=Kim Seungmi , Choi Byung Kwan , Cho Jeong Su , Huh Up , Shin Myung-Jun , Obradovic Zoran , Rubin Daniel J. , Lee Jae Il , Park Jong-Hwan 

TITLE=Development of machine learning models with explainable AI for frailty risk prediction and their web-based application in community public health

JOURNAL=Frontiers in Public Health

VOLUME=Volume 13 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1698062

DOI=10.3389/fpubh.2025.1698062

ISSN=2296-2565

ABSTRACT=BackgroundFrailty is a public health concern linked to falls, disability, and mortality. Early screening and tailored interventions can mitigate adverse outcomes, but community settings require tools that are accurate and explainable. Korea is entering a super-aged phase, yet few approaches have used nationally representative survey data.ObjectiveThis study aimed to identify key predictors of frailty risk using the K-FRAIL scale using explainable machine learning (ML), based on data from the 2023 National Survey of Older Koreans (NSOK). It also sought to develop and internally validate prediction models. To demonstrate the potential applicability of these models in community public health and clinical practice, a web-based application was implemented.MethodsData from 10,078 older adults were analyzed, with frailty defined by the K-FRAIL scale (robust = 0, pre-frail = 1–2, and frail = 3–5). A total of 132 candidate variables were constructed through selection and derivation. Using CatBoost with out-of-fold (OOF) SHapley Additive exPlanations (SHAP, a game-theoretic approach to quantify feature contributions), 15 key predictors were identified and applied across 10 algorithms under nested cross-validation (CV). Model performance was evaluated using receiver operating characteristic–area under the curve (ROC-AUC), precision–recall area under the curve (PR-AUC), F1-score, balanced accuracy, and the Brier score. To assess feasibility, a single-page bilingual web application was developed, integrating the CatBoost inference pipeline for offline use.ResultsSHAP analysis identified depression score, age, instrumental activities of daily living (IADL) count, sleep quality, and cognition as the leading predictors, followed by smartphone use, number of medications, province, driving status, hospital use, physical activity, osteoporosis, eating alone, digital adaptation difficulty, and sex, yielding 15 key predictors across the mental, functional, lifestyle, social, and digital domains. Using these predictors, boosting models outperformed other algorithms, with CatBoost achieving the best performance (ROC-AUC = 0.813 ± 0.014; PR-AUC = 0.748 ± 0.019).ConclusionAn explainable machine learning model with strong discrimination performance and adequate calibration was developed, accompanied by a lightweight web application for potential use in community and clinical settings. However, external validation, recalibration, and subgroup fairness assessments are needed to ensure generalizability and clinical adoption.