AUTHOR=Zhou Feiyue , Zhou Bin , Qu Yuan , Zhong Shuai , Liu Ting , Liu Yuan , Zhao Xiaohu , Tian Xuanhe , Hao Xiaojing , Jiang Ping 

TITLE=Development and validation of an interpretable machine learning model for predicting low muscle mass in patients with rheumatoid arthritis: a multicenter study

JOURNAL=Frontiers in Medicine

VOLUME=Volume 12 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1694320

DOI=10.3389/fmed.2025.1694320

ISSN=2296-858X

ABSTRACT=BackgroundThis study aims to develop a predictive model for identifying rheumatoid arthritis (RA) patients at risk of low muscle mass using easily obtainable clinical indicators. The goal is to facilitate targeted screening for individuals at high risk of sarcopenia, optimize diagnostic strategies, reduce the burden of additional testing, and improve the efficiency of early identification and intervention.MethodsThis study analyzed data from 1,260 RA patients obtained from the National Health and Nutrition Examination Survey (NHANES) database and the Affiliated Hospital of Shandong University of Traditional Chinese Medicine (SHUTCM). Eight machine learning models were developed, including Random Forest, LightGBM, XGBoost, CatBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Logistic Regression, and a weighted ensemble model. Model performance was evaluated using metrics such as accuracy, area under the receiver operating characteristic curve (AUC), F1 score, Precision, Recall, and Brier score loss. The SHapley Additive exPlanation (SHAP) method was used to rank feature importance and interpret the final model.ResultsAmong all machine learning models, the tree-based weighted ensemble model demonstrated the best performance, achieving an AUC of 0.921, outperforming all individual models. The model exhibited good calibration and higher net clinical benefit in decision curve analysis, especially within the probability threshold range of 0.2 to 0.8, and achieved an AUC of 0.848 on the test set, demonstrating a certain degree of generalizability. SHAP analysis identified BMI, albumin, hemoglobin, age, and creatinine as the most important features for predicting the risk of low muscle mass. SHAP dependency and waterfall plots further showed the model’s decision-making mechanisms. Finally, we developed an online risk prediction calculator based on the FastAPI framework, which automatically generates individualized low muscle mass risk scores based on user input. The tool has been deployed on the Hugging Face platform and is accessible online.ConclusionBased on a large, multicenter dataset, we developed and validated an explainable ML model capable of identifying individuals with a high risk of low muscle mass among patients with rheumatoid arthritis. This model may serve as a decision-support tool for clinicians in guiding further screening and diagnosis of sarcopenia.