AUTHOR=Wang Yufen , Bi Jian , Song Shunzhe , Sun Ying , Gong Aixia 

TITLE=Identifying gastric intestinal metaplasia risk based on clinical indicators: a machine learning predictive model based on the SHAP methodology

JOURNAL=Frontiers in Pharmacology

VOLUME=Volume 16 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2025.1602191

DOI=10.3389/fphar.2025.1602191

ISSN=1663-9812

ABSTRACT=BackgroundScreening for gastric intestinal metaplasia (GIM) holds significant importance for the early detection of gastric cancer. To help clinicians identify high-risk GIM patients and determine the timing of gastric mucosal biopsy, we aim to develop a predictive model for the occurrence of GIM in patients.MethodsPatients were collected from the First Affiliated Hospital of Dalian Medical University, following rigorous inclusion and exclusion criteria. Initially, the VarSelRF algorithm identified independent variables linked to GIM development. We employed eight machine learning algorithms, including Decision Trees (DT), Elastic Net (ENet), K-Nearest Neighbors (KNN), LightGBM, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) to construct predictive models. Their performances were benchmarked using ROC curves, calibration curves, and decision curve analysis (DCA) curves. We also applied SHAP values to interpret the RF model, quantifying the contribution of each feature to predictions. Additionally, a web-based calculator was developed based on the RF model to facilitate practical clinical applications.ResultsAmong the 975 patients examined, 322 individuals were pathologically confirmed to have GIM. Eleven independent variables significantly contributed to GIM occurrence, including gastric mucosal atrophy, H. pylori infection, direct bilirubin (DBIL), creatinine (Crea), smoking and alcohol history, gender, alanine aminotransferase (ALT), age, albumin/globulin ratio (ALB/GLO), and gamma-glutamyltransferase (GGT). The RF model demonstrated strong performance among the eight machine learning algorithms tested, achieving an AUC of 0.8167 in the testing dataset, along with a specificity of 85.5% and a sensitivity of 57.0%. The model’s interpretive capabilities were enhanced by SHAP values, which helped clinicians understand the decision-making process. The resulting web-based calculator serves as a practical tool for clinicians.ConclusionThis study highlights the innovative use of serological biomarkers to assess the risk of GIM. We found that certain markers related to liver and kidney function are strong predictors of GIM development. Additionally, the application of SHAP values improves the understanding of how features contribute to predictions, while the newly developed web-based calculator offers a practical tool for clinicians to evaluate GIM risk more easily.