AUTHOR=Zhao Shulin , Nan Baoyun , Guo Jun , Xu Wenkai , Li Zhen TITLE=Coronary heart disease risk prediction based on GAIN imputation and interpretable machine learning JOURNAL=Frontiers in Genetics VOLUME=Volume 16 - 2025 YEAR=2026 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2025.1752811 DOI=10.3389/fgene.2025.1752811 ISSN=1664-8021 ABSTRACT=IntroductionCoronary atherosclerotic heart disease (CHD) is a leading cause of morbidity and mortality worldwide, making timely identification critical for improving patient prognosis. However, traditional imaging examinations are limited by high costs and patient selection bias, while existing prediction models often lack interpretability and generalization ability. This study aimed to develop a robust, interpretable machine learning approach to address these challenges.MethodsThis retrospective study analyzed hospitalized patients at Quzhou People’s Hospital from July 2021 to March 2025. Patients diagnosed with CHD were categorized as positive samples, while those without cardiovascular disease served as negative controls. The dataset integrated demographic data, blood biomarkers, and vital signs. A Generative Adversarial Imputation Network (GAIN) was utilized to handle missing values, and multiple machine learning models were constructed and compared for prediction performance.ResultsAmong the evaluated algorithms, the XGBoost model achieved superior performance on the test set with an Area Under the Curve (AUC) of 0.9053. To enhance clinical utility, the integration of SHAP (SHapley Additive exPlanations) values enabled both global and local interpretation of model decisions. Key predictive factors identified included mean respiratory rate during hospitalization, age, high-sensitivity troponin I (hs-cTnI), and hypertension.DiscussionThe developed model demonstrates robust prediction performance combined with high clinical interpretability. Unlike traditional “black box” models, this approach clarifies the contribution of specific risk factors. Crucially, the tool is well-suited for dual deployment: serving as an automated screening tool integrated into hospital electronic health records (EHRs) and as a self-monitoring aid for individuals with underlying health conditions via mobile health applications.