AUTHOR=He Changli , Wang Yin , Zhang Han , Li Sitian , Kang Fengjiao , Cai Fengqun , Han Lizhu , Yin Qinan , Li Gang , Song Xuewu , Bian Yuan TITLE=A study on a real-world data-based VTE risk prediction model for lymphoma patients JOURNAL=Frontiers in Pharmacology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2025.1691271 DOI=10.3389/fphar.2025.1691271 ISSN=1663-9812 ABSTRACT=BackgroundPatients diagnosed with malignant tumors exhibit a markedly elevated risk of venous thromboembolism (VTE), which has a negative impact on their prognosis. Currently, there is no reliable predictive model specifically for thrombosis risk in lymphoma patients. This study aims to develop and validate a machine learning model leveraging real-world data, offering a dependable risk assessment tool for the early identification of VTE in lymphoma patients.MethodsWe retrospectively analyzed 605 hospitalized patients with lymphoma between January 2019 and June 2024. Candidate predictors included demographic characteristics, comorbidities and medical history, tumor-related factors, treatment-related factors, and laboratory parameters. The primary endpoint was the occurrence of VTE within 6 months after hospitalization for confirmed lymphoma. Model development incorporated three imputation methods, three sampling strategies, three feature selection approaches, and nine machine learning algorithms. Predictive performance was compared across all models.ResultsCombining different imputation, sampling, and feature selection strategies yielded 27 datasets, which were trained across nine algorithms to generate 243 models. The optimal model—Simp-SMOTE_rf_GBM, constructed using random forest imputation, SMOTE oversampling, and gradient boosting machine—achieved the highest predictive performance (AUC = 0.954). SHAP-based model interpretation identified nine key predictors ranked by importance: anticoagulant use, D-dimer, lactate dehydrogenase, central venous catheterization, carcinoembryonic antigen (CEA), Eastern Cooperative Oncology Group (ECOG) score, serum total protein (TP), total cholesterol (TC), and infectious disease.ConclusionThis study established and validated a machine learning model for predicting VTE risk in lymphoma patients, with the optimal model demonstrating excellent discriminatory ability (AUC = 0.954). The model provides evidence to guide the timing and strategy of anticoagulation, supporting early VTE screening and risk stratification in clinical practice. Its implementation has important implications for improving patient outcomes and advancing public health.