AUTHOR=Tang Quanqing , Li Yutong , Liu Kaifeng , Huang Gaozhen , Gao Liangmeng , Tang Yiqi , Liu Hongwei TITLE=Development and validation of a risk prediction model for distant metastasis in muscle-invasive bladder cancer: a retrospective study integrating SEER data with external validation cohort and biomarker analysis JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1607173 DOI=10.3389/fonc.2025.1607173 ISSN=2234-943X ABSTRACT=BackgroundBladder cancer (BCa) ranks among the most prevalent cancers in men, with a subset of patients developing distant metastases (DM), resulting in poor prognosis. This study aims to develop and validate a nomogram to predict DM in patients with BCa, utilizing machine learning techniques to identify potential biomarkers.MethodsClinical data from patients with BCa diagnosed between January 2010 and December 2015 were retrospectively retrieved from the Surveillance, Epidemiology, and End Results (SEER) database and randomly split into a training cohort (n = 1,619) and an internal validation cohort (n = 694). An external validation cohort (n = 112) was obtained from the Affiliated Hospital of Guangdong Medical University between January 2021 and December 2023. Independent risk factors for DM were identified using univariate and multivariate logistic regression analyses and incorporated into the nomogram. Predictive accuracy was evaluated using calibration curves, and the nomogram's discriminative ability was compared with traditional staging systems by calculating the area under the curve (AUC). ResultsTumor size ≥ 3 cm, N stage (N1–N3), and lack of surgery were found to be independent risk factors for DM, all of which were included in the nomogram. ROC curve analysis demonstrated robust predictive performance, with AUC values of 0.732 in the training cohort, 0.750 in the internal validation cohort, and 0.968 in the external validation cohort. Additionally, calibration curves consistently showed good predictive accuracy across all cohorts. Machine learning methods, including LASSO and Random Forest, identified ADH1B as a potential biomarker for BCa, displaying exceptional diagnostic and prognostic performance (AUC = 0.983). ConclusionThis study, based on the SEER database and an external validation cohort, identified independent risk factors for DM in BCa and revealed ADH1B as a novel biomarker, offering new perspectives for clinical prediction and personalized treatment.