AUTHOR=Kang Landan , Luo Dan , Xie Wenchi , Luo Xiaojing , Mei Jie , He Jing 

TITLE=An explainable machine learning model for predicting preterm birth in pregnant women with gestational diabetes mellitus and hypertensive disorders of pregnancy: development and external validation

JOURNAL=Frontiers in Endocrinology

VOLUME=Volume 16 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2025.1665935

DOI=10.3389/fendo.2025.1665935

ISSN=1664-2392

ABSTRACT=BackgroundGestational diabetes mellitus (GDM) and hypertensive disorders of pregnancy (HDP) often coexist and share pathophysiological features such as insulin resistance and endothelial dysfunction, increasing the risk of preterm birth. However, few predictive models have focused specifically on this high-risk group. This study aimed to develop and externally validate a machine learning model for this high-risk population and assess its clinical utility and interpretability.MethodsThis retrospective dual-center study included electronic medical records from 121 and 136 pregnant women with comorbid GDM and HDP, which served as the development and external validation cohorts, respectively. Multiple machine learning algorithms, including Least Absolute Shrinkage and Selection Operator (LASSO) regression, Random Forest (RF), and Naive Bayes (NB), were applied to construct predictive models. To address class imbalance and enhance model robustness, the Synthetic Minority Over-sampling Technique (SMOTE, which generates synthetic samples for the minority class to balance imbalanced datasets) was employed. Model interpretability was further assessed using Shapley Additive Explanations (SHAP).ResultsThirteen variables with univariate significance were entered into Elastic Net regression, yielding five key predictors: alanine transaminase (ALT), aspartate transaminase (AST), Albumin, lactate dehydrogenase (LDH), and systolic blood pressure at 32 – 36 weeks (SBP_32_36). While the LASSO model achieved the highest area under the receiver operating characteristic curve (AUC, 0.802), the NB model demonstrated greater clinical net benefit, higher reclassification performance as measured by the Net Reclassification Improvement (NRI, which evaluates whether patients are more accurately assigned to higher- or lower-risk groups, which reflects the average improvement in distinguishing high-risk from low-risk patients) and Integrated Discrimination Improvement (IDI), and greater robustness in SMOTE-based sensitivity analyses. In the external validation cohort (n = 136), it maintained strong generalization with an AUC of 0.777 (95% confidence interval [CI]: 0.645–0.887), accuracy of 0.801 (95% CI: 0.735–0.860), sensitivity of 0.792, and specificity of 0.804, supporting its selection as the optimal model for this high-risk population.ConclusionsThe Naive Bayes model exhibited robust predictive ability and interpretability for identifying preterm birth risk in pregnancies with comorbid GDM and HDP, and may serve as a transparent, clinically applicable tool for individualized obstetric risk management.