AUTHOR=He Yang , Huang Jiali , Li Na , Zhou Gaosheng , Liu Jinglan 

TITLE=Artificial intelligence-driven prediction and interpretation of central line-associated bloodstream infections in ICU: insights from the MIMIC-IV database

JOURNAL=Frontiers in Public Health

VOLUME=Volume 13 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1675077

DOI=10.3389/fpubh.2025.1675077

ISSN=2296-2565

ABSTRACT=ObjectiveTo develop and internally validate interpretable machine learning (ML) models for predicting individual central line-associated bloodstream infection (CLABSI) risk in adult ICU patients with central venous catheters (CVCs) using the MIMIC-IV database.MethodsWe conducted a retrospective observational cohort study using the MIMIC-IV database. Adult ICU patients with both central venous catheter placement and blood culture evaluation were included. Patients were classified into CLABSI and non-CLABSI cohorts based on central venous catheter tip culture results. A comprehensive set of demographic, physiological, laboratory, therapeutic, and nursing variables was extracted. Feature selection employed Least Absolute Shrinkage and Selection Operator (LASSO) regression. Seven machine learning (ML) models—logistic regression, decision tree, random forest, XGBoost, support vector machine, neural network, and gradient boosting—were developed and compared. Discrimination and calibration were assessed using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1 score, and Brier score. The optimal model was interpreted with SHAP (SHapley Additive exPlanations) values to elucidate feature contributions.ResultsAmong 11,999 ICU patients, 519 (4.3%) developed CLABSI. CLABSI patients were younger (61.0 vs. 66.0 years), had higher rates of multi-lumen catheters (91.3 vs. 63.6%), mechanical ventilation (90.9 vs. 74.0%), and dialysis (34.9 vs. 7.2%; all p < 0.001). The random forest model achieved optimal performance (AUC 0.950, 95% CI 0.931–0.966; sensitivity 0.904, specificity 0.865), outperforming traditional models. SHAP analysis identified ICU length of stay, unique caregivers, and arterial catheterization as top predictors. CLABSI cases exhibited prolonged ICU stays, increased caregiver exposure, and elevated inflammatory markers. Decision curve analysis confirmed clinical utility, with robust performance maintained in sensitivity analyses.ConclusionMachine learning models, particularly the random forest model, accurately predict CLABSI risk in ICU patients. The use of interpretable AI techniques such as SHAP enhances transparency and provides actionable insights for clinical practice. These findings support the development of early warning systems to reduce CLABSI incidence and improve patient outcomes.