AUTHOR=Hao Xiaoxi , Liang Dingjian , Shen Yimin , Sun Cuimin , Lan Wei TITLE=PreBP: an interpretable, optimized ensemble framework using routine complete blood count for rapid pathogen identification in bacterial pneumonia JOURNAL=Frontiers in Bioinformatics VOLUME=Volume 5 - 2025 YEAR=2026 URL=https://www.frontiersin.org/journals/bioinformatics/articles/10.3389/fbinf.2025.1769816 DOI=10.3389/fbinf.2025.1769816 ISSN=2673-7647 ABSTRACT=IntroductionBacterial pneumonia remains a major global health challenge, and early pathogen identification is important for timely and targeted treatment. However, conventional microbiological diagnostics such as sputum or blood culture are labor-intensive and time-consuming.MethodsWe propose an interpretable ensemble learning framework (PreBP) for rapid pathogen identification using routinely available complete blood count (CBC) parameters. We analyzed 1,334 CBC samples from patients with culture-confirmed bacterial pneumonia caused by four major pathogens: Pseudomonas aeruginosa, Escherichia coli, Staphylococcus aureus, and Streptococcus pneumoniae. Pathogen labels were determined based on clinical culture results. Five machine learning models (extreme gradient boosting (XGBoost), multilayer perceptron neural network (MLPNN), adaptive boosting (AdaBoost), random forest (RF), and extremely randomized trees (ExtraTrees)) were trained as comparators, and PreBP was developed with metaheuristic-optimized hyperparameters. Key CBC biomarkers were refined using a dual-phase feature selection strategy combining Lasso and Boruta. To enhance transparency, SHapley additive explanations (SHAP) were applied to provide both global biomarker importance and local, case-level explanations.ResultsPreBP achieved the best overall performance, with an AUC of 0.920, precision of 87.1%, and accuracy and sensitivity of 86.7%.DiscussionBecause the framework relies on routine CBC measurements, it can generate interpretable predictions once CBC results are available, which may provide supplementary evidence for earlier pathogen-oriented clinical decision-making alongside culture-dependent workflows. Overall, PreBP offers an interpretable and computational approach for pathogen identification in bacterial pneumonia based on routine laboratory data.