AUTHOR=Wu Sirui , Huang Yike , Luo Lan , Deng Jielun , Wang Yuanfang , Ye Fei , Li Dongdong TITLE=Latent class analysis and machine learning for clinical subtyping prediction and differentiation in suspected neurosyphilis patients JOURNAL=Frontiers in Cellular and Infection Microbiology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/cellular-and-infection-microbiology/articles/10.3389/fcimb.2025.1665468 DOI=10.3389/fcimb.2025.1665468 ISSN=2235-2988 ABSTRACT=ObjectiveNeurosyphilis presents significant diagnostic and therapeutic challenges due to its heterogeneous clinical manifestations, absence of a gold-standard diagnostic criterion, and variable treatment responses. This study aims to identify clinically homogeneous subtypes of suspected neurosyphilis patients and develop a machine learning-based subtyping model to support clinical decision-making.MethodsData from 451 suspected neurosyphilis patients were retrospectively collected from West China Hospital of Sichuan University. Patients were divided into a model development cohort (n=369) and an external validation cohort (n=82) by time. Latent class analysis (LCA) was performed to identify subtypes, with the optimal class number determined by model fit indicators. Key predictive variables were selected using LASSO regression and Boruta algorithm. Six machine learning algorithms were employed to build LCA subtype prediction models. Feature importance was interpreted via SHAP analysis, and model generalizability was assessed using the external cohort.ResultsLCA classified patients into three homogeneous subtypes: “typical neurosyphilis” (43.7%; predominantly male, high serum TRUST titer, significant CSF abnormalities, and robust intrathecal immune activation), “atypical neurosyphilis” (17.9%; absence of elevated CSF protein, mild intrathecal IgG synthesis), “non-neurosyphilis” (38.5%; normal CSF parameters). Six variables (age, serum TRUST titer, CSF protein, CSF nucleated cells, IgG index, CSF TTs) were used for model construction. The XGBoost model demonstrated optimal performance, achieving an AUC of 0.966 (accuracy: 87.3%) on the internal test set and 0.970 (accuracy: 91.5%) on the external validation set. Key predictors included CSF nucleated cells, CSF TTs, and IgG index.ConclusionThis study defines three clinically meaningful latent subtypes of neurosyphilis. The developed XGBoost model effectively discriminates between these subtypes of neurosyphilis and non-neurosyphilis in clinical settings, facilitating timely diagnosis and treatment.