AUTHOR=Diaz Alexander J. , Centurioni Dominick A. , Lasek-Nesselquist Erica , Lapierre Pascal , Egan Christina T. , Perry Michael J. TITLE=Whole genome sequencing of neurotoxin-producing Clostridium species in New York state to bolster epidemiological investigations and reveal patterns of diversity and distribution JOURNAL=Frontiers in Public Health VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1651032 DOI=10.3389/fpubh.2025.1651032 ISSN=2296-2565 ABSTRACT=Clostridia that produce neurotoxins are highly relevant organisms to public health. While cases of botulism [caused by C. botulinum and other organisms that produce botulinum neurotoxin (BoNT)] are rare, the severity of this disease necessitates robust epidemiologic surveillance to promptly identify and mitigate outbreaks. Next generation sequencing (NGS) can provide additional support to these investigations through single nucleotide polymorphism (SNP)-based analysis, phylogenetic reconstruction, toxin subtyping, and structural analysis. Until recently, testing for this disease was restricted to traditional culture or molecular methods such as polymerase chain reaction (PCR) to detect bont genes, while mouse bioassay and endopeptidase-mass spectrometry (Endopep-MS) methods confirmed the presence of enzymatically active toxin. The New York State Department of Health (NYSDOH) Wadsworth Center Biodefense Laboratory performed a retrospective whole genome sequence (WGS) analysis of approximately 240 Clostridium spp. isolates from the past 40 years to supplement traditional test results and further characterize these organisms. Genomic analyses identified seven BoNT serotypes/serotype combinations, including A4(B5), A5(B2’), and B5F2 that were uncharacteristic of samples typically received. Additionally, SNP-based analysis and de novo genome assemblies retrospectively validated several epidemiology links or differentiated samples previously tested with only traditional methods. Our work highlights the clinical utility of supplementing conventional data with NGS to further characterize BoNT-producing organisms and underscores the importance of incorporating WGS into laboratory workflows to support epidemiologic investigations. However, several obstacles still exist which may prevent implementation. These include the expertise needed to execute bioinformatic analyses and interpret the resulting data, a lack of standardized bioinformatic workflows, and difficulty in determining SNP-based thresholds to identify linked samples without incorporation of additional data and analyses. Supplementing or replacing short-read sequencing with long-read sequencing (LRS) and the use of metagenomic or capture-based enrichment for analysis of primary specimens could increase the leverage obtained from WGS in epidemiological investigations.