AUTHOR=Forth Leonie F. , Malorny Burkhard , Bönn Markus , Brinks Erik , Denay Grégoire , Deneke Carlus , El-Adawy Hosny , Fischer Jennie , Fuchs Jannika , Hiller Ekkehard , Bretschneider Nancy , Kleta Sylvia , Lüth Stefanie , Schultze Tilman , Petersen Henning , Projahn Michaela , Schäfers Christian , Stingl Kerstin , Stroehlein Andreas J. , Uelze Laura , Szabo Kathrin , Wöhlke Anne , Linde Jörg 

TITLE=An inter-laboratory study characterizes the impact of bioinformatic approaches on genome-based cluster detection for foodborne bacterial pathogens

JOURNAL=Frontiers in Microbiology

VOLUME=Volume 16 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2025.1629731

DOI=10.3389/fmicb.2025.1629731

ISSN=1664-302X

ABSTRACT=Accurate assignment of whole-genome sequences to clusters in foodborne outbreak investigations remains challenging. Variability in bioinformatics tools and quality metrics significantly impacts clustering outcomes. This study assessed inter-laboratory variance in cluster identification by providing four datasets of 50 raw Illumina paired-end sequences covering Shiga toxin-producing Escherichia coli, Listeria monocytogenes, Salmonella enterica, and Campylobacter jejuni. Following general rules of a specified guideline, participants applied in-house protocols for read quality assessment, 7-gene MLST, cgMLST, and SNP calling, then assigned samples to predefined focus clusters based on allele distance (AD) and mutations. Results revealed that differences in the interpretation of raw sequence and genome assembly quality influenced sample inclusion and finally cluster composition. Here, intra-species contamination was the most significant factor driving variability in decisions on whether to include or exclude samples. With one exception, 7-gene Multilocus-Sequence Typing (MLST) yielded consistent sequence types using different bioinformatics tools. The largest influence on cgMLST-defined clusters was the inclusion or exclusion of samples. Regarding bioinformatics, cgMLST was mainly reproducible. For S. enterica, discrepancies due to different software (Ridom SeqSphere+ vs. ChewieSnake) were larger than discrepancies due to different schemas. For other species, different schemas introduced larger discrepancies than different software. Most notably, C. jejuni cluster assignment was strongly affected by cgMLST schemas differing by a factor of two in the number of loci. SNP calling using Snippy produced concordant results across participants, except for C. jejuni when recombination filtering was used. This study highlights the impact caused by different interpretations of quality values when assessing clusters. Low-resolution cgMLST schemas were unsuitable for Campylobacter jejuni, and clustering near cut-off values was sensitive to bioinformatics tool selection. Standardized protocols are essential for reliable inter-laboratory comparison in foodborne pathogen surveillance.