AUTHOR=Srisuradetchai Patchanok TITLE=Posterior averaging with Gaussian naive Bayes and the R package RandomGaussianNB for big-data classification JOURNAL=Frontiers in Big Data VOLUME=Volume 8 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2025.1706417 DOI=10.3389/fdata.2025.1706417 ISSN=2624-909X ABSTRACT=RandomGaussianNB is an open-source R package implementing the posterior-averaging Gaussian naive Bayes (PAV-GNB) algorithm, a scalable ensemble extension of the classical GNB classifier. The method introduces posterior averaging to mitigate correlation bias and enhance stability in high-dimensional settings while maintaining interpretability and computational efficiency. Theoretical results establish the variance of the ensemble posterior, which decreases inversely with ensemble size, and a margin-based generalization bound that connects posterior variance with classification error. Together, these results provide a principled understanding of the bias–variance trade-off in PAV-GNB. The package delivers a fully parallel, reproducible framework for large-scale classification. Simulation studies under big-data conditions—large samples, many features, and multiple classes—show consistent accuracy, low variance, and agreement with theoretical predictions. Scalability experiments demonstrate near-linear runtime improvement with multi-core execution, and a real-world application on the Pima Indians Diabetes dataset validates PAV-GNB's reliability and computational efficiency as an interpretable, statistically grounded approach for ensemble naive Bayes classification.