AUTHOR=Ihenetu Francis Chukwuebuka , Okoro Chinyere Ihuarulam , Ozoude Makuochukwu Maryann , Okechukwu Emeka H. , Nwokah Easter Godwin TITLE=Comparative analysis of frequentist, Bayesian, and machine learning models for predicting SARS-CoV-2 PCR positivity JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 8 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1668477 DOI=10.3389/frai.2025.1668477 ISSN=2624-8212 ABSTRACT=BackgroundPrediction of infection status is critical for effective disease management and timely intervention. Traditional diagnostic methods for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) are challenged by varying sensitivities and specificities, necessitating the evaluation of advanced statistical approaches. This study evaluated the predictive performance of frequentist logistic regression, Bayesian logistic regression, and a random forest classifier using clinical and demographic predictors to predict PCR positivity.MethodologyA total of 950 participants were analyzed using three modeling approaches. To address class imbalance, the data were balanced using the Synthetic Minority Oversampling Technique (SMOTE) before training the random forest classifier. Predictors include IgG serostatus, travel history (international and domestic), self-reported symptoms (such as loss of smell, fatigue, sore throat), sex, and age. Three models were developed: (1) frequentist logistic regression; (2) Bayesian logistic regression with a moderately informative Normal (mean = 1, SD = 2) prior and a weakly informative Cauchy (0, 2.5) prior; and (3) machine learning (ML) using a random forest classifier. Missing data were minimal (<2%) and handled through imputation, with sensitivity analyses confirming no material impact on model performance. Performance was evaluated using odds ratios, posterior means with credible intervals, and area under the ROC curve (AUC).ResultsOf the 950 participants, 74.8% tested positive for SARS-CoV-2. The frequentist logistic regression identified recent international travel (Odds Ratio = 4.8), loss of smell (OR = 2.3), and domestic travel (OR = 1.5) as the strongest predictors of PCR positivity. The Bayesian model yielded similar posterior estimates, confirming the robustness of these associations across prior assumptions. The random forest classifier achieved the highest discriminative performance (AUC = 0.947–0.963). Notably, age and sex were not significant in the regression models but emerged as influential predictors in the random forest model, suggesting possible nonlinear or interaction effects.ConclusionThe machine learning approach (random forest) outperformed the logistic regression models in predictive accuracy. Bayesian regression confirmed the reliability of key predictors and allowed quantification of uncertainty. These findings highlight that simple, routinely collected symptom and exposure data can support rapid, resource-conscious screening for SARS-CoV-2, particularly when laboratory testing capacity is limited.