AUTHOR=Cristea Daniela-Maria , Sima Ioan , Iantovics Laszlo Barna 

TITLE=Comparative analysis of optimized logistic regression with state-of-the-art models for complex gastroenterological image analysis

JOURNAL=Frontiers in Medicine

VOLUME=Volume 12 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1655612

DOI=10.3389/fmed.2025.1655612

ISSN=2296-858X

ABSTRACT=IntroductionClassifying gastrointestinal (GI) polyps detected in colonoscopy images is a critical task in colorectal cancer prevention. Given the diagnostic ambiguity of serrated polyps, which share morphological features with both hyperplastic and adenomatous lesions, this study focuses on multiclass classification using machine learning (ML) techniques. Multiclass Logistic Regression (LR), a model favored by clinicians for its interpretability, was initially optimized and evaluated.MethodsA structured dataset comprising 152 instances and 698 extracted features was used. We conducted a statistical analysis of 88 LR configurations, varying solvers, penalties, and regularization strengths. To improve classification performance, four additional ML algorithms were implemented: k-Nearest Neighbors (kNN), Support Vector Machine (SVM), Random Forest (RF), and XGBoost. For each classifier, parameter tuning was applied using grid search and stratified cross-validation.ResultsThe best-performing LR model (liblinear solver, L1 penalty, C = 0.01) achieved an accuracy of 70.39%, outperforming physician benchmarks (experts: 65.00%, beginners: 58.42%). In the multiclass setting, XGBoost achieved the highest macro-average F1-score (0.88) and overall accuracy (89.34%), followed by Random Forest (F1 = 0.85, accuracy = 86.05%), SVM (F1 = 0.83, accuracy = 84.21%), and kNN (F1 = 0.56, accuracy = 66.38%).DiscussionWhile LR remains valuable for its interpretability, ensemble methods such as XGBoost and Random Forest demonstrated superior performance and robustness. These findings support the integration of advanced ML models into clinical decision support systems, particularly in low-data scenarios where deep learning may be impractical.