AUTHOR=Chang Li-Jen , Lee Kun-Hua , Mukundan Arvind , Karmakar Riya , Bauravindah Achmad , Chen Tsung-Hsien , Huang Chien-Wei , Wang Hsiang-Chen TITLE=Investigating object detection errors in endoscopic imaging of esophageal SCC and dysplasia through precision–recall analysis JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1707854 DOI=10.3389/fonc.2025.1707854 ISSN=2234-943X ABSTRACT=IntroductionEsophageal squamous cell carcinoma (ESCC) is difficult to detect early on white-light endoscopy (WLI) because lesions are subtle and artifacts (such as glare, bubbles, text, tools) mimic pathology. MethodsThis study benchmarked five object detectors including two You Only Look Once models (YOLOv5, YOLOv), Faster Region-based Convolutional Neural Networks (Faster R-CNN), Single Shot MultiBox Detector (SSD) and Real-time Detection Transformer (RT-DETR) on WLI dataset using harmonized training (from scratch, 150 epochs, identical hyperparameters) and two label configurations: a 4-label as major categories (SCC, Dysplasia, Bleeding, Inflammation) and an 11-label artifact. Evaluation used macro precision/recall/F1 at IoU 0.50 on a fixed 310-image test set. ResultsIncorporating artifact classes improved overall macro metrics, with YOLOv5/YOLOv8 providing the strongest performance in the 11-label scenarios, however, class-wise findings revealed persistent recall limitations for early disease. In the 11-label analysis, Dysplasia detection remained low (YOLOv5: 88/201, 43.8%; YOLOv8: 82/201, 40.8%), and SCC was only moderate (YOLOv5: 25/44, 56.8%; YOLOv8: 24/44, 54.5%). Confusion analyses showed that errors were dominated by non-detections (“background”) rather than misclassification with benign or artifact labels, while approximately one in five lesion predictions was a spurious unmatched false positive, implicating both sensitivity and specificity constraints. DiscussionThese results indicate that labeling artifacts reduces non-lesion confusion but does not, by itself, recover subtle early lesions. Limitations include single-center, WLI-only data and training from scratch, future work should prioritize endoscopy-specific pretraining, explicit artifact suppression or joint segmentation, and external validation.