AUTHOR=Stawarz Katarzyna , Gorzelnik Anna , Klos Wojciech , Korzon Jacek , Kissin Filip , Bieńkowska-Pluta Karolina , Stawarz Grzegorz , Rusetska Natalia , Zwolinski Jakub TITLE=Systematic review of artificial intelligence and radiomics for preoperative prediction of extranodal extension and lymph node metastasis in oropharyngeal cancer JOURNAL=Frontiers in Oncology VOLUME=Volume 15 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2025.1717641 DOI=10.3389/fonc.2025.1717641 ISSN=2234-943X ABSTRACT=BackgroundPreoperative identification of extranodal extension (ENE) and cervical lymph node metastasis (LNM) in oropharyngeal cancer guides treatment escalation and de-escalation. Artificial intelligence (AI) and radiomics offer promise for nodal assessment, but clinical utility and reporting quality remain variable.MethodsThis systematic review followed PRISMA guidelines. We systematically searched PubMed, Scopus, and Web of Science for studies published between 2020–2025. Eleven eligible studies (4 core, 7 supportive) addressed ENE (n=2) or LNM prediction (n=2), with additional supportive studies on segmentation, lymphatic spread modeling, MRI radiomics, and outcomes modeling. Extracted variables included study characteristics, performance metrics, validation, calibration, and unit of analysis. Risk of bias was assessed using PROBAST; reporting quality was evaluated with TRIPOD. Due to heterogeneity and limited study numbers, no meta-analysis was performed; results were narratively synthesized. For ENE, we report study-level accuracy, decision-curve analysis (DCA), and per-1,000 management impact.ResultsAll core studies were CT-based. The task-specific deep-learning ENE model achieved AUC 0.86 with balanced operating points, while the generalist LVLM (Large Vision-Language Model) reached sensitivity 1.00 with specificity 0.34. DCA favored the DL model across thresholds 0.10–0.40, showing fewer unnecessary dissections per 1,000 patients than Treat-all or L(V)LM. For LNM, discrimination was high (AUC 0.865–0.919), calibration was reported, and one study included external validation, though threshold-level sensitivity/specificity were missing. External validation was reported in 25% of core studies, calibration in 50%; TRIPOD adherence was 74.5% overall, with frequent under-reporting of blinding and missing-data handling.ConclusionsAI and radiomics show promising potential for preoperative prediction of ENE and LNM in oropharyngeal cancer. Task-specific deep-learning models achieve balanced discrimination, while generalist LVLMs provide high recall at lower specificity. For LNM, encouraging performance is reported, but limited external validation and absent standardized thresholds still preclude clinical use. Broader validation and harmonized reporting are essential before translation into practice.Registration/ProtocolNot registered; methods followed PRISMA/TRIPOD/PROBAST guidance.