AUTHOR=Wei Mu-Yang , Li Yu-Lin , Liu Shu-Yan , Li Guang-Yu TITLE=Evaluating the competence of large language models in ophthalmology clinical practice: a multi-scenario quantitative study JOURNAL=Frontiers in Cell and Developmental Biology VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/cell-and-developmental-biology/articles/10.3389/fcell.2025.1704762 DOI=10.3389/fcell.2025.1704762 ISSN=2296-634X ABSTRACT=Background and objectivesA comparative evaluation of large language models (LLMs) is crucial for their application in specialized fields, such as ophthalmology. This study systematically assesses five prominent LLMs (ChatGPT 4, Claude 3 Opus, Gemini 1.5 Flash, ERNIE 3.5, and iFLY Healthcare) to quantify their performance across key clinical domains and provide evidence-based guidance for their integration.MethodsWe evaluated the LLMs across three simulated ophthalmic scenarios. For clinical assistance, the models responded to 50 questions, which were assessed for accuracy, completeness, and readability. For diagnosis and treatment, models answered 375 qualification exam questions to assess clinical reasoning. For doctor-patient communication, models responded to 20 SPIKES-based scenarios, which were analyzed for emotional and social engagement.ResultsIn clinical assistance, Gemini 1.5 Flash demonstrated superior accuracy and completeness, while Claude 3 Opus produced the most readable text. For diagnosis and treatment, all models surpassed the passing threshold for the qualification exam, with Claude 3 Opus achieving the highest overall accuracy (81.07%). In doctor-patient communication, Gemini 1.5 Flash showed the strongest performance in positive emotional expression and social engagement.ConclusionThis study innovatively evaluates LLMs in ophthalmic practice. Gemini 1.5 Flash excels in generating accurate clinical content and engaging with patients, whereas Claude 3 Opus demonstrates exceptional clinical reasoning and readability of text. Findings validate LLMs’ clinical potential while providing evidence-based selection criteria for ophthalmic AI applications. The results establish practical foundations for optimizing ophthalmic AI model development and systematically constructing intelligent ophthalmic hospital systems.