AUTHOR=Wang Zhao , Liang Lin , Xu Hao , Huang Yuhui , He Chen , Xu Weiran , Zhu Haojie TITLE=Evaluating the efficacy of large language models in cardio-oncology patient education: a comparative analysis of accuracy, readability, and prompt engineering strategies JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 8 - 2025 YEAR=2026 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1693446 DOI=10.3389/frai.2025.1693446 ISSN=2624-8212 ABSTRACT=BackgroundThe integration of large language models (LLMs) into cardio-oncology patient education holds promise for addressing the critical gap in accessible, accurate, and patient-friendly information. However, the performance of publicly available LLMs in this specialized domain remains underexplored.ObjectivesThis study evaluates the performance of three LLMs (ChatGPT-4, Kimi, DouBao) act as assistants for physicians in cardio-oncology patient education and examines the impact of prompt engineering on response quality.MethodsTwenty standardized questions spanning cardio-oncology topics were posed twice to three LLMs (ChatGPT-4, Kimi, DouBao): once without prompts and once with a directive to simplify language, generating 240 responses. These responses were evaluated by four cardio-oncology specialists for accuracy, comprehensiveness, helpfulness, and practicality. Readability and complexity were assessed using a Chinese text analysis framework.ResultsAmong 240 responses, 63.3% were rated “correct,” 35.0% “partially correct,” and 1.7% “incorrect.” No significant differences in accuracy were observed between models (p = 0.26). Kimi demonstrated no incorrect responses. Significant declines in comprehensiveness (p = 0.03) and helpfulness (p < 0.01) occurred post-prompt, particularly for DouBao (accuracy: 57.5% vs. 7.5%, p < 0.01). Readability metrics (readability age, difficulty score, total word count, sentence length) showed no inter-model differences, but prompts reduced complexity (e.g., DouBao’s readability age decreased from 12.9 ± 0.8 to 10.1 ± 1.2 years, p < 0.01).ConclusionPublicly available LLMs provide largely accurate responses to cardio-oncology questions, yet their utility is constrained by inconsistent comprehensiveness and sensitivity to prompt design. While simplifying language improves readability, it risks compromising clinical relevance. Tailored fine-tuning and specialized evaluation frameworks are essential to optimize LLMs for patient education in cardio-oncology.