AUTHOR=De Cristofaro Elena , Zorzi Francesca , Abreu Maria , Colella Alice , Blanco Giovanna Del Vecchio , Fiorino Gionata , Lolli Elisabetta , Noor Nurulamin , Lopetuso Loris Riccardo , Pioche Mathieu , Grimaldi Jean , Paoluzi Omero Alessandro , Roseira Joana , Sena Giorgia , Troncone Edoardo , Calabrese Emma , Monteleone Giovanni , Marafini Irene 

TITLE=When AI speaks like a specialist: ChatGPT-4 in the management of inflammatory bowel disease

JOURNAL=Frontiers in Artificial Intelligence

VOLUME=Volume 8 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1678320

DOI=10.3389/frai.2025.1678320

ISSN=2624-8212

ABSTRACT=BackgroundArtificial intelligence (AI) is gaining traction in healthcare, especially for patients’ education. Inflammatory bowel diseases (IBD) require continuous engagement, yet the quality of online information accessed by patients is inconsistent. ChatGPT, a generative AI model, has shown promise in medical scenarios, but its role in IBD communication needs further evaluation. The objective of this study was to assess the quality of ChatGPT-4’s responses to common patient questions about IBD, compared to those provided by experienced IBD specialists.MethodsTwenty-five frequently asked questions were collected during routine IBD outpatient visits and categorized into five themes: pregnancy/breastfeeding, diet, vaccinations, lifestyle, and medical therapy/surgery. Each question was answered by ChatGPT-4 and by two expert gastroenterologists. Responses were anonymized and evaluated by 12 physicians (six IBD experts and six non-experts) using a 5-point Likert scale across four dimensions: accuracy, reliability, comprehensibility, and actionability. Evaluators also attempted to identify whether responses were AI- or human-generated.ResultsChatGPT-4 responses received significantly higher overall scores than those from human experts (mean 4.28 vs. 4.05; p < 0.001). The best-rated scenarios were medical therapy and surgery; the diet scenario consistently received lower scores. Only 33% of AI-generated responses were correctly identified as such, indicating strong similarity to human-written answers. Both expert and non-expert evaluators rated AI responses highly, though IBD specialists gave higher ratings overall.ConclusionChatGPT-4 generated high-quality, clear, and actionable responses to IBD-related patient questions, often outperforming human experts. Its outputs were frequently indistinguishable from those written by physicians, suggesting potential as a supportive tool for patient education. Nonetheless, further studies are needed to assess real-world application and ensure appropriate use in personalized clinical care.