AUTHOR=Liu Yan , Xie Xiaodong , Wan Xin , Pan Yi , Wang Cheng TITLE=Enhancing RAPTOR with semantic chunking and adaptive graph clustering JOURNAL=Frontiers in Computer Science VOLUME=Volume 7 - 2025 YEAR=2026 URL=https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2025.1710121 DOI=10.3389/fcomp.2025.1710121 ISSN=2624-9898 ABSTRACT=IntroductionWhile Retrieval-Augmented Generation (RAG) enhances language models, its application to long documents is often hampered by simplistic retrieval strategies that fail to capture hierarchical context. Although the RAPTOR framework addresses this through a recursive tree-structured approach, its effectiveness is constrained by semantic fragmentation from fixed-token chunking and a static clustering methodology that is suboptimal for organizing the hierarchy.MethodsIn this paper, we propose a comprehensive two-stage enhancement framework to address these limitations. We first employ Semantic Segmentation to generate coherent foundational leaf nodes, and subsequently introduce an Adaptive Graph Clustering (AGC) strategy. This strategy leverages the Leiden algorithm with a novel layer-aware dual-adaptive parameter mechanism to dynamically tailor clustering granularity.ResultsExtensive experiments on the narrative QuALITY benchmark and the scientific Qasper dataset demonstrate the robustness and domain generalization of our framework. Our full model achieves a peak accuracy of 65.5% on QuALITY and demonstrates superior semantic validity on Qasper, significantly outperforming the baseline. Comparative ablation studies further reveal that our graph-topological approach outperforms traditional distance-based, density-based, and distribution-based clustering methods. Additionally, our approach constructs a dramatically more compact hierarchy, reducing the number of required summary nodes by up to 76%.DiscussionThis work underscores the critical importance of a holistic, semantic-first approach to building more effective and efficient retrieval trees for complex RAG tasks.