AUTHOR=Liu Yan , Xie Xiaodong , Wan Xin , Pan Yi , Wang Cheng 
  
TITLE=Enhancing RAPTOR with semantic chunking and adaptive graph clustering
  
JOURNAL=Frontiers in Computer Science
  
VOLUME=Volume 7 - 2025
  
YEAR=2026
  
URL=https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2025.1710121
  
DOI=10.3389/fcomp.2025.1710121
  
ISSN=2624-9898
  
ABSTRACT=IntroductionWhile Retrieval-Augmented Generation (RAG) enhances language models, its application to long documents is often hampered by simplistic retrieval strategies that fail to capture hierarchical context. Although the RAPTOR framework addresses this through a recursive tree-structured approach, its effectiveness is constrained by semantic fragmentation from fixed-token chunking and a static clustering methodology that is suboptimal for organizing the hierarchy.MethodsIn this paper, we propose a comprehensive two-stage enhancement framework to address these limitations. We first employ Semantic Segmentation to generate coherent foundational leaf nodes, and subsequently introduce an Adaptive Graph Clustering (AGC) strategy. This strategy leverages the Leiden algorithm with a novel layer-aware dual-adaptive parameter mechanism to dynamically tailor clustering granularity.ResultsExtensive experiments on the narrative QuALITY benchmark and the scientific Qasper dataset demonstrate the robustness and domain generalization of our framework. Our full model achieves a peak accuracy of 65.5% on QuALITY and demonstrates superior semantic validity on Qasper, significantly outperforming the baseline. Comparative ablation studies further reveal that our graph-topological approach outperforms traditional distance-based, density-based, and distribution-based clustering methods. Additionally, our approach constructs a dramatically more compact hierarchy, reducing the number of required summary nodes by up to 76%.DiscussionThis work underscores the critical importance of a holistic, semantic-first approach to building more effective and efficient retrieval trees for complex RAG tasks.