ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation
Pith reviewed 2026-05-23 03:25 UTC · model grok-4.3
The pith
ArchRAG retrieves from attributed communities in a hierarchical graph index to raise RAG accuracy and cut token use.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ArchRAG augments the question using attributed communities, introduces an LLM-based hierarchical clustering method to create those communities from the graph, builds a novel hierarchical index structure over the communities, and develops an effective online retrieval method, which together allow more accurate identification of relevant information while consuming fewer tokens than prior graph RAG systems.
What carries the argument
Attributed community, a group of graph entities sharing semantic attributes and link relationships, arranged in a hierarchical index that supports layered retrieval.
If this is right
- Graph data can be used more efficiently in QA because only relevant communities reach the model.
- Token consumption drops during online retrieval because the hierarchical index prunes irrelevant sections early.
- Existing graph RAG methods that lack community structure or layering can be outperformed on both accuracy and cost.
- The same indexing approach can scale to larger graphs by keeping retrieval local to the relevant hierarchy levels.
Where Pith is reading between the lines
- The technique might generalize to non-graph structured data if similar community detection can be applied to tables or trees.
- Targeted context from communities could lower the rate of unsupported claims in generated answers.
- Dynamic graphs might require periodic re-clustering, which could be tested as an extension.
Load-bearing premise
The LLM clustering step forms communities that stay both semantically coherent and directly relevant to the input question without introducing retrieval errors or extra token overhead.
What would settle it
Run ArchRAG and a baseline on the same graph QA dataset, then replace the LLM clustering with random partitioning and measure whether accuracy falls or token count rises.
Figures
read the original abstract
Retrieval-Augmented Generation (RAG) has proven effective in integrating external knowledge into large language models (LLMs) for solving question-answer (QA) tasks. The state-of-the-art RAG approaches often use the graph data as the external data since they capture the rich semantic information and link relationships between entities. However, existing graph-based RAG approaches cannot accurately identify the relevant information from the graph and also consume large numbers of tokens in the online retrieval process. To address these issues, we introduce a novel graph-based RAG approach, called Attributed Community-based Hierarchical RAG (ArchRAG), by augmenting the question using attributed communities, and also introducing a novel LLM-based hierarchical clustering method. To retrieve the most relevant information from the graph for the question, we build a novel hierarchical index structure for the attributed communities and develop an effective online retrieval method. Experimental results demonstrate that ArchRAG outperforms existing methods in both accuracy and token cost.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ArchRAG, a graph-based RAG framework for QA tasks. It augments questions using attributed communities derived via a novel LLM-based hierarchical clustering method on graph data, constructs a hierarchical index over these communities, and performs targeted online retrieval. The central claim is that this yields higher accuracy and lower token cost than prior graph RAG methods.
Significance. If the empirical superiority holds under rigorous controls, the work would offer a practical advance in reducing token overhead while improving relevance in graph-augmented LLM systems, addressing two recurring pain points in current RAG literature.
major comments (2)
- [§4] §4 (Experiments): the abstract and introduction assert outperformance on accuracy and token cost, yet no quantitative results, baselines, datasets, or statistical controls are referenced in the provided front matter; the central empirical claim therefore cannot be evaluated without the full experimental tables and methodology details.
- [§3.2] §3.2 (LLM-based hierarchical clustering): the method is load-bearing for the attributed-community construction, yet the description supplies no formal definition of community attribution, no guarantee against semantic drift across hierarchy levels, and no ablation isolating its contribution to the reported gains; this leaves the weakest assumption untested.
minor comments (2)
- [§3] Notation for the hierarchical index structure should be introduced with a single diagram or pseudocode block early in §3 to avoid repeated textual descriptions.
- [§3.3] The online retrieval algorithm description would benefit from a complexity analysis (time and token) to make the claimed efficiency gains explicit rather than qualitative.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment point by point below, indicating the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): the abstract and introduction assert outperformance on accuracy and token cost, yet no quantitative results, baselines, datasets, or statistical controls are referenced in the provided front matter; the central empirical claim therefore cannot be evaluated without the full experimental tables and methodology details.
Authors: The full manuscript contains Section 4 with the complete experimental details, including the specific datasets used, the baseline methods compared, quantitative tables reporting accuracy and token-cost metrics, and the statistical controls applied. To address the concern about traceability from the front matter, we will revise the introduction to include explicit forward references to the relevant tables and figures in Section 4. revision: yes
-
Referee: [§3.2] §3.2 (LLM-based hierarchical clustering): the method is load-bearing for the attributed-community construction, yet the description supplies no formal definition of community attribution, no guarantee against semantic drift across hierarchy levels, and no ablation isolating its contribution to the reported gains; this leaves the weakest assumption untested.
Authors: We agree that the current description of the LLM-based hierarchical clustering can be strengthened. In the revised manuscript we will add a formal mathematical definition of community attribution. We will also include a short analysis (or empirical check) of semantic consistency across hierarchy levels to address potential drift. Finally, we will add an ablation study that isolates the contribution of the hierarchical clustering component to the observed accuracy and token-cost improvements. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents an empirical system description of ArchRAG, a graph-based RAG pipeline using LLM hierarchical clustering into attributed communities, a hierarchical index, question augmentation, and online retrieval. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the abstract or described structure. The central claim of outperformance on accuracy and token cost is an experimental result, not a derivation that reduces to its own inputs by construction. This is the expected non-finding for a constructive engineering paper without mathematical self-reference.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 5 Pith papers
-
EHRAG: Bridging Semantic Gaps in Lightweight GraphRAG via Hybrid Hypergraph Construction and Retrieval
EHRAG constructs structural hyperedges from sentence co-occurrence and semantic hyperedges from entity embedding clusters, then applies hybrid diffusion plus topic-aware PPR to retrieve top-k documents, outperforming ...
-
MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation
MG²-RAG proposes a multi-granularity graph RAG framework that constructs hierarchical multimodal nodes via entity-driven visual grounding and performs structured retrieval, delivering SOTA results on four multimodal t...
-
Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation
Ψ-RAG improves cross-document multi-hop QA performance using an adaptive hierarchical abstract tree and agent-powered hybrid retrieval, outperforming RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1.
-
G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge
G-reasoner uses QuadGraph abstraction and a 34M-parameter graph foundation model integrated with LLMs to enable scalable reasoning over diverse graph-structured knowledge, outperforming baselines on six benchmarks.
-
LLM+Graph@VLDB'2025 Workshop Summary
The report summarizes key research directions, challenges, and solutions from the LLM+Graph workshop at VLDB 2025.
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Angelidis, S.; and Lapata, M. 2018. Summarizing opinions: Aspect extraction meets sentiment prediction and they are both weakly supervised. arXiv preprint arXiv:1808.08858
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
Asai, A.; Wu, Z.; Wang, Y.; Sil, A.; and Hajishirzi, H. 2023. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Brown, T. B. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[6]
Cali \'n ski, T.; and Harabasz, J. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1): 1--27
work page 1974
- [7]
-
[8]
Charikar, M. S. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, 380--388
work page 2002
-
[9]
Chen, N.; Li, Y.; Tang, J.; and Li, J. 2024 a . Graphwiz: An instruction-following language model for graph computational problems. In KDD
work page 2024
-
[10]
Chen, S.; He, Y.; Cui, W.; Fan, J.; Ge, S.; Zhang, H.; Zhang, D.; and Chaudhuri, S. 2024 b . Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations. Proceedings of the ACM on Management of Data, 2(3): 1--27
work page 2024
-
[11]
Chen, S.; Tang, N.; Fan, J.; Yan, X.; Chai, C.; Li, G.; and Du, X. 2023. Haipipe: Combining human-generated and machine-generated pipelines for data preparation. Proceedings of the ACM on Management of Data, 1(1): 1--26
work page 2023
-
[12]
Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
Edge, D.; Trinh, H.; Cheng, N.; Bradley, J.; Chao, A.; Mody, A.; Truitt, S.; and Larson, J. 2024. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
Fan, W.; Ding, Y.; Ning, L.; Wang, S.; Li, H.; Yin, D.; Chua, T.-S.; and Li, Q. 2024. A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 6491--6501
work page 2024
-
[15]
Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; and Wang, H. 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [16]
-
[17]
Grover, A.; and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 855--864
work page 2016
-
[18]
Guo, Z.; Xia, L.; Yu, Y.; Ao, T.; and Huang, C. 2024. LightRAG: Simple and Fast Retrieval-Augmented Generation. arXiv e-prints, arXiv--2410
work page 2024
-
[19]
J.; Shu, Y.; Gu, Y.; Yasunaga, M.; and Su, Y
Guti \'e rrez, B. J.; Shu, Y.; Gu, Y.; Yasunaga, M.; and Su, Y. 2024. HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. arXiv preprint arXiv:2405.14831
-
[20]
Han, R.; Zhang, Y.; Qi, P.; Xu, Y.; Wang, J.; Liu, L.; Wang, W. Y.; Min, B.; and Castelli, V. 2024. RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 4354--4374
work page 2024
-
[21]
G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,
He, X.; Tian, Y.; Sun, Y.; Chawla, N. V.; Laurent, T.; LeCun, Y.; Bresson, X.; and Hooi, B. 2024. G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. arXiv preprint arXiv:2402.07630
- [22]
- [23]
-
[24]
Huang, Y.; and Huang, J. 2024. A Survey on Retrieval-Augmented Text Generation for Large Language Models. arXiv preprint arXiv:2404.10981
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [25]
-
[26]
Jeong, S.; Baek, J.; Cho, S.; Hwang, S. J.; and Park, J. C. 2024. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. arXiv preprint arXiv:2403.14403
-
[27]
Jiang, J.; Zhou, K.; Dong, Z.; Ye, K.; Zhao, W. X.; and Wen, J.-R. 2023. Structgpt: A general framework for large language model to reason over structured data. arXiv preprint arXiv:2305.09645
-
[28]
M.; Melis, G.; and Grefenstette, E
Ko c isk \`y , T.; Schwarz, J.; Blunsom, P.; Dyer, C.; Hermann, K. M.; Melis, G.; and Grefenstette, E. 2018. The narrativeqa reading comprehension challenge. Transactions of the Association for Computational Linguistics, 6: 317--328
work page 2018
-
[29]
S.; Reid, M.; Matsuo, Y.; and Iwasawa, Y
Kojima, T.; Gu, S. S.; Reid, M.; Matsuo, Y.; and Iwasawa, Y. 2022. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35: 22199--22213
work page 2022
-
[30]
Y.; Yun, S.; Lee, J.; Chacko, A.; Hou, B.; Duong-Tran, D.; Ding, Y.; et al
Li, D.; Yang, S.; Tan, Z.; Baik, J. Y.; Yun, S.; Lee, J.; Chacko, A.; Hou, B.; Duong-Tran, D.; Ding, Y.; et al. 2024. DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature. arXiv preprint arXiv:2405.04819
-
[31]
Li, Y.; Wang, S.; Ding, H.; and Chen, H. 2023. Large language models in finance: A survey. In Proceedings of the fourth ACM international conference on AI in finance, 374--382
work page 2023
-
[32]
Li, Z.; Yuan, H.; Wang, H.; Cong, G.; and Bing, L. 2025. LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency. Proceedings of the VLDB Endowment, 1(18): 53--65
work page 2025
- [33]
-
[34]
F.; Lin, K.; Hewitt, J.; Paranjape, A.; Bevilacqua, M.; Petroni, F.; and Liang, P
Liu, N. F.; Lin, K.; Hewitt, J.; Paranjape, A.; Bevilacqua, M.; Petroni, F.; and Liang, P. 2024 b . Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12: 157--173
work page 2024
- [35]
- [36]
-
[37]
Malkov, Y. A.; and Yashunin, D. A. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, 42(4): 824--836
work page 2018
-
[38]
Mallen, A.; Asai, A.; Zhong, V.; Das, R.; Khashabi, D.; and Hajishirzi, H. 2022. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. arXiv preprint arXiv:2212.10511
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [39]
-
[40]
Naeem, Z. A.; Ahmad, M. S.; Eltabakh, M.; Ouzzani, M.; and Tang, N. 2024. RetClean: Retrieval-Based Data Cleaning Using LLMs and Data Lakes. Proceedings of the VLDB Endowment, 17(12): 4421--4424
work page 2024
-
[41]
Narayan, A.; Chami, I.; Orr, L.; and R \'e , C. 2022. Can Foundation Models Wrangle Your Data? Proceedings of the VLDB Endowment, 16(4): 738--746
work page 2022
-
[42]
Nie, Y.; Kong, Y.; Dong, X.; Mulvey, J. M.; Poor, H. V.; Wen, Q.; and Zohren, S. 2024. A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges. arXiv preprint arXiv:2406.11903
-
[43]
Nomic Embed: Training a Reproducible Long Context Text Embedder
Nussbaum, Z.; Morris, J. X.; Duderstadt, B.; and Mulyar, A. 2024. Nomic Embed: Training a Reproducible Long Context Text Embedder. arXiv:2402.01613
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [44]
-
[45]
Qian, Y.; He, Y.; Zhu, R.; Huang, J.; Ma, Z.; Wang, H.; Wang, Y.; Sun, X.; Lian, D.; Ding, B.; et al. 2024. UniDM: A Unified Framework for Data Manipulation with Large Language Models. Proceedings of Machine Learning and Systems, 6: 465--482
work page 2024
-
[46]
Robertson, S. E.; and Walker, S. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, organised by Dublin City University, 232--241. Springer
work page 1994
-
[47]
Ruan, Y.; Fuhry, D.; and Parthasarathy, S. 2013. Efficient community detection in large networks using content and links. In Proceedings of the 22nd international conference on World Wide Web, 1089--1098
work page 2013
-
[48]
Sarthi, P.; Abdullah, S.; Tuli, A.; Khanna, S.; Goldie, A.; and Manning, C. D. 2024. Raptor: Recursive abstractive processing for tree-organized retrieval. arXiv preprint arXiv:2401.18059
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[49]
Schick, T.; Dwivedi-Yu, J.; Dess \` , R.; Raileanu, R.; Lomeli, M.; Hambro, E.; Zettlemoyer, L.; Cancedda, N.; and Scialom, T. 2024. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36
work page 2024
-
[50]
Siriwardhana, S.; Weerasekera, R.; Wen, E.; Kaluarachchi, T.; Rana, R.; and Nanayakkara, S. 2023. Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11: 1--17
work page 2023
- [51]
- [52]
- [53]
-
[54]
Tang, Y.; and Yang, Y. 2024. Multihop-rag: Benchmarking retrieval-augmented generation for multi-hop queries. arXiv preprint arXiv:2401.15391
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
A.; Waltman, L.; and Van Eck, N
Traag, V. A.; Waltman, L.; and Van Eck, N. J. 2019. From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports, 9(1): 1--12
work page 2019
-
[56]
Von Luxburg, U. 2007. A tutorial on spectral clustering. Statistics and computing, 17: 395--416
work page 2007
- [57]
- [58]
- [59]
-
[60]
Large language models for education: A survey and outlook
Wang, S.; Xu, T.; Li, H.; Zhang, C.; Liang, J.; Tang, J.; Yu, P. S.; and Wen, Q. 2024 b . Large language models for education: A survey and outlook. arXiv preprint arXiv:2403.18105
-
[61]
A.; Siu, A.; Zhang, R.; and Derr, T
Wang, Y.; Lipka, N.; Rossi, R. A.; Siu, A.; Zhang, R.; and Derr, T. 2024 c . Knowledge graph prompting for multi-document question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 19206--19214
work page 2024
- [62]
-
[63]
Wu, S.; Xiong, Y.; Cui, Y.; Wu, H.; Chen, C.; Yuan, Y.; Huang, L.; Liu, X.; Kuo, T.-W.; Guan, N.; et al. 2024 b . Retrieval-augmented generation for natural language processing: A survey. arXiv preprint arXiv:2407.13193
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [64]
-
[65]
Xu, X.; Yuruk, N.; Feng, Z.; and Schweiger, T. A. 2007. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 824--833
work page 2007
-
[66]
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Yang, Z.; Qi, P.; Zhang, S.; Bengio, Y.; Cohen, W. W.; Salakhutdinov, R.; and Manning, C. D. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[67]
Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; and Cao, Y. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [68]
-
[69]
K.; Fabbri, A.; Bernadett-Shapiro, G.; Zhang, R.; Mitra, P.; Xiong, C.; and Wu, C.-S
Zhang, N.; Choubey, P. K.; Fabbri, A.; Bernadett-Shapiro, G.; Zhang, R.; Mitra, P.; Xiong, C.; and Wu, C.-S. 2024 a . SiReRAG: Indexing Similar and Related Information for Multihop Reasoning. arXiv preprint arXiv:2412.06206
- [70]
-
[71]
Zhao, P.; Zhang, H.; Yu, Q.; Wang, Z.; Geng, Y.; Fu, F.; Yang, L.; Zhang, W.; and Cui, B. 2024. Retrieval-augmented generation for ai-generated content: A survey. arXiv preprint arXiv:2402.19473
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[72]
Zheng, Y.; Gan, W.; Chen, Z.; Qi, Z.; Liang, Q.; and Yu, P. S. 2024. Large language models for medicine: a survey. International Journal of Machine Learning and Cybernetics, 1--26
work page 2024
-
[73]
Zhou, Y.; Cheng, H.; and Yu, J. X. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment, 2(1): 718--729
work page 2009
-
[74]
Zhu, Y.; Wang, X.; Chen, J.; Qiao, S.; Ou, Y.; Yao, Y.; Deng, S.; Chen, H.; and Zhang, N. 2024. Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities. World Wide Web, 27(5): 58
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.