RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA

Flora D. Salim; Hakim Hacid; Hao Xue; Imran Razzak; Ruiyi Yang

arxiv: 2510.20505 · v4 · submitted 2025-10-23 · 💻 cs.CL · cs.AI

RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA

Ruiyi Yang , Hao Xue , Imran Razzak , Hakim Hacid , Flora D. Salim This is my paper

Pith reviewed 2026-05-18 04:51 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords RELOOPhierarchical sequenceHSEQmulti-hop retrievalheterogeneous QAagentic RAGstructure-aware iteration

0 comments

The pith

RELOOP converts heterogeneous evidence into reversible hierarchical sequences to guide budget-aware multi-hop retrieval across formats.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RELOOP as a way to fix brittleness in retrieval-augmented generation when questions need multiple steps and draw from mixed sources such as plain text, tables, and knowledge graphs. It converts these sources into a single reversible hierarchical sequence marked by lightweight structural tags, then uses a head agent to direct an iteration agent that expands the sequence through structure-respecting moves like parent or child hops and table neighbors. The process stops once enough evidence is gathered, followed by canonicalization and optional contradiction checks before answer generation. A reader would care because the approach claims to raise accuracy on standard benchmarks while lowering token use and unnecessary retrieval steps compared with prior single-pass or agentic methods.

Core claim

RELOOP linearizes documents, tables, and knowledge graphs into a reversible hierarchical sequence with lightweight structural tags and uses a head agent plus iteration agent to perform structure-respecting actions in a guided, budget-aware loop that collects just enough evidence before canonicalized answer synthesis.

What carries the argument

Hierarchical Sequence (HSEQ), a linear representation of mixed evidence sources that carries reversible lightweight structural tags enabling actions such as parent/child hops, table row or column neighbors, and knowledge-graph relations.

If this is right

A single policy can operate on text, tables, and knowledge graphs without dataset-specific retraining or separate tools.
Budget-aware guided iteration reduces extra hops, tool calls, and tokens while holding or improving answer accuracy.
Evidence canonicalization produces more consistent and auditable final answers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tag-based unification could be tried on additional data types such as code repositories or image captions if analogous lightweight markers can be defined.
Pairing RELOOP with stronger base models might further widen the efficiency margin on long evidence chains.
Scalability tests on larger or noisier collections would show whether the agent loop remains efficient when evidence volume grows beyond current benchmarks.

Load-bearing premise

Lightweight structural tags added to the hierarchical sequence are enough to let the iteration agent perform useful structure-respecting actions across text, tables, and graphs without losing critical relations or requiring per-dataset changes.

What would settle it

Run the same iteration policy on HybridQA or MetaQA after stripping the structural tags from HSEQ and check whether exact-match and F1 scores drop sharply relative to the tagged version.

Figures

Figures reproduced from arXiv: 2510.20505 by Flora D. Salim, Hakim Hacid, Hao Xue, Imran Razzak, Ruiyi Yang.

**Figure 1.** Figure 1: HSEQ overview. (i) HSEQ-A linearizes heterogeneous sources into Sh with level tags, parent pointers, and standardized metadata; (ii) HSEQ-I iterates over a windowed stream of segments under budgets, guided by g, and queries Φ for sufficiency; (iii) κ compacts Mt into provenance-preserving evidence; (iv) HSEQ-H produces the final answer and optionally triggers a brief refinement if inconsistencies are detec… view at source ↗

**Figure 2.** Figure 2: HSEQ Sh construction: Different modalities of data are transformed into unified sequence by HSEQ-A Here ℓ(s) is a level tag matching the raw content, including sentence, paragraph, table, triplet, etc., while p(s) is a parent pointer recording the roots. c(s) is compact human-readable content, and µ(s) is metadata with fixed keys to record content attributes. The single, modality-aware adapter converts he… view at source ↗

**Figure 3.** Figure 3: HSEQ I is build by training questions from multiple datasets. After guidance sets are generated, LoRA is applied for finetuning Training tuples and supervision. Supervision is organized as tuples (q, type, Sh, A⋆ ). Beside above mentioned query q and HSEQ Sh, an optional question label type is added during training. To ensure Iteration module πθ can trace sufficient data while limiting usage, a target t… view at source ↗

read the original abstract

Retrieval-augmented generation (RAG) remains brittle on multi-step questions and heterogeneous evidence sources, trading accuracy against latency and token/tool budgets. This paper introduces RELOOP, a structure aware framework using Hierarchical Sequence (HSEQ) that (i) linearize documents, tables, and knowledge graphs into a reversible hierarchical sequence with lightweight structural tags, and (ii) perform structure-aware iteration to collect just-enough evidence before answer synthesis. A Head Agent provides guidance that leads retrieval, while an Iteration Agent selects and expands HSeq via structure-respecting actions (e.g., parent/child hops, table row/column neighbors, KG relations); Finally the head agent composes canonicalized evidence to genearte the final answer, with an optional refinement loop to resolve detected contradictions. Experiments on HotpotQA (text), HybridQA/TAT-QA (table+text), and MetaQA (KG) show consistent EM/F1 gains over strong single-pass, multi-hop, and agentic RAG baselines with high efficiency. Besides, RELOOP exhibits three key advantages: (1) a format-agnostic unification that enables a single policy to operate across text, tables, and KGs without per-dataset specialization; (2) \textbf{guided, budget-aware iteration} that reduces unnecessary hops, tool calls, and tokens while preserving accuracy; and (3) evidence canonicalization for reliable QA, improving answers consistency and auditability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RELOOP unifies heterogeneous retrieval via HSEQ linearization and dual agents but the abstract gives too little data to judge whether the gains are real or the structure tags sufficient.

read the letter

RELOOP's core idea is to flatten text, tables, and knowledge graphs into one reversible hierarchical sequence using lightweight structural tags, then let a Head Agent steer and an Iteration Agent pick structure-respecting actions such as parent/child hops or table neighbors. This produces a single policy that runs across HotpotQA, HybridQA/TAT-QA, and MetaQA without per-dataset rewrites, plus a final canonicalization step and optional refinement loop. The approach builds on existing RAG and agent work but packages the pieces into a format-agnostic loop that claims both higher accuracy and lower token/tool cost through guided, budget-aware iteration. That combination is the clearest new element on offer. The reported consistent EM/F1 lifts over single-pass, multi-hop, and agentic baselines are the main empirical hook, and the efficiency angle is worth checking if the full numbers back it up. The stress-test worry about lightweight tags discarding adjacency or relational detail in tables and graphs is reasonable to raise; if the tags are truly minimal, the Iteration Agent's action space may not recover the needed connections on HybridQA or MetaQA without hidden specialization. The abstract supplies no metrics, baselines, or ablations, so those questions stay open until the full evaluation is read. This is aimed at people already working on multi-hop RAG and agentic retrieval who want a practical unification trick. It is coherent enough on its own terms to deserve a serious referee, mainly to get proper numbers and checks on whether the linearization actually preserves what the claims require. I would send it to review with requests for quantitative details and tag ablations rather than desk-reject it.

Referee Report

2 major / 3 minor

Summary. The paper introduces RELOOP, a recursive retrieval framework for heterogeneous question answering. It linearizes text, tables, and knowledge graphs into a Hierarchical Sequence (HSEQ) using lightweight structural tags, enabling a Head Agent for guidance and an Iteration Agent for structure-respecting actions such as parent/child hops, table row/column neighbors, and KG relations. Evidence is canonicalized before final answer generation, with an optional refinement loop for contradictions. Experiments on HotpotQA (text), HybridQA/TAT-QA (table+text), and MetaQA (KG) report consistent EM/F1 gains over single-pass, multi-hop, and agentic RAG baselines together with efficiency improvements through guided, budget-aware iteration.

Significance. If the empirical gains hold and the single-policy unification across formats is robust, RELOOP could meaningfully simplify RAG pipelines for multi-source, multi-hop QA by reducing the need for format-specific components. The budget-aware iteration and evidence canonicalization address practical concerns of token cost and answer reliability. Credit is due for targeting heterogeneous evidence unification, a persistent challenge in the field.

major comments (2)

[§3.2] §3.2 (HSEQ linearization and tag set): The central claim of format-agnostic unification rests on the assertion that lightweight structural tags suffice for the Iteration Agent to execute effective structure-respecting actions (parent/child hops, table neighbors, KG relations) without per-dataset specialization or significant relational information loss. The manuscript should supply the precise tag vocabulary and an ablation (or qualitative trace) on HybridQA and MetaQA demonstrating that adjacency and multi-hop connectivity are recovered at rates sufficient to support the reported gains; without this, the unification claim remains under-supported.
[§5] §5 (Experimental results on HybridQA and MetaQA): The reported consistent EM/F1 improvements are load-bearing for the overall contribution. The paper should clarify whether the action space of the Iteration Agent was held identical across all three datasets or whether any dataset-specific action masking occurred, and should report variance or statistical tests for the gains over the strongest agentic baselines.

minor comments (3)

[Abstract] Abstract: 'genearte' is a typo and should read 'generate'.
[Abstract] Abstract: The transition 'Besides, RELOOP exhibits three key advantages' is informal; consider 'In addition, RELOOP offers three key advantages'.
[Throughout] Notation: Ensure that 'HSEQ' and the roles of 'Head Agent' and 'Iteration Agent' are defined at first use with consistent capitalization throughout.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing RELOOP's potential to unify heterogeneous retrieval. We address each major comment below and have revised the manuscript accordingly to strengthen the claims.

read point-by-point responses

Referee: [§3.2] §3.2 (HSEQ linearization and tag set): The central claim of format-agnostic unification rests on the assertion that lightweight structural tags suffice for the Iteration Agent to execute effective structure-respecting actions (parent/child hops, table neighbors, KG relations) without per-dataset specialization or significant relational information loss. The manuscript should supply the precise tag vocabulary and an ablation (or qualitative trace) on HybridQA and MetaQA demonstrating that adjacency and multi-hop connectivity are recovered at rates sufficient to support the reported gains; without this, the unification claim remains under-supported.

Authors: We agree that explicit documentation of the tag set and supporting evidence for structural recovery would strengthen the unification argument. In the revised manuscript we have expanded §3.2 to list the complete lightweight tag vocabulary for text, tables, and KGs. We have also added a new qualitative trace together with quantitative recovery metrics on HybridQA and MetaQA that show adjacency and multi-hop connectivity are preserved at rates sufficient to explain the observed gains, confirming that no per-dataset specialization is required. revision: yes
Referee: [§5] §5 (Experimental results on HybridQA and MetaQA): The reported consistent EM/F1 improvements are load-bearing for the overall contribution. The paper should clarify whether the action space of the Iteration Agent was held identical across all three datasets or whether any dataset-specific action masking occurred, and should report variance or statistical tests for the gains over the strongest agentic baselines.

Authors: The Iteration Agent's action space is identical across all three datasets; actions are defined uniformly over the HSEQ structure with no dataset-specific masking. In the revised §5 we now report standard deviations across runs and include paired t-test results comparing RELOOP against the strongest agentic baselines on HybridQA and MetaQA, establishing that the EM/F1 gains are statistically significant. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents RELOOP as an engineering framework that linearizes heterogeneous sources into HSEQ with lightweight tags and applies structure-aware iteration via Head and Iteration Agents. No equations, fitted parameters, or first-principles derivations appear that reduce to their own inputs by construction. Claims of format-agnostic unification, budget-aware iteration, and evidence canonicalization are positioned as outcomes of the described architecture and are evaluated empirically on HotpotQA, HybridQA/TAT-QA, and MetaQA rather than being self-referential or dependent on self-citations that bear the central load. The derivation chain is therefore self-contained as an independent system design.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim rests on the unverified assumption that a single lightweight tagging scheme can reversibly capture structure across text, tables, and KGs sufficiently for agent actions, plus the effectiveness of the two-agent loop without additional training or specialization details.

axioms (1)

domain assumption Documents, tables, and knowledge graphs can be linearized into a reversible hierarchical sequence using lightweight structural tags that preserve all necessary relations for retrieval actions.
This is the core unification step invoked in the abstract to enable format-agnostic operation.

invented entities (3)

HSEQ no independent evidence
purpose: Linearize heterogeneous data sources into a single reversible hierarchical sequence for unified retrieval.
New representation introduced by the framework.
Head Agent no independent evidence
purpose: Provide high-level guidance to lead the retrieval process.
Core component of the multi-agent architecture.
Iteration Agent no independent evidence
purpose: Select and expand evidence via structure-respecting actions such as hops and neighbor traversals.
Handles the iterative collection loop.

pith-pipeline@v0.9.0 · 5802 in / 1602 out tokens · 58210 ms · 2026-05-18T04:51:32.463067+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

linearize documents, tables, and knowledge graphs into a reversible hierarchical sequence with lightweight structural tags... structure-respecting actions (e.g., parent/child hops, table row/column neighbors, KG relations)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

format-agnostic unification that enables a single policy to operate across text, tables, and KGs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

STAR combines expert nominal routes with trace-learned recovery transitions in a failure-typed routing matrix, improving multi-agent spatiotemporal reasoning over baselines especially on error-deviating queries.
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

STAR presents a failure-aware routing framework using a state-conditioned transition policy and an agent routing matrix combining expert routes with learned recoveries from execution traces to improve multi-agent spat...
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning
cs.AI 2026-05 unverdicted novelty 5.0

STAR is a failure-aware Markovian router that learns recovery transitions from both successful and unsuccessful execution traces to improve multi-agent performance on spatiotemporal benchmarks.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · cited by 1 Pith paper · 10 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Ale- man, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Hybrid graphs for table-and-text based question answering using llms.arXiv preprint arXiv:2501.17767,

Ankush Agarwal, Chaitanya Devaguptapu, et al. Hybrid graphs for table-and-text based question answering using llms.arXiv preprint arXiv:2501.17767,

work page arXiv
[3]

Jacob Beck, Anna Steinberg, Andreas Dimmelmeier, Laia Domenech Burin, Emily Kormanyos, Maurice Fehr, and Malte Schierholz

Jayetri Bardhan, Bushi Xiao, and Daisy Zhe Wang. Ttqa-rs-a break-down prompting approach for multi-hop table-text question answering with reasoning and summarization.arXiv preprint arXiv:2406.14732,

work page arXiv
[4]

Rq-rag: Learning to refine queries for retrieval augmented generation.arXiv preprint arXiv:2404.00610,

Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, and Jie Fu. Rq-rag: Learning to refine queries for retrieval augmented generation.arXiv preprint arXiv:2404.00610,

work page arXiv
[5]

ArXiv, abs/2004.07347

Wenhu Chen, Hanwen Zha, Zhiyu Chen, Wenhan Xiong, Hong Wang, and William Wang. Hy- bridqa: A dataset of multi-hop question answering over tabular and textual data.arXiv preprint arXiv:2004.07347,

work page arXiv 2004
[6]

Improving retrieval-augmented generation through multi-agent reinforcement learning

10 Paper under review Yiqun Chen, Lingyong Yan, Weiwei Sun, Xinyu Ma, Yi Zhang, Shuaiqiang Wang, Dawei Yin, Yiming Yang, and Jiaxin Mao. Improving retrieval-augmented generation through multi-agent reinforcement learning.arXiv preprint arXiv:2501.15228,

work page arXiv
[7]

Rag-based question answering over heterogeneous data and text.arXiv preprint arXiv:2412.07420,

Philipp Christmann and Gerhard Weikum. Rag-based question answering over heterogeneous data and text.arXiv preprint arXiv:2412.07420,

work page arXiv
[8]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Re2g: Retrieve, rerank, generate.arXiv preprint arXiv:2207.06300,

Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Ankita Rajaram Naik, Pengshan Cai, and Alfio Gliozzo. Re2g: Retrieve, rerank, generate.arXiv preprint arXiv:2207.06300,

work page arXiv
[11]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680, 2024a. Yue Guo and Yi Yang. Econnli: evaluating large language models on economics reasoning.arXiv preprint arXiv:2407.01212,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast retrieval- augmented generation.arXiv preprint arXiv:2410.05779, 2024b. Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collabora...

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Grag: Graph retrieval- augmented generation.arXiv preprint arXiv:2405.16506,

Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. Grag: Graph retrieval- augmented generation.arXiv preprint arXiv:2405.16506,

work page arXiv
[14]

Mapcoder: Multi-agent code generation for com- petitive problem solving.arXiv preprint arXiv:2405.11403,

Md Ashraful Islam, Mohammed Eunus Ali, and Md Rizwan Parvez. Mapcoder: Multi-agent code generation for competitive problem solving.arXiv preprint arXiv:2405.11403,

work page arXiv
[15]

J.; and Park, J

11 Paper under review Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity.arXiv preprint arXiv:2403.14403,

work page arXiv
[16]

Active retrieval augmented generation

Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 7969–7992,

work page 2023
[17]

Flashrag: A modular toolkit for efficient retrieval-augmented generation research

Jiajie Jin, Yutao Zhu, Zhicheng Dou, Guanting Dong, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, and Ji-Rong Wen. Flashrag: A modular toolkit for efficient retrieval-augmented generation research. InCompanion Proceedings of the ACM on Web Conference 2025, pp. 737– 740,

work page 2025
[18]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation.arXiv preprint arXiv:2101.00190,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Hm-rag: Hierar- chical multi-agent multimodal retrieval augmented generation.arXiv preprint arXiv:2504.12330,

Pei Liu, Xin Liu, Ruoyu Yao, Junming Liu, Siyuan Meng, Ding Wang, and Jun Ma. Hm-rag: Hierar- chical multi-agent multimodal retrieval augmented generation.arXiv preprint arXiv:2504.12330,

work page arXiv
[20]

Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models.arXiv preprint arXiv:2310.08975,

Haoran Luo, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin, Yifan Zhu, et al. Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models.arXiv preprint arXiv:2310.08975,

work page arXiv
[21]

Graph- constrained reasoning: Faithful reasoning on knowledge graphs with large language models

Linhao Luo, Zicheng Zhao, Gholamreza Haffari, Yuan-Fang Li, Chen Gong, and Shirui Pan. Graph- constrained reasoning: Faithful reasoning on knowledge graphs with large language models. arXiv preprint arXiv:2410.13080,

work page arXiv
[22]

Think-on-graph 2.0: Deep and faithful large language model reasoning with knowledge- guided retrieval augmented generation.arXiv preprint arXiv:2407.10805,

Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, and Jian Guo. Think-on-graph 2.0: Deep and faithful large language model reasoning with knowledge- guided retrieval augmented generation.arXiv preprint arXiv:2407.10805,

work page arXiv
[23]

and Karypis, G

Costas Mavromatis and George Karypis. Gnn-rag: Graph neural retrieval for large language model reasoning.arXiv preprint arXiv:2405.20139,

work page arXiv
[24]

Graph retrieval-augmented generation: A survey.arXiv preprint arXiv:2408.08921,

Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey.arXiv preprint arXiv:2408.08921,

work page arXiv
[25]

Metaqa: Combining expert agents for multi- skill question answering.arXiv preprint arXiv:2112.01922,

Haritz Puerto, G¨ozde G¨ul S ¸ahin, and Iryna Gurevych. Metaqa: Combining expert agents for multi- skill question answering.arXiv preprint arXiv:2112.01922,

work page arXiv
[26]

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

12 Paper under review Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136,

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322,

work page internal anchor Pith review Pith/arXiv arXiv
[28]

Beyond single pass, looping through time: Kg-irag with iterative knowledge retrieval.arXiv preprint arXiv:2503.14234,

Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, and Flora D Salim. Beyond single pass, looping through time: Kg-irag with iterative knowledge retrieval.arXiv preprint arXiv:2503.14234,

work page arXiv
[29]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering.arXiv preprint arXiv:1809.09600,

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Evaluation of retrieval- augmented generation: A survey.arXiv preprint arXiv:2405.07437,

Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, and Zhaofeng Liu. Evaluation of retrieval- augmented generation: A survey.arXiv preprint arXiv:2405.07437,

work page arXiv
[31]

Retrieval-augmented generation across heterogeneous knowledge

Wenhao Yu. Retrieval-augmented generation across heterogeneous knowledge. InProceedings of the 2022 conference of the North American chapter of the association for computational linguis- tics: human language technologies: student research workshop, pp. 52–58,

work page 2022
[32]

Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning.arXiv preprint arXiv:2506.10380,

Xiaohan Yu, Pu Jian, and Chong Chen. Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning.arXiv preprint arXiv:2506.10380,

work page arXiv
[33]

Retrieval-Augmented Generation for AI-Generated Content: A Survey

Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey.arXiv preprint arXiv:2402.19473,

work page internal anchor Pith review Pith/arXiv arXiv
[34]

TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance.arXiv preprint arXiv:2105.07624, 2021

Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, and Tat-Seng Chua. Tat-qa: A question answering benchmark on a hybrid of tabular and textual content in finance.arXiv preprint arXiv:2105.07624, 2021a. Fengbin Zhu, Wenqiang Lei, Chao Wang, Jianming Zheng, Soujanya Poria, and Tat-Seng Chua. Retrieving and reading: A...

work page arXiv
[35]

Falcon-h1: A fam- ily of hybrid-head language models redefining efficiency and performance.arXiv preprint arXiv:2507.22448, 2025

Jingwei Zuo, Maksim Velikanov, Ilyas Chahed, Younes Belkada, Dhia Eddine Rhayem, Guillaume Kunsch, Hakim Hacid, Hamza Yous, Brahim Farhat, Ibrahim Khadraoui, et al. Falcon-h1: A family of hybrid-head language models redefining efficiency and performance.arXiv preprint arXiv:2507.22448,

work page arXiv
[36]

type": "select

relies on invariants (T1)–(T3), which are satisfied by construction in the HSEQ adapters (offsets and row indices/ordering are recorded; triplets are stored verbatim). Ad- missibility is a regularity condition stating that an orderρexists (often paragraph/row-first) placing supporting segments early; in practice this is further improved by guidance. Assum...

work page 2048
[37]

Which style is the building located on the East Side of Midtown Manhattan that Robert Von Ancken appraised?

Takeaway.Guidance steers the iterator to a high-yield paragraph in the first step, which already contains the sufficient evidence (film identity and source novel). Subsequent steps provide cor- roboration from structured rows. The provenance inκ(M τ)makes the final answer auditable: the paragraphp 6df9c849explicitly tiesNight Watch(2004, Bekmambetov) to t...

work page 2004

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Ale- man, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Hybrid graphs for table-and-text based question answering using llms.arXiv preprint arXiv:2501.17767,

Ankush Agarwal, Chaitanya Devaguptapu, et al. Hybrid graphs for table-and-text based question answering using llms.arXiv preprint arXiv:2501.17767,

work page arXiv

[3] [3]

Jacob Beck, Anna Steinberg, Andreas Dimmelmeier, Laia Domenech Burin, Emily Kormanyos, Maurice Fehr, and Malte Schierholz

Jayetri Bardhan, Bushi Xiao, and Daisy Zhe Wang. Ttqa-rs-a break-down prompting approach for multi-hop table-text question answering with reasoning and summarization.arXiv preprint arXiv:2406.14732,

work page arXiv

[4] [4]

Rq-rag: Learning to refine queries for retrieval augmented generation.arXiv preprint arXiv:2404.00610,

Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, and Jie Fu. Rq-rag: Learning to refine queries for retrieval augmented generation.arXiv preprint arXiv:2404.00610,

work page arXiv

[5] [5]

ArXiv, abs/2004.07347

Wenhu Chen, Hanwen Zha, Zhiyu Chen, Wenhan Xiong, Hong Wang, and William Wang. Hy- bridqa: A dataset of multi-hop question answering over tabular and textual data.arXiv preprint arXiv:2004.07347,

work page arXiv 2004

[6] [6]

Improving retrieval-augmented generation through multi-agent reinforcement learning

10 Paper under review Yiqun Chen, Lingyong Yan, Weiwei Sun, Xinyu Ma, Yi Zhang, Shuaiqiang Wang, Dawei Yin, Yiming Yang, and Jiaxin Mao. Improving retrieval-augmented generation through multi-agent reinforcement learning.arXiv preprint arXiv:2501.15228,

work page arXiv

[7] [7]

Rag-based question answering over heterogeneous data and text.arXiv preprint arXiv:2412.07420,

Philipp Christmann and Gerhard Weikum. Rag-based question answering over heterogeneous data and text.arXiv preprint arXiv:2412.07420,

work page arXiv

[8] [8]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Re2g: Retrieve, rerank, generate.arXiv preprint arXiv:2207.06300,

Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Ankita Rajaram Naik, Pengshan Cai, and Alfio Gliozzo. Re2g: Retrieve, rerank, generate.arXiv preprint arXiv:2207.06300,

work page arXiv

[11] [11]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680, 2024a. Yue Guo and Yi Yang. Econnli: evaluating large language models on economics reasoning.arXiv preprint arXiv:2407.01212,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast retrieval- augmented generation.arXiv preprint arXiv:2410.05779, 2024b. Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collabora...

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Grag: Graph retrieval- augmented generation.arXiv preprint arXiv:2405.16506,

Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. Grag: Graph retrieval- augmented generation.arXiv preprint arXiv:2405.16506,

work page arXiv

[14] [14]

Mapcoder: Multi-agent code generation for com- petitive problem solving.arXiv preprint arXiv:2405.11403,

Md Ashraful Islam, Mohammed Eunus Ali, and Md Rizwan Parvez. Mapcoder: Multi-agent code generation for competitive problem solving.arXiv preprint arXiv:2405.11403,

work page arXiv

[15] [15]

J.; and Park, J

11 Paper under review Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity.arXiv preprint arXiv:2403.14403,

work page arXiv

[16] [16]

Active retrieval augmented generation

Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 7969–7992,

work page 2023

[17] [17]

Flashrag: A modular toolkit for efficient retrieval-augmented generation research

Jiajie Jin, Yutao Zhu, Zhicheng Dou, Guanting Dong, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, and Ji-Rong Wen. Flashrag: A modular toolkit for efficient retrieval-augmented generation research. InCompanion Proceedings of the ACM on Web Conference 2025, pp. 737– 740,

work page 2025

[18] [18]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation.arXiv preprint arXiv:2101.00190,

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

Hm-rag: Hierar- chical multi-agent multimodal retrieval augmented generation.arXiv preprint arXiv:2504.12330,

Pei Liu, Xin Liu, Ruoyu Yao, Junming Liu, Siyuan Meng, Ding Wang, and Jun Ma. Hm-rag: Hierar- chical multi-agent multimodal retrieval augmented generation.arXiv preprint arXiv:2504.12330,

work page arXiv

[20] [20]

Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models.arXiv preprint arXiv:2310.08975,

Haoran Luo, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin, Yifan Zhu, et al. Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models.arXiv preprint arXiv:2310.08975,

work page arXiv

[21] [21]

Graph- constrained reasoning: Faithful reasoning on knowledge graphs with large language models

Linhao Luo, Zicheng Zhao, Gholamreza Haffari, Yuan-Fang Li, Chen Gong, and Shirui Pan. Graph- constrained reasoning: Faithful reasoning on knowledge graphs with large language models. arXiv preprint arXiv:2410.13080,

work page arXiv

[22] [22]

Think-on-graph 2.0: Deep and faithful large language model reasoning with knowledge- guided retrieval augmented generation.arXiv preprint arXiv:2407.10805,

Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, and Jian Guo. Think-on-graph 2.0: Deep and faithful large language model reasoning with knowledge- guided retrieval augmented generation.arXiv preprint arXiv:2407.10805,

work page arXiv

[23] [23]

and Karypis, G

Costas Mavromatis and George Karypis. Gnn-rag: Graph neural retrieval for large language model reasoning.arXiv preprint arXiv:2405.20139,

work page arXiv

[24] [24]

Graph retrieval-augmented generation: A survey.arXiv preprint arXiv:2408.08921,

Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey.arXiv preprint arXiv:2408.08921,

work page arXiv

[25] [25]

Metaqa: Combining expert agents for multi- skill question answering.arXiv preprint arXiv:2112.01922,

Haritz Puerto, G¨ozde G¨ul S ¸ahin, and Iryna Gurevych. Metaqa: Combining expert agents for multi- skill question answering.arXiv preprint arXiv:2112.01922,

work page arXiv

[26] [26]

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

12 Paper under review Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136,

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322,

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

Beyond single pass, looping through time: Kg-irag with iterative knowledge retrieval.arXiv preprint arXiv:2503.14234,

Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, and Flora D Salim. Beyond single pass, looping through time: Kg-irag with iterative knowledge retrieval.arXiv preprint arXiv:2503.14234,

work page arXiv

[29] [29]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering.arXiv preprint arXiv:1809.09600,

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Evaluation of retrieval- augmented generation: A survey.arXiv preprint arXiv:2405.07437,

Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, and Zhaofeng Liu. Evaluation of retrieval- augmented generation: A survey.arXiv preprint arXiv:2405.07437,

work page arXiv

[31] [31]

Retrieval-augmented generation across heterogeneous knowledge

Wenhao Yu. Retrieval-augmented generation across heterogeneous knowledge. InProceedings of the 2022 conference of the North American chapter of the association for computational linguis- tics: human language technologies: student research workshop, pp. 52–58,

work page 2022

[32] [32]

Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning.arXiv preprint arXiv:2506.10380,

Xiaohan Yu, Pu Jian, and Chong Chen. Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning.arXiv preprint arXiv:2506.10380,

work page arXiv

[33] [33]

Retrieval-Augmented Generation for AI-Generated Content: A Survey

Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey.arXiv preprint arXiv:2402.19473,

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance.arXiv preprint arXiv:2105.07624, 2021

Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, and Tat-Seng Chua. Tat-qa: A question answering benchmark on a hybrid of tabular and textual content in finance.arXiv preprint arXiv:2105.07624, 2021a. Fengbin Zhu, Wenqiang Lei, Chao Wang, Jianming Zheng, Soujanya Poria, and Tat-Seng Chua. Retrieving and reading: A...

work page arXiv

[35] [35]

Falcon-h1: A fam- ily of hybrid-head language models redefining efficiency and performance.arXiv preprint arXiv:2507.22448, 2025

Jingwei Zuo, Maksim Velikanov, Ilyas Chahed, Younes Belkada, Dhia Eddine Rhayem, Guillaume Kunsch, Hakim Hacid, Hamza Yous, Brahim Farhat, Ibrahim Khadraoui, et al. Falcon-h1: A family of hybrid-head language models redefining efficiency and performance.arXiv preprint arXiv:2507.22448,

work page arXiv

[36] [36]

type": "select

relies on invariants (T1)–(T3), which are satisfied by construction in the HSEQ adapters (offsets and row indices/ordering are recorded; triplets are stored verbatim). Ad- missibility is a regularity condition stating that an orderρexists (often paragraph/row-first) placing supporting segments early; in practice this is further improved by guidance. Assum...

work page 2048

[37] [37]

Which style is the building located on the East Side of Midtown Manhattan that Robert Von Ancken appraised?

Takeaway.Guidance steers the iterator to a high-yield paragraph in the first step, which already contains the sufficient evidence (film identity and source novel). Subsequent steps provide cor- roboration from structured rows. The provenance inκ(M τ)makes the final answer auditable: the paragraphp 6df9c849explicitly tiesNight Watch(2004, Bekmambetov) to t...

work page 2004