Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning

Ho-fung Leung; Irwin King; Jiaming Zhou; Jianye Hao; Jian-Yun Nie; Lei Ding; Liheng Ma; Muzhi Li; Yihong Wu; Yingxue Zhang

arxiv: 2505.17086 · v4 · submitted 2025-05-20 · 💻 cs.CL

Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning

Yihong Wu , Liheng Ma , Muzhi Li , Jiaming Zhou , Lei Ding , Jianye Hao , Ho-fung Leung , Irwin King

show 2 more authors

Yingxue Zhang Jian-Yun Nie

This is my paper

Pith reviewed 2026-05-22 13:30 UTC · model grok-4.3

classification 💻 cs.CL

keywords multi-agent RAGreinforcement learninglong contextquestion answeringretrieval augmented generationpolicy gradient optimizationmulti-hop reasoning

0 comments

The pith

Multi-agent decomposition paired with minimalist reinforcement learning overcomes long-context limits in RAG systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Mujica-MyGO as a framework that lets large language models manage complex multi-turn reasoning in retrieval-augmented generation without the context lengths growing out of control. It splits multi-turn interactions into smaller cooperative sub-tasks handled by multiple agents, following a divide-and-conquer strategy. A lightweight reinforcement learning procedure called MyGO then trains the models directly, removing the need to stuff few-shot examples into every prompt. The authors supply convergence proofs for the RL step and report stronger results than prior systems on question-answering tests that use both plain text and knowledge graphs. Readers would care because the method keeps effective context short while still supporting deep reasoning chains.

Core claim

Mujica-MyGO combines Mujica, a multi-agent RAG workflow that decomposes multi-turn interactions into cooperative sub-interactions to reduce context length, with MyGO, a minimalist policy gradient optimization algorithm that enables effective post-training of LLMs in RAG pipelines without in-context learning and supplies theoretical guarantees of convergence to the optimal policy, yielding superior empirical performance across text-corpus and knowledge-graph question-answering benchmarks.

What carries the argument

Mujica multi-agent decomposition workflow together with MyGO minimalist policy gradient optimization for post-training.

If this is right

Multi-turn RAG reasoning no longer forces exponential growth in prompt length.
LLMs can be post-trained for RAG tasks using lightweight RL rather than prompt-based few-shot demonstrations.
The same workflow applies to both text-based corpora and structured knowledge graphs.
Theoretical convergence of the RL component supports stable optimization inside complex agent pipelines.
Overall accuracy on diverse question-answering tasks improves without larger context windows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar decomposition patterns could shorten context demands in other multi-step LLM applications such as planning or summarization.
The minimalist RL approach may lower the cost of adapting models to specialized retrieval pipelines.
Pairing the method with shorter-context base models could produce still more efficient end-to-end systems.

Load-bearing premise

Decomposing multi-turn interactions into cooperative sub-interactions will sufficiently reduce the long-context limitations that LLMs face in RAG pipelines.

What would settle it

A controlled run of the same benchmarks in which the multi-agent decomposition is used but performance remains no better than strong single-agent baselines or long-context errors persist.

Figures

Figures reproduced from arXiv: 2505.17086 by Ho-fung Leung, Irwin King, Jiaming Zhou, Jianye Hao, Jian-Yun Nie, Lei Ding, Liheng Ma, Muzhi Li, Yihong Wu, Yingxue Zhang.

**Figure 2.** Figure 2: , there might be two conditionally independent subquestions, S2,1 and S2,2, which are dependent in the later subquestion S4,1. The dependency relations form a directed acyclic graph (DAG). Directly applying Eq. 1 to such a reasoning process can be both inefficient and suboptimal. Therefore, to effectively handle DAG dependency graphs, we allow our Mujica planner to ask subquestions in multiple iterations … view at source ↗

**Figure 3.** Figure 3: The Proposed Minimalist Policy Gradient Optimization Framework. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: F1 score on 2Wiki-KG training 1K 3K 5K 7K 9K 11K 13K 15K 17K 19K Sample Iteration 0.63 0.64 0.65 0.66 0.67 0.68 Avg F1 Score (pass@1) Offline [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Large Language Models (LLMs) equipped with modern Retrieval-Augmented Generation (RAG) systems often employ multi-turn interaction pipelines to interface with search engines for complex reasoning tasks. However, such multi-turn interactions inevitably produce long intermediate contexts, as context length grows exponentially with exploration depth. This leads to a well-known limitation of LLMs: their difficulty in effectively leveraging information from long contexts. This problem is further amplified in RAG systems that depend on in-context learning, where few-shot demonstrations must also be included in the prompt, compounding the context-length bottleneck. To address these challenges, we propose Mujica-MyGo, a unified framework for efficient multi-turn reasoning in RAG. Inspired by the divide-and-conquer principle, we introduce Mujica (Multi-hop Joint Intelligence for Complex Question Answering), a multi-agent RAG workflow that decomposes multi-turn interactions into cooperative sub-interactions, thereby mitigating long-context issues. To eliminate the dependency on in-context learning, we further develop MyGO (Minimalist Policy Gradient Optimization), a lightweight and efficient reinforcement learning algorithm that enables effective post-training of LLMs within complex RAG pipelines. We provide theoretical guarantees for MyGO's convergence to the optimal policy. Empirical evaluations across diverse question-answering benchmarks, covering both text corpora and knowledge graphs, show that Mujica-MyGO achieves superior performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mujica-MyGO pairs multi-agent decomposition with minimalist RL for RAG but does not directly show reduced context lengths.

read the letter

The main thing to know about this paper is that it introduces Mujica-MyGO, which uses a multi-agent setup called Mujica to decompose multi-turn RAG interactions and a minimalist RL method called MyGO to train the model without depending on in-context learning. It reports superior results on various QA benchmarks and provides convergence guarantees for the RL algorithm. The paper does a good job of describing the real-world issue with context length growing too fast in these systems and how applying divide-and-conquer to split the tasks among agents could address it in theory. Introducing a lightweight RL approach to replace few-shot prompting is a sensible direction for making RAG more efficient. Where it falls short is in demonstrating that the multi-agent decomposition actually leads to shorter effective contexts. There are no reported measurements of token usage or context sizes before and after applying Mujica, nor ablations that isolate the effect of the decomposition from the RL training. Without that, it's difficult to rule out that the benchmark improvements stem primarily from the post-training rather than the context mitigation. The citation pattern seems standard for the area, and the empirical evaluations cover both text corpora and knowledge graphs, which adds some breadth. This paper is aimed at practitioners and researchers working on multi-agent systems and RAG enhancements. It shows clear thinking on a practical bottleneck, so it deserves a serious referee even if some claims need more supporting data. I recommend putting it through peer review with feedback focused on adding context length analysis and stronger ablations.

Referee Report

2 major / 2 minor

Summary. The paper proposes Mujica-MyGO, a unified multi-agent RAG framework for complex question answering. Mujica applies a divide-and-conquer decomposition to break multi-turn interactions into cooperative sub-interactions, aiming to mitigate exponential context growth in LLMs. MyGO introduces a lightweight policy-gradient RL algorithm for post-training that removes reliance on in-context learning, with claimed theoretical convergence guarantees to the optimal policy. Experiments across text-corpus and knowledge-graph QA benchmarks report superior performance over baselines.

Significance. If the central claims hold, the work offers a practical route to scalable multi-turn RAG by combining agent decomposition with minimalist RL, potentially reducing both context-length bottlenecks and few-shot prompting overhead while providing convergence assurances. The emphasis on lightweight, theoretically grounded RL distinguishes it from heavier fine-tuning approaches and could influence future multi-agent retrieval systems.

major comments (2)

[§4] §4 (Experiments) and associated figures/tables: No quantitative evidence—such as average token counts, context-length histograms, or ablation on input length—is provided comparing the Mujica workflow against standard multi-turn RAG baselines. Without these measurements it remains unclear whether the divide-and-conquer decomposition actually shortens effective contexts or whether reported gains arise from the RL component or other factors.
[§3.2–3.3] §3.2–3.3 (MyGO algorithm and theory): The convergence guarantee is stated to hold for the minimalist policy gradient, yet the manuscript supplies neither the full proof nor the precise assumptions on the reward function and policy parameterization needed to verify that the result is independent of fitted quantities or in-context demonstrations.

minor comments (2)

[Abstract] Notation for the two components is inconsistent between “Mujica-MyGo” and “Mujica-MyGO” in the abstract and section headings; standardize capitalization.
[§3.1] The description of cooperative sub-interactions in Mujica would benefit from an explicit diagram or pseudocode showing message passing between sub-agents to clarify how history accumulation is avoided.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's thoughtful review and constructive feedback on our manuscript. We have carefully considered each major comment and provide detailed responses below. Where appropriate, we will revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: [§4] §4 (Experiments) and associated figures/tables: No quantitative evidence—such as average token counts, context-length histograms, or ablation on input length—is provided comparing the Mujica workflow against standard multi-turn RAG baselines. Without these measurements it remains unclear whether the divide-and-conquer decomposition actually shortens effective contexts or whether reported gains arise from the RL component or other factors.

Authors: We acknowledge that the original manuscript lacks direct quantitative measurements of context lengths for the Mujica workflow compared to baselines. To address this, in the revised version we will include average token usage statistics, context length histograms, and an ablation study on varying input lengths. These additions will demonstrate the effectiveness of the divide-and-conquer decomposition in reducing context growth and help distinguish its impact from that of the MyGO reinforcement learning component. revision: yes
Referee: [§3.2–3.3] §3.2–3.3 (MyGO algorithm and theory): The convergence guarantee is stated to hold for the minimalist policy gradient, yet the manuscript supplies neither the full proof nor the precise assumptions on the reward function and policy parameterization needed to verify that the result is independent of fitted quantities or in-context demonstrations.

Authors: We agree with the referee that the full proof and precise assumptions were not supplied in the manuscript. In the revised version, we will include the complete proof of convergence for the minimalist policy gradient along with the detailed assumptions regarding the reward function and policy parameterization. This will enable verification that the result holds independently of fitted quantities or in-context demonstrations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain remains self-contained

full rationale

The paper presents Mujica as a multi-agent decomposition workflow inspired by divide-and-conquer to address long-context growth in RAG, and MyGO as a new minimalist RL method with independently claimed theoretical convergence guarantees. No equations, fitted parameters renamed as predictions, or load-bearing self-citations are exhibited that reduce the central performance claims or mitigation assertions to tautological inputs by construction. The divide-and-conquer motivation and RL convergence statement stand as external premises rather than self-referential reductions. The derivation is therefore not forced by definition or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim depends on the effectiveness of the proposed decomposition strategy and the convergence properties of the new RL algorithm; no numerical free parameters are mentioned.

axioms (1)

domain assumption Decomposing multi-turn RAG interactions into cooperative sub-interactions via divide-and-conquer mitigates long-context limitations.
This premise is invoked to justify the Mujica multi-agent workflow.

invented entities (2)

Mujica no independent evidence
purpose: Multi-agent RAG workflow that decomposes interactions
New framework introduced to address context length growth.
MyGO no independent evidence
purpose: Minimalist policy gradient optimization for post-training LLMs in RAG
New RL algorithm claimed to provide convergence guarantees without in-context learning.

pith-pipeline@v0.9.0 · 5802 in / 1396 out tokens · 61127 ms · 2026-05-22T13:30:14.867333+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Inspired by the divide-and-conquer principle, we introduce Mujica... that decomposes multi-turn interactions into cooperative sub-interactions, thereby mitigating long-context issues... MyGO... samples trajectories from an asymptotically approximate optimal policy... MLE for policy updates.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We provide theoretical guarantees for MyGO's convergence to the optimal policy.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

124 extracted references · 124 canonical work pages · 16 internal anchors

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Prerna Agarwal, Nishant Kumar, and Srikanta Bedathur. 2024. SymKGQA: Few- Shot Knowledge Graph Question Answering via Symbolic Program Generation and Execution. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computation...

work page doi:10.18653/v1/2024.acl-long.545 2024
[3]

Cecilia Aguerrebere, Ishwar Bhati, Mark Hildebrand, Mariano Tepper, and Ted Willke. 2023. Similarity search in the blink of an eye with compressed indices. Proceedings of the VLDB Endowment16, 11 (2023), 3433–3446

work page 2023
[4]

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, and Sara Hooker. 2024. Back to basics: Revisiting reinforce style optimization for learning from human feedback in llms.arXiv preprint arXiv:2402.14740(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

work page
[6]

InThe Twelfth International Conference on Learning Representations

Self-RAG: Learning to Retrieve, Generate, and Critique through Self- Reflection. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=hSyW5go0v8

work page
[7]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:arXiv:1606.01540

work page internal anchor Pith review Pith/arXiv arXiv 2016
[8]

Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, and Jie Fu. 2024. RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation. InFirst Conference on Language Modeling. https://openreview.net/ forum?id=tzE7VqsaJ4

work page 2024
[9]

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu

work page
[10]

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. arXiv:2402.03216 [cs.CL] https://arxiv.org/abs/2402.03216

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Liyi Chen, Panrong Tong, Zhongming Jin, Ying Sun, Jieping Ye, and Hui Xiong

work page
[12]

InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=CwCUEr6wO5

work page
[13]

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Tianpeng Li, Haoze Sun, Yijie Zhou, Chenzheng Zhu, Haofen Wang, Jeff Z. Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, and Weipeng Chen. 2025. ReSearch: Learning to Reason with Search for LLMs via Reinforce- ment Learning. arXiv:2503.19470 [cs.AI] https://arxiv.org/abs/2503.19470

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme. 2024. Dated data: Tracing knowledge cutoffs in large language models.arXiv preprint arXiv:2403.12958(2024)

work page arXiv 2024
[15]

Xiangxiang Chu, Hailang Huang, Xiao Zhang, Fei Wei, and Yong Wang. 2025. Gpg: A simple and strong reinforcement learning baseline for model reasoning. arXiv preprint arXiv:2504.02546(2025)

work page arXiv 2025
[16]

Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, SHUM KaShun, and Tong Zhang. 2023. RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment.Transactions on Machine Learning Research(2023)

work page 2023
[17]

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. 2024. A Survey on In-context Learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Comput...

work page doi:10.18653/v1/2024.emnlp-main.64 2024
[18]

Siyuan Fang, Kaijing Ma, Tianyu Zheng, Xinrun Du, Ningxuan Lu, Ge Zhang, and Qingkun Tang. 2024. KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model’s Reasoning Path Aggregation. arXiv:2412.20995 [cs.CL] https://arxiv.org/abs/2412.20995

work page arXiv 2024
[19]

Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su

work page
[20]

InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

work page
[21]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. InInternational conference on machine learning. PMLR, 3929–3938

work page 2020
[22]

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reason- ing Steps. InProceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Sp...

work page doi:10.18653/v1/2020.coling-main.580 2020
[23]

Jian Hu. 2025. Reinforce++: A simple and efficient approach for aligning large language models.arXiv preprint arXiv:2501.03262(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, chal- lenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55

work page 2025
[25]

Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Xin Zhao, Yang Song, and Tao Zhang. 2025. RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement. InProceedings of the 2025 Con- ference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies ...

work page 2025
[26]

Zhouyu Jiang, Mengshu Sun, Lei Liang, and Zhiqiang Zhang. 2025. Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach. arXiv:2407.13101 [cs.CL] https://arxiv.org/abs/2407.13101

work page arXiv 2025
[27]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick SH Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering.. InEMNLP (1). 6769–6781

work page 2020
[28]

Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christo- pher Potts, and Matei Zaharia. 2022. Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP.arXiv preprint arXiv:2212.14024(2022)

work page arXiv 2022
[29]

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences114, 13 (2017), 3521– 3526

work page 2017
[30]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles. 611–626

work page 2023
[31]

Yunshi Lan and Jing Jiang. 2020. Query Graph Generation for Answering Multi- hop Complex Questions from Knowledge Bases. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 969–974. https://doi.or...

work page doi:10.18653/v1/2020.acl-main.91 2020
[32]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems33 (2020), 9459–9474

work page 2020
[33]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. InProceedings of the 34th International Conference on Neural Information Processing Systems(Van...

work page 2020
[34]

Kun Li, Tianhua Zhang, Xixin Wu, Hongyin Luo, James Glass, and Helen Meng

work page
[35]

arXiv:2410.18415 [cs.CL] https: //arxiv.org/abs/2410.18415

Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains. arXiv:2410.18415 [cs.CL] https: //arxiv.org/abs/2410.18415

work page arXiv
[36]

Mufei Li, Siqi Miao, and Pan Li. 2025. Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=JvkuZZ04O7

work page 2025
[37]

Muzhi Li, Cehao Yang, Chengjin Xu, Xuhui Jiang, Yiyan Qi, Jian Guo, Ho-fung Leung, and Irwin King. 2025. Retrieval, Reasoning, Re-ranking: A Context- Enriched Framework for Knowledge Graph Completion. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (V...

work page 2025
[38]

Shaobo Li, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Chengjie Sun, Zhen- zhou Ji, and Bingquan Liu. 2021. HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions.Proceedings of the AAAI Conference on Artificial Intel- ligence35, 15 (May 2021), 13279–13287. https://doi.org/10.1609/aaai.v35i15.17568

work page doi:10.1609/aaai.v35i15.17568 2021
[39]

Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. 2025. Search-o1: Agentic Search-Enhanced Large Reasoning Models. arXiv:2501.05366 [cs.AI] https://arxiv.org/abs/2501.05366

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, and Zhi- Quan Luo. 2024. ReMax: a simple, effective, and efficient reinforcement learning method for aligning large language models. InProceedings of the 41st International Conference on Machine Learning. 29128–29163

work page 2024
[41]

Xujian Liang and Zhaoquan Gu. 2025. Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph.Proceedings of the AAAI Conference on Artificial Intelligence39, 23 (Apr. 2025), 24558–24566. https://doi.org/10.1609/aaai.v39i23.34635

work page doi:10.1609/aaai.v39i23.34635 2025
[42]

Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, and Xilun Chen. 2023. How to Train Your Dragon: Diverse Augmentation Towards Generalizable Dense Retrieval. InFindings of the Asso- ciation for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational L...

work page doi:10.18653/v1/2023.findings-emnlp.423 2023
[43]

Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, et al. 2025. Advances and chal- lenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173

work page 2024
[45]

Zhuang Liu and Kaiming He. 2024. A Decade’s Battle on Dataset Bias: Are We There Yet?arXiv preprint arXiv:2403.08632(2024)

work page arXiv 2024
[46]

Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic Gradient Descent with Warm Restarts. InInternational Conference on Learning Representations

work page 2017
[47]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations

work page 2019
[48]

LINHAO LUO, Yuan-Fang Li, Gholamreza Haffari, and Shirui Pan. 2024. Rea- soning on Graphs: Faithful and Interpretable Large Language Model Reasoning. InThe Twelfth International Conference on Learning Representations. https: //openreview.net/forum?id=ZGNWW7xZ6Q

work page 2024
[49]

LINHAO LUO, Zicheng Zhao, Gholamreza Haffari, Chen Gong, and Shirui Pan

work page
[50]

https://openreview.net/forum?id=6embY8aclt

Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models. https://openreview.net/forum?id=6embY8aclt

work page
[51]

Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, and Jian Guo. 2025. Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation. InThe Thirteenth International Conference on Learning Representations. https: //openreview.net/forum?id=oFBu7qaZpS

work page 2025
[52]

Shengyu Mao, Yong Jiang, Boli Chen, Xiao Li, Peng Wang, Xinyu Wang, Pengjun Xie, Fei Huang, Huajun Chen, and Ningyu Zhang. 2024. RaFe: Ranking Feedback Improves Query Rewriting for RAG. InFindings of the Association for Computa- tional Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguist...

work page doi:10.18653/v1/2024.findings-emnlp.49 2024
[53]

Vaibhav Mavi, Anubhav Jangra, Adam Jatowt, et al. 2024. Multi-hop question answering.Foundations and Trends®in Information Retrieval17, 5 (2024), 457– 586

work page 2024
[54]

Costas Mavromatis and George Karypis. 2024. GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning. arXiv:2405.20139 [cs.CL] https://arxiv. org/abs/2405.20139

work page arXiv 2024
[55]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744

work page 2022
[56]

Sukannya Purkayastha, Saswati Dana, Dinesh Garg, Dinesh Khandelwal, and G.P Shrivatsa Bhargav. 2022. A Deep Neural Approach to KGQA via SPARQL Silhouette Generation. In2022 International Joint Conference on Neural Networks (IJCNN). 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892263

work page doi:10.1109/ijcnn55064.2022.9892263 2022
[57]

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al . [n.d.]. Improving language understanding by generative pre-training. ([n. d.])

work page
[58]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems36 (2023), 53728–53741

work page 2023
[59]

Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Frame- work: BM25 and Beyond.Found. Trends Inf. Retr.3, 4 (April 2009), 333–389. https://doi.org/10.1561/1500000019

work page doi:10.1561/1500000019 2009
[60]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz

work page
[61]

InInternational conference on machine learning

Trust region policy optimization. InInternational conference on machine learning. PMLR, 1889–1897

work page
[62]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel. 2016. High-Dimensional Continuous Control Using Generalized Ad- vantage Estimation. In4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. http://arxiv.org/abs/1506.02438

work page internal anchor Pith review Pith/arXiv arXiv 2016
[63]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

work page
[64]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[65]

Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. 2023. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. InFindings of the Association for Compu- tational Linguistics: EMNLP 2023. 9248–9274

work page 2023
[66]

Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. 2023. Enhancing Retrieval-Augmented Large Language Models with It- erative Retrieval-Generation Synergy. InFindings of the Association for Com- putational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Sing...

work page doi:10.18653/v1/2023.findings-emnlp.620 2023
[67]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al . 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[68]

Yiheng Shu, Zhiwei Yu, Yuhan Li, Börje Karlsson, Tingting Ma, Yuzhong Qu, and Chin-Yew Lin. 2022. TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Base. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computati...

work page doi:10.18653/v1/2022 2022
[69]

Huatong Song, Jinhao Jiang, Yingqian Min, Jie Chen, Zhipeng Chen, Wayne Xin Zhao, Lei Fang, and Ji-Rong Wen. 2025. R1-searcher: Incentivizing the search capability in llms via reinforcement learning.arXiv preprint arXiv:2503.05592 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[70]

Yuan Sui, Yufei He, Nian Liu, Xiaoxin He, Kun Wang, and Bryan Hooi. 2025. FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering. arXiv:2405.13873 [cs.AI] https://arxiv.org/abs/2405.13873

work page arXiv 2025
[71]

Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, and Yan Zhang. 2025. ZEROSEARCH: Incentivize the Search Capability of LLMs without Searching.arXiv preprint arXiv:2505.04588(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[72]

Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel Ni, Heung-Yeung Shum, and Jian Guo. 2024. Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph. InThe Twelfth International Conference on Learning Representations. https: //openreview.net/forum?id=nnVO1PvbTv

work page 2024
[73]

Xingyu Tan, Xiaoyang Wang, Qing Liu, Xiwei Xu, Xin Yuan, and Wenjie Zhang

work page
[74]

arXiv:2410.14211 [cs.CL] https://arxiv.org/abs/2410.14211

Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning. arXiv:2410.14211 [cs.CL] https://arxiv.org/abs/2410.14211

work page arXiv
[75]

Yunhao Tang, Daniel Zhaohan Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, et al. 2024. Understanding the performance gap between online and offline alignment algorithms.arXiv preprint arXiv:2405.08448(2024)

work page arXiv 2024
[77]

Transactions of the Association for Computational Linguistics(2022)

MuSiQue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics(2022)

work page 2022
[78]

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal

work page
[79]

Diverse demonstrations improve in-context compositional generalization

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge- Intensive Multi-Step Questions. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 10014–10037. https:/...

work page doi:10.18653/v1/2023.acl- 2023
[80]

Ricardo Usbeck, Xi Yan, Aleksandr Perevalov, Longquan Jiang, Julius Schulz, Angelie Kraft, Cedric Möller, Junbo Huang, Jan Reineke, Axel-Cyrille Ngonga Ngomo, Muhammad Saleem, and Andreas Both. 2024. QALD-10 – The 10th challenge on question answering over linked data: Shifting from DBpedia to Wikidata as a KG for KGQA.Semantic Web15, 6 (2024), 2193–2207. ...

work page doi:10.3233/sw-233471 2024
[81]

Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase.Commun. ACM57, 10 (sep 2014), 78–85. https://doi.org/10. 1145/2629489

work page 2014

Showing first 80 references.

[1] [1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Prerna Agarwal, Nishant Kumar, and Srikanta Bedathur. 2024. SymKGQA: Few- Shot Knowledge Graph Question Answering via Symbolic Program Generation and Execution. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computation...

work page doi:10.18653/v1/2024.acl-long.545 2024

[3] [3]

Cecilia Aguerrebere, Ishwar Bhati, Mark Hildebrand, Mariano Tepper, and Ted Willke. 2023. Similarity search in the blink of an eye with compressed indices. Proceedings of the VLDB Endowment16, 11 (2023), 3433–3446

work page 2023

[4] [4]

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, and Sara Hooker. 2024. Back to basics: Revisiting reinforce style optimization for learning from human feedback in llms.arXiv preprint arXiv:2402.14740(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

work page

[6] [6]

InThe Twelfth International Conference on Learning Representations

Self-RAG: Learning to Retrieve, Generate, and Critique through Self- Reflection. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=hSyW5go0v8

work page

[7] [7]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:arXiv:1606.01540

work page internal anchor Pith review Pith/arXiv arXiv 2016

[8] [8]

Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, and Jie Fu. 2024. RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation. InFirst Conference on Language Modeling. https://openreview.net/ forum?id=tzE7VqsaJ4

work page 2024

[9] [9]

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu

work page

[10] [10]

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. arXiv:2402.03216 [cs.CL] https://arxiv.org/abs/2402.03216

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Liyi Chen, Panrong Tong, Zhongming Jin, Ying Sun, Jieping Ye, and Hui Xiong

work page

[12] [12]

InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=CwCUEr6wO5

work page

[13] [13]

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Tianpeng Li, Haoze Sun, Yijie Zhou, Chenzheng Zhu, Haofen Wang, Jeff Z. Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, and Weipeng Chen. 2025. ReSearch: Learning to Reason with Search for LLMs via Reinforce- ment Learning. arXiv:2503.19470 [cs.AI] https://arxiv.org/abs/2503.19470

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme. 2024. Dated data: Tracing knowledge cutoffs in large language models.arXiv preprint arXiv:2403.12958(2024)

work page arXiv 2024

[15] [15]

Xiangxiang Chu, Hailang Huang, Xiao Zhang, Fei Wei, and Yong Wang. 2025. Gpg: A simple and strong reinforcement learning baseline for model reasoning. arXiv preprint arXiv:2504.02546(2025)

work page arXiv 2025

[16] [16]

Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, SHUM KaShun, and Tong Zhang. 2023. RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment.Transactions on Machine Learning Research(2023)

work page 2023

[17] [17]

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. 2024. A Survey on In-context Learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Comput...

work page doi:10.18653/v1/2024.emnlp-main.64 2024

[18] [18]

Siyuan Fang, Kaijing Ma, Tianyu Zheng, Xinrun Du, Ningxuan Lu, Ge Zhang, and Qingkun Tang. 2024. KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model’s Reasoning Path Aggregation. arXiv:2412.20995 [cs.CL] https://arxiv.org/abs/2412.20995

work page arXiv 2024

[19] [19]

Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su

work page

[20] [20]

InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

work page

[21] [21]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. InInternational conference on machine learning. PMLR, 3929–3938

work page 2020

[22] [22]

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reason- ing Steps. InProceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Sp...

work page doi:10.18653/v1/2020.coling-main.580 2020

[23] [23]

Jian Hu. 2025. Reinforce++: A simple and efficient approach for aligning large language models.arXiv preprint arXiv:2501.03262(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, chal- lenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55

work page 2025

[25] [25]

Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Xin Zhao, Yang Song, and Tao Zhang. 2025. RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement. InProceedings of the 2025 Con- ference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies ...

work page 2025

[26] [26]

Zhouyu Jiang, Mengshu Sun, Lei Liang, and Zhiqiang Zhang. 2025. Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach. arXiv:2407.13101 [cs.CL] https://arxiv.org/abs/2407.13101

work page arXiv 2025

[27] [27]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick SH Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering.. InEMNLP (1). 6769–6781

work page 2020

[28] [28]

Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christo- pher Potts, and Matei Zaharia. 2022. Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP.arXiv preprint arXiv:2212.14024(2022)

work page arXiv 2022

[29] [29]

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences114, 13 (2017), 3521– 3526

work page 2017

[30] [30]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles. 611–626

work page 2023

[31] [31]

Yunshi Lan and Jing Jiang. 2020. Query Graph Generation for Answering Multi- hop Complex Questions from Knowledge Bases. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 969–974. https://doi.or...

work page doi:10.18653/v1/2020.acl-main.91 2020

[32] [32]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems33 (2020), 9459–9474

work page 2020

[33] [33]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. InProceedings of the 34th International Conference on Neural Information Processing Systems(Van...

work page 2020

[34] [34]

Kun Li, Tianhua Zhang, Xixin Wu, Hongyin Luo, James Glass, and Helen Meng

work page

[35] [35]

arXiv:2410.18415 [cs.CL] https: //arxiv.org/abs/2410.18415

Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains. arXiv:2410.18415 [cs.CL] https: //arxiv.org/abs/2410.18415

work page arXiv

[36] [36]

Mufei Li, Siqi Miao, and Pan Li. 2025. Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=JvkuZZ04O7

work page 2025

[37] [37]

Muzhi Li, Cehao Yang, Chengjin Xu, Xuhui Jiang, Yiyan Qi, Jian Guo, Ho-fung Leung, and Irwin King. 2025. Retrieval, Reasoning, Re-ranking: A Context- Enriched Framework for Knowledge Graph Completion. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (V...

work page 2025

[38] [38]

Shaobo Li, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Chengjie Sun, Zhen- zhou Ji, and Bingquan Liu. 2021. HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions.Proceedings of the AAAI Conference on Artificial Intel- ligence35, 15 (May 2021), 13279–13287. https://doi.org/10.1609/aaai.v35i15.17568

work page doi:10.1609/aaai.v35i15.17568 2021

[39] [39]

Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. 2025. Search-o1: Agentic Search-Enhanced Large Reasoning Models. arXiv:2501.05366 [cs.AI] https://arxiv.org/abs/2501.05366

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, and Zhi- Quan Luo. 2024. ReMax: a simple, effective, and efficient reinforcement learning method for aligning large language models. InProceedings of the 41st International Conference on Machine Learning. 29128–29163

work page 2024

[41] [41]

Xujian Liang and Zhaoquan Gu. 2025. Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph.Proceedings of the AAAI Conference on Artificial Intelligence39, 23 (Apr. 2025), 24558–24566. https://doi.org/10.1609/aaai.v39i23.34635

work page doi:10.1609/aaai.v39i23.34635 2025

[42] [42]

Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, and Xilun Chen. 2023. How to Train Your Dragon: Diverse Augmentation Towards Generalizable Dense Retrieval. InFindings of the Asso- ciation for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational L...

work page doi:10.18653/v1/2023.findings-emnlp.423 2023

[43] [43]

Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, et al. 2025. Advances and chal- lenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173

work page 2024

[45] [45]

Zhuang Liu and Kaiming He. 2024. A Decade’s Battle on Dataset Bias: Are We There Yet?arXiv preprint arXiv:2403.08632(2024)

work page arXiv 2024

[46] [46]

Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic Gradient Descent with Warm Restarts. InInternational Conference on Learning Representations

work page 2017

[47] [47]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations

work page 2019

[48] [48]

LINHAO LUO, Yuan-Fang Li, Gholamreza Haffari, and Shirui Pan. 2024. Rea- soning on Graphs: Faithful and Interpretable Large Language Model Reasoning. InThe Twelfth International Conference on Learning Representations. https: //openreview.net/forum?id=ZGNWW7xZ6Q

work page 2024

[49] [49]

LINHAO LUO, Zicheng Zhao, Gholamreza Haffari, Chen Gong, and Shirui Pan

work page

[50] [50]

https://openreview.net/forum?id=6embY8aclt

Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models. https://openreview.net/forum?id=6embY8aclt

work page

[51] [51]

Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, and Jian Guo. 2025. Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation. InThe Thirteenth International Conference on Learning Representations. https: //openreview.net/forum?id=oFBu7qaZpS

work page 2025

[52] [52]

Shengyu Mao, Yong Jiang, Boli Chen, Xiao Li, Peng Wang, Xinyu Wang, Pengjun Xie, Fei Huang, Huajun Chen, and Ningyu Zhang. 2024. RaFe: Ranking Feedback Improves Query Rewriting for RAG. InFindings of the Association for Computa- tional Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguist...

work page doi:10.18653/v1/2024.findings-emnlp.49 2024

[53] [53]

Vaibhav Mavi, Anubhav Jangra, Adam Jatowt, et al. 2024. Multi-hop question answering.Foundations and Trends®in Information Retrieval17, 5 (2024), 457– 586

work page 2024

[54] [54]

Costas Mavromatis and George Karypis. 2024. GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning. arXiv:2405.20139 [cs.CL] https://arxiv. org/abs/2405.20139

work page arXiv 2024

[55] [55]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744

work page 2022

[56] [56]

Sukannya Purkayastha, Saswati Dana, Dinesh Garg, Dinesh Khandelwal, and G.P Shrivatsa Bhargav. 2022. A Deep Neural Approach to KGQA via SPARQL Silhouette Generation. In2022 International Joint Conference on Neural Networks (IJCNN). 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892263

work page doi:10.1109/ijcnn55064.2022.9892263 2022

[57] [57]

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al . [n.d.]. Improving language understanding by generative pre-training. ([n. d.])

work page

[58] [58]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems36 (2023), 53728–53741

work page 2023

[59] [59]

Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Frame- work: BM25 and Beyond.Found. Trends Inf. Retr.3, 4 (April 2009), 333–389. https://doi.org/10.1561/1500000019

work page doi:10.1561/1500000019 2009

[60] [60]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz

work page

[61] [61]

InInternational conference on machine learning

Trust region policy optimization. InInternational conference on machine learning. PMLR, 1889–1897

work page

[62] [62]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel. 2016. High-Dimensional Continuous Control Using Generalized Ad- vantage Estimation. In4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. http://arxiv.org/abs/1506.02438

work page internal anchor Pith review Pith/arXiv arXiv 2016

[63] [63]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

work page

[64] [64]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[65] [65]

Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. 2023. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. InFindings of the Association for Compu- tational Linguistics: EMNLP 2023. 9248–9274

work page 2023

[66] [66]

Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. 2023. Enhancing Retrieval-Augmented Large Language Models with It- erative Retrieval-Generation Synergy. InFindings of the Association for Com- putational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Sing...

work page doi:10.18653/v1/2023.findings-emnlp.620 2023

[67] [67]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al . 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[68] [68]

Yiheng Shu, Zhiwei Yu, Yuhan Li, Börje Karlsson, Tingting Ma, Yuzhong Qu, and Chin-Yew Lin. 2022. TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Base. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computati...

work page doi:10.18653/v1/2022 2022

[69] [69]

Huatong Song, Jinhao Jiang, Yingqian Min, Jie Chen, Zhipeng Chen, Wayne Xin Zhao, Lei Fang, and Ji-Rong Wen. 2025. R1-searcher: Incentivizing the search capability in llms via reinforcement learning.arXiv preprint arXiv:2503.05592 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[70] [70]

Yuan Sui, Yufei He, Nian Liu, Xiaoxin He, Kun Wang, and Bryan Hooi. 2025. FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering. arXiv:2405.13873 [cs.AI] https://arxiv.org/abs/2405.13873

work page arXiv 2025

[71] [71]

Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, and Yan Zhang. 2025. ZEROSEARCH: Incentivize the Search Capability of LLMs without Searching.arXiv preprint arXiv:2505.04588(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[72] [72]

Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel Ni, Heung-Yeung Shum, and Jian Guo. 2024. Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph. InThe Twelfth International Conference on Learning Representations. https: //openreview.net/forum?id=nnVO1PvbTv

work page 2024

[73] [73]

Xingyu Tan, Xiaoyang Wang, Qing Liu, Xiwei Xu, Xin Yuan, and Wenjie Zhang

work page

[74] [74]

arXiv:2410.14211 [cs.CL] https://arxiv.org/abs/2410.14211

Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning. arXiv:2410.14211 [cs.CL] https://arxiv.org/abs/2410.14211

work page arXiv

[75] [75]

Yunhao Tang, Daniel Zhaohan Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, et al. 2024. Understanding the performance gap between online and offline alignment algorithms.arXiv preprint arXiv:2405.08448(2024)

work page arXiv 2024

[76] [77]

Transactions of the Association for Computational Linguistics(2022)

MuSiQue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics(2022)

work page 2022

[77] [78]

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal

work page

[78] [79]

Diverse demonstrations improve in-context compositional generalization

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge- Intensive Multi-Step Questions. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 10014–10037. https:/...

work page doi:10.18653/v1/2023.acl- 2023

[79] [80]

Ricardo Usbeck, Xi Yan, Aleksandr Perevalov, Longquan Jiang, Julius Schulz, Angelie Kraft, Cedric Möller, Junbo Huang, Jan Reineke, Axel-Cyrille Ngonga Ngomo, Muhammad Saleem, and Andreas Both. 2024. QALD-10 – The 10th challenge on question answering over linked data: Shifting from DBpedia to Wikidata as a KG for KGQA.Semantic Web15, 6 (2024), 2193–2207. ...

work page doi:10.3233/sw-233471 2024

[80] [81]

Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase.Commun. ACM57, 10 (sep 2014), 78–85. https://doi.org/10. 1145/2629489

work page 2014