arxiv: 2505.17086 · v4 · submitted 2025-05-20 · 💻 cs.CL

Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning

Yihong Wu , Liheng Ma , Muzhi Li , Jiaming Zhou , Lei Ding , Jianye Hao , Ho-fung Leung , Irwin King

show 2 more authors

Yingxue Zhang Jian-Yun Nie

This is my paper

classification 💻 cs.CL

keywords learningmulti-turncomplexllmssystemscontextsefficientfurther

0 comments

read the original abstract

Large Language Models (LLMs) equipped with modern Retrieval-Augmented Generation (RAG) systems often employ multi-turn interaction pipelines to interface with search engines for complex reasoning tasks. However, such multi-turn interactions inevitably produce long intermediate contexts, as context length grows exponentially with exploration depth. This leads to a well-known limitation of LLMs: their difficulty in effectively leveraging information from long contexts. This problem is further amplified in RAG systems that depend on in-context learning, where few-shot demonstrations must also be included in the prompt, compounding the context-length bottleneck. To address these challenges, we propose Mujica-MyGo, a unified framework for efficient multi-turn reasoning in RAG. Inspired by the divide-and-conquer principle, we introduce Mujica (Multi-hop Joint Intelligence for Complex Question Answering), a multi-agent RAG workflow that decomposes multi-turn interactions into cooperative sub-interactions, thereby mitigating long-context issues. To eliminate the dependency on in-context learning, we further develop MyGO (Minimalist Policy Gradient Optimization), a lightweight and efficient reinforcement learning algorithm that enables effective post-training of LLMs within complex RAG pipelines. We provide theoretical guarantees for MyGO's convergence to the optimal policy. Empirical evaluations across diverse question-answering benchmarks, covering both text corpora and knowledge graphs, show that Mujica-MyGO achieves superior performance.

This paper has not been read by Pith yet.

Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning

discussion (0)