APEX-Searcher: Refining Credit Assignment with Subgoaling for Agentic Retrieval-Augmented Generation

Kun Chen; Qingchao Kong; Wenji Mao; Zhao Feifei

arxiv: 2603.13853 · v3 · pith:7B67UJGEnew · submitted 2026-03-14 · 💻 cs.CL · cs.AI

APEX-Searcher: Refining Credit Assignment with Subgoaling for Agentic Retrieval-Augmented Generation

Kun Chen , Qingchao Kong , Zhao Feifei , Wenji Mao This is my paper

classification 💻 cs.CL cs.AI

keywords retrievalcreditexecutionplanningapex-searcherassignmentcomplexend-to-end

0 comments

read the original abstract

Retrieval-augmented generation (RAG) connects large language models (LLMs) to external knowledge, but single-round retrieval is often insufficient for complex multi-hop questions. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches improve problem-solving performance, they still face challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL), which can lead to inaccurate retrieval results and lower performance. We attribute these failures to hierarchical credit entanglement: a single final reward updates planning and execution together, so the model cannot clearly separate plan errors from retrieval errors. We propose APEX-Searcher, which uses a Refining Credit Assignment paradigm: planning is optimized by RL with a plan-level reward, while execution is learned by SFT. Extensive experiments show consistent gains in both multi-hop RAG and task planning across benchmarks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLM-Guided Planning for Multi-hop Reasoning over Multimodal Nuclear Regulatory Documents
cs.AI 2026-06 unverdicted novelty 4.0

LLM planning agent with dynamic KG state achieves 81.5% accuracy on 200 multi-hop questions from NuScale FSAR documents, outperforming non-planning RAG baselines by up to 38pp.