R-search: Em- powering llm reasoning with search via multi-reward reinforcement learning

Qingfei Zhao, Ruobing Wang, Dingling Xu, Daren Zha, Limin Liu · 2025 · arXiv 2506.04185

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.

Learning to Trust: Dynamic Utilization of Retrieval-Augmented Generation for E-commerce Search Relevance

cs.IR · 2025-10-13 · unverdicted · novelty 6.0

DyKnow-RAG uses Group Relative Policy Optimization with dual-group rollouts and posterior-driven advantage scaling to optimize context utilization in RAG for e-commerce relevance, showing offline gains and production lifts when deployed at Taobao.

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

cs.LG · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

SLIM dynamically optimizes the active external skill set in agentic RL via leave-one-skill-out marginal contribution estimates and lifecycle operations, delivering a 7.1% average gain over baselines on ALFWorld and SearchQA while showing some skills remain externally useful.

Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs

cs.CL · 2025-10-01 · unverdicted · novelty 5.0

ERL trains LLMs to erase faulty reasoning steps and regenerate them in place, yielding gains of up to 8.48% EM on multi-hop QA benchmarks like HotpotQA.

citing papers explorer

Showing 4 of 4 citing papers.

LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG cs.CL · 2026-05-07 · unverdicted · none · ref 50
LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
Learning to Trust: Dynamic Utilization of Retrieval-Augmented Generation for E-commerce Search Relevance cs.IR · 2025-10-13 · unverdicted · none · ref 26
DyKnow-RAG uses Group Relative Policy Optimization with dual-group rollouts and posterior-driven advantage scaling to optimize context utilization in RAG for e-commerce relevance, showing offline gains and production lifts when deployed at Taobao.
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning cs.LG · 2026-05-11 · unverdicted · none · ref 74 · 2 links
SLIM dynamically optimizes the active external skill set in agentic RL via leave-one-skill-out marginal contribution estimates and lifecycle operations, delivering a 7.1% average gain over baselines on ALFWorld and SearchQA while showing some skills remain externally useful.
Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs cs.CL · 2025-10-01 · unverdicted · none · ref 37
ERL trains LLMs to erase faulty reasoning steps and regenerate them in place, yielding gains of up to 8.48% EM on multi-hop QA benchmarks like HotpotQA.

R-search: Em- powering llm reasoning with search via multi-reward reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer