LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

· 2026 · cs.AI · arXiv 2604.17931

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-world search dependency during RL training introduces instability and prohibitive cost, which limits the scalability of Agentic RL. LiteResearcher is a training framework that makes Agentic RL scalable: by constructing a lite virtual world that mirrors real-world search dynamics, we enable a continuously improving training recipe that empowers a tiny search agent to outperform large-scale open-source and commercial models (e.g., Tongyi DeepResearch and Claude-4.5 Sonnet). Specifically, on common benchmarks such as GAIA and Xbench, our LiteResearcher-4B achieves open-source state-of-the-art results of 71.3% and 78.0% respectively, demonstrating that scalable RL training is a key enabler for Deep Research Agents.

representative citing papers

AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

cs.RO · 2026-06-02 · unverdicted · novelty 6.0

AgenticRL deploys a multimodal GPT agent in a closed-loop process to autonomously design and refine reward functions for PPO-trained vision-conditioned UAV navigation policies, reporting 71% policy improvement and 91% real-world success.

citing papers explorer

Showing 1 of 1 citing paper after filters.

AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation cs.RO · 2026-06-02 · unverdicted · none · ref 24 · internal anchor
AgenticRL deploys a multimodal GPT agent in a closed-loop process to autonomously design and refine reward functions for PPO-trained vision-conditioned UAV navigation policies, reporting 71% policy improvement and 91% real-world success.

LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

fields

years

verdicts

representative citing papers

citing papers explorer