Sfr-deepresearch: Towards effective reinforcement learning for autonomously reasoning single agents

Xuan-Phi Nguyen, Shrey Pandit, Revanth Gangi Reddy, Austin Xu, Silvio Savarese, Caiming Xiong, Shafiq Joty · 2025 · arXiv 2509.06283

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

cs.AI · 2026-06-02 · unverdicted · novelty 6.0

EvoDS adds autonomous skill acquisition via synthesis-validation-reuse and adaptive context compression via learned control within a two-stage multi-agent RL scheme, claiming 28.9% average gains over prior agents on four benchmarks plus elimination of out-of-token failures.

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

cs.CL · 2025-11-14 · unverdicted · novelty 6.0

MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

cs.AI · 2025-09-02 · accept · novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

citing papers explorer

Showing 2 of 2 citing papers after filters.

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management cs.AI · 2026-06-02 · unverdicted · none · ref 44
EvoDS adds autonomous skill acquisition via synthesis-validation-reuse and adaptive context compression via learned control within a two-stage multi-agent RL scheme, claiming 28.9% average gains over prior agents on four benchmarks plus elimination of out-of-token failures.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 289
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

Sfr-deepresearch: Towards effective reinforcement learning for autonomously reasoning single agents

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer