EvoDS adds autonomous skill acquisition via synthesis-validation-reuse and adaptive context compression via learned control within a two-stage multi-agent RL scheme, claiming 28.9% average gains over prior agents on four benchmarks plus elimination of out-of-token failures.
Sfr-deepresearch: Towards effective reinforcement learning for autonomously reasoning single agents
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
citing papers explorer
-
EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management
EvoDS adds autonomous skill acquisition via synthesis-validation-reuse and adaptive context compression via learned control within a two-stage multi-agent RL scheme, claiming 28.9% average gains over prior agents on four benchmarks plus elimination of out-of-token failures.
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.