Agent-r1: Training powerful LLM agents with end-to-end reinforcement learning.arXiv preprint

Agent-r1: Training powerful llm agents with end-to-end reinforcement learning , author= · 2025 · arXiv 2511.14460

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3 extension 1

citation-polarity summary

background 2 extend 1 unclear 1

representative citing papers

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.

Tools as Continuous Flow for Evolving Agentic Reasoning

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

FlowAgent models tool chaining as continuous latent trajectory generation with conditional flow matching to deliver global planning, formal utility bounds, and better robustness on long-horizon tasks, plus a new plan-level benchmark.

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

cs.LG · 2026-04-12 · unverdicted · novelty 6.0

Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.

TEC: A Collection of Human Trial-and-error Trajectories for Problem Solving

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

TEC is a new public dataset of detailed human trial-and-error trajectories and reflections on web tasks, with humans showing substantially higher accuracy than LLMs.

SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

cs.RO · 2026-03-26 · unverdicted · novelty 6.0

SABER uses a trained ReAct agent to produce bounded adversarial edits to robot instructions, cutting task success by 20.6% and increasing execution length and violations on the LIBERO benchmark across six VLA models.

StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning

cs.CL · 2026-04-20 · unverdicted · novelty 4.0

StepPO argues that LLM agents should optimize at the step level rather than token level to better handle delayed rewards and long contexts in agentic RL.

EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools

cs.AI · 2026-04-09 · unverdicted · novelty 4.0

Structured query and evidence tools added to an AI research agent improve benchmark accuracy by 0.6 to 3.8 percentage points.

Toward a Safe Internet of Agents

cs.MA · 2025-11-29 · unverdicted · novelty 4.0

The paper proposes a bottom-up framework for safe agentic AI systems that treats each component as a dual-use interface where added capabilities also expand attack surfaces across single agents, multi-agent systems, and interoperable ecosystems.

citing papers explorer

Showing 8 of 8 citing papers.

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents cs.AI · 2026-05-13 · unverdicted · none · ref 39 · 2 links
ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.
Tools as Continuous Flow for Evolving Agentic Reasoning cs.AI · 2026-05-08 · unverdicted · none · ref 16
FlowAgent models tool chaining as continuous latent trajectory generation with conditional flow matching to deliver global planning, formal utility bounds, and better robustness on long-horizon tasks, plus a new plan-level benchmark.
Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents cs.LG · 2026-04-12 · unverdicted · none · ref 7
Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.
TEC: A Collection of Human Trial-and-error Trajectories for Problem Solving cs.CL · 2026-04-08 · unverdicted · none · ref 7
TEC is a new public dataset of detailed human trial-and-error trajectories and reflections on web tasks, with humans showing substantially higher accuracy than LLMs.
SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models cs.RO · 2026-03-26 · unverdicted · none · ref 32
SABER uses a trained ReAct agent to produce bounded adversarial edits to robot instructions, cutting task success by 20.6% and increasing execution length and violations on the LIBERO benchmark across six VLA models.
StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning cs.CL · 2026-04-20 · unverdicted · none · ref 7
StepPO argues that LLM agents should optimize at the step level rather than token level to better handle delayed rewards and long contexts in agentic RL.
EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools cs.AI · 2026-04-09 · unverdicted · none · ref 5
Structured query and evidence tools added to an AI research agent improve benchmark accuracy by 0.6 to 3.8 percentage points.
Toward a Safe Internet of Agents cs.MA · 2025-11-29 · unverdicted · none · ref 5
The paper proposes a bottom-up framework for safe agentic AI systems that treats each component as a dual-use interface where added capabilities also expand attack surfaces across single agents, multi-agent systems, and interoperable ecosystems.

Agent-r1: Training powerful LLM agents with end-to-end reinforcement learning.arXiv preprint

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer