pith. sign in

hub Mixed citations

arXiv preprint arXiv:2511.14460 , year=

Mixed citation behavior. Most common role is background (60%).

14 Pith papers citing it
Background 60% of classified citations
abstract

Large language models (LLMs) have rapidly evolved from single-turn text generators into the foundation of increasingly capable agents. As these agents take on more complex reasoning, decision making, tool use, and long-horizon tasks, reinforcement learning (RL) is becoming increasingly important for shaping their behavior. This shift is especially visible in agentic RL, where models must interact with tools and environments across multiple rounds rather than produce a single standalone response. In this regime, the usual view of a trajectory as one ever-growing token sequence becomes increasingly inadequate: it makes context evolution rigid and creates representation mismatches between rollout and training. This paper presents Agent-R1, a unified and modular framework for agentic RL built around step-level trajectory representation, flexible context management, and layered interfaces for workflows, environments and optimization. The key idea is to treat each interaction step as the basic reinforcement-learning transition, while keeping the optimization layer flexible: once the interaction is modeled at the step level, the framework can support token-level credit assignment, step-level credit assignment, or other compatible designs. These design choices make the framework compatible with a range of optimization strategies rather than tying it to a single algorithm. Together, these components provide a principled, extensible, and reusable substrate for agentic RL.

hub tools

citation-role summary

background 4 extension 1

citation-polarity summary

years

2026 13 2025 1

clear filters

representative citing papers

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.

Tools as Continuous Flow for Evolving Agentic Reasoning

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

FlowAgent models tool chaining as continuous latent trajectory generation with conditional flow matching to deliver global planning, formal utility bounds, and better robustness on long-horizon tasks, plus a new plan-level benchmark.

RaMem: Contextual Reinstatement for Long-term Agentic Memory

cs.AI · 2026-06-22 · unverdicted · novelty 5.0

RaMem improves LLM agent memory by grounding fragments in original conditions like time and participants, then using validity-aware retrieval, yielding >10% average F1 gains over baselines.

Toward a Safe Internet of Agents

cs.MA · 2025-11-29 · unverdicted · novelty 4.0

The paper proposes a bottom-up framework for safe agentic AI systems that treats each component as a dual-use interface where added capabilities also expand attack surfaces across single agents, multi-agent systems, and interoperable ecosystems.

citing papers explorer

Showing 1 of 1 citing paper after filters.