pith. machine review for the scientific record. sign in

arxiv: 2512.04949 · v3 · submitted 2025-12-04 · 💻 cs.LG · cs.AI· cs.CL

Recognition: unknown

CARL: Criticality-Aware Agentic Reinforcement Learning

Authors on Pith no claims yet
classification 💻 cs.LG cs.AIcs.CL
keywords carlstatesachievesactionsagenticalgorithmcriticality-awarelearning
0
0 comments X
read the original abstract

Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each step holds equal contribution, which deviates significantly from reality. Our analysis reveals that only the action choices on a small fraction of states are critical in determining the final outcome. Building on this insight, we propose CARL, a criticality-aware reinforcement learning algorithm tailored for long-horizon agentic reasoning. CARL leverages entropy as a heuristic proxy for state criticality and achieves focused training by assigning rewards to actions taken from high-criticality states while excluding actions taken from low-criticality states from model updates, avoiding noisy credit assignment and redundant computation. Extensive experiments demonstrate that CARL achieves both stronger performance and higher efficiency across diverse evaluation settings. The source code will be publicly available.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

    cs.LG 2026-05 conditional novelty 6.0

    ActFocus resolves the action bottleneck in agentic RL by reweighting token gradients toward action tokens using observed reward variance and an energy-based uncertainty term, outperforming PPO and GRPO by up to 65 per...

  2. A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

    cs.CL 2026-05 unverdicted novelty 6.0

    A²TGPO improves RL policy optimization for multi-turn agentic LLMs by normalizing information gain within same-depth turn groups, rescaling cumulative advantages by sqrt of term count, and modulating clipping ranges p...

  3. From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models

    cs.CL 2026-04 unverdicted novelty 5.0

    A survey of credit assignment techniques in LLM reinforcement learning that distinguishes maturing methods for reasoning from new approaches needed for agentic settings and provides supporting resources.

  4. Medical Reasoning with Large Language Models: A Survey and MR-Bench

    cs.CL 2026-03 accept novelty 5.0

    LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.