Carl: Critical action focused reinforcement learning for multi-step agent

Leyang Shen, Yang Zhang, Chun Kai Ling, Xiaoyan Zhao, Tat-Seng Chua · 2025 · cs.LG · arXiv 2512.04949

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each step holds equal contribution, which deviates significantly from reality. Our analysis reveals that only the action choices on a small fraction of states are critical in determining the final outcome. Building on this insight, we propose CARL, a criticality-aware reinforcement learning algorithm tailored for long-horizon agentic reasoning. CARL leverages entropy as a heuristic proxy for state criticality and achieves focused training by assigning rewards to actions taken from high-criticality states while excluding actions taken from low-criticality states from model updates, avoiding noisy credit assignment and redundant computation. Extensive experiments demonstrate that CARL achieves both stronger performance and higher efficiency across diverse evaluation settings. The source code will be publicly available.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

cs.LG · 2026-05-14 · conditional · novelty 6.0

ActFocus resolves the action bottleneck in agentic RL by reweighting token gradients toward action tokens using observed reward variance and an energy-based uncertainty term, outperforming PPO and GRPO by up to 65 percentage points.

A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

A²TGPO improves RL policy optimization for multi-turn agentic LLMs by normalizing information gain within same-depth turn groups, rescaling cumulative advantages by sqrt of term count, and modulating clipping ranges per turn's normalized IG.

From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

A survey of credit assignment techniques in LLM reinforcement learning that distinguishes maturing methods for reasoning from new approaches needed for agentic settings and provides supporting resources.

Medical Reasoning with Large Language Models: A Survey and MR-Bench

cs.CL · 2026-03-17 · accept · novelty 5.0

LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.

citing papers explorer

Showing 4 of 4 citing papers.

Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy cs.LG · 2026-05-14 · conditional · none · ref 24 · internal anchor
ActFocus resolves the action bottleneck in agentic RL by reweighting token gradients toward action tokens using observed reward variance and an energy-based uncertainty term, outperforming PPO and GRPO by up to 65 percentage points.
A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping cs.CL · 2026-05-07 · unverdicted · none · ref 13 · internal anchor
A²TGPO improves RL policy optimization for multi-turn agentic LLMs by normalizing information gain within same-depth turn groups, rescaling cumulative advantages by sqrt of term count, and modulating clipping ranges per turn's normalized IG.
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models cs.CL · 2026-04-10 · unverdicted · none · ref 22 · internal anchor
A survey of credit assignment techniques in LLM reinforcement learning that distinguishes maturing methods for reasoning from new approaches needed for agentic settings and provides supporting resources.
Medical Reasoning with Large Language Models: A Survey and MR-Bench cs.CL · 2026-03-17 · accept · none · ref 98 · internal anchor
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.

Carl: Critical action focused reinforcement learning for multi-step agent

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer