Longer action horizons bottleneck LLM agent training through instability, but training with reduced horizons stabilizes learning and enables better generalization to longer horizons.
arXiv preprint arXiv:2507.04103 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Sequential SFT followed by RL, guided by the Plasticity-Ceiling Framework, achieves higher performance ceilings in LLM mathematical reasoning than synchronized methods by optimizing data scale and transition timing.
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.
citing papers explorer
-
On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length
Longer action horizons bottleneck LLM agent training through instability, but training with reduced horizons stabilizes learning and enables better generalization to longer horizons.
-
Rethinking Expert Trajectory Utilization in LLM Post-training for Mathematical Reasoning
Sequential SFT followed by RL, guided by the Plasticity-Ceiling Framework, achieves higher performance ceilings in LLM mathematical reasoning than synchronized methods by optimizing data scale and transition timing.
-
Rethinking Agentic Reinforcement Learning In Large Language Models
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.