Codebase Building upon the existing codebase verl (Sheng et al., 2024), our codebase introduces targeted modifications to both the vLLM (Kwon et al

execute actions in en · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

T$^2$PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

T²PO improves stability and performance in multi-turn agentic RL by using uncertainty dynamics at token and turn levels to guide exploration and avoid wasted rollouts.

citing papers explorer

Showing 1 of 1 citing paper.

T$^2$PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning cs.AI · 2026-05-04 · unverdicted · none · ref 31
T²PO improves stability and performance in multi-turn agentic RL by using uncertainty dynamics at token and turn levels to guide exploration and avoid wasted rollouts.

Codebase Building upon the existing codebase verl (Sheng et al., 2024), our codebase introduces targeted modifications to both the vLLM (Kwon et al

fields

years

verdicts

representative citing papers

citing papers explorer