Title resolution pending

Each reward component evaluates a distinct aspect of task completion, including correct product title identification (Title Score), accurate category matching (reward type), attribute fulfillment (reward attribute), final option selecti · 2039

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

T$^2$PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

T²PO improves stability and performance in multi-turn agentic RL by using uncertainty dynamics at token and turn levels to guide exploration and avoid wasted rollouts.

citing papers explorer

Showing 1 of 1 citing paper.

T$^2$PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning cs.AI · 2026-05-04 · unverdicted · none · ref 30
T²PO improves stability and performance in multi-turn agentic RL by using uncertainty dynamics at token and turn levels to guide exploration and avoid wasted rollouts.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer