arXiv preprint arXiv:2512.19126

Awpo: Enhancing tool-use of large language models through adaptive integration of reasoning rewards · arXiv 2512.19126

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration

cs.AI · 2026-04-03 · conditional · novelty 7.0

Iterative Reward Calibration with MT-GRPO and GTPO enables effective multi-turn RL for tool-calling agents, raising Tau-Bench success from 63.8% to 66.7% for a 4B model and from 58.0% to 69.5% for a 30B model.

citing papers explorer

Showing 1 of 1 citing paper.

Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration cs.AI · 2026-04-03 · conditional · none · ref 4
Iterative Reward Calibration with MT-GRPO and GTPO enables effective multi-turn RL for tool-calling agents, raising Tau-Bench success from 63.8% to 66.7% for a 4B model and from 58.0% to 69.5% for a 30B model.

arXiv preprint arXiv:2512.19126

fields

years

verdicts

representative citing papers

citing papers explorer