2501.10005 , archivePrefix=

Lucen Zhong, Zhengxiao Du, Akari Asai, Jie Tang , year= · arXiv 2501.10005

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

PROVE trains LLMs on multi-step tool calls using 20 live MCP servers with 343 tools, state-grounded synthesis, and adaptive efficiency rewards, delivering gains of up to 10.2 points on BFCL Multi-Turn and similar on other benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments cs.CL · 2026-06-02 · unverdicted · none · ref 4
PROVE trains LLMs on multi-step tool calls using 20 live MCP servers with 343 tools, state-grounded synthesis, and adaptive efficiency rewards, delivering gains of up to 10.2 points on BFCL Multi-Turn and similar on other benchmarks.

2501.10005 , archivePrefix=

fields

years

verdicts

representative citing papers

citing papers explorer