pith:EYMHDPD5
ToolRL: Reward is All Tool Learning Needs
A principled reward design for tool-use tasks lets reinforcement learning outperform supervised fine-tuning in training LLMs to use tools.
arxiv:2504.13958 v1 · 2025-04-16 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{EYMHDPD5YM3H6XWTQ4LIE3APIW}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
Empirical evaluations across diverse benchmarks demonstrate that our approach yields robust, scalable, and stable training, achieving a 17% improvement over base models and a 15% gain over SFT models.
The explored reward strategies and the proposed principled design are assumed to transfer to tool-use scenarios outside the specific benchmarks and tool sets used in the experiments.
A principled reward design for tool selection and application in RL-trained LLMs delivers 17% gains over base models and 15% over SFT across benchmarks.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-18T03:22:05.942883Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
261871bc7dc3367f5ed38716826c0f459c73573a005c87b75c51d4dcf1edc70c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/EYMHDPD5YM3H6XWTQ4LIE3APIW \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 261871bc7dc3367f5ed38716826c0f459c73573a005c87b75c51d4dcf1edc70c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "554a6c4040adfff956401c0dcf839c06f1adf4c031130927d473224b6450fda5",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-04-16T21:45:32Z",
"title_canon_sha256": "a6a3d8cbe619dc8a2acd102e7ff2545163a89ac48f1be9b07f337af429f6db69"
},
"schema_version": "1.0",
"source": {
"id": "2504.13958",
"kind": "arxiv",
"version": 1
}
}