Pith Number

pith:EYMHDPD5

pith:2025:EYMHDPD5YM3H6XWTQ4LIE3APIW

not attested not anchored not stored refs resolved

ToolRL: Reward is All Tool Learning Needs

Cheng Qian, Dilek Hakkani-T\"ur, Emre Can Acikgoz, Gokhan Tur, Heng Ji, Hongru Wang, Qi He, Xiusi Chen

A principled reward design for tool-use tasks lets reinforcement learning outperform supervised fine-tuning in training LLMs to use tools.

arxiv:2504.13958 v1 · 2025-04-16 · cs.LG · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{EYMHDPD5YM3H6XWTQ4LIE3APIW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Empirical evaluations across diverse benchmarks demonstrate that our approach yields robust, scalable, and stable training, achieving a 17% improvement over base models and a 15% gain over SFT models.

C2weakest assumption

The explored reward strategies and the proposed principled design are assumed to transfer to tool-use scenarios outside the specific benchmarks and tool sets used in the experiments.

C3one line summary

A principled reward design for tool selection and application in RL-trained LLMs delivers 17% gains over base models and 15% over SFT across benchmarks.

References

46 extracted · 46 resolved · 19 Pith anchors

[1] Can a single model master both multi-turn conversations and tool use? coalm: A uni- fied conversational agentic language model. Preprint, arXiv:2502.08820. Jinheon Baek, Sujay Kumar Jauhar, Silviu Cuc

[2] Researchagent: Iterative research idea generation over scientific literature with large language models,

[3] Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks · arXiv:2211.12588

[4] In Findings of the Association for Compu- tational Linguistics: ACL 2024 , pages 9354–9366, Bangkok, Thailand 2024

[5] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training · arXiv:2501.17161

Formal links

2 machine-checked theorem links

Cited by

39 papers in Pith

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents

Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions

Receipt and verification

First computed	2026-05-18T03:22:05.942883Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

261871bc7dc3367f5ed38716826c0f459c73573a005c87b75c51d4dcf1edc70c

Aliases

arxiv: 2504.13958 · arxiv_version: 2504.13958v1 · doi: 10.48550/arxiv.2504.13958 · pith_short_12: EYMHDPD5YM3H · pith_short_16: EYMHDPD5YM3H6XWT · pith_short_8: EYMHDPD5

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/EYMHDPD5YM3H6XWTQ4LIE3APIW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 261871bc7dc3367f5ed38716826c0f459c73573a005c87b75c51d4dcf1edc70c

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "554a6c4040adfff956401c0dcf839c06f1adf4c031130927d473224b6450fda5",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-04-16T21:45:32Z",
    "title_canon_sha256": "a6a3d8cbe619dc8a2acd102e7ff2545163a89ac48f1be9b07f337af429f6db69"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.13958",
    "kind": "arxiv",
    "version": 1
  }
}