pith:RYVQEAUD
Agentic Reinforced Policy Optimization
ARPO improves LLM agent performance on long-horizon tasks by sampling more at high-entropy steps right after each tool call.
arxiv:2507.19849 v1 · 2025-07-26 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{RYVQEAUDGW65ZGD55QAOJNHSIW}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
ARPO achieves improved performance using only half of the tool-use budget required by existing methods, offering a scalable solution for aligning LLM-based agents with real-time dynamic environments.
The preliminary observation that LLMs exhibit highly uncertain behavior (increased entropy) immediately following tool interactions is general enough to guide adaptive sampling across tasks and that this mechanism reliably improves long-horizon performance.
ARPO adds entropy-based adaptive rollouts and stepwise advantage attribution to RL for LLM agents, outperforming prior trajectory-level methods on 13 benchmarks with half the tool budget.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:15.333245Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
8e2b02028335bddc987dec00e4b4f2459a73c968a690cb2f56fc3280e364f4d7
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/RYVQEAUDGW65ZGD55QAOJNHSIW \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8e2b02028335bddc987dec00e4b4f2459a73c968a690cb2f56fc3280e364f4d7
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "2d063dcb52d9088260070a91f280b9064b4539cd1d082dfcb0de4de283df80a3",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-07-26T07:53:11Z",
"title_canon_sha256": "c6efe2ebcc3ed7ebb55512d4066b4de04d544066275cf02ab776cb1a95f4a0df"
},
"schema_version": "1.0",
"source": {
"id": "2507.19849",
"kind": "arxiv",
"version": 1
}
}