pith:DGUU7JQN
Reinforcement Learning for Self-Improving Agent with Skill Library
A reinforcement learning method lets LLM agents accumulate skills across task chains to improve accuracy and efficiency without retraining.
arxiv:2512.17102 v2 · 2025-12-18 · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DGUU7JQN3PALDNWOJKNRATEKIL}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency.
That skills generated and stored during sequential rollouts remain accurate and relevant when reused on later tasks without introducing compounding errors or requiring expensive validation.
SAGE combines sequential rollouts across task chains with skill-integrated rewards inside a GRPO RL loop so agents accumulate and reuse skills, yielding 8.9% higher goal completion, 26% fewer steps, and 59% fewer tokens on AppWorld.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.223167Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
19a94fa60ddbc0b1b6ce4a9b104c8a42e0728b0105de5d127e3695ba2b910e45
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DGUU7JQN3PALDNWOJKNRATEKIL \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 19a94fa60ddbc0b1b6ce4a9b104c8a42e0728b0105de5d127e3695ba2b910e45
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "ed42ae9780a8415ab89f8a1815bd287f58e88011fc99816db90d034fb5cc9a89",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.AI",
"submitted_at": "2025-12-18T21:58:19Z",
"title_canon_sha256": "035f1e969de9e82bc99e6bd287691bd16c98e1658801ee9673817d4dac7f2104"
},
"schema_version": "1.0",
"source": {
"id": "2512.17102",
"kind": "arxiv",
"version": 2
}
}