pith. sign in
Pith Number

pith:DGUU7JQN

pith:2025:DGUU7JQN3PALDNWOJKNRATEKIL
not attested not anchored not stored refs resolved

Reinforcement Learning for Self-Improving Agent with Skill Library

Jiongxiao Wang, Lin Lee Cheong, Megha Gandhi, Panpan Xu, Qiaojing Yan, Soumya Smruti Mishra, Yawei Wang, Yijun Tian, Zhichao Xu

A reinforcement learning method lets LLM agents accumulate skills across task chains to improve accuracy and efficiency without retraining.

arxiv:2512.17102 v2 · 2025-12-18 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DGUU7JQN3PALDNWOJKNRATEKIL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency.

C2weakest assumption

That skills generated and stored during sequential rollouts remain accurate and relevant when reused on later tasks without introducing compounding errors or requiring expensive validation.

C3one line summary

SAGE combines sequential rollouts across task chains with skill-integrated rewards inside a GRPO RL loop so agents accumulate and reuse skills, yielding 8.9% higher goal completion, 26% fewer steps, and 59% fewer tokens on AppWorld.

References

3 extracted · 3 resolved · 1 Pith anchors

[1] Rossi, Handong Zhao, Ruiyi Zhang, Puneet Mathur, Nedim Lipka, Yu Wang, Trung Bui, Franck Dernoncourt, and Tianyi Zhou 2025
[2] RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning 2024 · arXiv:2504.20073
[3] as our retrieval model and keep the top 5 retrieved skills for usage. This model differs from the general text-embedding model used for Query Embedding because it is specifically trained for document

Formal links

2 machine-checked theorem links

Cited by

25 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.223167Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

19a94fa60ddbc0b1b6ce4a9b104c8a42e0728b0105de5d127e3695ba2b910e45

Aliases

arxiv: 2512.17102 · arxiv_version: 2512.17102v2 · doi: 10.48550/arxiv.2512.17102 · pith_short_12: DGUU7JQN3PAL · pith_short_16: DGUU7JQN3PALDNWO · pith_short_8: DGUU7JQN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DGUU7JQN3PALDNWOJKNRATEKIL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 19a94fa60ddbc0b1b6ce4a9b104c8a42e0728b0105de5d127e3695ba2b910e45
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ed42ae9780a8415ab89f8a1815bd287f58e88011fc99816db90d034fb5cc9a89",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-12-18T21:58:19Z",
    "title_canon_sha256": "035f1e969de9e82bc99e6bd287691bd16c98e1658801ee9673817d4dac7f2104"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2512.17102",
    "kind": "arxiv",
    "version": 2
  }
}