Pith Number

pith:DGUU7JQN

pith:2025:DGUU7JQN3PALDNWOJKNRATEKIL

not attested not anchored not stored refs resolved

Reinforcement Learning for Self-Improving Agent with Skill Library

Jiongxiao Wang, Lin Lee Cheong, Megha Gandhi, Panpan Xu, Qiaojing Yan, Soumya Smruti Mishra, Yawei Wang, Yijun Tian, Zhichao Xu

A reinforcement learning method lets LLM agents accumulate skills across task chains to improve accuracy and efficiency without retraining.

arxiv:2512.17102 v2 · 2025-12-18 · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{DGUU7JQN3PALDNWOJKNRATEKIL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency.

C2weakest assumption

That skills generated and stored during sequential rollouts remain accurate and relevant when reused on later tasks without introducing compounding errors or requiring expensive validation.

C3one line summary

SAGE combines sequential rollouts across task chains with skill-integrated rewards inside a GRPO RL loop so agents accumulate and reuse skills, yielding 8.9% higher goal completion, 26% fewer steps, and 59% fewer tokens on AppWorld.

References

3 extracted · 3 resolved · 1 Pith anchors

[1] Rossi, Handong Zhao, Ruiyi Zhang, Puneet Mathur, Nedim Lipka, Yu Wang, Trung Bui, Franck Dernoncourt, and Tianyi Zhou 2025

[2] RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning 2024 · arXiv:2504.20073

[3] as our retrieval model and keep the top 5 retrieved skills for usage. This model differs from the general text-embedding model used for Query Embedding because it is specifically trained for document

Formal links

2 machine-checked theorem links

Cited by

25 papers in Pith

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory

Receipt and verification

First computed	2026-05-17T23:38:13.223167Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

19a94fa60ddbc0b1b6ce4a9b104c8a42e0728b0105de5d127e3695ba2b910e45

Aliases

arxiv: 2512.17102 · arxiv_version: 2512.17102v2 · doi: 10.48550/arxiv.2512.17102 · pith_short_12: DGUU7JQN3PAL · pith_short_16: DGUU7JQN3PALDNWO · pith_short_8: DGUU7JQN

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/DGUU7JQN3PALDNWOJKNRATEKIL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 19a94fa60ddbc0b1b6ce4a9b104c8a42e0728b0105de5d127e3695ba2b910e45

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ed42ae9780a8415ab89f8a1815bd287f58e88011fc99816db90d034fb5cc9a89",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-12-18T21:58:19Z",
    "title_canon_sha256": "035f1e969de9e82bc99e6bd287691bd16c98e1658801ee9673817d4dac7f2104"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2512.17102",
    "kind": "arxiv",
    "version": 2
  }
}