{"paper":{"title":"Reinforcement Learning for Self-Improving Agent with Skill Library","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A reinforcement learning method lets LLM agents accumulate skills across task chains to improve accuracy and efficiency without retraining.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Jiongxiao Wang, Lin Lee Cheong, Megha Gandhi, Panpan Xu, Qiaojing Yan, Soumya Smruti Mishra, Yawei Wang, Yijun Tian, Zhichao Xu","submitted_at":"2025-12-18T21:58:19Z","abstract_excerpt":"Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions but struggle to continuously improve and adapt when deployed in new environments. One promising approach is implementing skill libraries that allow agents to learn, validate, and apply new skills. However, current skill library approaches rely primarily on LLM prompting, making consistent skill library implementation challenging. To overcome these challenges, we propose a Reinforcement Learning (RL)-based approach to enhance agents' self-improvement capabilities wi"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That skills generated and stored during sequential rollouts remain accurate and relevant when reused on later tasks without introducing compounding errors or requiring expensive validation.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"SAGE combines sequential rollouts across task chains with skill-integrated rewards inside a GRPO RL loop so agents accumulate and reuse skills, yielding 8.9% higher goal completion, 26% fewer steps, and 59% fewer tokens on AppWorld.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A reinforcement learning method lets LLM agents accumulate skills across task chains to improve accuracy and efficiency without retraining.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3ab7283ffb05bd966f255f5caa1258261018e5a2364eb02b2b44e120dd5f2c81"},"source":{"id":"2512.17102","kind":"arxiv","version":2},"verdict":{"id":"65bdc090-d099-4cff-b0d6-d1e7528b712a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T20:01:16.035981Z","strongest_claim":"Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency.","one_line_summary":"SAGE combines sequential rollouts across task chains with skill-integrated rewards inside a GRPO RL loop so agents accumulate and reuse skills, yielding 8.9% higher goal completion, 26% fewer steps, and 59% fewer tokens on AppWorld.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That skills generated and stored during sequential rollouts remain accurate and relevant when reused on later tasks without introducing compounding errors or requiring expensive validation.","pith_extraction_headline":"A reinforcement learning method lets LLM agents accumulate skills across task chains to improve accuracy and efficiency without retraining."},"references":{"count":3,"sample":[{"doi":"","year":2025,"title":"Rossi, Handong Zhao, Ruiyi Zhang, Puneet Mathur, Nedim Lipka, Yu Wang, Trung Bui, Franck Dernoncourt, and Tianyi Zhou","work_id":"7798ac7c-cce5-40ab-9024-2635dc2ad381","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning","work_id":"b96383ee-f8dc-471f-aba4-bc5ce9b0b632","ref_index":2,"cited_arxiv_id":"2504.20073","is_internal_anchor":true},{"doi":"","year":null,"title":"as our retrieval model and keep the top 5 retrieved skills for usage. This model differs from the general text-embedding model used for Query Embedding because it is specifically trained for document ","work_id":"6ff374be-4287-4810-bb05-6f7cc80bedf9","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":3,"snapshot_sha256":"0e3ef61ad3c8fba2a94465bfff36b9ef3497b026a395225d9c107da4df8d07fd","internal_anchors":1},"formal_canon":{"evidence_count":2,"snapshot_sha256":"cc5380a8f62bd25a826d72eca507be8051a03f7a9db5ea22cc47da551e938262"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}