{"paper":{"title":"Skill Retrieval Augmentation for Agentic AI","license":"http://creativecommons.org/licenses/by-sa/4.0/","headline":"Dynamic retrieval of skills from large external corpora can substantially improve LLM agent performance on hard tasks, though agents load skills at similar rates whether the retrieved skill is relevant or needed at all.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Changyue Wang, Jianming Long, Qingyao Ai, Weihang Su, Yichen Tang, Yiqun Liu, Yiteng Tu","submitted_at":"2026-04-27T15:19:59Z","abstract_excerpt":"As large language models (LLMs) evolve into agentic problem solvers, they increasingly rely on external, reusable skills to handle tasks beyond their native parametric capabilities. In existing agent systems, the dominant strategy for incorporating skills is to explicitly enumerate available skills within the context window. However, this strategy fails to scale: as skill corpora expand, context budgets are consumed rapidly, and the agent becomes markedly less accurate in identifying the right skill. To this end, this paper formulates Skill Retrieval Augmentation (SRA), a new paradigm in which"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"retrieval-based skill augmentation can substantially improve agent performance, validating the promise of the paradigm. At the same time, we uncover a fundamental gap in skill incorporation: current LLM agents tend to load skills at similar rates, regardless of whether a gold skill is retrieved or whether the task actually requires external capabilities.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the manually constructed gold skills and web-collected distractors in SRA-Bench form a realistic and representative test of real-world agent skill use, and that the observed loading rates generalize beyond the specific models and tasks tested.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Dynamic retrieval of skills from large external corpora can substantially improve LLM agent performance on hard tasks, though agents load skills at similar rates whether the retrieved skill is relevant or needed at all.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"bb93738ed5a4b85e4b9eda923600e89490ef114e78216366605f8738e4202c87"},"source":{"id":"2604.24594","kind":"arxiv","version":2},"verdict":{"id":"73f0a2e1-6781-47d3-ac0e-44473c45cb35","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-08T03:39:22.020790Z","strongest_claim":"retrieval-based skill augmentation can substantially improve agent performance, validating the promise of the paradigm. At the same time, we uncover a fundamental gap in skill incorporation: current LLM agents tend to load skills at similar rates, regardless of whether a gold skill is retrieved or whether the task actually requires external capabilities.","one_line_summary":"Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the manually constructed gold skills and web-collected distractors in SRA-Bench form a realistic and representative test of real-world agent skill use, and that the observed loading rates generalize beyond the specific models and tasks tested.","pith_extraction_headline":"Dynamic retrieval of skills from large external corpora can substantially improve LLM agent performance on hard tasks, though agents load skills at similar rates whether the retrieved skill is relevant or needed at all."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.24594/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-21T06:37:53.839265Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T21:54:32.072214Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"0bf823a9ba1a36d9fd8222d9167b081a591d15dc209bfea930d9d8d61890947b"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}