{"total":18,"items":[{"citing_arxiv_id":"2605.23899","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills","primary_cat":"cs.AI","submitted_at":"2026-05-22T17:59:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A systematic study across five domains finds model-generated skills yield average gains but non-uniform negative transfer, with a meta-skill improving extraction quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18693","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents","primary_cat":"cs.AI","submitted_at":"2026-05-18T17:28:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SkillGenBench is a benchmark for evaluating LLM skill generation pipelines in task-conditioned and task-agnostic regimes from repository and document sources using execution-based checks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18401","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution","primary_cat":"cs.CL","submitted_at":"2026-05-18T13:44:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"A profiled skill library is searched before execution to expose task-relevant skills; after execution, trajectories and outcome signals are decomposed into skill-linked subtasks so reusable successful explorations can edit existing skills or create new ones. 2 profiling, recommendation, evaluation, and evolution to be treated as coupled processes [22, 77]. Against this background,SkillsVoteconstructs and profiles a million-scale open-source Agent Skill corpus and governs how skillsvoteinto the agent context before execution and how attributed evidencevotesinto the skill library after execution. This paper introducesSkillsVote, a lifecycle framework for Agent Skills. Before execution,SkillsVote"},{"citing_arxiv_id":"2605.11169","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents","primary_cat":"cs.AI","submitted_at":"2026-05-11T19:28:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10500","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillEvolver: Skill Learning as a Meta-Skill","primary_cat":"cs.AI","submitted_at":"2026-05-11T12:58:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A meta-skill authors and refines prose-and-code skills for agents by learning from post-deployment failures with an overfit audit, achieving 56.8% accuracy on SkillsBench tasks versus 43.6% for human-curated skills.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10114","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution","primary_cat":"cs.CL","submitted_at":"2026-05-11T07:31:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"RAE extends the retrieval-augmentation paradigm from knowledge-grounded generation to agent execution, where external procedural artifacts, such as skills, tools, APIs, and so on, are retrieved to execute a specific task. Existing RAE works can be grouped into three categories: 1) Skill routing methods focus on selecting, structuring, or composing reusable skills from large skill ecosystems [10, 23, 9]. 2) Repository-aware retrieval methods ex- ploit code, dataflow, document, or graph structure to improve access to execution-relevant evidence beyond flat vector search [11-13]. 3) Execution planning methods further connect retrieval with explicit execution planning or orchestration by assembling multi-step tool-use trajectories, model calls, or skill pipelines [19, 43, 20, 9]."},{"citing_arxiv_id":"2605.09359","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Skill-R1: Agent Skill Evolution via Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2026-05-10T06:19:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Skill-R1 applies bi-level group-relative policy optimization to evolve skills recurrently from verified outcomes, yielding gains over baselines on multi-step tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09038","ref_index":16,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks","primary_cat":"cs.AI","submitted_at":"2026-05-09T16:23:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SearchSkill improves exact match scores and retrieval efficiency on open-domain QA by conditioning LLM actions on skills from an evolving SkillBank updated from failure patterns via two-stage SFT.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"first-class interface for packaging instructions, code, and resources that an agent can load on demand. Follow-up analyses show that skills rapidly became a practical mechanism for extending model functionality, while also raising ecosystem-level questions about organization and safe reuse [ 18, 19]. AgentSkillOS studies selection and benchmarking over large skill ecosystems [ 16], while Reinforcement Learning for Self-Improving Agent with Skill Library, MemSkill, and SkillRL explore how agents can maintain and evolve skill libraries or skill banks over training [30, 36, 33]. However, applying this intuition to search tool use is still non-trivial.Challenge 2: current skill frameworks do not directly tell us how skills should be represented, invoked, and updated inside search"},{"citing_arxiv_id":"2605.08526","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck","primary_cat":"cs.LG","submitted_at":"2026-05-08T22:17:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"All trainable components are confined to the posterior encoder qθ, the prior rϕ, and the projection mapg ω, while the frozen task modelπ tsk is never updated. The final multimodal skill is a concrete realization of Equation (2): after the text stage selects c∗ from Equation (8), we instantiate the multimodal skill as S∗ = (c ∗,z ∗),c ∗ ∼π sc(· |Π c(X,L c)),z ∗ ∼q θ(· |M,c ∗), (15) which is the realized form of pψ(S|X , M) under the two-stage CMIB construction in Sec- tion 3.1. In this way, the text card c∗ supplies an interpretable and retrievable procedural interface, while the latent z∗ injects complementary multimodal evidence that cannot be faithfully compressed into text alone. The task model thus consumes the learned skill"},{"citing_arxiv_id":"2605.07358","ref_index":61,"ref_count":4,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications","primary_cat":"cs.IR","submitted_at":"2026-05-08T07:10:26+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"TextGrad [37], FINCON [38], M+ [39], Learned Memory Bank [40], Nemori [41], Intrinsic Memory [42], SkillForge [43] Code-Backed V oyager [12], SkillCraft [44], PolySkill [45], ASI [46], CUA-Skill [47], MetaGPT [6], Eureka [48], DS-Agent [49], LDB [50], CodeAct [51], SWE-agent [52], ToolCoder [53], PSN [54] Hybrid-BasedJARVIS-1 [55], Synapse [56], SkillWeaver [57], AgentSkillOS [58], TPTU [59], talker-reasoner [60], DAMCS [61], GraphSkill [62], Alita [63] Skill Acquisition (§IV) Human-DerivedSkillNet [64], AgentSkillOS [58], Agentic Skills [65], SkillOS [66], Agent Hospital [67] Experience-Derived V oyager [12], SkillCraft [44], Reflexion [19], ExpeL [23], BoT [24], Trace2Skill [27], EverMemOS [68], HyperMem [69], AWM [26], Synapse [56], PolySkill [45], GITM [31], Retroformer [33], MemGPT [34], Eureka [48], TiM [35], M+ [39],"},{"citing_arxiv_id":"2605.06978","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries","primary_cat":"cs.CL","submitted_at":"2026-05-07T21:51:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06130","ref_index":64,"ref_count":3,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning","primary_cat":"cs.AI","submitted_at":"2026-05-07T12:33:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05726","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents","primary_cat":"cs.AI","submitted_at":"2026-05-07T06:18:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27660","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Context to Skills: Can Language Models Learn from Context Skillfully?","primary_cat":"cs.AI","submitted_at":"2026-04-30T09:53:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Ctx2Skill lets language models autonomously evolve context-specific skills via multi-agent self-play, improving performance on context learning tasks without human supervision.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"2024. SWE-bench: Can language models resolve real-world github issues? InThe Twelfth International Conference on Learning Representations. [18] Hao Li, Chunjiang Mu, Jianhao Chen, Siyue Ren, Zhiyao Cui, Yiqun Zhang, Lei Bai, and Shuyue Hu. 2026. Organizing, orchestrating, and benchmarking agent skills at ecosystem scale. Preprint, arXiv:2603.02176. [19] Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, Shuyi Wang, Binxu Li, Qunhong Zeng, Di Wang, Xuandong Zhao, Yuanli Wang, Roey Ben Chaim, Zonglin Di, Yipeng Gao, Junwei He, Yizhuo He, Liqiang Jing, Luyang Kong, Xin Lan, Jiachen Li, Songlin Li, Yijiang Li, Yueqian Lin, Xinyi"},{"citing_arxiv_id":"2604.25727","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Scalable Terminal Task Synthesis via Skill Graphs","primary_cat":"cs.AI","submitted_at":"2026-04-28T14:53:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SkillSynth uses a scenario-mediated skill graph to sample workflow paths and generate executable terminal tasks, enabling controlled diversity in training trajectories for agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15709","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bilevel Optimization of Agent Skills via Monte Carlo Tree Search","primary_cat":"cs.AI","submitted_at":"2026-04-17T05:31:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Bilevel optimization with outer-loop MCTS for skill structure and inner-loop LLM refinement improves agent accuracy on an operations-research question-answering dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15415","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?","primary_cat":"cs.CR","submitted_at":"2026-04-16T17:31:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"skills can be installed with a single command and deployed on users' own devices or private infrastructure, the barrier to harm is substantially lowered [50, 51, 18]. Existing research on agent skill security mainly focuses on whether the skill itself contains vulnerabilities, such as embedded prompt injections, data exfiltration payloads, or malware [35, 38, 40]. We refer to this kind of skills asma- licious skills. As illustrated in Figure 4, prior research pri- marily considers a threat model where attackers develop ma- licious skills designed to compromise the user during skill execution, such as stealing confidential data. In contrast, our work addresses a complementary and overlooked prob- lem: skills whose intended functionality itself violates usage"},{"citing_arxiv_id":"2604.08224","ref_index":77,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering","primary_cat":"cs.SE","submitted_at":"2026-04-09T13:19:41+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}