pith. sign in

arxiv: 2605.07358 · v2 · pith:TLGMMJQ2new · submitted 2026-05-08 · 💻 cs.IR

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

Pith reviewed 2026-05-20 23:12 UTC · model grok-4.3

classification 💻 cs.IR
keywords LLM-based agentsagent skillsskill lifecyclereusable procedurestool coordinationagent taxonomyscalable agent systemsLLM applications
0
0 comments X

The pith

Agent skills serve as reusable procedural artifacts that let LLM agents execute tasks reliably without repeated low-level reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey argues that LLM-based agents benefit from skills defined as reusable procedures coordinating tools, memory, and context. Agents manage high-level planning while skills provide the operational layer for composable and maintainable execution. The authors organize existing research into four stages of the skill lifecycle: representation, acquisition, retrieval, and evolution. By reviewing methods across these stages, the paper shows how skills address inefficiency and error in open-ended agent deployments. It also points to challenges in quality control and long-term management of these skills.

Core claim

The paper establishes that skills, as reusable procedural artifacts coordinating tools, memory, and runtime context, form the key operational layer complementing agents' high-level reasoning, and organizes the literature around the four stages of representation, acquisition, retrieval, and evolution to advance scalability in LLM agent systems.

What carries the argument

The four-stage agent skill lifecycle consisting of representation, acquisition, retrieval, and evolution, which structures the review of techniques for creating and maintaining reusable skills.

If this is right

  • Skills enable reliable execution across similar tasks by reusing proven procedures.
  • Systems become more scalable as new tasks leverage existing skill libraries rather than building from scratch.
  • Maintainability improves through structured updates and evolution of skills over time.
  • Interoperability between different agent frameworks increases with standardized skill representations.
  • Applications in complex workflows gain robustness from composable skill combinations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers could build shared skill repositories that accelerate agent development across organizations.
  • The lifecycle model might extend to non-LLM agents, such as those using traditional planning algorithms.
  • Future research could explore automated verification methods for skill quality within this framework.
  • Integration with memory systems could create self-improving skill collections.

Load-bearing premise

The diverse literature on LLM-based agents fits into the proposed four stages of the skill lifecycle without forcing unnatural categorizations or leaving out important work.

What would settle it

Identification of a substantial set of agent skill techniques or papers that cannot be classified into any of the four stages: representation, acquisition, retrieval, or evolution.

Figures

Figures reproduced from arXiv: 2605.07358 by Wang Shu, Wenchuan Du, Xuemin Lin, Yaodong Su, Yingli Zhou, Yixiang Fang.

Figure 1
Figure 1. Figure 1: Historical evolution of skills, from embodied human survival and craftsmanship to engineering, industrial, digital, and [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Growth of research on agent skills from April 2023 to April 2026. The figure shows the cumulative number of [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The taxonomy for agent skills in this survey. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustrative Examples of Agent Skills. in this ecosystem. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of skill acquisition methods. which a skill is obtained: ❶ human-derived acquisition, ❷ experience-derived acquisition, ❸ task-derived acquisition, and ❹ corpus-derived acquisition. Human-derived acquisition obtains skills directly from expert knowledge and manual curation. Experience-derived acquisition builds them from trajectories, exemplars, or past executions. Task-derived ac￾quisition constr… view at source ↗
Figure 6
Figure 6. Figure 6: The trend of cumulated number of human-derived skills over time. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Skill retrieval and selection. applying dense retrieval to experiential lessons or structured reasoning memories rather than fully packaged executable skills. This makes dense retrieval the natural entry point when task formulations vary widely but the system still needs to reach reusable skills through a shared semantic layer. The same flexibility also explains why dense retrieval is rarely the whole stor… view at source ↗
Figure 8
Figure 8. Figure 8: From human skill refinement to agent skill evolution. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Skill evolution through staged refinement: updates revise skills, validation filters changes, and trusted skills are indexed, [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Application scenarios of agent skills. latency, and execution cost, including dynamic model routing and workload-aware scheduling [6], [18]. Skill Library Evolution under Non-Stationarity. APIs deprecate, tool behavior shifts, and task distributions change over time [10], [22]. Skill libraries need lifecycle-level robust￾ness: drift detection, compatibility checks, safe online updates, and versioned rollb… view at source ↗
read the original abstract

Large language model (LLM)-based agents that reason, plan, and act through tools, memory, and structured interaction are emerging as a promising paradigm for automating complex workflows. Recent systems such as OpenClaw and Claude Code exemplify a broader shift from passive response generation to action-oriented task execution. Yet as agents move toward open-ended, real-world deployment, relying on from-scratch reasoning and low-level tool calls for every task become increasingly inefficient, error-prone, and hard to maintain. This survey examines this challenge through the lens of \emph{agent skills}, which we define as reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints. Under this view, agents and skills play complementary roles: agents handle high-level reasoning and planning, while skills form the operational layer that enables reliable, reusable, and composable execution. Skills are therefore central to the scalability, robustness, and maintainability of modern agent systems. We organize the literature around four stages of the agent skill lifecycle -- representation, acquisition, retrieval, and evolution -- and review representative methods, ecosystem resources, and application settings across each stage. We conclude by discussing open challenges in quality control, interoperability, safe updating, and long-term capability management. All related resources, including research papers, open-source data, and projects, are collected for the community in \textcolor{blue}{https://github.com/JayLZhou/Awesome-Agent-Skills}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper surveys LLM-based agent skills, defining them as reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints. It argues that agents and skills play complementary roles, with skills forming the operational layer for reliable, reusable, and composable execution, thereby central to scalability, robustness, and maintainability. The literature is organized around four stages of the agent skill lifecycle—representation, acquisition, retrieval, and evolution—with reviews of representative methods, ecosystem resources, and applications in each stage. Open challenges in quality control, interoperability, safe updating, and long-term capability management are discussed, and all resources are collected in a GitHub repository.

Significance. If the four-stage taxonomy provides a non-forced and reasonably complete partition of the literature, the survey would offer a useful organizing framework for researchers building scalable agent systems. The explicit collection of papers, data, and projects in the linked GitHub repository is a concrete strength that enhances reproducibility and community utility beyond the textual review.

major comments (1)
  1. [Abstract and lifecycle organization section] The central organizational claim—that the existing literature partitions cleanly into the four stages of representation, acquisition, retrieval, and evolution without major omissions or forced categorizations—is load-bearing for the survey's practical value (see Abstract and the opening of the lifecycle section). The manuscript does not supply explicit selection criteria, coverage statistics, or a dedicated discussion of cross-stage methods (e.g., online skill refinement that interleaves acquisition and retrieval), leaving open the risk that the taxonomy imposes artificial boundaries as noted in the stress-test concern.
minor comments (2)
  1. [Review sections for each lifecycle stage] A summary table or figure listing representative methods per stage with key references would improve readability and allow readers to quickly assess coverage.
  2. [Conclusion] The GitHub repository link is mentioned but could be accompanied by a brief description of its structure and update policy in the main text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential utility of the four-stage taxonomy and the GitHub repository. We address the major comment below and will revise the manuscript accordingly to strengthen the presentation of the taxonomy.

read point-by-point responses
  1. Referee: [Abstract and lifecycle organization section] The central organizational claim—that the existing literature partitions cleanly into the four stages of representation, acquisition, retrieval, and evolution without major omissions or forced categorizations—is load-bearing for the survey's practical value (see Abstract and the opening of the lifecycle section). The manuscript does not supply explicit selection criteria, coverage statistics, or a dedicated discussion of cross-stage methods (e.g., online skill refinement that interleaves acquisition and retrieval), leaving open the risk that the taxonomy imposes artificial boundaries as noted in the stress-test concern.

    Authors: We agree that the manuscript would benefit from greater transparency regarding how the taxonomy was constructed. The four stages reflect a natural lifecycle progression observed across the surveyed literature, rather than an imposed partition, but we acknowledge that explicit documentation of selection criteria and coverage would help readers evaluate completeness and potential boundary issues. In the revised version, we will add a dedicated subsection (likely in the introduction or at the start of the lifecycle organization section) that outlines the literature search methodology, inclusion criteria, time frame, and approximate coverage statistics (e.g., number of papers reviewed per stage). We will also include a new discussion paragraph or subsection addressing cross-stage methods, with concrete examples such as online skill refinement that interleaves acquisition and retrieval, and how such hybrid approaches are handled or noted within the taxonomy. This addition will explicitly discuss overlaps and mitigate concerns about artificial boundaries. revision: yes

Circularity Check

0 steps flagged

Survey organizes external literature without self-referential derivation

full rationale

This paper is a literature survey that defines agent skills and organizes existing external work into four lifecycle stages (representation, acquisition, retrieval, evolution) as an organizational framework. It reviews representative methods and resources from the broader literature rather than deriving new quantities, predictions, or results from fitted parameters, self-citations, or internal equations. The complementary roles of agents and skills are presented as a definitional viewpoint to motivate the survey structure, with no load-bearing steps that reduce claims to inputs by construction. No uniqueness theorems, ansatzes, or renamings of known results are invoked in a self-referential manner. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work introduces no new free parameters, mathematical axioms, or invented entities; it relies on standard background assumptions from the LLM-agent literature such as the utility of tool use and memory in agents.

pith-pipeline@v0.9.0 · 5802 in / 1234 out tokens · 57985 ms · 2026-05-20T23:12:31.722227+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

139 extracted references · 139 canonical work pages · 46 internal anchors

  1. [1]

    Language Models are Few-Shot Learners

    T. B. Brownet al., “Language models are few-shot learners,” inAdvances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available: https://arxiv.org/abs/2005.14165

  2. [2]

    GPT-4 Technical Report

    J. Achiamet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [Online]. Available: https://arxiv.org/abs/ 2303.08774

  3. [3]

    Training language models to follow instructions with human feedback

    L. Ouyanget al., “Training language models to follow instructions with human feedback,” inAdvances in Neural Information Processing Systems, vol. 35. Curran Associates, Inc., 2022, pp. 27 730–27 744. [Online]. Available: https://arxiv.org/abs/2203.02155

  4. [4]

    ReAct: Synergizing Reasoning and Acting in Language Models

    S. Yaoet al., “ReAct: Synergizing reasoning and acting in language models,” inInternational Conference on Learning Representations (ICLR), 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

  5. [5]

    HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

    Y . Shenet al., “HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2303.17580

  6. [6]

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

    S. Honget al., “MetaGPT: Meta programming for a multi- agent collaborative framework,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https: //arxiv.org/abs/2308.00352

  7. [7]

    Openclaw — the open-source personal ai assistant and autonomous agent,

    OpenClaw, “Openclaw — the open-source personal ai assistant and autonomous agent,” https://open-claw.org/, 2026, official website, ac- cessed April 21, 2026

  8. [8]

    Welcome - manus documentation,

    Manus, “Welcome - manus documentation,” https://manus.im/docs, 2026, official documentation, accessed April 21, 2026

  9. [9]

    Claude code overview,

    Anthropic, “Claude code overview,” https://docs.anthropic.com/en/ docs/claude-code/overview, 2026, official documentation, accessed April 21, 2026

  10. [10]

    Introducing the model context protocol,

    ——, “Introducing the model context protocol,” https://www.anthropic. com/news/model-context-protocol, 2024, anthropic Blog, November 2024

  11. [11]

    Function calling and other API updates,

    OpenAI, “Function calling and other API updates,” https://openai.com/ blog/function-calling-and-other-api-updates, 2023, openAI Blog, June 2023

  12. [12]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    G. Wanget al., “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023. [Online]. Available: https://arxiv.org/abs/2305.16291

  13. [13]

    Large language models as tool makers,

    T. Caiet al., “Large language models as tool makers,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://arxiv.org/abs/2305.17126

  14. [14]

    CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models,

    C. Qianet al., “CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models,” inFindings of the Association for Computational Linguistics: EMNLP 2023. Singapore: Association for Computational Linguistics, 2023, pp. 6922–6939. [Online]. Available: https://aclanthology.org/2023.findings-emnlp.462/

  15. [15]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    P. Lewiset al., “Retrieval-augmented generation for knowledge- intensive NLP tasks,” inAdvances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 9459–9474. [Online]. Available: https://arxiv.org/abs/2005.11401

  16. [16]

    Dense passage retrieval for open-domain question answering,

    V . Karpukhinet al., “Dense passage retrieval for open-domain question answering,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, 2020, pp. 6769–6781. [Online]. Available: https://aclanthology.org/2020.emnlp-main.550/

  17. [17]

    Anytool: Self-reflective, hierarchical agents for large-scale api calls.arXiv preprint arXiv:2402.04253, 2024

    Y . Duet al., “AnyTool: Self-reflective, hierarchical agents for large- scale API calls,”arXiv preprint arXiv:2402.04253, 2024. [Online]. Available: https://arxiv.org/abs/2402.04253

  18. [18]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Q. Wuet al., “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,”arXiv preprint arXiv:2308.08155, 2023. [Online]. Available: https://arxiv.org/abs/2308.08155

  19. [19]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    N. Shinnet al., “Reflexion: Language agents with verbal reinforcement learning,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2303.11366

  20. [21]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    T. Schicket al., “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

  21. [22]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    Y . Qinet al., “ToolLLM: Facilitating large language models to master 16000+ real-world APIs,”arXiv preprint arXiv:2307.16789, 2023. [Online]. Available: https://arxiv.org/abs/2307.16789

  22. [24]

    Buffer of thoughts: Thought-augmented reasoning with large language models,

    X. Yanget al., “Buffer of thoughts: Thought-augmented reasoning with large language models,”Advances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2406.04271

  23. [28]

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    M. Ahnet al., “Do as i can, not as i say: Grounding language in robotic affordances,” inConference on Robot Learning (CoRL), 2022. [Online]. Available: https://arxiv.org/abs/2204.01691

  24. [29]

    Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

    Z. Wanget al., “Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents,” 2023. [Online]. Available: https://arxiv.org/abs/2302.01560

  25. [30]

    Generative Agents: Interactive Simulacra of Human Behavior

    J. S. Parket al., “Generative agents: Interactive simulacra of human behavior,” 2023. [Online]. Available: https://arxiv.org/abs/2304.03442 JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 19

  26. [31]

    Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

    X. Zhuet al., “Ghost in the Minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory,”arXiv preprint arXiv:2305.17144, 2023. [Online]. Available: https://arxiv.org/abs/2305.17144

  27. [32]

    Reasoning with language model is planning with world model,

    S. Haoet al., “Reasoning with language model is planning with world model,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 8154–8173

  28. [33]

    Retroformer: Retrospective large language agents with policy gradient optimization.arXiv preprint arXiv:2308.02151, 2023

    W. Yaoet al., “Retroformer: Retrospective large language agents with policy gradient optimization,”arXiv preprint arXiv:2308.02151, 2023

  29. [34]

    MemGPT: Towards LLMs as Operating Systems

    C. Packeret al., “Memgpt: Towards LLMs as operating systems,” arXiv preprint arXiv:2310.08560, 2023. [Online]. Available: https: //arxiv.org/abs/2310.08560

  30. [36]

    arXiv preprint arXiv:2311.08719 , year=

    [Online]. Available: https://arxiv.org/abs/2311.08719

  31. [37]

    Self-discover: Large language models self-compose reasoning structures,

    P. Zhouet al., “Self-discover: Large language models self-compose reasoning structures,”Advances in Neural Information Processing Systems, vol. 37, pp. 126 032–126 058, 2024

  32. [38]

    Optimizing generative ai by backpropagating language model feedback,

    M. Yuksekgonulet al., “Optimizing generative ai by backpropagating language model feedback,”Nature, vol. 639, no. 8055, pp. 609–616, 2025

  33. [39]

    Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making,

    Y . Yuet al., “Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making,”arXiv preprint arXiv:2407.06567, 2024. [Online]. Available: https://arxiv.org/abs/2407.06567

  34. [40]

    M+: Extending memoryllm with scalable long-term memory,

    Y . Wanget al., “M+: Extending memoryllm with scalable long-term memory,” 2025. [Online]. Available: https://arxiv.org/abs/2502.00592

  35. [41]

    Enhancing reasoning with collaboration and memory,

    J. Michelmanet al., “Enhancing reasoning with collaboration and memory,”arXiv preprint arXiv:2503.05944, 2025

  36. [42]

    Nemori: Self-organizing agent memory inspired by cognitive science,

    a. others, “Nemori: Self-organizing agent memory inspired by cognitive science,”arXiv preprint arXiv:2502.14828, 2025. [Online]. Available: https://arxiv.org/abs/2502.14828

  37. [43]

    Intrinsic memory agents: Heterogeneous multi-agent llm systems through structured contextual memory,

    ——, “Intrinsic memory agents: Heterogeneous multi-agent llm systems through structured contextual memory,”arXiv preprint arXiv:2506.19413, 2025. [Online]. Available: https://arxiv.org/abs/ 2506.19413

  38. [44]

    Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

    Q. Miet al., “Procmem: Learning reusable procedural memory from experience via non-parametric ppo for llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2602.01869

  39. [45]

    Skillcraft: Can llm agents learn to use tools skillfully?arXiv preprint arXiv:2603.00718, 2026

    S. Chenet al., “Skillcraft: Can LLM agents learn to use tools skillfully?”arXiv preprint arXiv:2603.00718, 2026. [Online]. Available: https://arxiv.org/abs/2603.00718

  40. [46]

    Polyskill: Learning generalizable skills through polymorphic abstraction,

    a. others, “Polyskill: Learning generalizable skills through polymorphic abstraction,”International Conference on Learning Representations,

  41. [47]

    Available: https://arxiv.org/abs/2510.15863

    [Online]. Available: https://arxiv.org/abs/2510.15863

  42. [49]

    Cua-skill: Develop skills for computer using agent,

    T. Chenet al., “Cua-skill: Develop skills for computer using agent,” arXiv preprint arXiv:2601.21123, 2026

  43. [50]

    Eureka: Human-Level Reward Design via Coding Large Language Models

    Y . J. Maet al., “Eureka: Human-level reward design via coding large language models,” 2023. [Online]. Available: https://arxiv.org/ abs/2310.12931

  44. [51]

    Ds-agent: Automated data science by empowering large language models with case-based reasoning,

    X. Yueet al., “Ds-agent: Automated data science by empowering large language models with case-based reasoning,” 2024. [Online]. Available: https://arxiv.org/abs/2402.17453

  45. [52]

    Ldb: A large language model debugger via verifying runtime execution step-by-step,

    X. Zhonget al., “Debug like a human: A large language model debugger via verifying runtime execution step-by-step,” 2024. [Online]. Available: https://arxiv.org/abs/2402.16906

  46. [53]

    Executable code actions elicit better LLM agents, 2024

    X. Wanget al., “Executable code actions elicit better LLM agents,”arXiv preprint arXiv:2402.01030, 2024. [Online]. Available: https://arxiv.org/abs/2402.01030

  47. [54]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

    J. Yanget al., “Swe-agent: Agent-computer interfaces enable automated software engineering,” 2024. [Online]. Available: https: //arxiv.org/abs/2405.15793

  48. [55]

    Toolcoder: Teach code generation models to use api search tools,

    K. Zhanget al., “Toolcoder: Teach code generation models to use api search tools,” 2023. [Online]. Available: https://arxiv.org/abs/2305. 04032

  49. [56]

    Evolving programmatic skill networks.arXiv preprint arXiv:2601.03509,

    H. Shiet al., “Evolving programmatic skill networks,” 2026. [Online]. Available: https://arxiv.org/abs/2601.03509

  50. [57]

    Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models

    Z. Wanget al., “Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models,”arXiv preprint arXiv:2311.05997, 2023. [Online]. Available: https://arxiv.org/abs/ 2311.05997

  51. [59]

    Zheng, R

    [Online]. Available: https://arxiv.org/abs/2306.07863

  52. [61]

    Organizing, orchestrating, and benchmarking agent skills at ecosystem scale

    H. Liet al., “Organizing, orchestrating, and benchmarking agent skills at ecosystem scale,”arXiv preprint arXiv:2603.02176, 2026. [Online]. Available: https://arxiv.org/abs/2603.02176

  53. [62]

    4 Athar Sefid, Prasenjit Mitra, and Lee Giles

    J. Ruanet al., “Tptu: large language model-based ai agents for task planning and tool usage,”arXiv preprint arXiv:2308.03427, 2023

  54. [63]

    Agents thinking fast and slow: A talker- reasoner architecture,

    K. Christakopoulouet al., “Agents thinking fast and slow: A talker- reasoner architecture,”arXiv preprint arXiv:2410.08328, 2024

  55. [64]

    Llm-powered decentralized generative agents with adaptive hierarchical knowledge graph for cooperative planning,

    a. others, “Llm-powered decentralized generative agents with adaptive hierarchical knowledge graph for cooperative planning,” arXiv preprint arXiv:2502.05453, 2025. [Online]. Available: https: //arxiv.org/abs/2502.05453

  56. [65]

    Graphskill: Documentation-guided hierarchical retrieval-augmented coding for complex graph reasoning,

    F. Wanget al., “Graphskill: Documentation-guided hierarchical retrieval-augmented coding for complex graph reasoning,” 2026. [Online]. Available: https://arxiv.org/abs/2603.06620

  57. [66]

    Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286, 2025

    J. Qiuet al., “Alita: Generalist agent enabling scalable agentic rea- soning with minimal predefinition and maximal self-evolution,”arXiv preprint arXiv:2505.20286, 2025

  58. [67]

    Skillnet: Create, evaluate, and connect ai skills,

    Y . Lianget al., “Skillnet: Create, evaluate, and connect ai skills,”

  59. [68]

    Available: https://arxiv.org/abs/2603.04448

    [Online]. Available: https://arxiv.org/abs/2603.04448

  60. [69]

    Sok: Agentic skills – beyond tool use in llm agents,

    Y . Jianget al., “Sok: Agentic skills – beyond tool use in llm agents,”

  61. [70]

    SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    [Online]. Available: https://arxiv.org/abs/2602.20867

  62. [71]

    Skills are the new apps – now it’s time for skill os,

    L. Chenet al., “Skills are the new apps – now it’s time for skill os,” 2026, preprints.org manuscript 202602.1096.v1. [Online]. Available: https://www.preprints.org/manuscript/202602.1096/v1

  63. [72]

    Agent hospital: A simulacrum of hospital with evolvable medical agents,

    J. Liet al., “Agent hospital: A simulacrum of hospital with evolvable medical agents,”arXiv preprint arXiv:2405.02957, 2024. [Online]. Available: https://arxiv.org/abs/2405.02957

  64. [73]

    arXiv:2601.02163 [cs.AI] https: //arxiv.org/abs/2601.02163

    C. Huet al., “Evermemos: A self-organizing memory operating system for structured long-horizon reasoning,” 2026. [Online]. Available: https://arxiv.org/abs/2601.02163

  65. [74]

    HyperMem: Hypergraph Memory for Long-Term Conversations

    L. Yueet al., “Hypermem: Hypergraph memory for long-term conversations,” 2026, accepted to ACL 2026 Main. [Online]. Available: https://arxiv.org/abs/2604.08256

  66. [75]

    G- memory: Tracing hierarchical memory for multi-agent systems, 2025

    G. Zhanget al., “G-memory: Tracing hierarchical memory for multi- agent systems, 2025,”URL https://arxiv. org/abs/2506.07398

  67. [76]

    Agentevolver: Towards efficient self-evolving agent system,

    a. others, “Agentevolver: Towards efficient self-evolving agent system,”arXiv preprint arXiv:2511.10395, 2025. [Online]. Available: https://arxiv.org/abs/2511.10395

  68. [77]

    Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark,

    Y . Caiet al., “Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark,” 2025. [Online]. Available: https://arxiv.org/abs/2508.19005

  69. [78]

    Autorefine: From trajectories to reusable expertise for continual llm agent refinement.arXiv preprint arXiv:2601.22758, 2026

    L. Qiuet al., “Autorefine: From trajectories to reusable expertise for continual llm agent refinement,” 2026. [Online]. Available: https://arxiv.org/abs/2601.22758

  70. [79]

    Karlsson, Bo An, and Zongqing Lu

    W. Tanet al., “Cradle: Empowering foundation agents towards general computer control,”arXiv preprint arXiv:2403.03186, 2024. [Online]. Available: https://arxiv.org/abs/2403.03186

  71. [80]

    AppAgent: Multimodal Agents as Smartphone Users

    C. Zhanget al., “Appagent: Multimodal agents as smartphone users,”arXiv preprint arXiv:2312.13771, 2023. [Online]. Available: https://arxiv.org/abs/2312.13771

  72. [81]

    Autoguide: Automated generation and selection of context-aware guidelines for large language model agents,

    Y . Fuet al., “Autoguide: Automated generation and selection of context-aware guidelines for large language model agents,” arXiv preprint arXiv:2403.08978, 2024. [Online]. Available: https: //arxiv.org/abs/2403.08978

  73. [82]

    WebArena: A Realistic Web Environment for Building Autonomous Agents

    S. Zhouet al., “WebArena: A realistic web environment for building autonomous agents,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https: //arxiv.org/abs/2307.13854

  74. [83]

    Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

    Y . Sunet al., “Don’t retrieve, navigate: Distilling enterprise knowledge into navigable agent skills for qa and rag,”arXiv preprint arXiv:2604.14572, Apr. 2026. [Online]. Available: https: //arxiv.org/abs/2604.14572

  75. [84]

    Agentdistill: Training-free agent distillation with gener- alizable mcp boxes,

    J. Qiuet al., “Agentdistill: Training-free agent distillation with gener- alizable mcp boxes,”arXiv preprint arXiv:2506.14728, 2025

  76. [85]

    Reinforcement Learning for Self-Improving Agent with Skill Library

    J. Wanget al., “Reinforcement learning for self-improving agent with skill library,”arXiv preprint arXiv:2512.17102, 2025

  77. [86]

    Autoskill: Experience-driven lifelong learning via skill self-evolution, 2026

    Y . Yanget al., “Autoskill: Experience-driven lifelong learning via skill self-evolution,” 2026. [Online]. Available: https://arxiv.org/abs/ 2603.01145

  78. [87]

    MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

    H. Zhanget al., “Memskill: Learning and evolving memory skills for self-evolving agents,” 2026. [Online]. Available: https: //arxiv.org/abs/2602.02474

  79. [88]

    ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

    S. Ouyanget al., “Reasoningbank: Scaling agent self-evolving with reasoning memory,” 2025. [Online]. Available: https://arxiv.org/abs/ 2509.25140 JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 20

  80. [89]

    SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

    B. Zhenget al., “Skillweaver: Web agents can self-improve by discovering and honing skills,” 2025. [Online]. Available: https://arxiv.org/abs/2504.07079

Showing first 80 references.