A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

Wang Shu; Wenchuan Du; Xuemin Lin; Yaodong Su; Yingli Zhou; Yixiang Fang

arxiv: 2605.07358 · v2 · pith:TLGMMJQ2new · submitted 2026-05-08 · 💻 cs.IR

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

Yingli Zhou , Wang Shu , Yaodong Su , Wenchuan Du , Yixiang Fang , Xuemin Lin This is my paper

Pith reviewed 2026-05-20 23:12 UTC · model grok-4.3

classification 💻 cs.IR

keywords LLM-based agentsagent skillsskill lifecyclereusable procedurestool coordinationagent taxonomyscalable agent systemsLLM applications

0 comments

The pith

Agent skills serve as reusable procedural artifacts that let LLM agents execute tasks reliably without repeated low-level reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey argues that LLM-based agents benefit from skills defined as reusable procedures coordinating tools, memory, and context. Agents manage high-level planning while skills provide the operational layer for composable and maintainable execution. The authors organize existing research into four stages of the skill lifecycle: representation, acquisition, retrieval, and evolution. By reviewing methods across these stages, the paper shows how skills address inefficiency and error in open-ended agent deployments. It also points to challenges in quality control and long-term management of these skills.

Core claim

The paper establishes that skills, as reusable procedural artifacts coordinating tools, memory, and runtime context, form the key operational layer complementing agents' high-level reasoning, and organizes the literature around the four stages of representation, acquisition, retrieval, and evolution to advance scalability in LLM agent systems.

What carries the argument

The four-stage agent skill lifecycle consisting of representation, acquisition, retrieval, and evolution, which structures the review of techniques for creating and maintaining reusable skills.

If this is right

Skills enable reliable execution across similar tasks by reusing proven procedures.
Systems become more scalable as new tasks leverage existing skill libraries rather than building from scratch.
Maintainability improves through structured updates and evolution of skills over time.
Interoperability between different agent frameworks increases with standardized skill representations.
Applications in complex workflows gain robustness from composable skill combinations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers could build shared skill repositories that accelerate agent development across organizations.
The lifecycle model might extend to non-LLM agents, such as those using traditional planning algorithms.
Future research could explore automated verification methods for skill quality within this framework.
Integration with memory systems could create self-improving skill collections.

Load-bearing premise

The diverse literature on LLM-based agents fits into the proposed four stages of the skill lifecycle without forcing unnatural categorizations or leaving out important work.

What would settle it

Identification of a substantial set of agent skill techniques or papers that cannot be classified into any of the four stages: representation, acquisition, retrieval, or evolution.

Figures

Figures reproduced from arXiv: 2605.07358 by Wang Shu, Wenchuan Du, Xuemin Lin, Yaodong Su, Yingli Zhou, Yixiang Fang.

**Figure 1.** Figure 1: Historical evolution of skills, from embodied human survival and craftsmanship to engineering, industrial, digital, and [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Growth of research on agent skills from April 2023 to April 2026. The figure shows the cumulative number of [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The taxonomy for agent skills in this survey. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Illustrative Examples of Agent Skills. in this ecosystem. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Overview of skill acquisition methods. which a skill is obtained: ❶ human-derived acquisition, ❷ experience-derived acquisition, ❸ task-derived acquisition, and ❹ corpus-derived acquisition. Human-derived acquisition obtains skills directly from expert knowledge and manual curation. Experience-derived acquisition builds them from trajectories, exemplars, or past executions. Task-derived acquisition constr… view at source ↗

**Figure 6.** Figure 6: The trend of cumulated number of human-derived skills over time. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Skill retrieval and selection. applying dense retrieval to experiential lessons or structured reasoning memories rather than fully packaged executable skills. This makes dense retrieval the natural entry point when task formulations vary widely but the system still needs to reach reusable skills through a shared semantic layer. The same flexibility also explains why dense retrieval is rarely the whole stor… view at source ↗

**Figure 8.** Figure 8: From human skill refinement to agent skill evolution. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Skill evolution through staged refinement: updates revise skills, validation filters changes, and trusted skills are indexed, [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Application scenarios of agent skills. latency, and execution cost, including dynamic model routing and workload-aware scheduling [6], [18]. Skill Library Evolution under Non-Stationarity. APIs deprecate, tool behavior shifts, and task distributions change over time [10], [22]. Skill libraries need lifecycle-level robustness: drift detection, compatibility checks, safe online updates, and versioned rollb… view at source ↗

read the original abstract

Large language model (LLM)-based agents that reason, plan, and act through tools, memory, and structured interaction are emerging as a promising paradigm for automating complex workflows. Recent systems such as OpenClaw and Claude Code exemplify a broader shift from passive response generation to action-oriented task execution. Yet as agents move toward open-ended, real-world deployment, relying on from-scratch reasoning and low-level tool calls for every task become increasingly inefficient, error-prone, and hard to maintain. This survey examines this challenge through the lens of \emph{agent skills}, which we define as reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints. Under this view, agents and skills play complementary roles: agents handle high-level reasoning and planning, while skills form the operational layer that enables reliable, reusable, and composable execution. Skills are therefore central to the scalability, robustness, and maintainability of modern agent systems. We organize the literature around four stages of the agent skill lifecycle -- representation, acquisition, retrieval, and evolution -- and review representative methods, ecosystem resources, and application settings across each stage. We conclude by discussing open challenges in quality control, interoperability, safe updating, and long-term capability management. All related resources, including research papers, open-source data, and projects, are collected for the community in \textcolor{blue}{https://github.com/JayLZhou/Awesome-Agent-Skills}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

This survey frames skills as reusable procedural artifacts for LLM agents and sorts the literature into a four-stage lifecycle, which gives a workable map but risks artificial boundaries on overlapping methods. The main new angle is treating skills as the operational layer that handles tools, memory, and context so agents can focus on planning without reinventing execution each time. They review methods across representation, acquisition, retrieval, and evolution, and they back it with a GitHub repo that gathers papers, datasets, and projects. That collection is the most immediately useful part for anyone trying to keep up with the area. The motivation in the abstract is straightforward: from-scratch reasoning does not scale well for complex workflows, so reusable skills matter for robustness and maintenance. The complementary roles view aligns with what systems like Claude Code are already doing in practice. The soft spot is the taxonomy itself. Methods that refine skills during execution sit across acquisition and retrieval, and some representations may emerge only through evolution rather than as a distinct upfront stage. If the paper does not explicitly discuss these interleavings or give clear assignment criteria, readers could see forced categories that reduce the framework's practical value. Coverage depth and selection criteria are hard to judge from the abstract alone, but the high-level structure holds together without obvious internal contradictions. This is for researchers and engineers already working on LLM agents who want an organizing lens rather than a new algorithm or benchmark. Someone building workflow tools or surveying the subfield would get value from the lifecycle view and the linked resources. It deserves a serious referee because a coherent survey can help structure a fast-moving area, even if the stages need tightening against cross-cutting examples. I would send it to peer review with notes to address potential overlaps and confirm the review is comprehensive.

Referee Report

1 major / 2 minor

Summary. The paper surveys LLM-based agent skills, defining them as reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints. It argues that agents and skills play complementary roles, with skills forming the operational layer for reliable, reusable, and composable execution, thereby central to scalability, robustness, and maintainability. The literature is organized around four stages of the agent skill lifecycle—representation, acquisition, retrieval, and evolution—with reviews of representative methods, ecosystem resources, and applications in each stage. Open challenges in quality control, interoperability, safe updating, and long-term capability management are discussed, and all resources are collected in a GitHub repository.

Significance. If the four-stage taxonomy provides a non-forced and reasonably complete partition of the literature, the survey would offer a useful organizing framework for researchers building scalable agent systems. The explicit collection of papers, data, and projects in the linked GitHub repository is a concrete strength that enhances reproducibility and community utility beyond the textual review.

major comments (1)

[Abstract and lifecycle organization section] The central organizational claim—that the existing literature partitions cleanly into the four stages of representation, acquisition, retrieval, and evolution without major omissions or forced categorizations—is load-bearing for the survey's practical value (see Abstract and the opening of the lifecycle section). The manuscript does not supply explicit selection criteria, coverage statistics, or a dedicated discussion of cross-stage methods (e.g., online skill refinement that interleaves acquisition and retrieval), leaving open the risk that the taxonomy imposes artificial boundaries as noted in the stress-test concern.

minor comments (2)

[Review sections for each lifecycle stage] A summary table or figure listing representative methods per stage with key references would improve readability and allow readers to quickly assess coverage.
[Conclusion] The GitHub repository link is mentioned but could be accompanied by a brief description of its structure and update policy in the main text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential utility of the four-stage taxonomy and the GitHub repository. We address the major comment below and will revise the manuscript accordingly to strengthen the presentation of the taxonomy.

read point-by-point responses

Referee: [Abstract and lifecycle organization section] The central organizational claim—that the existing literature partitions cleanly into the four stages of representation, acquisition, retrieval, and evolution without major omissions or forced categorizations—is load-bearing for the survey's practical value (see Abstract and the opening of the lifecycle section). The manuscript does not supply explicit selection criteria, coverage statistics, or a dedicated discussion of cross-stage methods (e.g., online skill refinement that interleaves acquisition and retrieval), leaving open the risk that the taxonomy imposes artificial boundaries as noted in the stress-test concern.

Authors: We agree that the manuscript would benefit from greater transparency regarding how the taxonomy was constructed. The four stages reflect a natural lifecycle progression observed across the surveyed literature, rather than an imposed partition, but we acknowledge that explicit documentation of selection criteria and coverage would help readers evaluate completeness and potential boundary issues. In the revised version, we will add a dedicated subsection (likely in the introduction or at the start of the lifecycle organization section) that outlines the literature search methodology, inclusion criteria, time frame, and approximate coverage statistics (e.g., number of papers reviewed per stage). We will also include a new discussion paragraph or subsection addressing cross-stage methods, with concrete examples such as online skill refinement that interleaves acquisition and retrieval, and how such hybrid approaches are handled or noted within the taxonomy. This addition will explicitly discuss overlaps and mitigate concerns about artificial boundaries. revision: yes

Circularity Check

0 steps flagged

Survey organizes external literature without self-referential derivation

full rationale

This paper is a literature survey that defines agent skills and organizes existing external work into four lifecycle stages (representation, acquisition, retrieval, evolution) as an organizational framework. It reviews representative methods and resources from the broader literature rather than deriving new quantities, predictions, or results from fitted parameters, self-citations, or internal equations. The complementary roles of agents and skills are presented as a definitional viewpoint to motivate the survey structure, with no load-bearing steps that reduce claims to inputs by construction. No uniqueness theorems, ansatzes, or renamings of known results are invoked in a self-referential manner. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work introduces no new free parameters, mathematical axioms, or invented entities; it relies on standard background assumptions from the LLM-agent literature such as the utility of tool use and memory in agents.

pith-pipeline@v0.9.0 · 5802 in / 1234 out tokens · 57985 ms · 2026-05-20T23:12:31.722227+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We organize the literature around four stages of the agent skill lifecycle — representation, acquisition, retrieval, and evolution

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

139 extracted references · 139 canonical work pages · 46 internal anchors

[1]

Language Models are Few-Shot Learners

T. B. Brownet al., “Language models are few-shot learners,” inAdvances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available: https://arxiv.org/abs/2005.14165

work page internal anchor Pith review Pith/arXiv arXiv 2020
[2]

GPT-4 Technical Report

J. Achiamet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [Online]. Available: https://arxiv.org/abs/ 2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Training language models to follow instructions with human feedback

L. Ouyanget al., “Training language models to follow instructions with human feedback,” inAdvances in Neural Information Processing Systems, vol. 35. Curran Associates, Inc., 2022, pp. 27 730–27 744. [Online]. Available: https://arxiv.org/abs/2203.02155

work page internal anchor Pith review Pith/arXiv arXiv 2022
[4]

ReAct: Synergizing Reasoning and Acting in Language Models

S. Yaoet al., “ReAct: Synergizing reasoning and acting in language models,” inInternational Conference on Learning Representations (ICLR), 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Y . Shenet al., “HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2303.17580

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

S. Honget al., “MetaGPT: Meta programming for a multi- agent collaborative framework,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https: //arxiv.org/abs/2308.00352

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Openclaw — the open-source personal ai assistant and autonomous agent,

OpenClaw, “Openclaw — the open-source personal ai assistant and autonomous agent,” https://open-claw.org/, 2026, official website, ac- cessed April 21, 2026

work page 2026
[8]

Welcome - manus documentation,

Manus, “Welcome - manus documentation,” https://manus.im/docs, 2026, official documentation, accessed April 21, 2026

work page 2026
[9]

Claude code overview,

Anthropic, “Claude code overview,” https://docs.anthropic.com/en/ docs/claude-code/overview, 2026, official documentation, accessed April 21, 2026

work page 2026
[10]

Introducing the model context protocol,

——, “Introducing the model context protocol,” https://www.anthropic. com/news/model-context-protocol, 2024, anthropic Blog, November 2024

work page 2024
[11]

Function calling and other API updates,

OpenAI, “Function calling and other API updates,” https://openai.com/ blog/function-calling-and-other-api-updates, 2023, openAI Blog, June 2023

work page 2023
[12]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wanget al., “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023. [Online]. Available: https://arxiv.org/abs/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

Large language models as tool makers,

T. Caiet al., “Large language models as tool makers,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://arxiv.org/abs/2305.17126

work page arXiv 2024
[14]

CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models,

C. Qianet al., “CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models,” inFindings of the Association for Computational Linguistics: EMNLP 2023. Singapore: Association for Computational Linguistics, 2023, pp. 6922–6939. [Online]. Available: https://aclanthology.org/2023.findings-emnlp.462/

work page 2023
[15]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

P. Lewiset al., “Retrieval-augmented generation for knowledge- intensive NLP tasks,” inAdvances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 9459–9474. [Online]. Available: https://arxiv.org/abs/2005.11401

work page internal anchor Pith review Pith/arXiv arXiv 2020
[16]

Dense passage retrieval for open-domain question answering,

V . Karpukhinet al., “Dense passage retrieval for open-domain question answering,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, 2020, pp. 6769–6781. [Online]. Available: https://aclanthology.org/2020.emnlp-main.550/

work page 2020
[17]

Anytool: Self-reflective, hierarchical agents for large-scale api calls.arXiv preprint arXiv:2402.04253, 2024

Y . Duet al., “AnyTool: Self-reflective, hierarchical agents for large- scale API calls,”arXiv preprint arXiv:2402.04253, 2024. [Online]. Available: https://arxiv.org/abs/2402.04253

work page arXiv 2024
[18]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wuet al., “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,”arXiv preprint arXiv:2308.08155, 2023. [Online]. Available: https://arxiv.org/abs/2308.08155

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

Reflexion: Language Agents with Verbal Reinforcement Learning

N. Shinnet al., “Reflexion: Language agents with verbal reinforcement learning,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2303.11366

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Toolformer: Language Models Can Teach Themselves to Use Tools

T. Schicket al., “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Y . Qinet al., “ToolLLM: Facilitating large language models to master 16000+ real-world APIs,”arXiv preprint arXiv:2307.16789, 2023. [Online]. Available: https://arxiv.org/abs/2307.16789

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

Buffer of thoughts: Thought-augmented reasoning with large language models,

X. Yanget al., “Buffer of thoughts: Thought-augmented reasoning with large language models,”Advances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2406.04271

work page arXiv 2024
[28]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

M. Ahnet al., “Do as i can, not as i say: Grounding language in robotic affordances,” inConference on Robot Learning (CoRL), 2022. [Online]. Available: https://arxiv.org/abs/2204.01691

work page internal anchor Pith review Pith/arXiv arXiv 2022
[29]

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Z. Wanget al., “Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents,” 2023. [Online]. Available: https://arxiv.org/abs/2302.01560

work page internal anchor Pith review Pith/arXiv arXiv 2023
[30]

Generative Agents: Interactive Simulacra of Human Behavior

J. S. Parket al., “Generative agents: Interactive simulacra of human behavior,” 2023. [Online]. Available: https://arxiv.org/abs/2304.03442 JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 19

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

X. Zhuet al., “Ghost in the Minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory,”arXiv preprint arXiv:2305.17144, 2023. [Online]. Available: https://arxiv.org/abs/2305.17144

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

Reasoning with language model is planning with world model,

S. Haoet al., “Reasoning with language model is planning with world model,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 8154–8173

work page 2023
[33]

Retroformer: Retrospective large language agents with policy gradient optimization.arXiv preprint arXiv:2308.02151, 2023

W. Yaoet al., “Retroformer: Retrospective large language agents with policy gradient optimization,”arXiv preprint arXiv:2308.02151, 2023

work page arXiv 2023
[34]

MemGPT: Towards LLMs as Operating Systems

C. Packeret al., “Memgpt: Towards LLMs as operating systems,” arXiv preprint arXiv:2310.08560, 2023. [Online]. Available: https: //arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

arXiv preprint arXiv:2311.08719 , year=

[Online]. Available: https://arxiv.org/abs/2311.08719

work page arXiv
[37]

Self-discover: Large language models self-compose reasoning structures,

P. Zhouet al., “Self-discover: Large language models self-compose reasoning structures,”Advances in Neural Information Processing Systems, vol. 37, pp. 126 032–126 058, 2024

work page 2024
[38]

Optimizing generative ai by backpropagating language model feedback,

M. Yuksekgonulet al., “Optimizing generative ai by backpropagating language model feedback,”Nature, vol. 639, no. 8055, pp. 609–616, 2025

work page 2025
[39]

Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making,

Y . Yuet al., “Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making,”arXiv preprint arXiv:2407.06567, 2024. [Online]. Available: https://arxiv.org/abs/2407.06567

work page arXiv 2024
[40]

M+: Extending memoryllm with scalable long-term memory,

Y . Wanget al., “M+: Extending memoryllm with scalable long-term memory,” 2025. [Online]. Available: https://arxiv.org/abs/2502.00592

work page arXiv 2025
[41]

Enhancing reasoning with collaboration and memory,

J. Michelmanet al., “Enhancing reasoning with collaboration and memory,”arXiv preprint arXiv:2503.05944, 2025

work page arXiv 2025
[42]

Nemori: Self-organizing agent memory inspired by cognitive science,

a. others, “Nemori: Self-organizing agent memory inspired by cognitive science,”arXiv preprint arXiv:2502.14828, 2025. [Online]. Available: https://arxiv.org/abs/2502.14828

work page arXiv 2025
[43]

Intrinsic memory agents: Heterogeneous multi-agent llm systems through structured contextual memory,

——, “Intrinsic memory agents: Heterogeneous multi-agent llm systems through structured contextual memory,”arXiv preprint arXiv:2506.19413, 2025. [Online]. Available: https://arxiv.org/abs/ 2506.19413

work page arXiv 2025
[44]

Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

Q. Miet al., “Procmem: Learning reusable procedural memory from experience via non-parametric ppo for llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2602.01869

work page internal anchor Pith review Pith/arXiv arXiv 2026
[45]

Skillcraft: Can llm agents learn to use tools skillfully?arXiv preprint arXiv:2603.00718, 2026

S. Chenet al., “Skillcraft: Can LLM agents learn to use tools skillfully?”arXiv preprint arXiv:2603.00718, 2026. [Online]. Available: https://arxiv.org/abs/2603.00718

work page arXiv 2026
[46]

Polyskill: Learning generalizable skills through polymorphic abstraction,

a. others, “Polyskill: Learning generalizable skills through polymorphic abstraction,”International Conference on Learning Representations,

work page
[47]

Available: https://arxiv.org/abs/2510.15863

[Online]. Available: https://arxiv.org/abs/2510.15863

work page arXiv
[49]

Cua-skill: Develop skills for computer using agent,

T. Chenet al., “Cua-skill: Develop skills for computer using agent,” arXiv preprint arXiv:2601.21123, 2026

work page arXiv 2026
[50]

Eureka: Human-Level Reward Design via Coding Large Language Models

Y . J. Maet al., “Eureka: Human-level reward design via coding large language models,” 2023. [Online]. Available: https://arxiv.org/ abs/2310.12931

work page internal anchor Pith review Pith/arXiv arXiv 2023
[51]

Ds-agent: Automated data science by empowering large language models with case-based reasoning,

X. Yueet al., “Ds-agent: Automated data science by empowering large language models with case-based reasoning,” 2024. [Online]. Available: https://arxiv.org/abs/2402.17453

work page arXiv 2024
[52]

Ldb: A large language model debugger via verifying runtime execution step-by-step,

X. Zhonget al., “Debug like a human: A large language model debugger via verifying runtime execution step-by-step,” 2024. [Online]. Available: https://arxiv.org/abs/2402.16906

work page arXiv 2024
[53]

Executable code actions elicit better LLM agents, 2024

X. Wanget al., “Executable code actions elicit better LLM agents,”arXiv preprint arXiv:2402.01030, 2024. [Online]. Available: https://arxiv.org/abs/2402.01030

work page arXiv 2024
[54]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

J. Yanget al., “Swe-agent: Agent-computer interfaces enable automated software engineering,” 2024. [Online]. Available: https: //arxiv.org/abs/2405.15793

work page internal anchor Pith review Pith/arXiv arXiv 2024
[55]

Toolcoder: Teach code generation models to use api search tools,

K. Zhanget al., “Toolcoder: Teach code generation models to use api search tools,” 2023. [Online]. Available: https://arxiv.org/abs/2305. 04032

work page 2023
[56]

Evolving programmatic skill networks.arXiv preprint arXiv:2601.03509,

H. Shiet al., “Evolving programmatic skill networks,” 2026. [Online]. Available: https://arxiv.org/abs/2601.03509

work page arXiv 2026
[57]

Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models

Z. Wanget al., “Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models,”arXiv preprint arXiv:2311.05997, 2023. [Online]. Available: https://arxiv.org/abs/ 2311.05997

work page arXiv 2023
[59]

Zheng, R

[Online]. Available: https://arxiv.org/abs/2306.07863

work page arXiv
[61]

Organizing, orchestrating, and benchmarking agent skills at ecosystem scale

H. Liet al., “Organizing, orchestrating, and benchmarking agent skills at ecosystem scale,”arXiv preprint arXiv:2603.02176, 2026. [Online]. Available: https://arxiv.org/abs/2603.02176

work page arXiv 2026
[62]

4 Athar Sefid, Prasenjit Mitra, and Lee Giles

J. Ruanet al., “Tptu: large language model-based ai agents for task planning and tool usage,”arXiv preprint arXiv:2308.03427, 2023

work page arXiv 2023
[63]

Agents thinking fast and slow: A talker- reasoner architecture,

K. Christakopoulouet al., “Agents thinking fast and slow: A talker- reasoner architecture,”arXiv preprint arXiv:2410.08328, 2024

work page arXiv 2024
[64]

Llm-powered decentralized generative agents with adaptive hierarchical knowledge graph for cooperative planning,

a. others, “Llm-powered decentralized generative agents with adaptive hierarchical knowledge graph for cooperative planning,” arXiv preprint arXiv:2502.05453, 2025. [Online]. Available: https: //arxiv.org/abs/2502.05453

work page arXiv 2025
[65]

Graphskill: Documentation-guided hierarchical retrieval-augmented coding for complex graph reasoning,

F. Wanget al., “Graphskill: Documentation-guided hierarchical retrieval-augmented coding for complex graph reasoning,” 2026. [Online]. Available: https://arxiv.org/abs/2603.06620

work page arXiv 2026
[66]

Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286, 2025

J. Qiuet al., “Alita: Generalist agent enabling scalable agentic rea- soning with minimal predefinition and maximal self-evolution,”arXiv preprint arXiv:2505.20286, 2025

work page arXiv 2025
[67]

Skillnet: Create, evaluate, and connect ai skills,

Y . Lianget al., “Skillnet: Create, evaluate, and connect ai skills,”

work page
[68]

Available: https://arxiv.org/abs/2603.04448

[Online]. Available: https://arxiv.org/abs/2603.04448

work page arXiv
[69]

Sok: Agentic skills – beyond tool use in llm agents,

Y . Jianget al., “Sok: Agentic skills – beyond tool use in llm agents,”

work page
[70]

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

[Online]. Available: https://arxiv.org/abs/2602.20867

work page internal anchor Pith review Pith/arXiv arXiv
[71]

Skills are the new apps – now it’s time for skill os,

L. Chenet al., “Skills are the new apps – now it’s time for skill os,” 2026, preprints.org manuscript 202602.1096.v1. [Online]. Available: https://www.preprints.org/manuscript/202602.1096/v1

work page arXiv 2026
[72]

Agent hospital: A simulacrum of hospital with evolvable medical agents,

J. Liet al., “Agent hospital: A simulacrum of hospital with evolvable medical agents,”arXiv preprint arXiv:2405.02957, 2024. [Online]. Available: https://arxiv.org/abs/2405.02957

work page arXiv 2024
[73]

arXiv:2601.02163 [cs.AI] https: //arxiv.org/abs/2601.02163

C. Huet al., “Evermemos: A self-organizing memory operating system for structured long-horizon reasoning,” 2026. [Online]. Available: https://arxiv.org/abs/2601.02163

work page arXiv 2026
[74]

HyperMem: Hypergraph Memory for Long-Term Conversations

L. Yueet al., “Hypermem: Hypergraph memory for long-term conversations,” 2026, accepted to ACL 2026 Main. [Online]. Available: https://arxiv.org/abs/2604.08256

work page internal anchor Pith review Pith/arXiv arXiv 2026
[75]

G- memory: Tracing hierarchical memory for multi-agent systems, 2025

G. Zhanget al., “G-memory: Tracing hierarchical memory for multi- agent systems, 2025,”URL https://arxiv. org/abs/2506.07398

work page arXiv 2025
[76]

Agentevolver: Towards efficient self-evolving agent system,

a. others, “Agentevolver: Towards efficient self-evolving agent system,”arXiv preprint arXiv:2511.10395, 2025. [Online]. Available: https://arxiv.org/abs/2511.10395

work page arXiv 2025
[77]

Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark,

Y . Caiet al., “Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark,” 2025. [Online]. Available: https://arxiv.org/abs/2508.19005

work page arXiv 2025
[78]

Autorefine: From trajectories to reusable expertise for continual llm agent refinement.arXiv preprint arXiv:2601.22758, 2026

L. Qiuet al., “Autorefine: From trajectories to reusable expertise for continual llm agent refinement,” 2026. [Online]. Available: https://arxiv.org/abs/2601.22758

work page arXiv 2026
[79]

Karlsson, Bo An, and Zongqing Lu

W. Tanet al., “Cradle: Empowering foundation agents towards general computer control,”arXiv preprint arXiv:2403.03186, 2024. [Online]. Available: https://arxiv.org/abs/2403.03186

work page arXiv 2024
[80]

AppAgent: Multimodal Agents as Smartphone Users

C. Zhanget al., “Appagent: Multimodal agents as smartphone users,”arXiv preprint arXiv:2312.13771, 2023. [Online]. Available: https://arxiv.org/abs/2312.13771

work page internal anchor Pith review Pith/arXiv arXiv 2023
[81]

Autoguide: Automated generation and selection of context-aware guidelines for large language model agents,

Y . Fuet al., “Autoguide: Automated generation and selection of context-aware guidelines for large language model agents,” arXiv preprint arXiv:2403.08978, 2024. [Online]. Available: https: //arxiv.org/abs/2403.08978

work page arXiv 2024
[82]

WebArena: A Realistic Web Environment for Building Autonomous Agents

S. Zhouet al., “WebArena: A realistic web environment for building autonomous agents,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https: //arxiv.org/abs/2307.13854

work page internal anchor Pith review Pith/arXiv arXiv 2024
[83]

Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

Y . Sunet al., “Don’t retrieve, navigate: Distilling enterprise knowledge into navigable agent skills for qa and rag,”arXiv preprint arXiv:2604.14572, Apr. 2026. [Online]. Available: https: //arxiv.org/abs/2604.14572

work page internal anchor Pith review Pith/arXiv arXiv 2026
[84]

Agentdistill: Training-free agent distillation with gener- alizable mcp boxes,

J. Qiuet al., “Agentdistill: Training-free agent distillation with gener- alizable mcp boxes,”arXiv preprint arXiv:2506.14728, 2025

work page arXiv 2025
[85]

Reinforcement Learning for Self-Improving Agent with Skill Library

J. Wanget al., “Reinforcement learning for self-improving agent with skill library,”arXiv preprint arXiv:2512.17102, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[86]

Autoskill: Experience-driven lifelong learning via skill self-evolution, 2026

Y . Yanget al., “Autoskill: Experience-driven lifelong learning via skill self-evolution,” 2026. [Online]. Available: https://arxiv.org/abs/ 2603.01145

work page arXiv 2026
[87]

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

H. Zhanget al., “Memskill: Learning and evolving memory skills for self-evolving agents,” 2026. [Online]. Available: https: //arxiv.org/abs/2602.02474

work page internal anchor Pith review Pith/arXiv arXiv 2026
[88]

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

S. Ouyanget al., “Reasoningbank: Scaling agent self-evolving with reasoning memory,” 2025. [Online]. Available: https://arxiv.org/abs/ 2509.25140 JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 20

work page internal anchor Pith review Pith/arXiv arXiv 2025
[89]

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

B. Zhenget al., “Skillweaver: Web agents can self-improve by discovering and honing skills,” 2025. [Online]. Available: https://arxiv.org/abs/2504.07079

work page internal anchor Pith review Pith/arXiv arXiv 2025

Showing first 80 references.

[1] [1]

Language Models are Few-Shot Learners

T. B. Brownet al., “Language models are few-shot learners,” inAdvances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available: https://arxiv.org/abs/2005.14165

work page internal anchor Pith review Pith/arXiv arXiv 2020

[2] [2]

GPT-4 Technical Report

J. Achiamet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [Online]. Available: https://arxiv.org/abs/ 2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Training language models to follow instructions with human feedback

L. Ouyanget al., “Training language models to follow instructions with human feedback,” inAdvances in Neural Information Processing Systems, vol. 35. Curran Associates, Inc., 2022, pp. 27 730–27 744. [Online]. Available: https://arxiv.org/abs/2203.02155

work page internal anchor Pith review Pith/arXiv arXiv 2022

[4] [4]

ReAct: Synergizing Reasoning and Acting in Language Models

S. Yaoet al., “ReAct: Synergizing reasoning and acting in language models,” inInternational Conference on Learning Representations (ICLR), 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023

[5] [5]

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Y . Shenet al., “HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2303.17580

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

S. Honget al., “MetaGPT: Meta programming for a multi- agent collaborative framework,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https: //arxiv.org/abs/2308.00352

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Openclaw — the open-source personal ai assistant and autonomous agent,

OpenClaw, “Openclaw — the open-source personal ai assistant and autonomous agent,” https://open-claw.org/, 2026, official website, ac- cessed April 21, 2026

work page 2026

[8] [8]

Welcome - manus documentation,

Manus, “Welcome - manus documentation,” https://manus.im/docs, 2026, official documentation, accessed April 21, 2026

work page 2026

[9] [9]

Claude code overview,

Anthropic, “Claude code overview,” https://docs.anthropic.com/en/ docs/claude-code/overview, 2026, official documentation, accessed April 21, 2026

work page 2026

[10] [10]

Introducing the model context protocol,

——, “Introducing the model context protocol,” https://www.anthropic. com/news/model-context-protocol, 2024, anthropic Blog, November 2024

work page 2024

[11] [11]

Function calling and other API updates,

OpenAI, “Function calling and other API updates,” https://openai.com/ blog/function-calling-and-other-api-updates, 2023, openAI Blog, June 2023

work page 2023

[12] [12]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wanget al., “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023. [Online]. Available: https://arxiv.org/abs/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv 2023

[13] [13]

Large language models as tool makers,

T. Caiet al., “Large language models as tool makers,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://arxiv.org/abs/2305.17126

work page arXiv 2024

[14] [14]

CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models,

C. Qianet al., “CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models,” inFindings of the Association for Computational Linguistics: EMNLP 2023. Singapore: Association for Computational Linguistics, 2023, pp. 6922–6939. [Online]. Available: https://aclanthology.org/2023.findings-emnlp.462/

work page 2023

[15] [15]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

P. Lewiset al., “Retrieval-augmented generation for knowledge- intensive NLP tasks,” inAdvances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 9459–9474. [Online]. Available: https://arxiv.org/abs/2005.11401

work page internal anchor Pith review Pith/arXiv arXiv 2020

[16] [16]

Dense passage retrieval for open-domain question answering,

V . Karpukhinet al., “Dense passage retrieval for open-domain question answering,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, 2020, pp. 6769–6781. [Online]. Available: https://aclanthology.org/2020.emnlp-main.550/

work page 2020

[17] [17]

Anytool: Self-reflective, hierarchical agents for large-scale api calls.arXiv preprint arXiv:2402.04253, 2024

Y . Duet al., “AnyTool: Self-reflective, hierarchical agents for large- scale API calls,”arXiv preprint arXiv:2402.04253, 2024. [Online]. Available: https://arxiv.org/abs/2402.04253

work page arXiv 2024

[18] [18]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wuet al., “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,”arXiv preprint arXiv:2308.08155, 2023. [Online]. Available: https://arxiv.org/abs/2308.08155

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

Reflexion: Language Agents with Verbal Reinforcement Learning

N. Shinnet al., “Reflexion: Language agents with verbal reinforcement learning,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2303.11366

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [21]

Toolformer: Language Models Can Teach Themselves to Use Tools

T. Schicket al., “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [22]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Y . Qinet al., “ToolLLM: Facilitating large language models to master 16000+ real-world APIs,”arXiv preprint arXiv:2307.16789, 2023. [Online]. Available: https://arxiv.org/abs/2307.16789

work page internal anchor Pith review Pith/arXiv arXiv 2023

[22] [24]

Buffer of thoughts: Thought-augmented reasoning with large language models,

X. Yanget al., “Buffer of thoughts: Thought-augmented reasoning with large language models,”Advances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2406.04271

work page arXiv 2024

[23] [28]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

M. Ahnet al., “Do as i can, not as i say: Grounding language in robotic affordances,” inConference on Robot Learning (CoRL), 2022. [Online]. Available: https://arxiv.org/abs/2204.01691

work page internal anchor Pith review Pith/arXiv arXiv 2022

[24] [29]

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Z. Wanget al., “Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents,” 2023. [Online]. Available: https://arxiv.org/abs/2302.01560

work page internal anchor Pith review Pith/arXiv arXiv 2023

[25] [30]

Generative Agents: Interactive Simulacra of Human Behavior

J. S. Parket al., “Generative agents: Interactive simulacra of human behavior,” 2023. [Online]. Available: https://arxiv.org/abs/2304.03442 JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 19

work page internal anchor Pith review Pith/arXiv arXiv 2023

[26] [31]

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

X. Zhuet al., “Ghost in the Minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory,”arXiv preprint arXiv:2305.17144, 2023. [Online]. Available: https://arxiv.org/abs/2305.17144

work page internal anchor Pith review Pith/arXiv arXiv 2023

[27] [32]

Reasoning with language model is planning with world model,

S. Haoet al., “Reasoning with language model is planning with world model,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 8154–8173

work page 2023

[28] [33]

Retroformer: Retrospective large language agents with policy gradient optimization.arXiv preprint arXiv:2308.02151, 2023

W. Yaoet al., “Retroformer: Retrospective large language agents with policy gradient optimization,”arXiv preprint arXiv:2308.02151, 2023

work page arXiv 2023

[29] [34]

MemGPT: Towards LLMs as Operating Systems

C. Packeret al., “Memgpt: Towards LLMs as operating systems,” arXiv preprint arXiv:2310.08560, 2023. [Online]. Available: https: //arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2023

[30] [36]

arXiv preprint arXiv:2311.08719 , year=

[Online]. Available: https://arxiv.org/abs/2311.08719

work page arXiv

[31] [37]

Self-discover: Large language models self-compose reasoning structures,

P. Zhouet al., “Self-discover: Large language models self-compose reasoning structures,”Advances in Neural Information Processing Systems, vol. 37, pp. 126 032–126 058, 2024

work page 2024

[32] [38]

Optimizing generative ai by backpropagating language model feedback,

M. Yuksekgonulet al., “Optimizing generative ai by backpropagating language model feedback,”Nature, vol. 639, no. 8055, pp. 609–616, 2025

work page 2025

[33] [39]

Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making,

Y . Yuet al., “Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making,”arXiv preprint arXiv:2407.06567, 2024. [Online]. Available: https://arxiv.org/abs/2407.06567

work page arXiv 2024

[34] [40]

M+: Extending memoryllm with scalable long-term memory,

Y . Wanget al., “M+: Extending memoryllm with scalable long-term memory,” 2025. [Online]. Available: https://arxiv.org/abs/2502.00592

work page arXiv 2025

[35] [41]

Enhancing reasoning with collaboration and memory,

J. Michelmanet al., “Enhancing reasoning with collaboration and memory,”arXiv preprint arXiv:2503.05944, 2025

work page arXiv 2025

[36] [42]

Nemori: Self-organizing agent memory inspired by cognitive science,

a. others, “Nemori: Self-organizing agent memory inspired by cognitive science,”arXiv preprint arXiv:2502.14828, 2025. [Online]. Available: https://arxiv.org/abs/2502.14828

work page arXiv 2025

[37] [43]

Intrinsic memory agents: Heterogeneous multi-agent llm systems through structured contextual memory,

——, “Intrinsic memory agents: Heterogeneous multi-agent llm systems through structured contextual memory,”arXiv preprint arXiv:2506.19413, 2025. [Online]. Available: https://arxiv.org/abs/ 2506.19413

work page arXiv 2025

[38] [44]

Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

Q. Miet al., “Procmem: Learning reusable procedural memory from experience via non-parametric ppo for llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2602.01869

work page internal anchor Pith review Pith/arXiv arXiv 2026

[39] [45]

Skillcraft: Can llm agents learn to use tools skillfully?arXiv preprint arXiv:2603.00718, 2026

S. Chenet al., “Skillcraft: Can LLM agents learn to use tools skillfully?”arXiv preprint arXiv:2603.00718, 2026. [Online]. Available: https://arxiv.org/abs/2603.00718

work page arXiv 2026

[40] [46]

Polyskill: Learning generalizable skills through polymorphic abstraction,

a. others, “Polyskill: Learning generalizable skills through polymorphic abstraction,”International Conference on Learning Representations,

work page

[41] [47]

Available: https://arxiv.org/abs/2510.15863

[Online]. Available: https://arxiv.org/abs/2510.15863

work page arXiv

[42] [49]

Cua-skill: Develop skills for computer using agent,

T. Chenet al., “Cua-skill: Develop skills for computer using agent,” arXiv preprint arXiv:2601.21123, 2026

work page arXiv 2026

[43] [50]

Eureka: Human-Level Reward Design via Coding Large Language Models

Y . J. Maet al., “Eureka: Human-level reward design via coding large language models,” 2023. [Online]. Available: https://arxiv.org/ abs/2310.12931

work page internal anchor Pith review Pith/arXiv arXiv 2023

[44] [51]

Ds-agent: Automated data science by empowering large language models with case-based reasoning,

X. Yueet al., “Ds-agent: Automated data science by empowering large language models with case-based reasoning,” 2024. [Online]. Available: https://arxiv.org/abs/2402.17453

work page arXiv 2024

[45] [52]

Ldb: A large language model debugger via verifying runtime execution step-by-step,

X. Zhonget al., “Debug like a human: A large language model debugger via verifying runtime execution step-by-step,” 2024. [Online]. Available: https://arxiv.org/abs/2402.16906

work page arXiv 2024

[46] [53]

Executable code actions elicit better LLM agents, 2024

X. Wanget al., “Executable code actions elicit better LLM agents,”arXiv preprint arXiv:2402.01030, 2024. [Online]. Available: https://arxiv.org/abs/2402.01030

work page arXiv 2024

[47] [54]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

J. Yanget al., “Swe-agent: Agent-computer interfaces enable automated software engineering,” 2024. [Online]. Available: https: //arxiv.org/abs/2405.15793

work page internal anchor Pith review Pith/arXiv arXiv 2024

[48] [55]

Toolcoder: Teach code generation models to use api search tools,

K. Zhanget al., “Toolcoder: Teach code generation models to use api search tools,” 2023. [Online]. Available: https://arxiv.org/abs/2305. 04032

work page 2023

[49] [56]

Evolving programmatic skill networks.arXiv preprint arXiv:2601.03509,

H. Shiet al., “Evolving programmatic skill networks,” 2026. [Online]. Available: https://arxiv.org/abs/2601.03509

work page arXiv 2026

[50] [57]

Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models

Z. Wanget al., “Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models,”arXiv preprint arXiv:2311.05997, 2023. [Online]. Available: https://arxiv.org/abs/ 2311.05997

work page arXiv 2023

[51] [59]

Zheng, R

[Online]. Available: https://arxiv.org/abs/2306.07863

work page arXiv

[52] [61]

Organizing, orchestrating, and benchmarking agent skills at ecosystem scale

H. Liet al., “Organizing, orchestrating, and benchmarking agent skills at ecosystem scale,”arXiv preprint arXiv:2603.02176, 2026. [Online]. Available: https://arxiv.org/abs/2603.02176

work page arXiv 2026

[53] [62]

4 Athar Sefid, Prasenjit Mitra, and Lee Giles

J. Ruanet al., “Tptu: large language model-based ai agents for task planning and tool usage,”arXiv preprint arXiv:2308.03427, 2023

work page arXiv 2023

[54] [63]

Agents thinking fast and slow: A talker- reasoner architecture,

K. Christakopoulouet al., “Agents thinking fast and slow: A talker- reasoner architecture,”arXiv preprint arXiv:2410.08328, 2024

work page arXiv 2024

[55] [64]

Llm-powered decentralized generative agents with adaptive hierarchical knowledge graph for cooperative planning,

a. others, “Llm-powered decentralized generative agents with adaptive hierarchical knowledge graph for cooperative planning,” arXiv preprint arXiv:2502.05453, 2025. [Online]. Available: https: //arxiv.org/abs/2502.05453

work page arXiv 2025

[56] [65]

Graphskill: Documentation-guided hierarchical retrieval-augmented coding for complex graph reasoning,

F. Wanget al., “Graphskill: Documentation-guided hierarchical retrieval-augmented coding for complex graph reasoning,” 2026. [Online]. Available: https://arxiv.org/abs/2603.06620

work page arXiv 2026

[57] [66]

Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286, 2025

J. Qiuet al., “Alita: Generalist agent enabling scalable agentic rea- soning with minimal predefinition and maximal self-evolution,”arXiv preprint arXiv:2505.20286, 2025

work page arXiv 2025

[58] [67]

Skillnet: Create, evaluate, and connect ai skills,

Y . Lianget al., “Skillnet: Create, evaluate, and connect ai skills,”

work page

[59] [68]

Available: https://arxiv.org/abs/2603.04448

[Online]. Available: https://arxiv.org/abs/2603.04448

work page arXiv

[60] [69]

Sok: Agentic skills – beyond tool use in llm agents,

Y . Jianget al., “Sok: Agentic skills – beyond tool use in llm agents,”

work page

[61] [70]

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

[Online]. Available: https://arxiv.org/abs/2602.20867

work page internal anchor Pith review Pith/arXiv arXiv

[62] [71]

Skills are the new apps – now it’s time for skill os,

L. Chenet al., “Skills are the new apps – now it’s time for skill os,” 2026, preprints.org manuscript 202602.1096.v1. [Online]. Available: https://www.preprints.org/manuscript/202602.1096/v1

work page arXiv 2026

[63] [72]

Agent hospital: A simulacrum of hospital with evolvable medical agents,

J. Liet al., “Agent hospital: A simulacrum of hospital with evolvable medical agents,”arXiv preprint arXiv:2405.02957, 2024. [Online]. Available: https://arxiv.org/abs/2405.02957

work page arXiv 2024

[64] [73]

arXiv:2601.02163 [cs.AI] https: //arxiv.org/abs/2601.02163

C. Huet al., “Evermemos: A self-organizing memory operating system for structured long-horizon reasoning,” 2026. [Online]. Available: https://arxiv.org/abs/2601.02163

work page arXiv 2026

[65] [74]

HyperMem: Hypergraph Memory for Long-Term Conversations

L. Yueet al., “Hypermem: Hypergraph memory for long-term conversations,” 2026, accepted to ACL 2026 Main. [Online]. Available: https://arxiv.org/abs/2604.08256

work page internal anchor Pith review Pith/arXiv arXiv 2026

[66] [75]

G- memory: Tracing hierarchical memory for multi-agent systems, 2025

G. Zhanget al., “G-memory: Tracing hierarchical memory for multi- agent systems, 2025,”URL https://arxiv. org/abs/2506.07398

work page arXiv 2025

[67] [76]

Agentevolver: Towards efficient self-evolving agent system,

a. others, “Agentevolver: Towards efficient self-evolving agent system,”arXiv preprint arXiv:2511.10395, 2025. [Online]. Available: https://arxiv.org/abs/2511.10395

work page arXiv 2025

[68] [77]

Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark,

Y . Caiet al., “Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark,” 2025. [Online]. Available: https://arxiv.org/abs/2508.19005

work page arXiv 2025

[69] [78]

Autorefine: From trajectories to reusable expertise for continual llm agent refinement.arXiv preprint arXiv:2601.22758, 2026

L. Qiuet al., “Autorefine: From trajectories to reusable expertise for continual llm agent refinement,” 2026. [Online]. Available: https://arxiv.org/abs/2601.22758

work page arXiv 2026

[70] [79]

Karlsson, Bo An, and Zongqing Lu

W. Tanet al., “Cradle: Empowering foundation agents towards general computer control,”arXiv preprint arXiv:2403.03186, 2024. [Online]. Available: https://arxiv.org/abs/2403.03186

work page arXiv 2024

[71] [80]

AppAgent: Multimodal Agents as Smartphone Users

C. Zhanget al., “Appagent: Multimodal agents as smartphone users,”arXiv preprint arXiv:2312.13771, 2023. [Online]. Available: https://arxiv.org/abs/2312.13771

work page internal anchor Pith review Pith/arXiv arXiv 2023

[72] [81]

Autoguide: Automated generation and selection of context-aware guidelines for large language model agents,

Y . Fuet al., “Autoguide: Automated generation and selection of context-aware guidelines for large language model agents,” arXiv preprint arXiv:2403.08978, 2024. [Online]. Available: https: //arxiv.org/abs/2403.08978

work page arXiv 2024

[73] [82]

WebArena: A Realistic Web Environment for Building Autonomous Agents

S. Zhouet al., “WebArena: A realistic web environment for building autonomous agents,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https: //arxiv.org/abs/2307.13854

work page internal anchor Pith review Pith/arXiv arXiv 2024

[74] [83]

Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

Y . Sunet al., “Don’t retrieve, navigate: Distilling enterprise knowledge into navigable agent skills for qa and rag,”arXiv preprint arXiv:2604.14572, Apr. 2026. [Online]. Available: https: //arxiv.org/abs/2604.14572

work page internal anchor Pith review Pith/arXiv arXiv 2026

[75] [84]

Agentdistill: Training-free agent distillation with gener- alizable mcp boxes,

J. Qiuet al., “Agentdistill: Training-free agent distillation with gener- alizable mcp boxes,”arXiv preprint arXiv:2506.14728, 2025

work page arXiv 2025

[76] [85]

Reinforcement Learning for Self-Improving Agent with Skill Library

J. Wanget al., “Reinforcement learning for self-improving agent with skill library,”arXiv preprint arXiv:2512.17102, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[77] [86]

Autoskill: Experience-driven lifelong learning via skill self-evolution, 2026

Y . Yanget al., “Autoskill: Experience-driven lifelong learning via skill self-evolution,” 2026. [Online]. Available: https://arxiv.org/abs/ 2603.01145

work page arXiv 2026

[78] [87]

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

H. Zhanget al., “Memskill: Learning and evolving memory skills for self-evolving agents,” 2026. [Online]. Available: https: //arxiv.org/abs/2602.02474

work page internal anchor Pith review Pith/arXiv arXiv 2026

[79] [88]

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

S. Ouyanget al., “Reasoningbank: Scaling agent self-evolving with reasoning memory,” 2025. [Online]. Available: https://arxiv.org/abs/ 2509.25140 JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 20

work page internal anchor Pith review Pith/arXiv arXiv 2025

[80] [89]

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

B. Zhenget al., “Skillweaver: Web agents can self-improve by discovering and honing skills,” 2025. [Online]. Available: https://arxiv.org/abs/2504.07079

work page internal anchor Pith review Pith/arXiv arXiv 2025