arxiv: 2605.10052 · v1 · submitted 2026-05-11 · 💻 cs.CL · cs.AI

Recognition: no theorem link

Swarm Skills: A Portable, Self-Evolving Multi-Agent System Specification for Coordination Engineering

Xinyu Zhang , Zhicheng Dou , Deyang Li , Jianjun Tao , Shuo Cheng , Ruifeng Shi , Fangchao Liu , Enrui Hu

show 5 more authors

Yangkai Ding Hongbo Wang Qi Ye Xuefeng Jin Zhangchun Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:24 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords Swarm Skillsmulti-agent coordinationself-evolving specificationscoordination engineeringportable agent workflowsprogressive disclosureagent team portabilityframework-independent skills

0 comments

The pith

Swarm Skills turns multi-agent coordination protocols into portable, self-evolving assets that work across different frameworks without adapters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Swarm Skills to address the shift from single-agent prompt engineering to multi-agent coordination engineering. Single-agent skills can already be shared as portable files, but team coordination remains trapped inside individual frameworks or static setups, blocking distribution and ongoing improvement. Swarm Skills extends an existing skills standard with roles, workflows, execution rules, and a semantic layer that supports automatic refinement. A built-in algorithm then extracts new skills from successful runs and updates old ones using scores for effectiveness, utilization, and freshness. If this works, agent teams could treat coordination strategies as open, improvable resources rather than framework-specific code.

Core claim

Swarm Skills extends the Anthropic Skills standard with multi-agent semantics to create first-class, distributable assets consisting of roles, workflows, execution bounds, and a semantic structure for self-evolution. A companion algorithm automatically distills successful execution trajectories into new Swarm Skills and patches existing ones using multi-dimensional scoring of Effectiveness, Utilization, and Freshness, removing the need for human oversight. Architectural analysis and a case study with the JiuwenSwarm implementation show that this approach delivers zero-adapter cross-agent portability through progressive disclosure, allowing agent teams to self-evolve coordination strategies.

What carries the argument

The Swarm Skills specification, which encodes roles, workflows, bounds, and self-evolution semantics so that multi-agent coordination becomes a shareable, updatable asset.

If this is right

Multi-agent workflows become distributable assets that any compatible system can load and run.
Coordination strategies improve continuously through automated distillation of execution data.
Teams can switch agent frameworks without rewriting or adapting their collaboration protocols.
Progressive disclosure lets agents access only the coordination details they need at each step.
The refinement loop operates without ongoing human intervention once the initial skills are defined.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A library of evolved Swarm Skills could emerge, similar to how prompt libraries exist today, allowing teams to start from community-refined coordination patterns.
The same scoring approach might be tested on mixed human-AI teams to see whether the algorithm still produces useful updates.
Portability claims could be checked by measuring the exact number of code changes required when moving a Swarm Skill between two unrelated agent platforms.
If successful, this pattern might influence how future agent standards define not just individual capabilities but also interaction rules.

Load-bearing premise

The self-evolution algorithm can reliably extract and update skills from execution data using Effectiveness, Utilization, and Freshness scores without causing performance drops or needing human corrections.

What would settle it

Run the self-evolution process on a set of trajectories, then deploy the resulting skills in a fresh agent team on a different framework and measure whether task success rates drop or manual adjustments become necessary.

Figures

Figures reproduced from arXiv: 2605.10052 by Deyang Li, Enrui Hu, Fangchao Liu, Hongbo Wang, Jianjun Tao, Qi Ye, Ruifeng Shi, Shuo Cheng, Xinyu Zhang, Xuefeng Jin, Yangkai Ding, Zhangchun Zhao, Zhicheng Dou.

**Figure 2.** Figure 2: The Anatomy of a Swarm Skill. The specification delineates the asset into three [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The Self-Evolution Lifecycle. The algorithm orchestrates a continuous loop start [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The overall collaboration workflow of the [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

As artificial intelligence engineering paradigms shift from single-agent Prompt and Context Engineering toward multi-agent \textbf{Coordination Engineering}, the ability to codify and systematically improve how multiple agents collaborate has emerged as a critical bottleneck. While single-agent skills can now be distributed as portable assets, multi-agent coordination protocols remain locked within framework-internal code or static configurations, preventing them from being shared across systems or autonomously improved over time. We propose \textbf{Swarm Skills}, a portable specification that extends the Anthropic Skills standard with multi-agent semantics. Swarm Skills turns multi-agent workflows into first-class, distributable assets that consist of roles, workflows, execution bounds, and a built-in semantic structure for self-evolution. To operationalize the specification's evolving nature, we present a companion self-evolution algorithm that automatically distills successful execution trajectories into new Swarm Skills and continuously patches existing ones based on multi-dimensional scoring (Effectiveness, Utilization, and Freshness), eliminating the need for human-in-the-loop oversight during the refinement process. Through an architectural compatibility analysis and a comprehensive qualitative case study using the open-source JiuwenSwarm reference implementation, we demonstrate how Swarm Skills achieves zero-adapter cross-agent portability via progressive disclosure, enabling agent teams to self-evolve their coordination strategies without framework lock-in.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Swarm Skills offers a practical extension for multi-agent coordination but its self-evolution relies on untested scoring without quantitative support.

read the letter

Swarm Skills is a specification for turning multi-agent coordination into shareable, self-updating assets. The paper extends the Anthropic Skills standard with roles, workflows, and a built-in way for the system to distill better versions from past runs. They pair it with an algorithm that scores trajectories on effectiveness, utilization, and freshness to create or patch skills automatically. This is new in combining portability across frameworks with autonomous refinement. The architectural analysis shows how progressive disclosure lets different agents use the same skill without adapters. The JiuwenSwarm case study gives a real example of how this plays out in an open implementation, which is helpful for seeing the practical side. The main soft spot is the lack of hard evidence for the self-evolution working reliably. The claims about no human oversight and no degradation rest on the scoring system and one qualitative demonstration. Without numbers on performance changes over iterations or checks for missed failure modes, it's hard to know if the process stays stable or slowly worsens coordination. This paper is for people doing coordination engineering in multi-agent setups who need a way to share and maintain protocols outside of one framework. A practitioner could use the spec as a starting point. It deserves a serious referee because the idea is actionable and the portability aspect can be evaluated directly, even though the evolution mechanism needs more data to support the no-degradation guarantee. I recommend sending it to peer review and asking the authors to include quantitative tests of the self-evolution loop in revisions.

Referee Report

3 major / 2 minor

Summary. The paper proposes Swarm Skills, a portable specification extending the Anthropic Skills standard with multi-agent semantics (roles, workflows, execution bounds, and self-evolution structure) to address the bottleneck in coordination engineering. It introduces a companion self-evolution algorithm that distills successful execution trajectories into new or patched Swarm Skills using multi-dimensional scoring (Effectiveness, Utilization, Freshness) without human oversight. Claims of zero-adapter cross-agent portability via progressive disclosure and reliable autonomous improvement are supported by an architectural compatibility analysis and a qualitative case study on the JiuwenSwarm reference implementation.

Significance. If the self-evolution algorithm can be shown to reliably extract and refine coordination skills without degradation, the work would offer a meaningful advance in multi-agent systems by turning coordination protocols into distributable, iteratively improvable assets independent of any framework. The focus on portability and progressive disclosure addresses a real practical gap, though the current qualitative-only evidence limits immediate applicability.

major comments (3)

[Abstract] Abstract: The central claim that the self-evolution algorithm 'eliminates the need for human-in-the-loop oversight during the refinement process' rests solely on a qualitative case study; no quantitative metrics (success-rate deltas, coordination latency, iteration stability) or ablations on the Effectiveness/Utilization/Freshness scoring functions are reported, leaving the no-degradation guarantee unverified.
[Self-evolution algorithm] Description of the self-evolution algorithm: No equations, pseudocode, or formal definition of how the three scalar scores are computed from trajectories or how they trigger distillation/patching steps are provided, preventing assessment of whether the process can silently introduce role-conflict or information-loss patterns while still satisfying the thresholds.
[Case study] Case study section: The architectural compatibility analysis and JiuwenSwarm demonstration establish only that the specification is syntactically portable; they do not test cross-agent execution or multi-iteration self-evolution, so the 'zero-adapter' portability claim remains an unquantified assertion.

minor comments (2)

[Abstract] The abstract would be clearer if it briefly enumerated the specific multi-agent extensions (e.g., role semantics, workflow bounds) added to the base Anthropic Skills standard.
Notation for the scoring dimensions (Effectiveness, Utilization, Freshness) is introduced without explicit formulas or example calculations, which hinders reproducibility even at a conceptual level.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which identify key opportunities to strengthen the rigor and clarity of our presentation on Swarm Skills. We address each major comment point by point below, with commitments to revisions that enhance the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the self-evolution algorithm 'eliminates the need for human-in-the-loop oversight during the refinement process' rests solely on a qualitative case study; no quantitative metrics (success-rate deltas, coordination latency, iteration stability) or ablations on the Effectiveness/Utilization/Freshness scoring functions are reported, leaving the no-degradation guarantee unverified.

Authors: We agree that the no-degradation guarantee would be more robust with quantitative support. The abstract claim reflects the autonomous design of the algorithm, which relies on threshold-based multi-dimensional scoring to trigger distillation and patching without external intervention, as demonstrated qualitatively in the JiuwenSwarm case study. In the revised version, we will add quantitative metrics from extended experiments, including success-rate deltas and iteration stability, along with ablations on the scoring functions to verify their role in preventing degradation. revision: yes
Referee: [Self-evolution algorithm] Description of the self-evolution algorithm: No equations, pseudocode, or formal definition of how the three scalar scores are computed from trajectories or how they trigger distillation/patching steps are provided, preventing assessment of whether the process can silently introduce role-conflict or information-loss patterns while still satisfying the thresholds.

Authors: The referee correctly notes the absence of formal definitions. The manuscript describes the scoring dimensions at a high level but omits explicit computation details and triggering logic. We will revise this section to include equations for Effectiveness (task success weighted by outcome quality), Utilization (efficiency of agent and resource use), and Freshness (temporal decay and novelty), plus pseudocode for the full distillation and patching workflow. This addition will enable evaluation of potential issues such as role conflicts. revision: yes
Referee: [Case study] Case study section: The architectural compatibility analysis and JiuwenSwarm demonstration establish only that the specification is syntactically portable; they do not test cross-agent execution or multi-iteration self-evolution, so the 'zero-adapter' portability claim remains an unquantified assertion.

Authors: The compatibility analysis shows that progressive disclosure enables interpretation of Swarm Skills across frameworks without custom adapters, supporting syntactic and semantic portability. The JiuwenSwarm study illustrates self-evolution in practice. We acknowledge that explicit multi-agent execution tests and multi-iteration tracking would provide stronger validation. In revision, we will expand the case study with cross-framework execution examples and multi-iteration results, while clarifying that the zero-adapter claim is grounded in the compatibility analysis rather than full runtime benchmarks. revision: partial

Circularity Check

0 steps flagged

No circularity; new specification and algorithm are independent constructs

full rationale

The paper introduces Swarm Skills as an extension of the external Anthropic Skills standard and defines a companion self-evolution algorithm driven by Effectiveness, Utilization, and Freshness scores. No equations, derivations, or fitted parameters are shown that reduce by construction to the paper's own inputs or prior self-citations. The central claims rest on an architectural compatibility analysis and qualitative case study rather than any self-referential prediction or uniqueness theorem imported from the authors' prior work. This is a standard non-circular proposal paper whose constructs build on but do not collapse into their own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claims rest on the new specification and algorithm as invented constructs; limited free parameters or axioms are visible in the abstract, but the approach assumes codifiability of coordination and effectiveness of scoring-based evolution.

axioms (2)

domain assumption Multi-agent coordination protocols can be effectively represented as portable, first-class assets with roles, workflows, and execution bounds
Core premise of extending the Anthropic Skills standard to multi-agent settings.
ad hoc to paper Successful execution trajectories can be automatically distilled into improved skills using multi-dimensional scoring without human oversight
Foundational to the self-evolution algorithm described.

invented entities (2)

Swarm Skills no independent evidence
purpose: Portable specification consisting of roles, workflows, execution bounds, and semantic structure for self-evolution
New asset type introduced to codify multi-agent coordination.
Self-evolution algorithm no independent evidence
purpose: Automatically distills trajectories into new skills and patches existing ones based on Effectiveness, Utilization, and Freshness scores
Companion mechanism to operationalize the evolving nature of the specification.

pith-pipeline@v0.9.0 · 5569 in / 1453 out tokens · 34245 ms · 2026-05-12T02:24:03.563192+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 3 internal anchors

[1]

arXiv:2308.10848

Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Haiyao Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, et al. Agentverse: Facilitating multi-agent collab- oration and exploring emergent behaviors.arXiv preprint arXiv:2308.10848, 2023

work page arXiv 2023
[2]

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven K. K. Yau, Zijian Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and J¨ urgen Schiele. Metagpt: Meta programming for a multi-agent collaborative framework. InThe Twelfth International Conference on Learning Represen- tations, 2024

work page 2024
[3]

Camel: Communicative agents for” mind” exploration of large scale language model society

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for” mind” exploration of large scale language model society. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[4]

arXiv preprint arXiv:2409.00872 , year=

Xuechen Liang, Meiling Tao, Yinghui Xia, Tianyu Shi, Jun Wang, and JingSong Yang. Self-evolving agents with reflective and memory-augmented abilities.arXiv preprint arXiv:2409.00872, 2024

work page arXiv 2024
[5]

Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization

Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. InThe Twelfth In- ternational Conference on Learning Representations, 2024

work page 2024
[6]

Coordination engineering: From single agent to elite teams

openJiuwen Team. Coordination engineering: From single agent to elite teams. openJiuwen Community, 2026

work page 2026
[7]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InPro- ceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22, 2023

work page 2023
[8]

ChatDev: Communicative Agents for Software Development

Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, and Maosong Sun. Communicative agents for software development.arXiv preprint arXiv:2307.07924, 2023

work page internal anchor Pith review arXiv 2023
[9]

Toolllm: Facilitating large language models to master 16000+ real-world apis

Yujia Qin, Shiuyang Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. Toolllm: Facilitating large language models to master 16000+ real-world apis. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[10]

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face

Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. InAdvances in Neural Information Processing Systems, volume 36, 2024

work page 2024
[11]

Aflow: Automating agentic workflow generation.arXiv preprint arXiv:2408.08155, 2024

Yinsong Tian, Jian Wang, et al. Aflow: Automating agentic workflow generation.arXiv preprint arXiv:2408.08155, 2024

work page arXiv 2024
[12]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An open-ended embodied agent with large lan- guage models.arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm appli- cations.arXiv preprint arXiv:2308.08155, 2023. 13

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[15]

Evoagent: Towards automatic multi-agent generation via evolutionary algorithms.arXiv preprint arXiv:2406.08155, 2024

Suyu Zhang et al. Evoagent: Towards automatic multi-agent generation via evolutionary algorithms.arXiv preprint arXiv:2406.08155, 2024

work page arXiv 2024
[16]

Expel: Llm agents are experiential learners

Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Qi, and Hao Zhu. Expel: Llm agents are experiential learners.arXiv preprint arXiv:2308.10144, 2023

work page arXiv 2023
[17]

Language agents as optimizable graphs

Mingchen Zhuge, Wenxuan Lin, Boris Guzhov, Hao Dong, Lu Hou, Yixuan Su, and J¨ urgen Schiele. Language agents as optimizable graphs. InForty-first International Conference on Machine Learning, 2024. A Author List Core Contributors.Xinyu Zhang, Zhicheng Dou, Deyang Li, Jianjun Tao, Shuo Cheng, Ruifeng Shi, Fangchao Liu, Enrui Hu, Yangkai Ding, Hongbo Wang,...

work page 2024