pith. machine review for the scientific record. sign in

arxiv: 2605.08887 · v1 · submitted 2026-05-09 · 💻 cs.AI · cs.CL

Recognition: 2 theorem links

· Lean Theorem

Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:22 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords self-evolving agentsmultimodal tool useprioritized samplingknowledge clusteringagent bootstrappingknowledge transfermixture of experts
0
0 comments X

The pith

Ace-Skill jointly prioritizes informative rollouts and clusters knowledge to improve self-evolving multimodal agents and enable transfer to smaller models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that self-evolving multimodal agents suffer from inefficient use of rollout data on uninformative tasks and interference from disorganized knowledge in shared stores. Ace-Skill counters this by using a prioritized sampler that tracks proficiency to focus on valuable samples and a clustered organizer that groups knowledge semantically for better retrieval. If these changes hold, the process becomes self-improving, with higher quality data leading to better agents. The gains appear across benchmarks and transfer zero-shot to smaller models, reducing the resources needed for capable agents.

Core claim

Ace-Skill is a co-evolutionary framework for self-evolving multimodal agents that combines a prioritized sampler using lazy-decay proficiency tracking to allocate rollouts to informative and unmastered tasks with a clustered organizer that semantically groups knowledge artifacts. This joint optimization breaks the cycle where poor rollouts create noisy knowledge that further degrades performance. On four multimodal tool-use benchmarks the approach produces large accuracy improvements, for example a 35.46 percent relative gain in average accuracy, allowing an open-source 35 billion parameter mixture-of-experts model to equal or exceed closed-source models, and the resulting knowledge transfer

What carries the argument

The key machinery is the dual optimization of rollout allocation through prioritized sampling with proficiency tracking and knowledge organization through semantic clustering, which together aim to produce more informative training signals and cleaner knowledge retrieval.

If this is right

  • Strong performance gains on multimodal tool-use tasks.
  • Open-source models can reach parity with proprietary ones.
  • Knowledge transfers to smaller models in zero-shot fashion.
  • Self-evolution becomes more efficient by focusing effort and reducing noise.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be applied to other types of agent tasks beyond tool use.
  • It suggests a path for efficient scaling where larger models generate knowledge that smaller ones inherit.
  • Future work might test whether the clustering reduces specific failure modes like hallucinated tool calls.

Load-bearing premise

The central assumption is that the benefits from better sampling and better organization will compound without the clustering step adding retrieval errors or the prioritization introducing sampling bias.

What would settle it

Running the full system against versions that disable the prioritized sampler or the clustering component separately on the benchmark tasks and measuring whether the combined version shows additive or super-additive gains without increased variance in retrieval quality.

Figures

Figures reproduced from arXiv: 2605.08887 by Feng Xiong, Jinghan He, Liang Lin, Xiangxiang Chu, Xuecai Hu, Yong Wang, Yuan Liu, Zengbin Wang.

Figure 1
Figure 1. Figure 1: Overall performance of ACE-SKILL across four multimodal tool-use benchmarks (Li et al., 2025; Guo et al., 2025; Tao et al., 2025; Su et al., 2026). Both ACE-SKILL variants consistently match or surpass prominent closed-source models (e.g., GPT-5-mini (OpenAI, 2025b), Gemini-2.5-Pro (Comanici et al., 2025), Qwen3.5-Plus (Qwen, Team, 2026)) and open-source models (GLM-4.6V (GLM, Team, 2026)) under Avg@4 and … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the ACE-SKILL framework. ACE-SKILL breaks the vicious cycle between data inefficiency and knowledge interference through a prioritized sampler and a clustered organizer. 2 ACE-SKILL: Self-evolved Agents with Prioritized and Clustered Evolution In this section, we introduce ACE-SKILL, a co-evolutionary framework that bootstraps multimodal agents by jointly optimizing data allocation and knowledg… view at source ↗
Figure 3
Figure 3. Figure 3: Per-sample selection frequency of the Prioritized Evolution Sampler over 4 epochs on four [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation of proposed modules on TIR-Bench (Avg@4). [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hyperparameter sensitivity on TIR￾Bench (Avg@4). Top: difficulty bias γ (ρ = 0.95). Bottom: temporal decay ρ (γ=0.4). minimum exploration floor is necessary to prevent the sampler from prematurely ignoring under-explored samples. Disabling the difficulty bias (w/o difficulty bias) leads to a larger drop (47.50), confirming that up-weighting challenging samples is critical for data utility. On the organizat… view at source ↗
Figure 6
Figure 6. Figure 6: Case Study: Clustered vs. Non-Clustered Pipeline on a Maze Task. From top to bottom: task, activated skills, retrieved experiences, and inference traces. and another is pure cross-type noise, namely a symbol-transcription tip entirely irrelevant to maze solving. In contrast, the clustered system retrieves five tightly focused experiences from the grid-solving cluster, covering automated dimension scoring, … view at source ↗
read the original abstract

Self-evolving agents present a promising path toward continual adaptation by distilling task interactions into reusable knowledge artifacts. In practice, this paradigm remains hindered by two coupled bottlenecks: data inefficiency, where costly rollout effort is disproportionately spent on low-value samples rather than informative ones, and knowledge interference, where heterogeneous knowledge stored in shared repositories leads to noisy retrieval and task-misaligned guidance. Together, these issues form a self-reinforcing failure loop in which uninformative rollouts yield noisy knowledge, which in turn degrades subsequent rollouts. In this work, we introduce Ace-Skill, a co-evolutionary framework that jointly optimizes rollout allocation and knowledge organization for self-evolving multimodal agents. Specifically, Ace-Skill combines aprioritized sampler with lazy-decay proficiency tracking to focus rollouts on informative and insufficiently mastered samples, and a clustered organizer that semantically clusters knowledge for cleaner retrieval and more reliable adaptation. By improving sampling and organization together, Ace-Skill turns self-evolution into a virtuous cycle in which more informative rollouts produce higher-quality knowledge that supports stronger subsequent rollouts. Across four multimodal tool-use benchmarks, Ace-Skill delivers strong gains (e.g., +35.46% relative improvement in Avg@4 accuracy), enabling an opensource 35B MoE model to match or surpass proprietary models. The acquired knowledge also transfers effectively in a zero-shot manner to smaller 9B and 4B models, allowing resource-constrained agents to inherit advanced capabilities without additional training. The code has been publicly available at https://github.com/AMAP-ML/Ace-Skill.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Ace-Skill, a co-evolutionary framework for self-evolving multimodal agents that jointly optimizes rollout allocation via a prioritized sampler (with lazy-decay proficiency tracking) and knowledge organization via semantic clustering. It claims this converts a self-reinforcing failure loop of data inefficiency and knowledge interference into a virtuous cycle, yielding strong empirical gains (e.g., +35.46% relative improvement in Avg@4 accuracy) across four multimodal tool-use benchmarks. These gains allow an open-source 35B MoE model to match or surpass proprietary models, with effective zero-shot transfer of acquired knowledge to smaller 9B and 4B models. The code is publicly released.

Significance. If the results hold under rigorous validation, this work offers a practical advance in efficient bootstrapping of agent capabilities, addressing key bottlenecks in continual adaptation without requiring massive additional data or compute. The public code release aids reproducibility, and the zero-shot transfer result has clear implications for resource-constrained deployment of advanced multimodal agents.

major comments (2)
  1. [§3.2] §3.2 (Prioritized Sampler): The lazy-decay proficiency tracking is described as focusing effort on informative samples, but the manuscript provides no explicit control experiment or metric (e.g., comparison of task distribution entropy before/after prioritization) demonstrating that this mechanism avoids systematic over-allocation to easily clusterable yet unrepresentative samples; this directly bears on whether the virtuous cycle is achieved or merely masks sampling bias.
  2. [§4.2, Table 2] §4.2, Table 2 (Ablations): The ablation isolating the clustered organizer reports gains from reduced retrieval noise, yet lacks an edge-case or OOD query subset evaluation to quantify whether intra-cluster coherence trades off against increased noise on tail tasks; without this, the claim that joint optimization reliably improves over the failure loop remains incompletely supported.
minor comments (2)
  1. [Abstract] Abstract: 'aprioritized sampler' is a typographical error and should read 'a prioritized sampler'.
  2. [§4.1] §4.1: Main result tables would benefit from explicit reporting of run count, standard deviation, and statistical significance tests for the reported relative improvements to strengthen interpretability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will make to strengthen the empirical support for our claims.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Prioritized Sampler): The lazy-decay proficiency tracking is described as focusing effort on informative samples, but the manuscript provides no explicit control experiment or metric (e.g., comparison of task distribution entropy before/after prioritization) demonstrating that this mechanism avoids systematic over-allocation to easily clusterable yet unrepresentative samples; this directly bears on whether the virtuous cycle is achieved or merely masks sampling bias.

    Authors: We appreciate the referee highlighting the need for direct evidence that the lazy-decay mechanism avoids sampling bias. The proficiency tracking prioritizes samples with persistently low performance scores, which are updated only after sufficient attempts, to focus rollouts on insufficiently mastered tasks. To address the concern explicitly, we will add in the revised manuscript a control analysis comparing task distribution entropy before and after prioritization, together with a uniform-sampling baseline. This will quantify whether the sampler increases focus on informative samples without over-allocating to easily clusterable but unrepresentative ones, thereby supporting the claimed virtuous cycle. revision: yes

  2. Referee: [§4.2, Table 2] §4.2, Table 2 (Ablations): The ablation isolating the clustered organizer reports gains from reduced retrieval noise, yet lacks an edge-case or OOD query subset evaluation to quantify whether intra-cluster coherence trades off against increased noise on tail tasks; without this, the claim that joint optimization reliably improves over the failure loop remains incompletely supported.

    Authors: We agree that the ablation would be more complete with an explicit check on tail and OOD queries. The reported gains in Table 2 arise from lower retrieval noise on the primary benchmarks. In the revision we will add an evaluation on an edge-case/OOD subset to measure any trade-off between intra-cluster coherence and performance on underrepresented tasks. This additional result will provide stronger evidence that the joint optimization reliably mitigates the failure loop. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical gains rest on benchmark outcomes rather than self-referential derivations

full rationale

The paper introduces Ace-Skill as an empirical co-evolutionary framework that combines a prioritized sampler (with lazy-decay proficiency tracking) and a clustered organizer to address data inefficiency and knowledge interference in self-evolving multimodal agents. The central claims consist of reported performance improvements across four tool-use benchmarks (e.g., +35.46% relative Avg@4 accuracy) and zero-shot transfer to smaller models. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described method. The virtuous-cycle narrative is presented as the intended outcome of the joint optimization, validated externally via benchmarks rather than reducing to its own inputs by construction. This is a standard empirical proposal with independent experimental support.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard machine-learning assumptions about the value of focused sampling and semantic organization; no explicit free parameters, new physical entities, or ad-hoc axioms are named in the abstract.

axioms (1)
  • domain assumption Semantic clustering of stored knowledge produces cleaner retrieval and more reliable task guidance than unorganized storage.
    This is the core premise of the clustered organizer component.

pith-pipeline@v0.9.0 · 5606 in / 1213 out tokens · 42621 ms · 2026-05-12T01:22:00.813360+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 12 internal anchors

  1. [1]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631,

  2. [2]

    OpenAI GPT-5 System Card

    OpenAI. Introducing openai o3 and o4-mini. 2025a. URL https://openai.com/index/ introducing-o3-and-o4-mini/. Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaugh- lin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267,

  3. [3]

    Kimi K2.5: Visual Agentic Intelligence

    Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al. Kimi k2.5: Visual agentic intelligence.arXiv preprint arXiv:2602.02276,

  4. [4]

    arXiv preprint arXiv:2511.01833 (2025)

    URL https: //www.zhipuai.cn/zh/research/144. Ming Li, Jike Zhong, Shitian Zhao, Haoquan Zhang, Shaoheng Lin, Yuxiang Lai, Chen Wei, Konstantinos Psounis, and Kaipeng Zhang. Tir-bench: A comprehensive benchmark for agentic thinking-with-images reasoning.arXiv preprint arXiv:2511.01833,

  5. [5]

    Agentvista: Evaluating multimodal agents in ultra- challenging realistic visual scenarios.arXiv preprint arXiv:2602.23166,

    Zhaochen Su, Jincheng Gao, Hangyu Guo, Zhenhua Liu, Lueyang Zhang, Xinyu Geng, Shijue Huang, Peng Xia, Guanyu Jiang, Cheng Wang, et al. Agentvista: Evaluating multimodal agents in ultra- challenging realistic visual scenarios.arXiv preprint arXiv:2602.23166,

  6. [6]

    Mmsearch-plus: Benchmarking provenance-aware search for multimodal browsing agents.arXiv preprint arXiv:2508.21475, 2025

    Xijia Tao, Yihua Teng, Xinxing Su, Xinyu Fu, Jihao Wu, Chaofan Tao, Ziru Liu, Haoli Bai, Rui Liu, and Lingpeng Kong. Mmsearch-plus: Benchmarking provenance-aware search for multimodal browsing agents.arXiv preprint arXiv:2508.21475,

  7. [7]

    arXiv preprint arXiv:2510.12712 (2025)

    12 Xingang Guo, Utkarsh Tyagi, Advait Gosai, Paula Vergara, Jayeon Park, Ernesto Gabriel Hernández Montoya, Chen Bo Calvin Zhang, Bin Hu, Yunzhong He, Bing Liu, et al. Beyond seeing: Evaluating multimodal llms on tool-enabled image perception, transformation, and reasoning.arXiv preprint arXiv:2510.12712,

  8. [9]

    Adaptation of agentic ai: A survey of post-training, memory, and skills.arXiv preprint arXiv:2512.16301, 2026a

    Pengcheng Jiang, Jiacheng Lin, Zhiyi Shi, Zifeng Wang, Luxi He, Yichen Wu, Ming Zhong, Peiyang Song, Qizheng Zhang, Heng Wang, Xueqiang Xu, Hanwen Xu, Pengrui Han, Dylan Zhang, Jiashuo Sun, Chaoqi Yang, Kun Qian, Tian Wang, Changran Hu, Manling Li, Quanzheng Li, Hao Peng, Sheng Wang, Jingbo Shang, Chao Zhang, Jiaxuan You, Liyuan Liu, Pan Lu, Yu Zhang, Hen...

  9. [10]

    174197995.54372833

    URL https://doi.org/10.36227/techrxiv. 177203250.05832634/v1. Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652,

  10. [11]

    Memento: Fine-tuning LLM agents without fine-tuning LLMs.arXiv, 2025

    Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, et al. Memento: Fine-tuning llm agents without fine-tuning llms, 2025.URL https://arxiv. org/abs/2508.16153. Guanyu Jiang, Zhaochen Su, Xiaoye Qu, and Yi R Fung. Xskill: Continual learning from experience and skills in multimodal agents....

  11. [12]

    URLhttps://arxiv.org/abs/2512.18746

    Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, and Shuicheng Yan. Memevolve: Meta-evolution of agent memory systems.arXiv preprint arXiv:2512.18746,

  12. [13]

    Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

    Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Erchao Zhao, Xiaoxi Jiang, and Guanjun Jiang. Trace2skill: Distill trajectory-local lessons into transferable agent skills.arXiv preprint arXiv:2603.25158,

  13. [14]

    SkillX: Automatically Constructing Skill Knowledge Bases for Agents

    Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, and Shumin Deng. Skillx: Automatically constructing skill knowledge bases for agents.arXiv preprint arXiv:2604.04804, 2026a. YanZhao Zheng, ZhenTao Zhang, Chao Ma, YuanQiang Yu, JiHuai Zhu, Yong Wu, Tianze Xu, Baohua Dong, Hangcheng ...

  14. [15]

    DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

    13 Hao Zheng, Guozhao Mo, Xinru Yan, Qianhao Yuan, Wenkai Zhang, Xuanang Chen, Yaojie Lu, Hongyu Lin, Xianpei Han, and Le Sun. Deeppresenter: Environment-grounded reflection for agentic presenta- tion generation.arXiv preprint arXiv:2602.22839, 2026b. Yifan Du, Zikang Liu, Jinbiao Peng, Jie Wu, Junyi Li, Jinyang Li, Wayne Xin Zhao, and Ji-Rong Wen. Toward...

  15. [16]

    Skill0: In-context agentic reinforcement learning for skill internalization.arXiv preprint arXiv:2604.02268, 2026

    Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, and Yongliang Shen. Skill0: In-context agentic reinforcement learning for skill internalization.arXiv preprint arXiv:2604.02268, 2026a. Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, and Dongbin Zhao. Dynamic ...

  16. [17]

    Memcollab: Cross-agent memory collaboration via contrastive trajectory distillation.arXiv preprint arXiv:2603.23234,

    Yurui Chang, Yiran Wu, Qingyun Wu, and Lu Lin. Memcollab: Cross-agent memory collaboration via contrastive trajectory distillation.arXiv preprint arXiv:2603.23234,

  17. [18]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    OpenAI. Introducing gpt-5. 2025b. URLhttps://openai.com/index/introducing-gpt-5/. Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capab...

  18. [19]

    URL https://qwen.ai/blog?id=qwen3. 5/. Huichi Zhou, Siyuan Guo, Anjie Liu, Zhongwei Yu, Ziqin Gong, Bowen Zhao, Zhixun Chen, Menglong Zhang, Yihang Chen, Jinsong Li, et al. Memento-skills: Let agents design agents.arXiv preprint arXiv:2603.18743,

  19. [20]

    Single-stream policy optimization.arXiv preprint arXiv:2509.13232,

    Zhongwen Xu and Zihan Ding. Single-stream policy optimization.arXiv preprint arXiv:2509.13232,

  20. [21]

    SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

    Ziyu Ma, Shidong Yang, Yuxiang Ji, Xucong Wang, Yong Wang, Yiming Hu, Tongwen Huang, and Xiangx- iang Chu. Skillclaw: Let skills evolve collectively with agentic evolver.arXiv preprint arXiv:2604.08377,

  21. [22]

    arXiv preprint arXiv:2603.28088 , year=

    Zefeng He, Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Yu Cheng, and Yang Yang. Gems: Agent-native multimodal generation with memory and skills.arXiv preprint arXiv:2603.28088,

  22. [23]

    MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

    Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, and Wenya Wang. Memskill: Learning and evolving memory skills for self-evolving agents.arXiv preprint arXiv:2602.02474, 2026b. Hanchen Li, Runyuan He, Qizheng Zhang, Changxiu Ji, Qiuyang Mang, Xiaokun Chen, Lakshya A. Agrawal, Wei-Liang Liao, Eric Yang, Alvin Cheung, James Zou, ...

  23. [24]

    SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, and Guangsheng Yu. Sok: Agentic skills – beyond tool use in llm agents.arXiv preprint arXiv:2602.20867, 2026c. Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, and Chen Dai. Contractskill: Repairable contract-based skills for multimodal web agents.arXiv preprint arXiv:2603.20340, 2026b...

  24. [25]

    Polyskill: Learning generalizable skills through polymor- phic abstraction.arXiv preprint arXiv:2510.15863,

    Simon Yu, Gang Li, Weiyan Shi, and Peng Qi. Polyskill: Learning generalizable skills through polymor- phic abstraction.arXiv preprint arXiv:2510.15863,

  25. [26]

    CoEvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

    Hanrong Zhang, Shicheng Fan, Henry Peng Zou, Yankai Chen, Zhenting Wang, Jiayu Zhou, Chengze Li, Wei-Chieh Huang, Yifei Yao, Kening Zheng, et al. Evoskills: Self-evolving agent skills via co- evolutionary verification.arXiv preprint arXiv:2604.01687, 2026c. Tik Yu Yim, Wenting Tan, Sum Yee Chan, Tak-Wah Lam, and Siu Ming Yiu. Asda: Automated skill distill...

  26. [27]

    SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

    URL https://openreview. net/forum?id=nfURupkdRJ. Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, et al. Skillrl: Evolving agents via recursive skill-augmented reinforcement learning.arXiv preprint arXiv:2602.08234, 2026a. Yu Li, Rui Miao, Zhengling Qi, and Tian Lan. Arise: Agent rea...

  27. [28]

    Towards effective experiential learning: Dual guidance for utilization and internalization

    Fei Bai, Zhipeng Chen, Chuan Hao, Ming Yang, Ran Tao, Bryan Dai, Wayne Xin Zhao, Jian Yang, and Hongteng Xu. Towards effective experiential learning: Dual guidance for utilization and internalization. arXiv preprint arXiv:2603.24093,

  28. [29]

    Metaclaw: Just talk–an agent that meta-learns and evolves in the wild.arXiv preprint arXiv:2603.17187, 2026b

    Peng Xia, Jianwen Chen, Xinyu Yang, Haoqin Tu, Jiaqi Liu, Kaiwen Xiong, Siwei Han, Shi Qiu, Haonian Ji, Yuyin Zhou, et al. Metaclaw: Just talk–an agent that meta-learns and evolves in the wild.arXiv preprint arXiv:2603.17187, 2026b. Rui Ge, Yichao Fu, Yuyang Qian, Junda Su, Yiming Zhao, Peng Zhao, and Hao Zhang. Internalizing agency from reflective experi...