SkillOpt: Executive Strategy for Self-Evolving Agent Skills

· 2026 · cs.AI · arXiv 2605.23904

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells and beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside Claude Code. Transfer experiments further show that optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and to a nearby math benchmark without further optimization. Code: https://aka.ms/skillopt

representative citing papers

VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents

cs.RO · 2026-06-03 · unverdicted · novelty 7.0

VASO is a verification-guided self-evolution framework for LLM robot skill contracts that reaches 97.2% formal-specification compliance on Jackal and quadcopter tasks using under 100 samples.

Auto-Configuring Scientific Simulators with Lightweight Coding-Agent Adapters

cs.AI · 2026-06-08 · unverdicted · novelty 6.0 · 2 refs

SIGA is a coding-agent adapter using retrieval, procedural memory, and validation gates that raises success rate on GEOS from 0.720 to 0.789 while cutting variance 16x and matching expert quality in minutes instead of hours.

SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

cs.CL · 2026-05-31 · unverdicted · novelty 6.0

SkillAdaptor introduces step-level failure attribution and targeted skill updates for LLM agents, yielding performance gains on WebShop, PinchBench, and Claw-Eval benchmarks.

Governed Evolution of Agent Runtimes through Executable Operational Cognition

cs.SE · 2026-05-26 · unverdicted · novelty 4.0

Introduces HarnessMutation as a governed mechanism for lifecycle-aware runtime adaptation in agent systems, modeling evolution as a bounded observable process over persistent operational memory.

Odyssey: Constructing Verifiable Local Truth-Preserving Foundation Models

cs.AI · 2026-06-25 · unverdicted · novelty 3.0

ODYSSEY is a sheaf-theoretic framework for building verifiable foundation models as compositions of foundries via left and right Kan extensions.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Governed Evolution of Agent Runtimes through Executable Operational Cognition cs.SE · 2026-05-26 · unverdicted · none · ref 12 · internal anchor
Introduces HarnessMutation as a governed mechanism for lifecycle-aware runtime adaptation in agent systems, modeling evolution as a bounded observable process over persistent operational memory.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

fields

years

verdicts

representative citing papers

citing papers explorer