hub Canonical reference

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

· 2026 · cs.AI · arXiv 2603.25158

Canonical reference. 89% of citing Pith papers cite this work as background.

19 Pith papers citing it

Background 89% of classified citations

open full Pith review browse 19 citing papers arXiv PDF

abstract

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9

citation-polarity summary

background 8 unclear 1

representative citing papers

Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

cs.AI · 2026-05-19 · unverdicted · novelty 7.0

Identifies library drift as a failure mode in self-evolving LLM skill libraries and shows a governance recipe improves pass@1 from 0.258 to 0.584 on MBPP+ hard-100.

Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.

Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

DataPRM is a new process reward model for data analysis agents that detects silent errors via environment interaction and ternary rewards, yielding 7-11% gains on benchmarks and further RL improvements.

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

cs.AI · 2026-04-19 · unverdicted · novelty 7.0

SkillFlow benchmark shows lifelong skill evolution yields modest gains for some models like Claude Opus 4.6 but limited or negative utility for others despite high skill usage.

From Multi-Agent to Single-Agent: When Is Skill Distillation Beneficial?

cs.AI · 2026-04-02 · unverdicted · novelty 7.0

Metric Freedom (F), quantified via Mantel test on output diversity and score variance, predicts when single-agent skill distillation from multi-agent systems will succeed, enabling up to 8x cost and 15x latency reductions across tested tasks.

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

A systematic study across five domains finds model-generated skills yield average gains but non-uniform negative transfer, with a meta-skill improving extraction quality.

Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

cs.AI · 2026-05-21 · conditional · novelty 6.0

Ratchet provides a minimal hygiene recipe for self-managing skill libraries in frozen LLM agents, delivering +0.328 rolling-mean pass@1 gain on MBPP+ hard-100 and +0.22 peak lift on SWE-bench Verified.

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

cs.AI · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

Insights Generator is a multi-agent system that generates evidence-backed natural-language insights characterizing systematic patterns across corpora of LLM agent execution traces.

SkillEvolver: Skill Learning as a Meta-Skill

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

A meta-skill authors and refines prose-and-code skills for agents by learning from post-deployment failures with an overfit audit, achieving 56.8% accuracy on SkillsBench tasks versus 43.6% for human-curated skills.

SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.

SkillGen: Verified Inference-Time Agent Skill Synthesis

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

SkillGen synthesizes auditable skills from agent trajectories via contrastive induction on successes and failures, then verifies net performance impact by comparing outcomes with and without the skill on identical tasks.

ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

ClawTrace enables cost-aware LLM agent skill distillation by tracing per-step costs and generating preserve, prune, and repair patches, with ablations showing reduced regressions and prune rules transferring to cut costs by 32%.

Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents

cs.AI · 2026-04-17 · conditional · novelty 6.0

The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

cs.CL · 2026-05-18 · unverdicted · novelty 5.0

SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

cs.LG · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

SLIM dynamically optimizes the active external skill set in agentic RL via leave-one-skill-out marginal contribution estimates and lifecycle operations, delivering a 7.1% average gain over baselines on ALFWorld and SearchQA while showing some skills remain externally useful.

Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution

cs.AI · 2026-05-09 · unverdicted · novelty 5.0

Ace-Skill boosts multimodal agent self-evolution via prioritized rollouts with lazy-decay tracking and semantic knowledge clustering, yielding up to 35% relative gains on tool-use benchmarks and zero-shot transfer to smaller models.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

cs.AI · 2026-05-22

Evidence Over Plans: Online Trajectory Verification for Skill Distillation

cs.AI · 2026-05-09

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

cs.IR · 2026-05-08 · 2 refs

citing papers explorer

Showing 19 of 19 citing papers.

Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries cs.AI · 2026-05-19 · unverdicted · none · ref 7 · internal anchor
Identifies library drift as a failure mode in self-evolving LLM skill libraries and shows a governance recipe improves pass@1 from 0.258 to 0.584 on MBPP+ hard-100.
Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents cs.CL · 2026-05-16 · unverdicted · none · ref 1 · internal anchor
SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis cs.CL · 2026-04-27 · unverdicted · none · ref 38 · internal anchor
DataPRM is a new process reward model for data analysis agents that detects silent errors via environment interaction and ternary rewards, yielding 7-11% gains on benchmarks and further RL improvements.
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents cs.AI · 2026-04-19 · unverdicted · none · ref 24 · internal anchor
SkillFlow benchmark shows lifelong skill evolution yields modest gains for some models like Claude Opus 4.6 but limited or negative utility for others despite high skill usage.
From Multi-Agent to Single-Agent: When Is Skill Distillation Beneficial? cs.AI · 2026-04-02 · unverdicted · none · ref 3 · internal anchor
Metric Freedom (F), quantified via Mantel test on output diversity and score variance, predicts when single-agent skill distillation from multi-agent systems will succeed, enabling up to 8x cost and 15x latency reductions across tested tasks.
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills cs.AI · 2026-05-22 · unverdicted · none · ref 9 · internal anchor
A systematic study across five domains finds model-generated skills yield average gains but non-uniform negative transfer, with a meta-skill improving extraction quality.
Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents cs.AI · 2026-05-21 · conditional · none · ref 19 · internal anchor
Ratchet provides a minimal hygiene recipe for self-managing skill libraries in frozen LLM agents, delivering +0.328 rolling-mean pass@1 gain on MBPP+ hard-100 and +0.22 peak lift on SWE-bench Verified.
Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents cs.AI · 2026-05-20 · unverdicted · none · ref 17 · 2 links · internal anchor
Insights Generator is a multi-agent system that generates evidence-backed natural-language insights characterizing systematic patterns across corpora of LLM agent execution traces.
SkillEvolver: Skill Learning as a Meta-Skill cs.AI · 2026-05-11 · unverdicted · none · ref 5 · internal anchor
A meta-skill authors and refines prose-and-code skills for agents by learning from post-deployment failures with an overfit audit, achieving 56.8% accuracy on SkillsBench tasks versus 43.6% for human-curated skills.
SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution cs.CL · 2026-05-11 · unverdicted · none · ref 23 · internal anchor
SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.
SkillGen: Verified Inference-Time Agent Skill Synthesis cs.LG · 2026-05-09 · unverdicted · none · ref 10 · internal anchor
SkillGen synthesizes auditable skills from agent trajectories via contrastive induction on successes and failures, then verifies net performance impact by comparing outcomes with and without the skill on identical tasks.
ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation cs.AI · 2026-04-26 · unverdicted · none · ref 2 · internal anchor
ClawTrace enables cost-aware LLM agent skill distillation by tracing per-step costs and generating preserve, prune, and repair patches, with ablations showing reduced regressions and prune rules transferring to cut costs by 32%.
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents cs.AI · 2026-04-17 · conditional · none · ref 16 · internal anchor
The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution cs.CL · 2026-05-18 · unverdicted · none · ref 38 · internal anchor
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning cs.LG · 2026-05-11 · unverdicted · none · ref 37 · 2 links · internal anchor
SLIM dynamically optimizes the active external skill set in agentic RL via leave-one-skill-out marginal contribution estimates and lifecycle operations, delivering a 7.1% average gain over baselines on ALFWorld and SearchQA while showing some skills remain externally useful.
Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution cs.AI · 2026-05-09 · unverdicted · none · ref 13 · internal anchor
Ace-Skill boosts multimodal agent self-evolution via prioritized rollouts with lazy-decay tracking and semantic knowledge clustering, yielding up to 35% relative gains on tool-use benchmarks and zero-shot transfer to smaller models.
SkillOpt: Executive Strategy for Self-Evolving Agent Skills cs.AI · 2026-05-22 · unreviewed · ref 9 · internal anchor
Evidence Over Plans: Online Trajectory Verification for Skill Distillation cs.AI · 2026-05-09 · unreviewed · ref 9 · internal anchor
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications cs.IR · 2026-05-08 · unreviewed · ref 114 · 2 links · internal anchor

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer