hub Canonical reference

Skillcraft: Can llm agents learn to use tools skillfully?

Shiqi Chen, Jingze Gai, Ruochen Zhou, Jinghan Zhang, Tongyao Zhu, Junlong Li, Kangrui Wang, Zihan Wang, Zhengyu Chen, Klara Kaleb, Ning Miao, Siyang Gao, Cong Lu, Manling Li, Junxian He, Yee Whye Teh · 2026 · arXiv 2603.00718

Canonical reference. 100% of citing Pith papers cite this work as background.

16 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9

citation-polarity summary

background 9

representative citing papers

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning

cs.CL · 2026-05-30 · unverdicted · novelty 7.0

SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

SkillEvolBench is a new diagnostic benchmark that evaluates the transition from episodic experience to procedural skills in LLM agents using role-conditioned task families and frozen deployment tests.

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.

Tools as Continuous Flow for Evolving Agentic Reasoning

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

FlowAgent models tool chaining as continuous latent trajectory generation with conditional flow matching to deliver global planning, formal utility bounds, and better robustness on long-horizon tasks, plus a new plan-level benchmark.

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

cs.AI · 2026-04-19 · unverdicted · novelty 7.0

SkillFlow benchmark shows lifelong skill evolution yields modest gains for some models like Claude Opus 4.6 but limited or negative utility for others despite high skill usage.

SkillAxe: Sharpening LLM-Authored Agent Skills Through Evaluation-Guided Self-Refinement

cs.MA · 2026-06-09 · unverdicted · novelty 6.0

SkillAxe is an unsupervised framework that decomposes LLM skill quality into four dimensions to generate improvement briefs, raising pass rates 28% relative on SkillsBench and from 16% to 52% on SpreadsheetBench.

SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup?

cs.SE · 2026-05-25 · unverdicted · novelty 6.0

SetupX presents an experiential learning framework for LLM agents that reaches 92% pass rate on functionality-correct repository setup by transferring verified fixes across repositories via XPU representations, LIFO Docker snapshots, and Prosecutor-Judge verification.

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

A systematic study across five domains finds model-generated skills yield average gains but non-uniform negative transfer, with a meta-skill improving extraction quality.

Skill-R1: Agent Skill Evolution via Reinforcement Learning

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Skill-R1 applies bi-level group-relative policy optimization to evolve skills recurrently from verified outcomes, yielding gains over baselines on multi-step tasks.

Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

cs.AI · 2026-05-10 · unverdicted · novelty 6.0

Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution under GPT-5.1.

Evidence Over Plans: Online Trajectory Verification for Skill Distillation

cs.AI · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

SPARK generates environment-verified trajectories to compute PDI, enabling posterior skill distillation that outperforms no-skill baselines and human-written skills across 86 tasks with up to 1000x cheaper inference.

SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology

cs.AI · 2026-04-19 · unverdicted · novelty 6.0

SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

cs.CL · 2026-05-18 · unverdicted · novelty 5.0

SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.

PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

Agent Skill Evaluation and Evolution: Frameworks and Benchmarks

cs.CL · 2026-06-09 · unverdicted · novelty 4.0

The paper surveys skill evolution frameworks in agentic systems, grouping them into execution feedback, trajectory distillation, compression, and reinforcement learning paradigms while analyzing gaps across six benchmark categories.

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

cs.IR · 2026-05-08 · unverdicted · novelty 3.0 · 3 refs

A survey that defines agent skills as reusable procedural artifacts and reviews methods, resources, and applications across their representation, acquisition, retrieval, and evolution stages.

citing papers explorer

Showing 16 of 16 citing papers after filters.

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning cs.CL · 2026-05-30 · unverdicted · none · ref 38
SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.
SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills cs.AI · 2026-05-22 · unverdicted · none · ref 28
SkillEvolBench is a new diagnostic benchmark that evaluates the transition from episodic experience to procedural skills in LLM agents using role-conditioned task families and frozen deployment tests.
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents cs.AI · 2026-05-11 · unverdicted · none · ref 40
OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
Tools as Continuous Flow for Evolving Agentic Reasoning cs.AI · 2026-05-08 · unverdicted · none · ref 19
FlowAgent models tool chaining as continuous latent trajectory generation with conditional flow matching to deliver global planning, formal utility bounds, and better robustness on long-horizon tasks, plus a new plan-level benchmark.
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents cs.AI · 2026-04-19 · unverdicted · none · ref 6
SkillFlow benchmark shows lifelong skill evolution yields modest gains for some models like Claude Opus 4.6 but limited or negative utility for others despite high skill usage.
SkillAxe: Sharpening LLM-Authored Agent Skills Through Evaluation-Guided Self-Refinement cs.MA · 2026-06-09 · unverdicted · none · ref 9
SkillAxe is an unsupervised framework that decomposes LLM skill quality into four dimensions to generate improvement briefs, raising pass rates 28% relative on SkillsBench and from 16% to 52% on SpreadsheetBench.
SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup? cs.SE · 2026-05-25 · unverdicted · none · ref 13
SetupX presents an experiential learning framework for LLM agents that reaches 92% pass rate on functionality-correct repository setup by transferring verified fixes across repositories via XPU representations, LIFO Docker snapshots, and Prosecutor-Judge verification.
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills cs.AI · 2026-05-22 · unverdicted · none · ref 14
A systematic study across five domains finds model-generated skills yield average gains but non-uniform negative transfer, with a meta-skill improving extraction quality.
Skill-R1: Agent Skill Evolution via Reinforcement Learning cs.LG · 2026-05-10 · unverdicted · none · ref 1
Skill-R1 applies bi-level group-relative policy optimization to evolve skills recurrently from verified outcomes, yielding gains over baselines on multi-step tasks.
Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation cs.AI · 2026-05-10 · unverdicted · none · ref 14
Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution under GPT-5.1.
Evidence Over Plans: Online Trajectory Verification for Skill Distillation cs.AI · 2026-05-09 · unverdicted · none · ref 3 · 2 links
SPARK generates environment-verified trajectories to compute PDI, enabling posterior skill distillation that outperforms no-skill baselines and human-written skills across 86 tasks with up to 1000x cheaper inference.
SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology cs.AI · 2026-04-19 · unverdicted · none · ref 6
SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution cs.CL · 2026-05-18 · unverdicted · none · ref 9
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents cs.CL · 2026-05-08 · unverdicted · none · ref 19
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
Agent Skill Evaluation and Evolution: Frameworks and Benchmarks cs.CL · 2026-06-09 · unverdicted · none · ref 1
The paper surveys skill evolution frameworks in agentic systems, grouping them into execution feedback, trajectory distillation, compression, and reinforcement learning paradigms while analyzing gaps across six benchmark categories.
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications cs.IR · 2026-05-08 · unverdicted · none · ref 44 · 3 links
A survey that defines agent skills as reusable procedural artifacts and reviews methods, resources, and applications across their representation, acquisition, retrieval, and evolution stages.

Skillcraft: Can llm agents learn to use tools skillfully?

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer