Of 322 candidate submissions from 105 contributors, 86 tasks passed all review stages and were included in the final benchmark (26.7% acceptance rate)

Benchmark report: For each task, reviewers produce a structured report documenting oracle results, agent pass rates with, without Skills, failure analysis, a final verdict (

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

cs.AI · 2026-02-13 · unverdicted · novelty 7.0

Curated Skills boost LLM agent pass rates by 16.2pp on average across 86 tasks but self-generated Skills provide no benefit, with large variation by domain and some negative effects.

citing papers explorer

Showing 1 of 1 citing paper.

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks cs.AI · 2026-02-13 · unverdicted · none · ref 3
Curated Skills boost LLM agent pass rates by 16.2pp on average across 86 tasks but self-generated Skills provide no benefit, with large variation by domain and some negative effects.

Of 322 candidate submissions from 105 contributors, 86 tasks passed all review stages and were included in the final benchmark (26.7% acceptance rate)

fields

years

verdicts

representative citing papers

citing papers explorer