SkillReducer: Optimizing LLM Agent Skills for Token Efficiency

Pingchuan Ma; Shuai Wang; Yuanyuan Yuan; Yudong Gao; Zimo Ji; Zongjie Li

arxiv: 2603.29919 · v2 · pith:MWM6IA7Mnew · submitted 2026-03-31 · 💻 cs.SE

SkillReducer: Optimizing LLM Agent Skills for Token Efficiency

Yudong Gao , Zongjie Li , Yuanyuan Yuan , Zimo Ji , Pingchuan Ma , Shuai Wang This is my paper

classification 💻 cs.SE

keywords contentskillsagentskillreducerbodycompressioncontextdescriptions

0 comments

read the original abstract

LLM-based coding agents rely on \emph{skills}, pre-packaged instruction sets that extend agent capabilities, yet every token of skill content injected into the context window incurs both monetary cost and attention dilution. To understand the severity of this problem, we conduct a large-scale empirical study of 55,315 publicly available skills and find systemic inefficiencies: 26.4\% lack routing descriptions entirely, over 60\% of body content is non-actionable, and reference files can inject tens of thousands of tokens per invocation. Motivated by these findings, we present \textsc{SkillReducer}, a two-stage optimization framework. Stage~1 optimizes the routing layer by compressing verbose descriptions and generating missing ones via adversarial delta debugging. Stage~2 restructures skill bodies through taxonomy-driven classification and progressive disclosure, separating actionable core rules from supplementary content loaded on demand, validated by faithfulness checks and a self-correcting feedback loop. Evaluated on 600 skills and the SkillsBench benchmark, \textsc{SkillReducer} achieves 48\% description compression and 39\% body compression while improving functional quality by 2.8\%, revealing a \emph{less-is-more} effect where removing non-essential content reduces distraction in the context window. These benefits transfer across five models from four families with a mean retention of 0.965, and generalize to an independent agent framework.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation
cs.AI 2026-06 unverdicted novelty 6.0

AFTER benchmark shows single refinement improves LLM agent performance by 3.7-6.7 points and multi-model procedural skills reach 73.1% cross-model accuracy on 382 tasks.
Attacking the Trusted Imagination: Oracle-Level Integrity Attacks on Imagine-then-Act World Models
cs.LG 2026-06 unverdicted novelty 6.0

Attacks can corrupt the latent future trajectory imagined by world-action models in VLA policies, causing failures in oracles like MPC while the reactive policy stays intact.
SkillAxe: Sharpening LLM-Authored Agent Skills Through Evaluation-Guided Self-Refinement
cs.MA 2026-06 unverdicted novelty 6.0

SkillAxe is an unsupervised framework that decomposes LLM skill quality into four dimensions to generate improvement briefs, raising pass rates 28% relative on SkillsBench and from 16% to 52% on SpreadsheetBench.
SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

SkillMaster enables LLM agents to autonomously develop skills via trajectory review, counterfactual evaluation, and DualAdv-GRPO training, boosting success rates by 8.8% on ALFWorld and 9.3% on WebShop.
SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

SkillMaster is a training framework that lets LLM agents autonomously propose, update, and apply skills, yielding 8.8% and 9.3% higher success rates on ALFWorld and WebShop than prior methods.
ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era
cs.AI 2026-04 unverdicted novelty 6.0

ObjectGraph is a Markdown superset file format that represents documents as traversable knowledge graphs, achieving up to 95.3% token reduction for agents with no significant accuracy loss.
ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture
cs.AI 2026-04 unverdicted novelty 5.0

ANX introduces a protocol-first design with 3EX architecture that cuts token consumption by 47-66% and execution time by 58% versus prior methods in form-filling tests.
Written by AI, Managed by AI: Semantic Space Control and Index Sickness Elimination Across 391 Consecutive Sessions
cs.SE 2026-06 unverdicted novelty 3.0

Single-project case study identifies Index Sickness from complex symbolic LLM management and reports that Baseline-Log Physical Separation reduced instructions by 75% with no recurrence observed.