Skilldex: A Package Manager and Registry for Agent Skill Packages with Hierarchical Scope-Based Distribution

Pranav Hemanth; Sampriti Saha

arxiv: 2604.16911 · v1 · submitted 2026-04-18 · 💻 cs.AI

Skilldex: A Package Manager and Registry for Agent Skill Packages with Hierarchical Scope-Based Distribution

Sampriti Saha , Pranav Hemanth This is my paper

Pith reviewed 2026-05-10 07:37 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLM agentsskill packagespackage managerformat conformance scoringskillset abstractionagent toolingregistryhierarchical scopes

0 comments

The pith

Skilldex scores skill packages for format compliance and bundles related skills with shared assets to enforce coherence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Skilldex is a package manager and registry for skill packages that extend LLM agents at runtime. It fills two specific gaps in existing community tools: the lack of any public scorer that checks packages against Anthropic's published format specification and the absence of any bundling mechanism that keeps related skills mutually consistent. The system supplies compiler-style checks that return line-level diagnostics on description specificity, frontmatter validity, and overall structure. It also introduces the skillset abstraction, which groups related skills together with shared vocabulary files, templates, and reference documents. These features are meant to make skill development and distribution more reliable as agents rely on larger numbers of externally loaded packages.

Core claim

Skilldex addresses the gaps by delivering two new mechanisms: compiler-style conformance scoring that produces line-level diagnostics on description specificity, frontmatter validity, and structural adherence, plus the skillset abstraction, which bundles related skills with shared assets to enforce cross-skill behavioral coherence. The package manager and registry also include a three-tier hierarchical scope system, a human-in-the-loop suggestion loop, a metadata-only community registry, and an MCP server, all implemented in a TypeScript CLI with a Hono/Supabase backend.

What carries the argument

The skillset abstraction, a bundled collection of related skills that share assets such as vocabulary files, templates, and reference documents to enforce cross-skill behavioral coherence, together with the compiler-style format conformance scorer that generates line-level diagnostics.

If this is right

Developers receive concrete, line-level feedback that can be used to correct skill packages before publication.
Related skills grouped in a skillset share context and assets, reducing the chance of contradictory instructions.
The three-tier scope system organizes distribution so packages can be scoped to individual, team, or public levels.
The metadata-only registry allows community discovery without requiring full package hosting.
Integration with the MCP server enables broader tool compatibility for agent runtimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread use could create de-facto standards for how skills are written and versioned across the LLM agent ecosystem.
The same scoring and bundling pattern could be adapted to skill systems in other agent frameworks that do not use Anthropic's specification.
Empirical studies could measure whether skillsets actually lower the rate of agent incoherence compared with independently installed skills.
The hierarchical scopes might later support automated dependency resolution or conflict detection between skill packages.

Load-bearing premise

That supplying format conformance scoring and skillset bundling will actually close the stated tooling gaps and that shared assets will in practice enforce behavioral coherence across the bundled skills.

What would settle it

A controlled test in which agents run the same tasks with matched skill packages that either pass or fail the conformance scorer, or with versus without skillset bundling, to check whether high scores or bundling measurably reduce agent errors or behavioral drift.

Figures

Figures reproduced from arXiv: 2604.16911 by Pranav Hemanth, Sampriti Saha.

**Figure 1.** Figure 1: Skilldex system architecture. The CLI and MCP server share all core modules. The registry is a separate service accessed via a typed HTTP client. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

Large Language Model (LLM) agents are increasingly extended at runtime via skill packages, structured natural-language instruction bundles loaded from a well-known directory. Community install tooling and registries exist, but two gaps persist: no public tool scores skill packages against Anthropic's published format specification, and no mechanism bundles related skills with the shared context they need to remain mutually coherent. We present Skilldex, a package manager and registry for agent skill packages addressing both gaps. The two novel contributions are: (1) compiler-style format conformance scoring against Anthropic's skill specification, producing line-level diagnostics on description specificity, frontmatter validity, and structural adherence; and (2) the skillset abstraction, a bundled collection of related skills with shared assets (vocabulary files, templates, reference documents) that enforce cross-skill behavioral coherence. Skilldex also provides supporting infrastructure: a three-tier hierarchical scope system, a human-in-the-loop agent suggestion loop, a metadata-only community registry, and a Model Context Protocol (MCP) server. The system is implemented as a TypeScript CLI (skillpm / spm) with a Hono/Supabase registry backend, and is open-source.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Skilldex adds line-level conformance scoring and skillset bundling to LLM agent tooling in a concrete open-source implementation, but provides no usage data or tests to show real impact.

read the letter

This paper gives us a package manager called Skilldex that scores skill packages for format compliance at the line level and bundles related skills into skillsets with shared assets. The two new pieces are the conformance checker, which produces diagnostics on specificity, validity, and structure, and the skillset abstraction that keeps skills coherent by sharing files and templates. The rest of the system—hierarchical scopes, suggestion loop, registry, and MCP server—is supporting infrastructure built in TypeScript and released open source. What the paper does well is ship a complete, usable implementation instead of stopping at a proposal. It directly responds to the missing scorer and bundling mechanism in current community tools. The soft spots are minor but real: there is no empirical evidence or usage data to show that these features solve the problems in practice. The assumption that skillsets will enforce behavioral coherence is reasonable but untested in the paper. Soundness is based on the code description rather than benchmarks or case studies. This work is for people developing or maintaining LLM agent skills who need better management tools. A practitioner or tool builder would find the implementation details and open-source code valuable. I would send it for peer review. The contributions are specific enough and the system is implemented, so referees can assess the details and suggest improvements like adding evaluations.

Referee Report

2 major / 3 minor

Summary. The manuscript presents Skilldex, an open-source TypeScript CLI (skillpm/spm) and Hono/Supabase-backed registry for LLM agent skill packages. It identifies two gaps in existing community tooling—no public conformance scoring against Anthropic's skill specification and no bundling mechanism for related skills—and claims to close them via (1) compiler-style line-level diagnostics for description specificity, frontmatter validity, and structural adherence, and (2) the skillset abstraction that bundles skills with shared assets (vocabulary files, templates, reference documents) to enforce cross-skill behavioral coherence. Supporting features include a three-tier hierarchical scope system, human-in-the-loop suggestion loop, metadata-only registry, and Model Context Protocol (MCP) server.

Significance. If the described implementation functions as outlined, Skilldex supplies immediately usable infrastructure that could standardize skill package quality and enable coherent multi-skill agents. The open-source release, registry backend, and concrete mechanisms for conformance scoring and asset sharing constitute practical strengths that lower barriers for developers extending LLM agents. These contributions are directly responsive to the stated tooling gaps and may see adoption in the Anthropic-compatible agent ecosystem.

major comments (2)

[Abstract] Abstract and novel-contributions paragraph: the claim that the skillset abstraction 'enforce[s] cross-skill behavioral coherence' is load-bearing for the second stated contribution, yet the manuscript provides no concrete mechanism (validation rules, conflict detection, or runtime checks) showing how shared assets achieve enforcement rather than merely co-location.
[Implementation and features] Throughout the system-description sections: no usage statistics, adoption metrics, or even minimal case-study results are reported, so the assertion that the conformance scorer and skillset mechanism address the identified community gaps rests solely on the existence of the implementation rather than demonstrated outcomes.

minor comments (3)

[Abstract] The acronym MCP is introduced without expansion; define 'Model Context Protocol' on first use.
[Scope system description] A diagram or table illustrating the three-tier hierarchical scope system and how scopes interact with skillsets would improve readability.
[Discussion or conclusion] Consider adding a short 'Limitations' subsection noting that the current registry is metadata-only and that behavioral-coherence enforcement is currently advisory rather than enforced at runtime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address the two major comments point by point below, offering clarifications and committing to targeted revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract and novel-contributions paragraph: the claim that the skillset abstraction 'enforce[s] cross-skill behavioral coherence' is load-bearing for the second stated contribution, yet the manuscript provides no concrete mechanism (validation rules, conflict detection, or runtime checks) showing how shared assets achieve enforcement rather than merely co-location.

Authors: We agree that the abstract phrasing would benefit from greater precision. The skillset mechanism enforces coherence by requiring that all bundled skills and their shared assets (vocabulary files, templates, reference documents) are loaded together as a single unit into the agent's context window. This guarantees that every skill references the identical shared elements, preventing drift that would occur if skills were installed independently. We will revise the abstract and contributions paragraph to explicitly state this bundling-and-co-loading approach as the enforcement mechanism, and we will insert a short illustrative example in the implementation section showing how a skillset maintains consistent behavior across its members. revision: yes
Referee: [Implementation and features] Throughout the system-description sections: no usage statistics, adoption metrics, or even minimal case-study results are reported, so the assertion that the conformance scorer and skillset mechanism address the identified community gaps rests solely on the existence of the implementation rather than demonstrated outcomes.

Authors: We acknowledge that the current manuscript is primarily a systems description. As Skilldex is a newly released open-source tool, longitudinal adoption metrics are not yet available. To address the concern directly, we will add a concise case-study subsection that applies the conformance scorer to representative community skills and demonstrates a skillset in a minimal multi-skill agent configuration. This will provide concrete illustration of gap closure while remaining proportionate to the paper's scope as infrastructure rather than an empirical evaluation. revision: yes

Circularity Check

0 steps flagged

No significant circularity: direct system description with no derivations or self-referential reductions

full rationale

The paper presents Skilldex as an open-source tooling implementation (TypeScript CLI, registry backend, conformance scorer, and skillset bundler) that directly addresses two explicitly stated gaps in existing community tooling. No equations, fitted parameters, predictions, or formal derivations appear in the abstract or described contributions. The two novel elements—line-level format conformance diagnostics and the skillset abstraction—are introduced as concrete engineering responses to the identified absences, without any reduction to prior fitted quantities, self-citations, or ansatzes. The work is self-contained as a system description and implementation report; no load-bearing step collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claims rest on the domain assumption that LLM agents rely on directory-loaded skill packages and that format adherence plus shared assets will produce coherence; no free parameters or invented entities with independent evidence are introduced beyond the new abstractions themselves.

axioms (1)

domain assumption LLM agents are increasingly extended at runtime via skill packages loaded from a well-known directory.
Opening premise of the abstract that frames the problem being solved.

invented entities (2)

skillset no independent evidence
purpose: Bundled collection of related skills with shared assets to enforce cross-skill behavioral coherence
New abstraction introduced as one of the two core contributions.
three-tier hierarchical scope system no independent evidence
purpose: Mechanism for scope-based distribution of skill packages
Supporting infrastructure feature described as part of the system.

pith-pipeline@v0.9.0 · 5510 in / 1458 out tokens · 48389 ms · 2026-05-10T07:37:08.071273+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Skills Are Not Islands: Measuring Dependency and Risk in Agent Skill Supply Chains
cs.SE 2026-07 unverdicted novelty 7.0

The paper defines Agent Skill Supply Chains (ASSCs) and SkillDepAnalyzer to extract and analyze dependency graphs from over 1.43 million LLM agent skills, revealing structural patterns and security signals.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · cited by 1 Pith paper

[1]

Claude Code: Skills and Skill Packages,

Anthropic, “Claude Code: Skills and Skill Packages,” Technical doc- umentation, Anthropic PBC, 2024. [Online]. Available: https://docs. anthropic.com/en/docs/claude-code/skills-overview

work page 2024
[2]

npm: Node Package Manager,

I. Z. Schlueter, “npm: Node Package Manager,” 2010. [Online]. Avail- able: https://www.npmjs.com

work page 2010
[3]

pip: The Python Package Installer,

I. Bicking, “pip: The Python Package Installer,” 2008. [Online]. Avail- able: https://pip.pypa.io

work page 2008
[4]

Cargo: The Rust Package Manager,

The Rust Programming Language, “Cargo: The Rust Package Manager,”

work page
[5]

Available: https://doc.rust-lang.org/cargo

[Online]. Available: https://doc.rust-lang.org/cargo

work page
[6]

Homebrew: The Missing Package Manager for macOS,

M. Howell, “Homebrew: The Missing Package Manager for macOS,”

work page
[7]

Available: https://brew.sh

[Online]. Available: https://brew.sh

work page
[8]

Model Context Protocol (MCP) Specification,

Anthropic, “Model Context Protocol (MCP) Specification,” 2024. [On- line]. Available: https://modelcontextprotocol.io

work page 2024
[9]

LangChain Hub,

LangChain, Inc., “LangChain Hub,” 2023. [Online]. Available: https: //smith.langchain.com/hub

work page 2023
[10]

virtualenv: Virtual Python Environment Builder,

I. Bicking, “virtualenv: Virtual Python Environment Builder,” 2007. [Online]. Available: https://virtualenv.pypa.io

work page 2007
[11]

CSS Cascading and Inheritance Level 5,

World Wide Web Consortium, “CSS Cascading and Inheritance Level 5,” W3C Candidate Recommendation Snapshot, Jan. 2022. [Online]. Available: https://www.w3.org/TR/2022/CR-css-cascade-5-20220113/

work page 2022
[12]

Seshia, Dorsa Sadigh, and S

S. A. Seshia, D. Sadigh, and S. S. Sastry, “Towards verified artificial intelligence,”arXiv preprint arXiv:1606.08514, 2016

work page arXiv 2016
[13]

The sound of silence in online feedback: Estimating trading risks in the presence of reporting bias,

C. Dellarocas and C. A. Wood, “The sound of silence in online feedback: Estimating trading risks in the presence of reporting bias,”Management Science, vol. 54, no. 3, pp. 460–476, 2008

work page 2008
[14]

A look in the mirror: Attacks on package managers,

J. Cappos, J. Samuel, S. Baker, and J. H. Hartman, “A look in the mirror: Attacks on package managers,” inProc. 15th ACM Conf. Computer and Communications Security (CCS), 2008, pp. 565–574

work page 2008
[15]

vercel-labs/skills: The open agent skills tool,

Vercel, “vercel-labs/skills: The open agent skills tool,” 2025. [Online]. Available: https://github.com/vercel-labs/skills

work page 2025

[1] [1]

Claude Code: Skills and Skill Packages,

Anthropic, “Claude Code: Skills and Skill Packages,” Technical doc- umentation, Anthropic PBC, 2024. [Online]. Available: https://docs. anthropic.com/en/docs/claude-code/skills-overview

work page 2024

[2] [2]

npm: Node Package Manager,

I. Z. Schlueter, “npm: Node Package Manager,” 2010. [Online]. Avail- able: https://www.npmjs.com

work page 2010

[3] [3]

pip: The Python Package Installer,

I. Bicking, “pip: The Python Package Installer,” 2008. [Online]. Avail- able: https://pip.pypa.io

work page 2008

[4] [4]

Cargo: The Rust Package Manager,

The Rust Programming Language, “Cargo: The Rust Package Manager,”

work page

[5] [5]

Available: https://doc.rust-lang.org/cargo

[Online]. Available: https://doc.rust-lang.org/cargo

work page

[6] [6]

Homebrew: The Missing Package Manager for macOS,

M. Howell, “Homebrew: The Missing Package Manager for macOS,”

work page

[7] [7]

Available: https://brew.sh

[Online]. Available: https://brew.sh

work page

[8] [8]

Model Context Protocol (MCP) Specification,

Anthropic, “Model Context Protocol (MCP) Specification,” 2024. [On- line]. Available: https://modelcontextprotocol.io

work page 2024

[9] [9]

LangChain Hub,

LangChain, Inc., “LangChain Hub,” 2023. [Online]. Available: https: //smith.langchain.com/hub

work page 2023

[10] [10]

virtualenv: Virtual Python Environment Builder,

I. Bicking, “virtualenv: Virtual Python Environment Builder,” 2007. [Online]. Available: https://virtualenv.pypa.io

work page 2007

[11] [11]

CSS Cascading and Inheritance Level 5,

World Wide Web Consortium, “CSS Cascading and Inheritance Level 5,” W3C Candidate Recommendation Snapshot, Jan. 2022. [Online]. Available: https://www.w3.org/TR/2022/CR-css-cascade-5-20220113/

work page 2022

[12] [12]

Seshia, Dorsa Sadigh, and S

S. A. Seshia, D. Sadigh, and S. S. Sastry, “Towards verified artificial intelligence,”arXiv preprint arXiv:1606.08514, 2016

work page arXiv 2016

[13] [13]

The sound of silence in online feedback: Estimating trading risks in the presence of reporting bias,

C. Dellarocas and C. A. Wood, “The sound of silence in online feedback: Estimating trading risks in the presence of reporting bias,”Management Science, vol. 54, no. 3, pp. 460–476, 2008

work page 2008

[14] [14]

A look in the mirror: Attacks on package managers,

J. Cappos, J. Samuel, S. Baker, and J. H. Hartman, “A look in the mirror: Attacks on package managers,” inProc. 15th ACM Conf. Computer and Communications Security (CCS), 2008, pp. 565–574

work page 2008

[15] [15]

vercel-labs/skills: The open agent skills tool,

Vercel, “vercel-labs/skills: The open agent skills tool,” 2025. [Online]. Available: https://github.com/vercel-labs/skills

work page 2025