pith. sign in

arxiv: 2607.01136 · v1 · pith:FOYP7SXZnew · submitted 2026-07-01 · 💻 cs.SE · cs.AI

Skills Are Not Islands: Measuring Dependency and Risk in Agent Skill Supply Chains

Pith reviewed 2026-07-02 08:04 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords agent skillsdependency graphssupply chainsLLM agentsSBOMsecurity analysissoftware dependenciesskill metadata
0
0 comments X

The pith

Agent skills form supply chains with hidden dependencies and security risks missed by isolated inspection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LLM agent skills are dependency-bearing artifacts whose relationships form Agent Skill Supply Chains spanning skills, packages, and services. It presents SkillDepAnalyzer, which mines natural-language evidence from skill packages to build these graphs, and validates the approach on the SKILL-DEP benchmark where it recovers metadata and dependencies more accurately than baselines or existing SBOM tools. When scaled to over 1.43 million skills, the resulting chains expose four structural patterns including concentrated reuse, recursive expansion, and workflow clusters, plus security signals that single-skill views overlook. The authors use these findings to identify persistent malicious skills and propose governance practices such as typed manifests and audit commands.

Core claim

Skills are not islands but participate in mixed dependency graphs called Agent Skill Supply Chains. SkillDepAnalyzer extracts skill metadata and these graphs from natural-language descriptions, recovering them accurately on the SKILL-DEP benchmark and outperforming LLM baselines and package-centric SBOM tools. Application to more than 1.43 million skills reveals four patterns—activation-ready but governance-poor metadata, spanning dependencies with concentrated reuse, recursive reuse that hides package inventory, and workflow-based clusters—along with security signals invisible when inspecting skills alone, enabling reports of known malicious skills to developers.

What carries the argument

Agent Skill Supply Chains (ASSCs) as mixed skill-package-service dependency graphs, built by SkillDepAnalyzer extracting evidence from natural-language descriptions in skill packages.

If this is right

  • Dependency graphs span skill, package, and service dependencies with concentrated reuse.
  • Recursive skill reuse expands dependency graphs and creates hidden package inventory.
  • Skill dependency clusters form around related workflows.
  • Inspecting a skill alone misses security-relevant signals hiding in its dependencies.
  • Typed dependency manifests, first-class dependency-cluster management, risk-warning audit commands, and lockfile-like records can address the identified governance gaps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the same extraction approach to other LLM artifacts such as tools or prompt libraries could surface comparable supply-chain risks.
  • Concentrated reuse in the graphs may create single points of failure that propagate across many agent deployments.
  • Periodic re-analysis of the full set of skills could track how malicious entries persist or spread through the chains over time.
  • The structural patterns suggest that workflow-specific cluster detection could support targeted security scanning.

Load-bearing premise

Natural-language descriptions in skill packages contain sufficient, accurate, and unambiguous evidence of dependencies on other skills, packages, and services to allow reliable automated extraction and graph construction without substantial false positives or omissions.

What would settle it

Running SkillDepAnalyzer on a held-out set of skills with independently verified manual dependency annotations and measuring whether precision or recall falls substantially below the levels reported on the SKILL-DEP benchmark.

Figures

Figures reproduced from arXiv: 2607.01136 by Changguo Jia, Minghui Zhou, Runzhi He, Tianqi Zhao.

Figure 1
Figure 1. Figure 1: SDA generation workflow. candidates are then classified into three dependency chan￾nels—packages, skills, and external services—while ambiguous candidates are retained as annotations. Package dependencies are analyzed from package-specific evidence patterns, including package managers, installation commands, manifest entries, and Docker snippets. Strong evidence, such as installation commands and manifest … view at source ↗
read the original abstract

Agent skills package reusable operational knowledge for Large Language Model (LLM) agents, yet as they grow in scope, they become dependency-bearing artifacts whose identities, versions, and provenance remain implicit. This opacity already causes duplicated dependencies and inconsistent installations, exposing a gap that dependency management has yet to close. We introduce Agent Skill Supply Chains (ASSCs) to characterize mixed skill-package-service dependency graphs and help close this gap. Borrowing from Software Bill of Materials (SBOMs), we design SkillDepAnalyzer to capture natural-language dependency evidence and model skills as dependency-bearing artifacts. On the SKILL-DEP benchmark, SkillDepAnalyzer recovers skill metadata and dependency graphs accurately and comprehensively, substantially outperforming an LLM-based baseline and package-centric SBOM tools. Applying SkillDepAnalyzer to over 1.43 million skills, we obtain ASSCs and explore their structural diversity and security signals. We find four structural patterns: skill metadata is activation-ready but governance-poor; dependency graphs span skill, package, and service dependencies with concentrated reuse; recursive skill reuse expands dependency graphs and creates hidden package inventory; and skill dependency clusters form around related workflows. We also find that inspecting a skill alone misses security-relevant signals hiding in its dependencies. By analyzing ASSCs, we identify and report known malicious skills persisting in ASSCs to their developers. Based on these findings, we recommend typed dependency manifests, first-class dependency-cluster management, risk-warning audit commands for skill infrastructure maintainers, and lockfile-like records for skill developers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Agent Skill Supply Chains (ASSCs) as mixed dependency graphs spanning skills, packages, and services for LLM agents. It presents SkillDepAnalyzer, which extracts dependency metadata and graphs from natural-language descriptions in skill packages by borrowing SBOM concepts. On the SKILL-DEP benchmark the tool is claimed to recover metadata and graphs accurately and comprehensively while substantially outperforming an LLM baseline and package-centric SBOM tools. The analyzer is then applied to >1.43 million skills to surface four structural patterns (activation-ready but governance-poor metadata; concentrated reuse across skill/package/service edges; recursive reuse creating hidden inventory; workflow-based clusters) and security signals (known malicious skills and risks missed by isolated inspection). Recommendations include typed manifests, cluster management, audit commands, and lockfiles.

Significance. If the natural-language extraction step is shown to be reliable, the work supplies the first large-scale empirical map of dependency structure in the agent-skill ecosystem and demonstrates concrete security and governance gaps that isolated skill inspection cannot detect. The scale (1.43 M skills) and the explicit linkage of structural patterns to risk signals constitute a useful contribution to the emerging literature on LLM-agent supply-chain security, analogous to SBOM efforts in conventional software. The absence of quantitative benchmark metrics and error analysis in the abstract, however, leaves the central empirical claim under-supported at present.

major comments (3)
  1. [§4] §4 (SKILL-DEP benchmark) and the abstract: the claim that SkillDepAnalyzer 'recovers skill metadata and dependency graphs accurately and comprehensively' and 'substantially outperforming' baselines is load-bearing for both the benchmark result and the downstream 1.43 M-skill patterns, yet the manuscript provides no precision/recall/F1 numbers, confusion-matrix breakdown, or description of how ground-truth dependency graphs were constructed. Without these, it is impossible to assess whether the reported outperformance is robust to the acknowledged ambiguities in natural-language skill descriptions.
  2. [§5] §5 (large-scale ASSC analysis): the identification of the four structural patterns and the security-signal claims rest directly on the output of the same extraction pipeline. If false-positive or false-negative edges are common (e.g., implicit service references, version ambiguity, or skill-name collisions), the reported concentration of reuse, recursive expansion, and hidden malicious dependencies could be artifacts of the extraction method rather than properties of the skill corpus.
  3. [§3] §3 (SkillDepAnalyzer design): the weakest assumption—that natural-language descriptions contain sufficient, accurate, and unambiguous evidence of dependencies—is stated but not stress-tested with an inter-annotator agreement study or a held-out manual validation set beyond the benchmark. A concrete error analysis (e.g., rate of missed transitive package dependencies) is required before the security recommendations can be considered actionable.
minor comments (2)
  1. [Abstract] The abstract states 'over 1.43 million skills' without a precise count or sampling frame; the methods section should report the exact corpus size and any filtering criteria.
  2. [§2] Notation for ASSC components (skill, package, service nodes and edge types) is introduced informally; a compact tabular definition or diagram legend would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for your constructive and detailed review. We appreciate the focus on strengthening the empirical foundations of the SKILL-DEP benchmark and the large-scale analysis. We will revise the manuscript to incorporate quantitative metrics, ground-truth construction details, inter-annotator agreement, and error analysis. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [§4] §4 (SKILL-DEP benchmark) and the abstract: the claim that SkillDepAnalyzer 'recovers skill metadata and dependency graphs accurately and comprehensively' and 'substantially outperforming' baselines is load-bearing for both the benchmark result and the downstream 1.43 M-skill patterns, yet the manuscript provides no precision/recall/F1 numbers, confusion-matrix breakdown, or description of how ground-truth dependency graphs were constructed. Without these, it is impossible to assess whether the reported outperformance is robust to the acknowledged ambiguities in natural-language skill descriptions.

    Authors: We agree that explicit quantitative metrics are needed to substantiate the claims. In the revised manuscript we will add precision, recall, and F1 scores for both metadata extraction and dependency-graph recovery on SKILL-DEP, include a confusion-matrix breakdown, and describe the construction of the ground-truth graphs (human annotation protocol and adjudication process). We will also discuss how the evaluation accounts for ambiguities in natural-language descriptions. revision: yes

  2. Referee: [§5] §5 (large-scale ASSC analysis): the identification of the four structural patterns and the security-signal claims rest directly on the output of the same extraction pipeline. If false-positive or false-negative edges are common (e.g., implicit service references, version ambiguity, or skill-name collisions), the reported concentration of reuse, recursive expansion, and hidden malicious dependencies could be artifacts of the extraction method rather than properties of the skill corpus.

    Authors: We acknowledge that the large-scale patterns depend on extraction reliability. We will add a manual validation study on a stratified sample of extracted dependencies from the 1.43 M skills, reporting error rates by dependency type (skill, package, service) and discussing potential sources of false positives/negatives such as implicit references. For the security signals we will clarify that known malicious skills were identified via external reports and independently reported to developers; the sample validation will help readers assess whether the reported structural patterns are robust. revision: partial

  3. Referee: [§3] §3 (SkillDepAnalyzer design): the weakest assumption—that natural-language descriptions contain sufficient, accurate, and unambiguous evidence of dependencies—is stated but not stress-tested with an inter-annotator agreement study or a held-out manual validation set beyond the benchmark. A concrete error analysis (e.g., rate of missed transitive package dependencies) is required before the security recommendations can be considered actionable.

    Authors: We will add an inter-annotator agreement study performed on a held-out set of skill descriptions and report the resulting agreement statistics. We will also include a concrete error analysis that quantifies missed transitive package dependencies and other common extraction errors, thereby strengthening the basis for the security recommendations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation and corpus analysis are self-contained

full rationale

The paper defines ASSCs conceptually, presents SkillDepAnalyzer as a tool that extracts dependency evidence from natural-language skill descriptions, evaluates its accuracy on the SKILL-DEP benchmark against baselines, and applies the tool to 1.43 million skills to surface structural patterns. None of these steps reduce by construction to fitted parameters, self-definitions, or self-citation chains; the benchmark results and downstream observations are independent empirical outputs. The reliability of NL extraction is a substantive assumption rather than a definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the premise that natural-language skill text encodes extractable dependencies and that the SKILL-DEP benchmark plus the 1.43M-skill corpus are representative; no free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption Natural-language descriptions contain sufficient and reliable dependency evidence for automated extraction
    Invoked when SkillDepAnalyzer is described as capturing natural-language dependency evidence to model skills as dependency-bearing artifacts.
invented entities (1)
  • Agent Skill Supply Chains (ASSCs) no independent evidence
    purpose: To characterize mixed skill-package-service dependency graphs
    New conceptual model introduced to organize the dependency analysis; no independent falsifiable evidence supplied beyond the tool's output.

pith-pipeline@v0.9.1-grok · 5804 in / 1398 out tokens · 27196 ms · 2026-07-02T08:04:46.889685+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 10 canonical work pages · 6 internal anchors

  1. [1]

    Agent skills,

    Anthropic, “Agent skills,” https://platform.claude.com/docs/en/ agents-and-tools/agent-skills/overview, 2025

  2. [2]

    Agent skills,

    OpenAI, “Agent skills,” https://developers.openai.com/codex/skills, 2025

  3. [3]

    Using skills,

    OpenAI, “Using skills,” https://openai.com/academy/skills/, 2026

  4. [4]

    Agent skills marketplace,

    SkillsMP, “Agent skills marketplace,” https://skillsmp.com/, 2026

  5. [5]

    Feature request: support external repo references in market- place.json,

    rajivpant, “Feature request: support external repo references in market- place.json,” https://github.com/anthropics/skills/issues/796, 2026

  6. [6]

    bug: respect user package manager for skills dependency installs,

    kaitranntt, “bug: respect user package manager for skills dependency installs,” https://github.com/mrgoonie/claudekit-cli/issues/871, 2026

  7. [7]

    Spdx specification 3.0.1,

    Linux Foundation, “Spdx specification 3.0.1,” https://spdx.github.io/ spdx-spec/v3.0.1/, 2024

  8. [8]

    Cyclonedx specification overview,

    OWASP Foundation, “Cyclonedx specification overview,” https:// cyclonedx.org/specification/overview/, 2025

  9. [9]

    Sit: An accurate, compliant sbom generator with incremental construction,

    C. Jia, N. Li, K. Yang, and M. Zhou, “Sit: An accurate, compliant sbom generator with incremental construction,” in2025 IEEE/ACM 47th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 2025, pp. 13–16

  10. [10]

    Skillclone: Multi-modal clone detection and clone propagation analysis in the agent skill ecosystem,

    J. Zhu, L. Zhang, W. Guo, and Y . Liu, “Skillclone: Multi-modal clone detection and clone propagation analysis in the agent skill ecosystem,” arXiv preprint arXiv:2603.22447, 2026

  11. [11]

    Syft: A cli tool and library for generating a software bill of materials,

    Anchore, “Syft: A cli tool and library for generating a software bill of materials,” https://github.com/anchore/syft, 2026

  12. [12]

    Cyclonedx generator (cdxgen),

    OW ASP Foundation, “Cyclonedx generator (cdxgen),” https://github.com/ cdxgen/cdxgen, 2026

  13. [13]

    Scancode toolkit,

    AboutCode, “Scancode toolkit,” https://github.com/aboutcode-org/ scancode-toolkit, 2026

  14. [14]

    Oss review toolkit,

    OSS Review Toolkit, “Oss review toolkit,” https://github.com/ oss-review-toolkit/ort, 2026

  15. [15]

    Microsoft sbom tool,

    Microsoft, “Microsoft sbom tool,” https://github.com/microsoft/sbom-tool, 2026

  16. [16]

    "Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

    Y . Liu, Z. Chen, Y . Zhang, G. Deng, Y . Li, J. Ning, and L. Y . Zhang, ““do not mention this to the user”: Detecting and understanding malicious agent skills in the wild,”arXiv preprint arXiv:2602.06547, 2026

  17. [17]

    Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

    Y . Liu, W. Wang, R. Feng, Y . Zhang, G. Xu, G. Deng, Y . Li, and L. Zhang, “Agent skills in the wild: An empirical study of security vulnerabilities at scale,”arXiv preprint arXiv:2601.10338, 2026

  18. [18]

    Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

    D. Schmotz, L. Beurer-Kellner, S. Abdelnabi, and M. Andriushchenko, “Skill-inject: Measuring agent vulnerability to skill file attacks,”arXiv preprint arXiv:2602.20156, 2026

  19. [19]

    SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement

    Z. Duan, Y . Tian, Z. Yin, L. Pang, J. Deng, Z. Wei, S. Xu, Y . Ge, and X. Cheng, “Skillattack: Automated red teaming of agent skills through attack path refinement,”arXiv preprint arXiv:2604.04989, 2026

  20. [20]

    Skill-name collision,

    BaseInfinity, “Skill-name collision,” https://github.com/BaseInfinity/ opencode-sdlc-wizard/issues/26, 2026

  21. [21]

    Formal analysis and supply chain security for agentic ai skills,

    V . P. Bhardwaj, “Formal analysis and supply chain security for agentic ai skills,”arXiv preprint arXiv:2603.00195, 2026

  22. [22]

    An empirical comparison of dependency network evolution in seven software packaging ecosystems,

    A. Decan, T. Mens, and P. Grosjean, “An empirical comparison of dependency network evolution in seven software packaging ecosystems,” Empirical Software Engineering, vol. 24, no. 1, pp. 381–416, 2019

  23. [23]

    How deep does your dependency tree go? an empirical study of dependency amplification across 10 package ecosystems,

    J. Arafat, “How deep does your dependency tree go? an empirical study of dependency amplification across 10 package ecosystems,”arXiv preprint arXiv:2512.14739, 2025

  24. [24]

    Clawhavoc: 341 malicious clawed skills found by the bot they were targeting,

    Koi, “Clawhavoc: 341 malicious clawed skills found by the bot they were targeting,” https://www.koi.ai/blog/ clawhavoc-341-malicious-clawedbot-skills-found-by-the-bot-they-were-targeting, 2026

  25. [25]

    Snyk finds prompt injection in 36%, 1467 malicious payloads in a toxicskills study of agent skills supply chain compromise,

    Snyk, “Snyk finds prompt injection in 36%, 1467 malicious payloads in a toxicskills study of agent skills supply chain compromise,” https: //snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub/, 2026

  26. [26]

    Inside the ‘clawdhub’ malicious campaign: Ai agent skills drop reverse shells on openclaw marketplace,

    Snyk, “Inside the ‘clawdhub’ malicious campaign: Ai agent skills drop reverse shells on openclaw marketplace,” https://snyk.io/articles/ clawdhub-malicious-campaign-ai-agent-skills/, 2026

  27. [27]

    Clawhavoc: Analysis of large-scale poisoning campaign targeting the openclaw skill market for ai agents,

    Antiy Labs, “Clawhavoc: Analysis of large-scale poisoning campaign targeting the openclaw skill market for ai agents,” https://www.antiy.net/p/ clawhavoc-analysis-of-large-scale-poisoning-campaign-targeting-the-openclaw-skill-market-for-ai-agents/, 2026

  28. [28]

    How “clinejection

    Snyk, “How “clinejection” turned an ai bot into a supply chain attack,” https://snyk.io/blog/ cline-supply-chain-attack-prompt-injection-github-actions/, 2026

  29. [29]

    Weaponizing ai coding agents for malware in the nx malicious package security incident,

    Snyk, “Weaponizing ai coding agents for malware in the nx malicious package security incident,” https://snyk.io/blog/ weaponizing-ai-coding-agents-for-malware-in-the-nx-malicious-package/, 2025

  30. [30]

    Supply chain compromise of axios npm package,

    Huntress, “Supply chain compromise of axios npm package,” https://www. huntress.com/blog/supply-chain-compromise-axios-npm-package, 2025

  31. [31]

    mcp-remote exposed to os command injection via untrusted mcp server connections,

    GitHub Advisory Database, “mcp-remote exposed to os command injection via untrusted mcp server connections,” https://github.com/ advisories/GHSA-6xpm-ggf7-wc3p, 2025

  32. [32]

    Malicious mcp server on npm postmark- mcp harvests emails,

    Snyk, “Malicious mcp server on npm postmark- mcp harvests emails,” https://snyk.io/blog/ malicious-mcp-server-on-npm-postmark-mcp-harvests-emails/, 2026

  33. [33]

    Mcp inspector proxy server lacks authen- tication between the inspector client and proxy,

    GitHub Advisory Database, “Mcp inspector proxy server lacks authen- tication between the inspector client and proxy,” https://github.com/ advisories/GHSA-7f8r-222p-6f5g, 2025

  34. [34]

    figma-developer-mcp vulnerable to com- mand injection in get figma data tool,

    GitHub Advisory Database, “figma-developer-mcp vulnerable to com- mand injection in get figma data tool,” https://github.com/advisories/ GHSA-gxw4-4fc5-9gr5, 2025

  35. [35]

    From SKILL.md to shell access in three lines of markdown: Threat modeling agent skills,

    Snyk, “From SKILL.md to shell access in three lines of markdown: Threat modeling agent skills,” https://snyk.io/articles/skill-md-shell-access/, 2026

  36. [36]

    Develop with mcp: Security best prac- tices,

    Model Context Protocol, “Develop with mcp: Security best prac- tices,” https://modelcontextprotocol.io/docs/tutorials/security/security best practices, 2026

  37. [37]

    Mcp tool poisoning,

    OWASP Foundation, “Mcp tool poisoning,” https://owasp.org/ www-community/attacks/MCP Tool Poisoning, 2026

  38. [38]

    The cli for the open agent skills ecosystem

    Vercel Labs, “The cli for the open agent skills ecosystem.” https://github. com/vercel-labs/skills, 2026

  39. [39]

    The open agent skills ecosystem,

    Vercel Labs, “The open agent skills ecosystem,” https://skills.sh, 2026

  40. [40]

    Skilldex: A Package Manager and Registry for Agent Skill Packages with Hierarchical Scope-Based Distribution

    S. Saha and P. Hemanth, “Skilldex: A package manager and registry for agent skill packages with hierarchical scope-based distribution,”arXiv preprint arXiv:2604.16911, 2026

  41. [41]

    Install peer dependencies,

    npm, “Install peer dependencies,” https://raw.githubusercontent.com/npm/ rfcs/main/implemented/0025-install-peer-deps.md, 2026

  42. [42]

    npm v7 series - beta release! and: Semver-major changes in npm v7,

    npm, “npm v7 series - beta release! and: Semver-major changes in npm v7,” https://blog.npmjs.org/post/626173315965468672/ npm-v7-series-beta-release-and-semver-major.html, 2026

  43. [43]

    npm-audit,

    npm, “npm-audit,” https://docs.npmjs.com/cli/v11/commands/npm-audit, 2026

  44. [44]

    Remove duplicate frontend-design skill,

    TJHomstad, “Remove duplicate frontend-design skill,” https://github.com/ anthropics/skills/pull/665, 2026

  45. [45]

    [bug]: experimental install strips subpath from owner/re- po/¡subpath¿ source in skills-lock.json,

    daviddwlee84, “[bug]: experimental install strips subpath from owner/re- po/¡subpath¿ source in skills-lock.json,” https://github.com/vercel-labs/ skills/issues/1005, 2026

  46. [46]

    Not all dependencies are equal: An empirical study on production dependencies in npm,

    J. Latendresse, S. Mujahid, D. E. Costa, and E. Shihab, “Not all dependencies are equal: An empirical study on production dependencies in npm,” inProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–12

  47. [47]

    What You See Is Not What You Execute: Memory-Based Runtime SBOM Generation for Supply Chain Security

    H. Alia, A. Case, and I. Ahmed, “What you see is not what you execute: Memory-based runtime sbom generation for supply chain security,”arXiv preprint arXiv:2606.22827, 2026