Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

· 2026 · cs.CR · arXiv 2604.02837

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

Agent Skills is an emerging open standard that defines a modular, filesystem-based packaging format enabling LLM-based agents to acquire domain-specific expertise on demand. Despite rapid adoption across multiple agentic platforms and the emergence of large community marketplaces, the security properties of Agent Skills have not been systematically studied. This paper presents the first comprehensive security analysis of the Agent Skills framework. We define the full lifecycle of an Agent Skill across four phases -- Creation, Distribution, Deployment, and Execution -- and identify the structural attack surface each phase introduces. Building on this lifecycle analysis, we construct a threat taxonomy comprising seven categories and seventeen scenarios organized across three attack layers, grounded in both architectural analysis and real-world evidence. We validate the taxonomy through analysis of five confirmed security incidents in the Agent Skills ecosystem. Based on these findings, we discuss defense directions for each threat category, identify open research challenges, and provide actionable recommendations for stakeholders. Our analysis reveals that the most severe threats arise from structural properties of the framework itself, including the absence of a data-instruction boundary, a single-approval persistent trust model, and the lack of mandatory marketplace security review, and cannot be addressed through incremental mitigations alone.

representative citing papers

Sealing the Audit-Runtime Gap for LLM Skills

cs.CR · 2026-05-06 · unverdicted · novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.

AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

cs.CR · 2026-05-13 · conditional · novelty 6.0

AgentTrap shows that current LLM agents typically complete user tasks while silently accepting unsafe side effects from malicious third-party skills rather than refusing them.

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

cs.CL · 2026-04-27 · unverdicted · novelty 6.0 · 2 refs

SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.

Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

cs.CR · 2026-05-01 · unverdicted · novelty 4.0

Proposes a trust schema including verification levels and a biconditional correctness criterion to verify skills in human-in-the-loop agent runtimes, reducing the need for constant oversight.

Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents

cs.SE · 2026-05-21

citing papers explorer

Showing 6 of 6 citing papers.

Sealing the Audit-Runtime Gap for LLM Skills cs.CR · 2026-05-06 · unverdicted · none · ref 26 · internal anchor
SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills cs.CR · 2026-05-13 · conditional · none · ref 4 · internal anchor
AgentTrap shows that current LLM agents typically complete user tasks while silently accepting unsafe side effects from malicious third-party skills rather than refusing them.
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces cs.CR · 2026-05-12 · unverdicted · none · ref 65 · internal anchor
SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills cs.CL · 2026-04-27 · unverdicted · none · ref 11 · 2 links · internal anchor
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes cs.CR · 2026-05-01 · unverdicted · none · ref 15 · internal anchor
Proposes a trust schema including verification levels and a biconditional correctness criterion to verify skills in human-in-the-loop agent runtimes, reducing the need for constant oversight.
Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents cs.SE · 2026-05-21 · unreviewed · ref 3 · internal anchor

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

fields

years

verdicts

representative citing papers

citing papers explorer