The paper defines Agent Skill Supply Chains (ASSCs) and SkillDepAnalyzer to extract and analyze dependency graphs from over 1.43 million LLM agent skills, revealing structural patterns and security signals.
SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement
10 Pith papers cite this work. Polarity classification is still indexing.
abstract
LLM-based agent systems increasingly rely on agent skills sourced from open registries to extend their capabilities, yet the openness of such ecosystems makes skills difficult to thoroughly vet. Existing attacks rely on injecting malicious instructions into skills, making them easily detectable by static auditing. However, non-malicious skills may also harbor latent vulnerabilities that an attacker can exploit solely through adversarial prompting, without modifying the skill itself. We introduce SkillAttack, a red-teaming framework that dynamically verifies skill vulnerability exploitability through adversarial prompting. SkillAttack combines vulnerability analysis, surface-parallel attack generation, and feedback-driven exploit refinement into a closed-loop search that progressively converges toward successful exploitation. Experiments across 10 LLMs on 71 adversarial and 100 real-world skills show that SkillAttack outperforms all baselines by a wide margin (ASR 0.73--0.93 on adversarial skills, up to 0.26 on real-world skills), revealing that even well-intended skills pose serious security risks under realistic agent interactions.
citation-role summary
citation-polarity summary
years
2026 10verdicts
UNVERDICTED 10roles
background 3polarities
background 3representative citing papers
SkillHarm benchmark shows current AI agents are vulnerable to lifecycle-aware skill poisoning with success rates up to 86.3% for fixed-payload attacks and 69.3% for self-mutating attacks.
SkillSafetyBench is a benchmark of 155 cases across 47 tasks and 6 risk domains showing that non-user attacks via skills, artifacts, or environments can consistently induce unsafe agent behavior.
Runtime Skill Audit introduces targeted runtime probing to detect malicious LLM agent skills, reporting 90% accuracy and resilience to self-evolving attacks on 100 skills versus static baselines.
POISE is a stealthy skill-poisoning attack achieving 89.3% ASR on Skill-Inject by blending a compressed trigger into contextually appropriate positions in skill bodies, outperforming YAML and random-placement baselines while evading static scanners.
DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
Contractual skills framework structures SKILL.md files as readable task contracts; A/B tests on synthetic tasks show mean quality rising from 4.692 to 4.914 and critical-error rate falling from 0.083 to 0.013 across models.
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.
citing papers explorer
-
Skills Are Not Islands: Measuring Dependency and Risk in Agent Skill Supply Chains
The paper defines Agent Skill Supply Chains (ASSCs) and SkillDepAnalyzer to extract and analyze dependency graphs from over 1.43 million LLM agent skills, revealing structural patterns and security signals.
-
SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction
SkillHarm benchmark shows current AI agents are vulnerable to lifecycle-aware skill poisoning with success rates up to 86.3% for fixed-payload attacks and 69.3% for self-mutating attacks.
-
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
SkillSafetyBench is a benchmark of 155 cases across 47 tasks and 6 risk domains showing that non-user attacks via skills, artifacts, or environments can consistently induce unsafe agent behavior.
-
Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security
Runtime Skill Audit introduces targeted runtime probing to detect malicious LLM agent skills, reporting 90% accuracy and resilience to self-evolving attacks on 100 skills versus static baselines.
-
POISE: Position-Aware Undetectable Skill Injection on LLM Agents
POISE is a stealthy skill-poisoning attack achieving 89.3% ASR on Skill-Inject by blending a compressed trigger into contextually appropriate positions in skill bodies, outperforming YAML and random-placement baselines while evading static scanners.
-
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw
DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.
-
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
-
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
-
Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents
Contractual skills framework structures SKILL.md files as readable task contracts; A/B tests on synthetic tasks show mean quality rising from 4.692 to 4.914 and critical-error rate falling from 0.083 to 0.013 across models.
-
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.