hub

Skillject: Automating stealthy skill-based prompt injection for coding agents with trace-driven closed-loop refinement.arXiv preprint

Xiaojun Jia, Jie Liao, Simeng Qin, Jindong Gu, Wenqi Ren, Xiaochun Cao, Yang Liu, Philip Torr · 2026 · cs.CR · arXiv 2602.14211

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

open full Pith review browse 18 citing papers arXiv PDF

abstract

Agent skills extend LLM agents with task-specific instructions, executable scripts, and auxiliary resources, improving reusability but creating a new supply-chain attack surface. A malicious or compromised skill can be repeatedly loaded as trusted guidance and steer downstream tool use. Existing skill-based prompt-injection attacks are often manual and brittle, because explicit malicious instructions are rejected or ignored when they are not aligned with the original workflow. We propose SkillJect, the first automated framework for generating poisoned skills against skill-enabled agent systems. SkillJect uses two coordinated channels. In the artifact channel, it hides the payload inside an auxiliary helper script. In the instruction channel, it rewrites SKILL.md with a front-loaded inducement strategy, placing injected content at the beginning and framing the helper script as a mandatory prerequisite or initialization step. The rewritten instruction explicitly references the helper-script path and provides an executable example command, making the helper appear to be a legitimate setup step before normal skill operations. SkillJect further adopts a closed-loop multi-agent process to improve attack effectiveness. An Attack Agent generates poisoned skills, a Victim Agent executes downstream tasks with the poisoned skill, and an Evaluate Agent inspects execution traces to determine whether the hidden payload was executed. The Attack Agent then uses this feedback to diagnose failure causes and rewrite SKILL.md, while keeping the payload fixed. Experiments across skill-enabled platforms, backend LLMs, and attack categories show that SkillJect substantially outperforms naive direct injection and prior manual skill-injection attacks, highlighting poisoned skills as a persistent threat in reusable skill ecosystems.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills

cs.CR · 2026-06-05 · unverdicted · novelty 8.0

MalSkillBench supplies the first sandbox-verified dataset of malicious agent skills and shows that existing detectors achieve high recall on code injection but collapse on prompt injection and agent-control attacks.

Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware

cs.CR · 2026-07-02 · unverdicted · novelty 7.0

SkillCloak evades existing static scanners for agent skill malware at high rates, while SkillDetonate detects 97% of attacks at 2% false-positive rate using sandboxed runtime behavior analysis.

Dynamic Malicious Skills in Agentic AI

cs.CR · 2026-06-15 · unverdicted · novelty 7.0

Attackers can dynamically inject malicious logic into benign AI agent skills by embedding instructions in documentation like SKILL.md, demonstrated on OpenHands and Claude Code, with a kernel read-only mount defense proposed.

Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents

cs.CR · 2026-05-08 · unverdicted · novelty 7.0

Memory Sandbox at the memory layer reduces persistent memory attack success rate to 0% for eight of nine models with no utility cost, while input-level and retrieval-level defenses achieve near-baseline attack success rates of 88-89%.

The Decomposition Is the Fingerprint: Per-Component Identity for Agent Skills

cs.CR · 2026-06-30 · unverdicted · novelty 6.0

A per-component SimHash fingerprint supplies structural identity for AI agent skills, recovering family membership under paraphrase and refactoring with AUC 0.974 while localizing changes.

ActPlane: Programmable OS-Level Policy Enforcement for Agent Harnesses

cs.OS · 2026-06-23 · unverdicted · novelty 6.0

ActPlane introduces an OS-kernel policy engine using an information-flow control DSL and eBPF to enforce agent harness policies, achieving better compliance on indirect paths with 1.9-8.4% overhead.

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

cs.SE · 2026-05-30 · unverdicted · novelty 6.0

About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.

Exploiting LLM Agent Supply Chains via Payload-less Skills

cs.CR · 2026-05-14 · conditional · novelty 6.0

Semantic Compliance Hijacking lets attackers hijack LLM agents by disguising malicious instructions as compliance rules in skills, reaching up to 77.67% success on confidentiality breaches and 67.33% on RCE while evading all tested scanners.

Behavioral Integrity Verification for AI Agent Skills

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.

Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

cs.CR · 2026-05-11 · unverdicted · novelty 6.0

DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

cs.CR · 2026-04-06 · conditional · novelty 6.0

Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.

Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners

cs.CR · 2026-06-16 · unverdicted · novelty 5.0

SkillCamo conceals malicious instructions in images within agent skills to bypass text-based scanners, while ExecScan improves detection via joint multimodal and execution-grounded analysis.

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

cs.CR · 2026-05-30 · unverdicted · novelty 5.0

SkillVetBench is a two-stage benchmark combining natural-language semantic vetting and instrumented sandbox execution to detect and provide runtime evidence for malicious skills in open agent platforms, with experiments showing static methods miss up to 89% of threats.

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

cs.CR · 2026-04-28 · unverdicted · novelty 5.0

SkillGuard-Robust formulates pre-load auditing of untrusted Agent Skills as a three-way classification task and achieves 97.30% exact match and 98.33% malicious-risk recall on held-out benchmarks.

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

cs.CR · 2026-06-09 · unverdicted · novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

cs.CR · 2026-04-08

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

cs.CR · 2026-03-28

citing papers explorer

Showing 12 of 12 citing papers after filters.

MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills cs.CR · 2026-06-05 · unverdicted · none · ref 23 · internal anchor
MalSkillBench supplies the first sandbox-verified dataset of malicious agent skills and shows that existing detectors achieve high recall on code injection but collapse on prompt injection and agent-control attacks.
Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware cs.CR · 2026-07-02 · unverdicted · none · ref 18 · internal anchor
SkillCloak evades existing static scanners for agent skill malware at high rates, while SkillDetonate detects 97% of attacks at 2% false-positive rate using sandboxed runtime behavior analysis.
Dynamic Malicious Skills in Agentic AI cs.CR · 2026-06-15 · unverdicted · none · ref 3 · internal anchor
Attackers can dynamically inject malicious logic into benign AI agent skills by embedding instructions in documentation like SKILL.md, demonstrated on OpenHands and Claude Code, with a kernel read-only mount defense proposed.
Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents cs.CR · 2026-05-08 · unverdicted · none · ref 24 · internal anchor
Memory Sandbox at the memory layer reduces persistent memory attack success rate to 0% for eight of nine models with no utility cost, while input-level and retrieval-level defenses achieve near-baseline attack success rates of 88-89%.
The Decomposition Is the Fingerprint: Per-Component Identity for Agent Skills cs.CR · 2026-06-30 · unverdicted · none · ref 20 · internal anchor
A per-component SimHash fingerprint supplies structural identity for AI agent skills, recovering family membership under paraphrase and refactoring with AUC 0.974 while localizing changes.
Behavioral Integrity Verification for AI Agent Skills cs.CR · 2026-05-12 · unverdicted · none · ref 46 · internal anchor
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw cs.CR · 2026-05-11 · unverdicted · none · ref 8 · internal anchor
DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills cs.CR · 2026-05-07 · unverdicted · none · ref 23 · internal anchor
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners cs.CR · 2026-06-16 · unverdicted · none · ref 3 · internal anchor
SkillCamo conceals malicious instructions in images within agent skills to bypass text-based scanners, while ExecScan improves detection via joint multimodal and execution-grounded analysis.
Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems cs.CR · 2026-05-30 · unverdicted · none · ref 25 · internal anchor
SkillVetBench is a two-stage benchmark combining natural-language semantic vetting and instrumented sandbox execution to detect and provide runtime evidence for malicious skills in open agent platforms, with experiments showing static methods miss up to 89% of threats.
Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills cs.CR · 2026-04-28 · unverdicted · none · ref 8 · internal anchor
SkillGuard-Robust formulates pre-load auditing of untrusted Agent Skills as a three-way classification task and achieves 97.30% exact match and 98.33% malicious-risk recall on held-out benchmarks.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 76 · internal anchor
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

Skillject: Automating stealthy skill-based prompt injection for coding agents with trace-driven closed-loop refinement.arXiv preprint

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer