pith. sign in

arxiv: 2602.14211 · v2 · pith:ZLRQ3H3Znew · submitted 2026-02-15 · 💻 cs.CR · cs.AI

SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents

Pith reviewed 2026-05-21 12:41 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords prompt injectionLLM agentspoisoned skillsautomated attackssupply-chain securityagent securityskill ecosystems
0
0 comments X

The pith

SkillJect automates the generation of poisoned skills that inject hidden commands into LLM agents by hiding payloads in helper scripts and front-loading instructions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SkillJect as the first automated framework for creating poisoned skills that compromise skill-enabled LLM agents. It splits the attack into hiding a malicious payload inside an auxiliary script and rewriting the SKILL.md file to present that script as a required first step. A closed-loop process uses an Attack Agent to generate the skill, a Victim Agent to execute tasks with it, and an Evaluate Agent to check traces for payload success, then feeds results back to refine the description while keeping the payload fixed. Experiments across platforms, models, and attack types show higher success rates than direct injections or manual methods. This demonstrates that reusable skill ecosystems introduce persistent supply-chain vulnerabilities for agent systems.

Core claim

SkillJect decomposes the attack into an artifact channel that conceals the payload in a helper script and an instruction channel that rewrites SKILL.md with front-loaded inducement, explicitly referencing the script path and framing it as a mandatory prerequisite. It then applies a closed-loop multi-agent process in which an Attack Agent produces the poisoned skill, a Victim Agent runs downstream tasks, and an Evaluate Agent inspects execution traces to confirm payload execution, allowing the Attack Agent to diagnose failures and iteratively improve SKILL.md while leaving the payload unchanged.

What carries the argument

Dual-channel attack that hides the payload in an auxiliary script while using front-loaded prerequisite framing in SKILL.md, coordinated through a closed-loop multi-agent feedback process that refines instructions based on execution traces.

If this is right

  • Poisoned skills can be produced automatically at scale rather than through brittle manual crafting.
  • The same attack succeeds across different agent platforms and underlying LLMs.
  • Reusable skill libraries create a persistent attack surface that direct or manual injections cannot exploit as effectively.
  • Front-loaded instructions that present a helper script as a required initialization step bypass agent safeguards more reliably than explicit malicious prompts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Skill marketplaces would need verification steps or sandboxing for uploaded files to limit this vector.
  • The dual-channel approach could apply to other modular AI components such as plugins or tool extensions.
  • Defensive testing frameworks might adopt similar multi-agent loops to probe and strengthen skills before release.

Load-bearing premise

The Evaluate Agent can reliably inspect execution traces to determine whether the hidden payload executed, enabling the Attack Agent to successfully rewrite SKILL.md while keeping the payload fixed.

What would settle it

If the generated poisoned skills fail to produce execution of the hidden payload in the majority of test runs across multiple platforms and backend LLMs, the claim of substantially improved attack effectiveness would not hold.

Figures

Figures reproduced from arXiv: 2602.14211 by Jie Liao, Jindong Gu, Philip Torr, Simeng Qin, Wenqi Ren, Xiaochun Cao, Xiaojun Jia, Yang Liu.

Figure 1
Figure 1. Figure 1: The threat model of SKILLJECT. While a benign skill assists the agent in achieving goals (top), a poisoned skill (bottom) manipulates the agent to bypass safety checks, leading to conse￾quences like data leakage or backdoors. natural language understanding and generation, question answering, reasoning, and so on. More recently, LLMs have moved beyond the “text-only” interaction toward tool￾augmented agency… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the SKILLJECT framework. The pipeline operates as an iterative loop: the Attack Agent transforms a benign skill into a poisoned one by modifying documentation and artifacts under constraints Ω. The Code Agent executes the skill during task routing and execution. The Evaluate Agent then assesses the execution traces against the target behavior to provide feedback for refinement. a “normal” workf… view at source ↗
Figure 3
Figure 3. Figure 3: Emergent injection strategies autonomously discovered by the Attack Agent. Instead of relying on predefined templates, the LLM explores different documentation styles driven by the feedback loop. (a) The agent learns to mimic standard section headers to blend in with the context. (b) The agent evolves to utilize alert blocks to manufacture urgency. These diverse examples highlight the model’s ability to ad… view at source ↗
read the original abstract

Agent skills are increasingly used to extend LLM agents with task-specific instructions, executable scripts, and auxiliary resources. While improving reusability, this modular design also introduces a new supply-chain attack surface: a malicious or compromised skill may be repeatedly loaded as trusted guidance and steer an agent's tool use during downstream execution. Existing skill-based prompt-injection attacks are mostly manual and brittle, as explicit malicious instructions are often rejected or ignored when poorly aligned with the original skill workflow. We propose SkillJect, the first automated framework for generating effective poisoned skills against skill-enabled agent systems. SkillJect decomposes the attack into two coordinated channels. In the artifact channel, it hides the malicious payload in an auxiliary helper script. In the instruction channel, it rewrites SKILL.md using a front-loaded inducement strategy, placing injected content at the beginning and framing the helper script as a mandatory prerequisite or first step. The instruction explicitly references the helper-script path and provides an executable command, making the helper appear to be a legitimate initialization step before normal operations. SkillJect further adopts a closed-loop multi-agent process to improve attack performance. An Attack Agent generates poisoned skills, a Victim Agent executes downstream tasks with them, and an Evaluate Agent inspects execution traces to determine whether the hidden payload is executed. The Attack Agent then uses this feedback to diagnose failures and rewrite SKILL.md, while keeping the payload fixed. Experiments across platforms, backend LLMs, and attack categories show that SkillJect substantially outperforms naive direct injection and prior manual attacks, revealing poisoned skills as a persistent attack vector in reusable skill ecosystems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SkillJect, the first automated framework for generating poisoned skills to perform prompt injection on skill-enabled LLM agents. It decomposes attacks into an artifact channel (hiding payloads in helper scripts) and an instruction channel (rewriting SKILL.md with front-loaded inducements that frame the helper as a mandatory initialization step). A closed-loop multi-agent process is used: an Attack Agent generates candidates, a Victim Agent executes downstream tasks, and an Evaluate Agent inspects traces to determine payload execution, feeding back to the Attack Agent for iterative SKILL.md rewrites while keeping the payload fixed. Experiments across platforms, backend LLMs, and attack categories claim that SkillJect substantially outperforms naive direct injection and prior manual attacks.

Significance. If the empirical results hold, this work is significant because it automates and demonstrates the persistence of a supply-chain attack vector in reusable agent skill ecosystems, moving beyond brittle manual attacks. The closed-loop multi-agent feedback mechanism for attack refinement is a methodological contribution that could generalize to other agent security problems. Cross-platform and cross-LLM validation adds practical relevance for the security community.

major comments (2)
  1. [Section 3] Section 3 (Closed-loop Multi-agent Process): The headline outperformance claim depends on the Evaluate Agent reliably determining from execution traces whether the hidden payload executed. The manuscript provides no details on the inspection procedure, decision criteria, handling of ambiguous logs, or validation against ground-truth cases, so it is unclear whether the feedback signal is accurate or whether the loop is optimizing against noise.
  2. [Section 4] Section 4 (Experiments): The central claim that SkillJect 'substantially outperforms' baselines is load-bearing for the paper's contribution, yet the text supplies no information on the precise success metric, the concrete implementation of the 'naive direct injection' and 'prior manual attacks' baselines, the number of trials, variance, or any statistical tests. Without these, the reported gains cannot be assessed.
minor comments (2)
  1. [Introduction] The distinction between the artifact channel and instruction channel should be defined explicitly with a short table or diagram in the introduction for clarity.
  2. [Section 4] Ensure all experimental figures include error bars or confidence intervals and label the y-axis with the exact success metric used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's detailed feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. Where the comments highlight areas needing additional clarification or detail, we have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Section 3] Section 3 (Closed-loop Multi-agent Process): The headline outperformance claim depends on the Evaluate Agent reliably determining from execution traces whether the hidden payload executed. The manuscript provides no details on the inspection procedure, decision criteria, handling of ambiguous logs, or validation against ground-truth cases, so it is unclear whether the feedback signal is accurate or whether the loop is optimizing against noise.

    Authors: We agree that additional details are required. In the revised manuscript, we will include a comprehensive description of the Evaluate Agent's inspection procedure, including the specific decision criteria used to determine payload execution from traces, protocols for handling ambiguous or incomplete logs (such as treating them as non-execution to avoid false positives), and results from a validation study comparing the agent's assessments to human-annotated ground truth on a subset of cases. This will clarify the accuracy of the feedback signal and demonstrate that the optimization loop is not driven by noise. revision: yes

  2. Referee: [Section 4] Section 4 (Experiments): The central claim that SkillJect 'substantially outperforms' baselines is load-bearing for the paper's contribution, yet the text supplies no information on the precise success metric, the concrete implementation of the 'naive direct injection' and 'prior manual attacks' baselines, the number of trials, variance, or any statistical tests. Without these, the reported gains cannot be assessed.

    Authors: We recognize that the experimental section lacks critical details for reproducibility and assessment. We will revise Section 4 to explicitly define the success metric as the proportion of executions where the payload is triggered and completes its intended action. We will describe the implementation of the naive direct injection baseline as direct embedding of malicious instructions in the skill description without auxiliary scripts or framing. For prior manual attacks, we will detail how we replicated the approaches from the cited literature within our experimental framework. Furthermore, we will specify the number of trials conducted, report measures of variance such as standard deviation, and include appropriate statistical tests to support the significance of the observed performance differences. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack-generation method with independent experimental validation

full rationale

The paper proposes SkillJect as a practical framework decomposing attacks into artifact and instruction channels, augmented by a closed-loop multi-agent feedback process (Attack Agent, Victim Agent, Evaluate Agent). Central claims rest on empirical outperformance across platforms, LLMs, and attack categories versus naive injection and manual baselines. No mathematical derivations, equations, fitted parameters, or self-citations appear in the provided text that would reduce any result to its inputs by construction. The evaluation is externally falsifiable via replication, making the work self-contained against benchmarks rather than circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on domain assumptions about agent execution behavior rather than new mathematical constructs or fitted parameters.

axioms (1)
  • domain assumption Skill-enabled agents load SKILL.md and auxiliary scripts as trusted guidance and execute referenced commands without additional verification or sandboxing.
    This assumption creates the attack surface and is invoked when describing how the front-loaded instruction and helper script are followed.

pith-pipeline@v0.9.0 · 5841 in / 1129 out tokens · 71064 ms · 2026-05-21T12:41:43.565556+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Exploiting LLM Agent Supply Chains via Payload-less Skills

    cs.CR 2026-05 conditional novelty 6.0

    Semantic Compliance Hijacking lets attackers hijack LLM agents by disguising malicious instructions as compliance rules in skills, reaching up to 77.67% success on confidentiality breaches and 67.33% on RCE while evad...

  2. Behavioral Integrity Verification for AI Agent Skills

    cs.CR 2026-05 unverdicted novelty 6.0

    BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.

  3. Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

    cs.CR 2026-05 unverdicted novelty 6.0

    DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks ar...

  4. Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents

    cs.CR 2026-05 unverdicted novelty 6.0

    A memory-layer defense called Memory Sandbox stops persistent memory attacks on most LLM agents while other layer defenses fail.

  5. SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

    cs.CR 2026-05 unverdicted novelty 6.0

    SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task c...

  6. SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

    cs.CR 2026-04 unverdicted novelty 6.0

    SkillSieve is a hierarchical triage framework combining regex/AST/XGBoost filtering, parallel LLM subtasks, and multi-LLM jury voting to detect malicious AI agent skills, reaching 0.800 F1 on a 400-skill benchmark at ...

  7. Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

    cs.CR 2026-04 conditional novelty 6.0

    Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.

  8. Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

    cs.CR 2026-03 unverdicted novelty 6.0

    The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.

  9. Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

    cs.CR 2026-04 unverdicted novelty 5.0

    SkillGuard-Robust formulates pre-load auditing of untrusted Agent Skills as a three-way classification task and achieves 97.30% exact match and 98.33% malicious-risk recall on held-out benchmarks.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · cited by 9 Pith papers · 3 internal anchors

  1. [1]

    Claude code skills documentation.https:// docs.anthropic.com/en/docs/claude-code /skills, 2025

    Anthropic. Claude code skills documentation.https:// docs.anthropic.com/en/docs/claude-code /skills, 2025. Official documentation for agent skills architecture. 2

  2. [2]

    Claude code documentation.https://docs .anthropic.com/en/docs/claude-code, 2025

    Anthropic. Claude code documentation.https://docs .anthropic.com/en/docs/claude-code, 2025. Official Claude Code documentation. 2

  3. [3]

    Defending against prompt in- jection with a few defensivetokens

    Sizhe Chen, Yizhu Wang, Nicholas Carlini, Chawin Sitawarin, and David Wagner. Defending against prompt in- jection with a few defensivetokens. InProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, pages 242–252, 2025. 3

  4. [4]

    Se- calign: Defending against prompt injection with preference optimization

    Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. Se- calign: Defending against prompt injection with preference optimization. InProceedings of the 2025 ACM SIGSAC Con- ference on Computer and Communications Security, pages 2833–2847, 2025. 3

  5. [5]

    Gemini CLI skills documentation.https://ge minicli.com/docs/cli/skills, 2025

    Google. Gemini CLI skills documentation.https://ge minicli.com/docs/cli/skills, 2025. Agent skills for Gemini CLI using SKILL.md format in .gemini/skills/ directory. 2

  6. [6]

    Efficient universal goal hijacking with semantics-guided prompt orga- nization

    Yihao Huang, Chong Wang, Xiaojun Jia, Qing Guo, Felix Juefei-Xu, Jian Zhang, Yang Liu, and Geguang Pu. Efficient universal goal hijacking with semantics-guided prompt orga- nization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5796–5816, 2025. 3 9

  7. [7]

    Mapcoder: Multi-agent code generation for com- petitive problem solving.arXiv preprint arXiv:2405.11403,

    Md Ashraful Islam, Mohammed Eunus Ali, and Md Rizwan Parvez. Mapcoder: Multi-agent code generation for com- petitive problem solving.arXiv preprint arXiv:2405.11403,

  8. [8]

    arXiv preprint arXiv:2405.21018

    Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, and Min Lin. Improved tech- niques for optimization-based jailbreaking on large language models.arXiv preprint arXiv:2405.21018, 2024. 3

  9. [9]

    Omnisafebench-mm: A unified benchmark and toolbox for multimodal jailbreak attack-defense evaluation

    Xiaojun Jia, Jie Liao, Qi Guo, Teng Ma, Simeng Qin, Ran- jie Duan, Tianlin Li, Yihao Huang, Zhitao Zeng, Dongxian Wu, et al. Omnisafebench-mm: A unified benchmark and toolbox for multimodal jailbreak attack-defense evaluation. arXiv preprint arXiv:2512.06589, 2025. 3

  10. [10]

    Prompt Injection attack against LLM-integrated Applications

    Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, et al. Prompt injection attack against llm- integrated applications.arXiv preprint arXiv:2306.05499,

  11. [11]

    Formalizing and benchmarking prompt injection attacks and defenses

    Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Se- curity Symposium (USENIX Security 24), pages 1831–1847,

  12. [12]

    Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

    Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang. Agent skills in the wild: An empirical study of security vulnerabilities at scale.arXiv preprint arXiv:2601.10338, 2026. 2, 3

  13. [13]

    Agent skills in the wild: An empirical study of security vulnerabilities at scale, 2026

    Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang. Agent skills in the wild: An empirical study of security vulnerabilities at scale, 2026. 8, 9

  14. [14]

    Advancing tool-augmented large language models via meta-verification and reflection learning

    Zhiyuan Ma, Jiayu Liu, Xianzhen Luo, Zhenya Huang, Qingfu Zhu, and Wanxiang Che. Advancing tool-augmented large language models via meta-verification and reflection learning. InProceedings of the 31st ACM SIGKDD Confer- ence on Knowledge Discovery and Data Mining V . 2, pages 2078–2089, 2025. 2

  15. [15]

    Code like humans: A multi-agent solution for medical coding

    Andreas Motzfeldt, Joakim Edin, Casper L Christensen, Christian Hardmeier, Lars Maaløe, and Anna Rogers. Code like humans: A multi-agent solution for medical coding. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 22612–22627. Association for Compu- tational Linguistics, 2025. 2

  16. [16]

    Codex CLI skills documentation.h t t p s : //developers.openai.com/codex/skills/,

    OpenAI. Codex CLI skills documentation.h t t p s : //developers.openai.com/codex/skills/,

  17. [17]

    Agent skills for Codex CLI using SKILL.md format in .codex/skills/ directory. 2

  18. [18]

    Agent skills enable a new class of realis- tic and trivially simple prompt injections.arXiv preprint arXiv:2510.26328, 2025

    David Schmotz, Sahar Abdelnabi, and Maksym An- driushchenko. Agent skills enable a new class of realis- tic and trivially simple prompt injections.arXiv preprint arXiv:2510.26328, 2025. 2, 3

  19. [19]

    Thinkgeo: Evaluating tool- augmented agents for remote sensing tasks.arXiv preprint arXiv:2505.23752, 2025

    Akashah Shabbir, Muhammad Akhtar Munir, Akshay Dud- hane, Muhammad Umer Sheikh, Muhammad Haris Khan, Paolo Fraccaro, Juan Bernabe Moreno, Fahad Shahbaz Khan, and Salman Khan. Thinkgeo: Evaluating tool- augmented agents for remote sensing tasks.arXiv preprint arXiv:2505.23752, 2025. 2

  20. [20]

    SkillsMP: Agent skills marketplace.https: //skillsmp.com, 2025

    SkillsMP. SkillsMP: Agent skills marketplace.https: //skillsmp.com, 2025. Community-driven marketplace aggregating skills from public GitHub repositories; provides search, categorization, and quality indicators. 3

  21. [21]

    Skills.rest: Agent skills registry.https:// skills.rest, 2025

    Skills.rest. Skills.rest: Agent skills registry.https:// skills.rest, 2025. Community registry for agent skills with automated indexing from GitHub repositories. 3

  22. [22]

    Manipulating multimodal agents via cross-modal prompt injection

    Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, and Xiang- long Liu. Manipulating multimodal agents via cross-modal prompt injection. InProceedings of the 33rd ACM Inter- national Conference on Multimedia, pages 10955–10964,

  23. [23]

    Webinject: Prompt injection attack to web agents

    Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, and Neil Zhenqiang Gong. Webinject: Prompt injection attack to web agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Pro- cessing, pages 2010–2030, 2025. 3

  24. [24]

    Jailbreak Attacks and Defenses Against Large Language Models: A Survey

    Sibo Yi, Yule Liu, Zhen Sun, Tianshuo Cong, Xinlei He, Ji- axing Song, Ke Xu, and Qi Li. Jailbreak attacks and defenses against large language models: A survey.arXiv preprint arXiv:2407.04295, 2024. 3

  25. [25]

    CodeAgent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges.arXiv preprint arXiv:2401.07339, 2024

    Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, and Zhi Jin. Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges. arXiv preprint arXiv:2401.07339, 2024. 2

  26. [26]

    agentar: Creating augmented real- ity applications with tool-augmented llm-based autonomous agents

    Chenfei Zhu, Shao-Kang Hsia, Xiyun Hu, Ziyi Liu, Jingyu Shi, and Karthik Ramani. agentar: Creating augmented real- ity applications with tool-augmented llm-based autonomous agents. InProceedings of the 38th Annual ACM Sympo- sium on User Interface Software and Technology, pages 1– 23, 2025. 2 10