pith. sign in

arxiv: 2606.19191 · v1 · pith:OZGMY2P6new · submitted 2026-06-17 · 💻 cs.CR

PhantomSkill: Malicious Code Injection in Agent Skill Ecosystems

Pith reviewed 2026-06-26 20:17 UTC · model grok-4.3

classification 💻 cs.CR
keywords agent skillssupply chain attacksLLM coding agentsmalicious code injectionvulnerability maskingAI securitythird-party packages
0
0 comments X

The pith

PhantomSkill hides malicious behavior in agent skill auxiliary resources by rewriting it as vulnerability-shaped code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PhantomSkill as a supply-chain attack on LLM coding agents that acquire skills from third-party packages. Instead of placing malice in the skill description, the attack embeds it in auxiliary resources and activates it only under attacker-chosen trigger conditions. Its central technique, VulMask, converts overt malicious scripts into code that resembles ordinary insecure but useful implementations. Experiments across host skills, attack goals, agents, models, and reviewers show that the masked versions retain their benign functions while lowering warning and malware detection rates. The authors conclude that skill ecosystems therefore require resource-level vetting, runtime containment, and policies that treat exploitable vulnerabilities as potential attack payloads.

Core claim

PhantomSkill shows that malicious payloads can be concealed inside the auxiliary resources of agent skills rather than their textual descriptions; VulMask rewrites explicit malicious scripts into vulnerability-shaped implementations whose harmful actions trigger only under attacker-controlled conditions, preserving benign utility while lowering detection by warning systems and malware scanners across tested hosts, agents, models, and reviewers.

What carries the argument

VulMask, the rewriting method that converts overt malicious scripts into implementations resembling common vulnerabilities, with malicious behavior gated behind attacker-specified trigger conditions.

If this is right

  • Skill ecosystems must add resource-level vetting that examines auxiliary files beyond skill descriptions.
  • Execution-time containment mechanisms are needed to limit damage once a trigger activates hidden behavior.
  • Security policies should classify exploitable vulnerabilities inside skills as possible malicious payloads.
  • Detection tools must move beyond static signatures to account for conditionally triggered malicious actions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same masking approach could extend to other third-party components such as datasets or model adapters in agent systems.
  • Improving automated reviewers by simulating a range of trigger conditions during scanning might reduce the effectiveness of trigger-gated attacks.
  • If triggers remain hard to detect, open skill marketplaces may need mandatory execution sandboxes or provenance requirements for auxiliary resources.

Load-bearing premise

The tested conditions for coding agents, generation models, and automated reviewers match real-world deployment, and attacker-controlled triggers can be added without creating additional detectable artifacts.

What would settle it

A controlled test in which an automated reviewer or malware detector flags VulMasked skills at the same rate as overt malicious scripts would falsify the claim of reduced detection.

Figures

Figures reproduced from arXiv: 2606.19191 by Chia-Mu Yu, Yu-Ting Lin.

Figure 1
Figure 1. Figure 1: Overview of VulMask. Instead of placing overt malicious instructions in SKILL.md, the attack embeds a vulnerability-shaped payload inside auxiliary resources under scripts/. During normal use, the coding agent may inspect and execute the script as part of the host skill’s benign workflow. When the attacker-controlled trigger condition is satisfied, the hidden payload activates and causes effects such as cr… view at source ↗
read the original abstract

Agent skills allow LLM-based coding agents to acquire domain-specific capabilities from third-party packages, but they also introduce a new supply-chain attack surface. We present PhantomSkill, an attack framework that hides malicious behavior in a skill's auxiliary resources rather than in its textual description. Its core technique, VulMask, rewrites overt malicious scripts into vulnerability-shaped implementations whose malicious behavior is activated only under attacker-controlled trigger conditions. This design shifts the visible signal from explicit malicious intent to ordinary-looking insecure code. Across representative host skills, attack goals, coding agents, generation models, and automated reviewers, VulMask preserves benign utility while reducing warning and malware-level detection compared with overt malicious scripts. Our results show that skill ecosystems require resource-level vetting, execution-time containment, and security policies that treat exploitable vulnerabilities in agent skills as potential malicious payloads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces PhantomSkill, a supply-chain attack framework targeting LLM-based coding agent skill ecosystems. Its core technique, VulMask, rewrites overt malicious scripts into vulnerability-shaped implementations whose malicious behavior activates only under attacker-controlled trigger conditions. The authors claim that, across representative host skills, attack goals, coding agents, generation models, and automated reviewers, this approach preserves benign utility while reducing warning and malware-level detection relative to overt malicious scripts. The paper concludes that skill ecosystems require resource-level vetting, execution-time containment, and policies treating exploitable vulnerabilities as potential malicious payloads.

Significance. If substantiated by detailed experiments, the work identifies a previously under-explored attack surface in third-party skill packages for LLM agents. By showing how malicious intent can be masked as ordinary insecure code, it provides concrete motivation for rethinking security assumptions in agent skill distribution and execution. The multi-dimensional evaluation scope (skills, goals, agents, models, reviewers) is a positive feature that could strengthen the case for new defensive practices if the quantitative results are robust.

major comments (2)
  1. Abstract: the central empirical claims—that VulMask preserves benign utility while reducing warning and malware-level detection—are stated at a high level with no metrics, baselines, effect sizes, statistical tests, or error analysis provided. Without these, it is impossible to evaluate whether the data support the stated conclusions or to judge the magnitude and reliability of the reported improvements.
  2. Abstract (weakest assumption): the claim that results generalize across 'representative' coding agents, generation models, and automated reviewers rests on an unexamined premise that the tested conditions match real-world deployment; no discussion of how triggers are introduced without creating additional detectable artifacts is visible, which is load-bearing for the stealth claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight opportunities to strengthen the presentation of our empirical results and the discussion of our experimental assumptions. We address each major comment below and commit to revisions that improve clarity without altering the core contributions.

read point-by-point responses
  1. Referee: Abstract: the central empirical claims—that VulMask preserves benign utility while reducing warning and malware-level detection—are stated at a high level with no metrics, baselines, effect sizes, statistical tests, or error analysis provided. Without these, it is impossible to evaluate whether the data support the stated conclusions or to judge the magnitude and reliability of the reported improvements.

    Authors: We agree that the abstract would be more informative with quantitative support. The body of the manuscript (Sections 4.2–4.4 and Tables 2–4) reports the specific metrics, including detection-rate reductions (e.g., 68–82% relative to overt baselines), utility preservation scores, and the statistical tests used. In the revised version we will condense the key effect sizes, baselines, and confidence intervals into the abstract while preserving its length constraints. revision: yes

  2. Referee: Abstract (weakest assumption): the claim that results generalize across 'representative' coding agents, generation models, and automated reviewers rests on an unexamined premise that the tested conditions match real-world deployment; no discussion of how triggers are introduced without creating additional detectable artifacts is visible, which is load-bearing for the stealth claim.

    Authors: Section 3.2 describes the selection criteria for the five agents, three models, and four reviewers as the most widely adopted at the time of the study. We acknowledge that the abstract does not explicitly address trigger embedding. The full manuscript (Section 3.3 and Appendix C) explains the trigger design, but we will expand the revised abstract and add a short paragraph in Section 3.3 that directly discusses how the chosen trigger mechanisms avoid introducing new static or behavioral artifacts detectable by the tested reviewers. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical attack framework (PhantomSkill/VulMask) and reports detection/utility results across tested conditions. No equations, derivations, fitted parameters, or predictions appear in the abstract or described content. Claims rest on direct experimental outcomes rather than any self-referential reduction, self-citation chain, or ansatz smuggling. The reader's assessment of score 0.0 is consistent with the absence of any load-bearing mathematical or definitional circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no details on free parameters, axioms, or invented entities; insufficient information to populate the ledger.

pith-pipeline@v0.9.1-grok · 5664 in / 1049 out tokens · 26846 ms · 2026-06-26T20:17:04.527165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware

    cs.CR 2026-07 unverdicted novelty 7.0

    SkillCloak evades existing static scanners for agent skill malware at high rates, while SkillDetonate detects 97% of attacks at 2% false-positive rate using sandboxed runtime behavior analysis.

Reference graph

Works this paper leans on

9 extracted references · cited by 1 Pith paper

  1. [1]

    Make a Feint to the East While Attacking in the West: Blinding. 2025. 2025

  2. [2]

    2026 , langid =

    Jia, Xiaojun and Liao, Jie and Qin, Simeng and Gu, Jindong and Ren, Wenqi and Cao, Xiaochun and Liu, Yang and Torr, Philip , urldate =. 2026 , langid =

  3. [3]

    Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks , url =

    Schmotz, David and Beurer-Kellner, Luca and Abdelnabi, Sahar and Andriushchenko, Maksym , urldate =. Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks , url =. 2026 , date =

  4. [4]

    Supply-Chain Poisoning Attacks Against

    Qu, Yubin and Liu, Yi and Geng, Tongcheng and Deng, Gelei and Li, Yuekang and Zhang, Leo Yu and Zhang, Ying and Ma, Lei , urldate =. Supply-Chain Poisoning Attacks Against. 2026 , date =

  5. [5]

    2024 , address =

    Zhan, Qiusi and Liang, Zhixiang and Ying, Zifan and Kang, Daniel , booktitle =. 2024 , address =

  6. [6]

    Not What You've Signed Up For: Compromising Real-World

    Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , booktitle =. Not What You've Signed Up For: Compromising Real-World

  7. [7]

    2025 , month = oct, url =

    Zhang, Barry and Lazuka, Keith and Murag, Mahesh , title =. 2025 , month = oct, url =

  8. [8]

    2025 IEEE Conference on Software Testing, Verification and Validation (ICST) , pages=

    Understanding the effectiveness of large language models in detecting security vulnerabilities , author=. 2025 IEEE Conference on Software Testing, Verification and Validation (ICST) , pages=. 2025 , organization=

  9. [9]

    33rd USENIX Security Symposium (USENIX Security 24) , pages=

    Formalizing and benchmarking prompt injection attacks and defenses , author=. 33rd USENIX Security Symposium (USENIX Security 24) , pages=