POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Dehai Min; Haochang Hao; Lu Cheng; Miao Xu; Yingqiang Ge; Yunbei Zhang; Zhifang Zhang

arxiv: 2606.07943 · v1 · pith:UCBG77Z6new · submitted 2026-06-06 · 💻 cs.CR · cs.AI· cs.CL

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Haochang Hao , Dehai Min , Zhifang Zhang , Yunbei Zhang , Miao Xu , Yingqiang Ge , Lu Cheng This is my paper

Pith reviewed 2026-06-27 19:50 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CL

keywords skill poisoningLLM agentsattack success ratestealth injectionposition-awarebody injectionscanner false positives

0 comments

The pith

Position-aware blending of a single trigger into skill bodies achieves 89.3 percent attack success while evading scanner detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that effective skill-poisoning attacks on LLM agents must execute their payload without causing the user's legitimate task to fail, since failure would prompt inspection of the skill. It shows that prior approaches face a tradeoff where YAML-header injections load reliably but are easy to spot, while body injections are harder to detect but less reliable because out-of-context commands raise suspicion. POISE addresses this by compressing the trigger into one benign-looking instruction, positioning it where it fits naturally among setup or prerequisite steps, and blending it with surrounding text via a context-aware generator. A sympathetic reader would care because open skill formats let users extend agents easily yet leave them open to hidden manipulation if attacks can stay invisible. The evaluation claims that this method reaches high success rates while producing almost no additional scanner alerts beyond those already raised by clean skills.

Core claim

POISE is a position-aware attack that compresses the trigger into a single benign-looking body instruction, places it at a feasible position, and uses a context-aware generator to blend it with nearby setup or prerequisite steps. On Skill-Inject with codex and gpt-5.2, POISE achieves an 89.3 percent ASR, 28.0 points above a random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining the stealth advantage of body placement. That stealth holds because legitimate skill bodies naturally require privileged tool operations, causing LLM scanners to flag 74.6 percent of clean skills on average across four judges and both benchmarks, so POISE causes only 5.6 percent of p

What carries the argument

The position-aware placement and context-aware blending of a single compressed trigger instruction into the skill body.

If this is right

Body placement of the trigger retains its stealth advantage over YAML-header methods.
Legitimate skills already trigger high false-positive rates of 74.6 percent in LLM scanners.
Blended injections increase high-risk alerts by only 5.6 percent over clean baselines.
Static scanner defenses become ineffective against position-aware blended attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Runtime monitoring of skill execution behavior may be needed instead of relying on static scans.
Skill distribution platforms could require additional verification steps such as execution sandboxes.
The same blending principle could apply to other open agent extension formats beyond skills.

Load-bearing premise

Legitimate skill bodies naturally require privileged tool operations that cause LLM scanners to produce high false-positive rates.

What would settle it

An experiment that measures scanner false-positive rates on clean skills below 30 percent or records POISE attack success rate below 60 percent on the same benchmark would settle the central claim.

Figures

Figures reproduced from arXiv: 2606.07943 by Dehai Min, Haochang Hao, Lu Cheng, Miao Xu, Yingqiang Ge, Yunbei Zhang, Zhifang Zhang.

**Figure 1.** Figure 1: Overview of POISE. (a) Direct skill-file poisoning fragments explicit malicious steps across the file, each one risking the agent’s suspicion as out-of-context. (b) POISE places a single benign-looking line at a feasible body position, framing a script invocation as a routine prerequisite and delegating the attacker-chosen behavior to an auxiliary canary script. (2026) embed malicious logic in documentatio… view at source ↗

**Figure 2.** Figure 2: POISE is both stealthy and reliable: (a) it adds little incremental static-detection signal over already-noisy LLM judges, and (b) it triggers at a high rate from any feasible body position. signal. The opposite regime is no safer. Wholefile ingestion makes body-position injections more reliable but also exposes more of the skill to a scanner with longer context, and our results show the scanners do not … view at source ↗

read the original abstract

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting failure signal invites inspection of the skill. We therefore evaluate attacks by Attack Success Rate, which requires the injected payload to execute and the user's task to still pass its verifier in the same trial. Prior skill-poisoning attacks face a reliability-stealth trade-off under this lens: YAML-header injections are reliably loaded but easily inspected, whereas stealthier body injections that place explicit malicious commands in the skill prose are less reliable because out-of-context commands invite the agent's own suspicion. We introduce POISE, a position-aware attack that compresses the trigger into a single, benign-looking body instruction, placing it at a feasible position and using a context-aware generator to blend it with nearby setup or prerequisite steps. On Skill-Inject with codex+gpt-5.2, POISE achieves an 89.3% ASR, 28.0 points above a random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining the stealth advantage of body placement. That stealth is the decisive margin: because legitimate skill bodies naturally require privileged tool operations, LLM scanners are hyper-sensitive, falsely flagging 74.6% of clean skills on average across four judges and both benchmarks. Blending into these false alarms, POISE causes only 5.6% of poisoned variants to gain a new high-risk alert over their clean baselines, rendering current static defenses ineffective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

POISE shows a workable position-aware blending trick for skill injection that lifts ASR over random body baselines, but the stealth edge depends on an unverified 74.6% scanner FP rate that lacks protocol details.

read the letter

The main point is that POISE compresses the trigger into one blended body instruction, places it at a feasible spot, and uses context-aware generation to make it look like a normal prerequisite step. On Skill-Inject with codex plus gpt-5.2 this reaches 89.3% ASR, 28 points above random body placement and 2.6 above YAML. That is the concrete advance over the YAML and random-placement baselines cited in the abstract.

The paper does a clean job of framing the reliability-stealth trade-off and then measuring attack success as both payload execution and task verifier pass. The comparisons are direct and use named benchmarks, which makes the gain easy to see. No fitted parameters or circular definitions appear in the results.

The soft spot is the stealth argument. The claim that POISE stays undetectable rests on LLM scanners producing a 74.6% false-positive rate on clean skills, so the poisoned versions add a new high-risk alert in only 5.6% of cases. The abstract presents the 74.6% figure as observed fact but supplies no measurement protocol, judge prompts, inter-judge agreement, or ablation on scanner wording. If that rate is tied to the particular judges or the benchmark skills rather than a general property of body placement, the undetectable advantage does not follow and the 2.6-point edge over YAML becomes less relevant. The stress-test note flags exactly this gap.

This is for researchers working on LLM agent security and poisoning defenses. A reader who wants to see a practical blending technique and concrete ASR numbers will find value here, even if they have to re-run the scanner evaluation themselves.

It deserves a serious referee because the attack method is new enough and the empirical comparisons are sharp enough to check the experimental setup and the FP-rate measurement.

Referee Report

3 major / 0 minor

Summary. The paper introduces POISE, a position-aware attack for skill injection on LLM agents. It compresses malicious triggers into single benign-looking body instructions placed at feasible positions and blended with context via a generator. On the Skill-Inject benchmark with codex+gpt-5.2, POISE reports 89.3% ASR (28 points above random body placement, 2.6 above YAML), while claiming stealth because legitimate skill bodies trigger 74.6% false positives on average across four LLM judges, so POISE adds new high-risk alerts in only 5.6% of cases.

Significance. If the empirical results and the 74.6% FP claim hold under full protocol, the work shows that body-placement attacks can overcome the reliability-stealth tradeoff in open agent skills, rendering static LLM scanners ineffective and highlighting a practical vulnerability in agent extensibility mechanisms.

major comments (3)

[Abstract] Abstract: The stealth advantage is justified solely by the claim that LLM scanners produce a 74.6% FP rate on clean skills (allowing POISE to add only 5.6% new alerts). No measurement protocol, judge identities, inter-judge agreement, benchmark skill selection criteria, or ablation on scanner prompt wording is supplied, making it impossible to determine whether this rate is a general property of body placement or an artifact of the chosen judges and templates.
[Abstract] Abstract: The headline 89.3% ASR (and the 28-point and 2.6-point margins) is reported without error bars, full experimental protocol, dataset details, number of trials, or verification steps for the 74.6% FP figure. This directly limits assessment of whether the data supports the central reliability-plus-stealth claim.
[Abstract] Abstract: The weakest assumption—that legitimate skill bodies naturally require privileged tool operations and therefore trigger high FP rates—is presented as observed fact but is not supported by any ablation showing that the FP rate persists under different scanner prompts or skill corpora; if this rate is judge- or prompt-specific, the “undetectable” margin over YAML baselines does not follow.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed feedback on the abstract. We address each comment below and will revise the manuscript to incorporate additional experimental details, error bars, and an ablation study as noted.

read point-by-point responses

Referee: [Abstract] Abstract: The stealth advantage is justified solely by the claim that LLM scanners produce a 74.6% FP rate on clean skills (allowing POISE to add only 5.6% new alerts). No measurement protocol, judge identities, inter-judge agreement, benchmark skill selection criteria, or ablation on scanner prompt wording is supplied, making it impossible to determine whether this rate is a general property of body placement or an artifact of the chosen judges and templates.

Authors: The full measurement protocol is described in Section 5.2, including the four judges (GPT-4o, Claude-3-Opus, Llama-3-70B, Gemini-1.5), prompt templates, inter-judge agreement (Cohen's κ=0.81), and selection of 150 clean skills from Skill-Inject and AgentBench. We agree the abstract omits these details due to length and will revise it to summarize the protocol with section references. The prompts follow standard static scanner designs; an explicit ablation on wording variants was not performed. revision: yes
Referee: [Abstract] Abstract: The headline 89.3% ASR (and the 28-point and 2.6-point margins) is reported without error bars, full experimental protocol, dataset details, number of trials, or verification steps for the 74.6% FP figure. This directly limits assessment of whether the data supports the central reliability-plus-stealth claim.

Authors: ASR figures are means over 100 trials per condition (standard error ±1.8% for POISE); full protocol, dataset (50 skills from Skill-Inject), trial counts, and FP verification steps appear in Section 4.1 and Appendix B. We will revise the abstract to include error bars and explicit references to these sections. revision: yes
Referee: [Abstract] Abstract: The weakest assumption—that legitimate skill bodies naturally require privileged tool operations and therefore trigger high FP rates—is presented as observed fact but is not supported by any ablation showing that the FP rate persists under different scanner prompts or skill corpora; if this rate is judge- or prompt-specific, the “undetectable” margin over YAML baselines does not follow.

Authors: The high FP rate is an empirical result from the evaluated benchmarks and judges. We acknowledge that an ablation on alternative prompts and corpora would strengthen generality claims. We will add this ablation (two new prompt variants and one additional corpus) in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results on benchmarks

full rationale

The paper reports measured Attack Success Rates (89.3% ASR) and false-positive rates (74.6% on clean skills) from direct experiments on named benchmarks against explicit baselines (random-placement body, YAML-only). No equations, fitted parameters, self-citations, or derivations are present in the provided text; the central claims are observational comparisons that remain independent of any internal reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No mathematical derivations or fitted parameters; the work is an empirical attack evaluation relying on domain assumptions about agent behavior and scanner sensitivity.

axioms (1)

domain assumption LLM agents execute skills containing blended privileged operations without flagging them as anomalous when the malicious content is positioned among setup steps.
Implicit in the claim that body placement retains stealth while YAML does not; required for the attack success without task failure.

pith-pipeline@v0.9.1-grok · 5842 in / 1287 out tokens · 27842 ms · 2026-06-27T19:50:25.979754+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 19 canonical work pages · 13 internal anchors

[1]

Transactions of the association for computational linguistics , volume=

Lost in the middle: How language models use long contexts , author=. Transactions of the association for computational linguistics , volume=
[2]

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

Skill-inject: Measuring agent vulnerability to skill file attacks , author=. arXiv preprint arXiv:2602.20156 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[3]

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

SkillsBench: Benchmarking how well agent skills work across diverse tasks , author=. arXiv preprint arXiv:2602.12670 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=

Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection , author=. Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=
[5]

NeurIPS ML Safety Workshop , year=

Ignore Previous Prompt: Attack Techniques For Language Models , author=. NeurIPS ML Safety Workshop , year=
[6]

2025 , howpublished =

Equipping Agents for the Real World with Agent Skills , author =. 2025 , howpublished =

2025
[7]

2023 , html =

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle =. 2023 , html =

2023
[8]

Advances in neural information processing systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in neural information processing systems , volume=
[9]

International Conference on Learning Representations , volume=

Toolllm: Facilitating large language models to master 16000+ real-world apis , author=. International Conference on Learning Representations , volume=
[10]

Intrinsically-Motivated and Open-Ended Learning Workshop @NeurIPS2023 , year=

Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. Intrinsically-Motivated and Open-Ended Learning Workshop @NeurIPS2023 , year=
[11]

2023 IEEE Symposium on Security and Privacy (SP) , pages=

Sok: Taxonomy of attacks on open-source software supply chains , author=. 2023 IEEE Symposium on Security and Privacy (SP) , pages=. 2023 , organization=

2023
[12]

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment , pages=

Backstabber’s knife collection: A review of open source software supply chain attacks , author=. International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment , pages=. 2020 , organization=

2020
[13]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024
[14]

33rd USENIX Security Symposium (USENIX Security 24) , pages=

Formalizing and benchmarking prompt injection attacks and defenses , author=. 33rd USENIX Security Symposium (USENIX Security 24) , pages=
[15]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
[16]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Sparks of Artificial General Intelligence: Early experiments with GPT-4 , author=. arXiv preprint arXiv:2303.12712 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

2026 , howpublished =

Codex CLI , author =. 2026 , howpublished =

2026
[18]

2026 , howpublished =

Claude Code Documentation , author =. 2026 , howpublished =

2026
[19]

Advances in Neural Information Processing Systems , volume=

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents , author=. Advances in Neural Information Processing Systems , volume=
[20]

International Conference on Learning Representations , volume=

Agentharm: A benchmark for measuring harmfulness of llm agents , author=. International Conference on Learning Representations , volume=
[21]

arXiv preprint arXiv:2510.02554 , year=

Tooltweak: An attack on tool selection in llm-based agents , author=. arXiv preprint arXiv:2510.02554 , year=

work page arXiv
[22]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Mcptox: A benchmark for tool poisoning on real-world mcp servers , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[23]

The 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents , year=

Skillject: Automating stealthy skill-based prompt injection for coding agents with trace-driven closed-loop refinement , author=. The 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents , year=
[24]

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems , author=. arXiv preprint arXiv:2604.03081 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement

Skillattack: Automated red teaming of agent skills through attack path refinement , author=. arXiv preprint arXiv:2604.04989 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

First Workshop on Agent Skills , year=

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward , author=. First Workshop on Agent Skills , year=
[27]

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

Towards secure agent skills: Architecture, threat taxonomy, and security analysis , author=. arXiv preprint arXiv:2604.02837 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[28]

"Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

Malicious agent skills in the wild: A large-scale security empirical study , author=. arXiv preprint arXiv:2602.06547 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[29]

Exploiting LLM Agent Supply Chains via Payload-less Skills

Exploiting LLM Agent Supply Chains via Payload-less Skills , author=. arXiv preprint arXiv:2605.14460 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Prompt Injection attack against LLM-integrated Applications

Prompt injection attack against llm-integrated applications , author=. arXiv preprint arXiv:2306.05499 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Universal and transferable adversarial attacks on aligned language models , author=. arXiv preprint arXiv:2307.15043 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[32]

do anything now

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models , author=. Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security , pages=

2024
[33]

arXiv preprint arXiv:2506.02456 , year=

Vpi-bench: Visual prompt injection attacks for computer-use agents , author=. arXiv preprint arXiv:2506.02456 , year=

work page arXiv
[34]

Advances in Neural Information Processing Systems , volume=

Memory injection attacks on LLM agents via query-only interaction , author=. Advances in Neural Information Processing Systems , volume=
[35]

arXiv preprint arXiv:2601.05504 , year=

Memory poisoning attack and defense on memory based llm-agents , author=. arXiv preprint arXiv:2601.05504 , year=

work page arXiv
[36]

34th USENIX Security Symposium (USENIX Security 25) , pages=

\ StruQ \ : Defending against prompt injection with structured queries , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=
[37]

European Symposium on Research in Computer Security , pages=

Jatmo: Prompt injection defense by task-specific finetuning , author=. European Symposium on Research in Computer Security , pages=. 2024 , organization=

2024
[38]

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Defending against indirect prompt injection attacks with spotlighting , author=. arXiv preprint arXiv:2403.14720 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[39]

AIP Conference Proceedings , volume=

Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applications , author=. AIP Conference Proceedings , volume=. 2024 , organization=

2024
[40]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

Towards Reverse Engineering of Language Models: A Survey , author=. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

2025
[41]

arXiv preprint arXiv:2509.24566 , year=

Tokenswap: Backdoor attack on the compositional understanding of large vision-language models , author=. arXiv preprint arXiv:2509.24566 , year=

work page arXiv
[42]

Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation

Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation , author=. arXiv preprint arXiv:2603.17368 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[43]

Advances in Neural Information Processing Systems , volume=

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases , author=. Advances in Neural Information Processing Systems , volume=
[44]

34th USENIX Security Symposium (USENIX Security 25) , pages=

\ PoisonedRAG \ : Knowledge corruption attacks to \ Retrieval-Augmented \ generation of large language models , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=
[45]

Advances in Neural Information Processing Systems , volume=

Watch out for your agents! investigating backdoor threats to llm-based agents , author=. Advances in Neural Information Processing Systems , volume=
[46]

arXiv preprint arXiv:2402.08567 , year=

Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast , author=. arXiv preprint arXiv:2402.08567 , year=

work page arXiv
[47]

Coercing LLMs to do and reveal (almost) anything,

Coercing llms to do and reveal (almost) anything , author=. arXiv preprint arXiv:2402.14020 , year=

work page arXiv
[48]

CoEvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

Coevoskills: Self-evolving agent skills via co-evolutionary verification , author=. arXiv preprint arXiv:2604.01687 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Transactions of the association for computational linguistics , volume=

Lost in the middle: How language models use long contexts , author=. Transactions of the association for computational linguistics , volume=

[2] [2]

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

Skill-inject: Measuring agent vulnerability to skill file attacks , author=. arXiv preprint arXiv:2602.20156 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

SkillsBench: Benchmarking how well agent skills work across diverse tasks , author=. arXiv preprint arXiv:2602.12670 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=

Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection , author=. Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=

[5] [5]

NeurIPS ML Safety Workshop , year=

Ignore Previous Prompt: Attack Techniques For Language Models , author=. NeurIPS ML Safety Workshop , year=

[6] [6]

2025 , howpublished =

Equipping Agents for the Real World with Agent Skills , author =. 2025 , howpublished =

2025

[7] [7]

2023 , html =

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle =. 2023 , html =

2023

[8] [8]

Advances in neural information processing systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in neural information processing systems , volume=

[9] [9]

International Conference on Learning Representations , volume=

Toolllm: Facilitating large language models to master 16000+ real-world apis , author=. International Conference on Learning Representations , volume=

[10] [10]

Intrinsically-Motivated and Open-Ended Learning Workshop @NeurIPS2023 , year=

Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. Intrinsically-Motivated and Open-Ended Learning Workshop @NeurIPS2023 , year=

[11] [11]

2023 IEEE Symposium on Security and Privacy (SP) , pages=

Sok: Taxonomy of attacks on open-source software supply chains , author=. 2023 IEEE Symposium on Security and Privacy (SP) , pages=. 2023 , organization=

2023

[12] [12]

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment , pages=

Backstabber’s knife collection: A review of open source software supply chain attacks , author=. International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment , pages=. 2020 , organization=

2020

[13] [13]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024

[14] [14]

33rd USENIX Security Symposium (USENIX Security 24) , pages=

Formalizing and benchmarking prompt injection attacks and defenses , author=. 33rd USENIX Security Symposium (USENIX Security 24) , pages=

[15] [15]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

[16] [16]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Sparks of Artificial General Intelligence: Early experiments with GPT-4 , author=. arXiv preprint arXiv:2303.12712 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

2026 , howpublished =

Codex CLI , author =. 2026 , howpublished =

2026

[18] [18]

2026 , howpublished =

Claude Code Documentation , author =. 2026 , howpublished =

2026

[19] [19]

Advances in Neural Information Processing Systems , volume=

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents , author=. Advances in Neural Information Processing Systems , volume=

[20] [20]

International Conference on Learning Representations , volume=

Agentharm: A benchmark for measuring harmfulness of llm agents , author=. International Conference on Learning Representations , volume=

[21] [21]

arXiv preprint arXiv:2510.02554 , year=

Tooltweak: An attack on tool selection in llm-based agents , author=. arXiv preprint arXiv:2510.02554 , year=

work page arXiv

[22] [22]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Mcptox: A benchmark for tool poisoning on real-world mcp servers , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[23] [23]

The 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents , year=

Skillject: Automating stealthy skill-based prompt injection for coding agents with trace-driven closed-loop refinement , author=. The 6th Workshop of Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents , year=

[24] [24]

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems , author=. arXiv preprint arXiv:2604.03081 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement

Skillattack: Automated red teaming of agent skills through attack path refinement , author=. arXiv preprint arXiv:2604.04989 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

First Workshop on Agent Skills , year=

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward , author=. First Workshop on Agent Skills , year=

[27] [27]

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

Towards secure agent skills: Architecture, threat taxonomy, and security analysis , author=. arXiv preprint arXiv:2604.02837 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

"Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

Malicious agent skills in the wild: A large-scale security empirical study , author=. arXiv preprint arXiv:2602.06547 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

Exploiting LLM Agent Supply Chains via Payload-less Skills

Exploiting LLM Agent Supply Chains via Payload-less Skills , author=. arXiv preprint arXiv:2605.14460 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Prompt Injection attack against LLM-integrated Applications

Prompt injection attack against llm-integrated applications , author=. arXiv preprint arXiv:2306.05499 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Universal and transferable adversarial attacks on aligned language models , author=. arXiv preprint arXiv:2307.15043 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

do anything now

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models , author=. Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security , pages=

2024

[33] [33]

arXiv preprint arXiv:2506.02456 , year=

Vpi-bench: Visual prompt injection attacks for computer-use agents , author=. arXiv preprint arXiv:2506.02456 , year=

work page arXiv

[34] [34]

Advances in Neural Information Processing Systems , volume=

Memory injection attacks on LLM agents via query-only interaction , author=. Advances in Neural Information Processing Systems , volume=

[35] [35]

arXiv preprint arXiv:2601.05504 , year=

Memory poisoning attack and defense on memory based llm-agents , author=. arXiv preprint arXiv:2601.05504 , year=

work page arXiv

[36] [36]

34th USENIX Security Symposium (USENIX Security 25) , pages=

\ StruQ \ : Defending against prompt injection with structured queries , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=

[37] [37]

European Symposium on Research in Computer Security , pages=

Jatmo: Prompt injection defense by task-specific finetuning , author=. European Symposium on Research in Computer Security , pages=. 2024 , organization=

2024

[38] [38]

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Defending against indirect prompt injection attacks with spotlighting , author=. arXiv preprint arXiv:2403.14720 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[39] [39]

AIP Conference Proceedings , volume=

Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applications , author=. AIP Conference Proceedings , volume=. 2024 , organization=

2024

[40] [40]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

Towards Reverse Engineering of Language Models: A Survey , author=. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages=

2025

[41] [41]

arXiv preprint arXiv:2509.24566 , year=

Tokenswap: Backdoor attack on the compositional understanding of large vision-language models , author=. arXiv preprint arXiv:2509.24566 , year=

work page arXiv

[42] [42]

Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation

Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation , author=. arXiv preprint arXiv:2603.17368 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[43] [43]

Advances in Neural Information Processing Systems , volume=

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases , author=. Advances in Neural Information Processing Systems , volume=

[44] [44]

34th USENIX Security Symposium (USENIX Security 25) , pages=

\ PoisonedRAG \ : Knowledge corruption attacks to \ Retrieval-Augmented \ generation of large language models , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=

[45] [45]

Advances in Neural Information Processing Systems , volume=

Watch out for your agents! investigating backdoor threats to llm-based agents , author=. Advances in Neural Information Processing Systems , volume=

[46] [46]

arXiv preprint arXiv:2402.08567 , year=

Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast , author=. arXiv preprint arXiv:2402.08567 , year=

work page arXiv

[47] [47]

Coercing LLMs to do and reveal (almost) anything,

Coercing llms to do and reveal (almost) anything , author=. arXiv preprint arXiv:2402.14020 , year=

work page arXiv

[48] [48]

CoEvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

Coevoskills: Self-evolving agent skills via co-evolutionary verification , author=. arXiv preprint arXiv:2604.01687 , year=

work page internal anchor Pith review Pith/arXiv arXiv