hub Canonical reference

Prompt Injection Attack to Tool Selection in LLM Agents

Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun · 2025 · cs.CR · arXiv 2504.19793

Canonical reference. 100% of citing Pith papers cite this work as background.

24 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 24 citing papers arXiv PDF

abstract

Tool selection is a key component of LLM agents. A popular approach follows a two-step process - \emph{retrieval} and \emph{selection} - to pick the most appropriate tool from a tool library for a given task. In this work, we introduce \textit{ToolHijacker}, a novel prompt injection attack targeting tool selection in no-box scenarios. ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. Specifically, we formulate the crafting of such tool documents as an optimization problem and propose a two-phase optimization strategy to solve it. Our extensive experimental evaluation shows that ToolHijacker is highly effective, significantly outperforming existing manual-based and automated prompt injection attacks when applied to tool selection. Moreover, we explore various defenses, including prevention-based defenses (StruQ and SecAlign) and detection-based defenses (known-answer detection, DataSentinel, perplexity detection, and perplexity windowed detection). Our experimental results indicate that these defenses are insufficient, highlighting the urgent need for developing new defense strategies.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7

citation-polarity summary

background 7

representative citing papers

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

cs.CR · 2026-05-27 · unverdicted · novelty 8.0

Roughly 1% of real resumes contain hidden prompt injections against LLM screeners, prevalence has risen over 1-2 years, and over 90% avoid explicit instructions.

Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions

cs.CR · 2026-05-08 · conditional · novelty 8.0

Agentic Workflow Injection is a new injection vulnerability class in LLM-augmented GitHub Actions, with two patterns (P2A and P2S) detected via the TaintAWI tool yielding 496 confirmed exploitable instances across 13,392 workflows.

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

Agent-native LLMs are substantially more vulnerable to adversarial instructions arriving in tool descriptions than user messages (with the pattern reversing for general-purpose models and inverting again for tool outputs), as quantified by the new Safety Asymmetry Score across six models and three a

AIRGuard: Guarding Agent Actions with Runtime Authority Control

cs.CR · 2026-05-27 · unverdicted · novelty 7.0

AIRGuard is a runtime authority-control layer for tool-using agents that reduces attack success on AgentTrap from 36.3% to 5.5% while retaining higher benign utility than ARGUS or MELON on DTAP-150.

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

Model-adaptive tool necessity shows 26-54% mismatch with actual tool calls across LLMs, driven by nearly orthogonal hidden-state signals for cognition versus action.

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

cs.CR · 2026-05-13 · unverdicted · novelty 7.0

Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.

FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.

The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.

ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

cs.AI · 2026-04-06 · unverdicted · novelty 7.0

ShieldNet detects supply-chain poisoned tools in LLM agents by monitoring network interactions with a MITM proxy and lightweight classifier, reaching 0.995 F1 and 0.8% false positives on a new benchmark of 25+ attack types.

The Surface You Test Is Not the Surface That Breaks

cs.CR · 2026-05-28 · unverdicted · novelty 6.0

Prompt injection vulnerability in tool-augmented LLMs is a model-surface interaction rather than a fixed channel property; the same payload inverts success rates across models, and adaptive attack rate exceeds single-surface baselines by 9.1 pp on average.

Behavioral Integrity Verification for AI Agent Skills

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

cs.CR · 2026-05-05 · unverdicted · novelty 6.0

ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.

CleanBase: Detecting Malicious Documents in RAG Knowledge Databases

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.

BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

cs.CR · 2026-04-10 · unverdicted · novelty 6.0

BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.

Security Considerations for Multi-agent Systems

cs.CR · 2026-03-09 · unverdicted · novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

cs.CR · 2026-05-30 · unverdicted · novelty 5.0

SkillVetBench is a two-stage benchmark combining natural-language semantic vetting and instrumented sandbox execution to detect and provide runtime evidence for malicious skills in open agent platforms, with experiments showing static methods miss up to 89% of threats.

VIPER-MCP: Detecting and Exploiting Taint-Style Vulnerabilities in Model Context Protocol Servers

cs.CR · 2026-05-20 · unverdicted · novelty 5.0

VIPER-MCP detects and exploits taint-style vulnerabilities in Model Context Protocol servers via anchor-query static analysis and feedback-driven prompt evolution, uncovering 106 zero-day vulnerabilities across 39,884 repositories with 67 CVEs assigned.

CapSeal: Capability-Sealed Secret Mediation for Secure Agent Execution

cs.CR · 2026-04-18 · unverdicted · novelty 5.0

CapSeal introduces a capability-sealed broker architecture that lets AI agents perform constrained secret-using actions without ever receiving the secrets themselves.

Your LLM Agents are Temporally Blind: The Misalignment Between Tool Use Decisions and Human Time Perception

cs.CL · 2025-10-27 · unverdicted · novelty 5.0

LLM agents exhibit temporal blindness, achieving no better than 65% normalized alignment with human preferences on tool-use decisions across time-sensitive scenarios in the new TicToc dataset.

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

cs.CR · 2026-04-30 · conditional · novelty 4.0

The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

cs.AI · 2026-04-11 · unverdicted · novelty 4.0

STARS fuses static priors and contextual risk scoring for agent skill invocations, achieving modest AUPRC gains on prompt injection attacks in a new SIA-Bench but concluding it supplements rather than replaces static auditing.

How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study

cs.CR · 2026-04-03

citing papers explorer

Showing 7 of 7 citing papers after filters.

Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions cs.CR · 2026-05-08 · conditional · none · ref 16 · internal anchor
Agentic Workflow Injection is a new injection vulnerability class in LLM-augmented GitHub Actions, with two patterns (P2A and P2S) detected via the TaintAWI tool yielding 496 confirmed exploitable instances across 13,392 workflows.
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems cs.CR · 2026-05-12 · unverdicted · none · ref 49 · internal anchor
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck cs.CR · 2026-05-11 · unverdicted · none · ref 20 · internal anchor
PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.
Behavioral Integrity Verification for AI Agent Skills cs.CR · 2026-05-12 · unverdicted · none · ref 29 · internal anchor
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis cs.CR · 2026-05-01 · unverdicted · none · ref 39 · internal anchor
Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.
Security Considerations for Multi-agent Systems cs.CR · 2026-03-09 · unverdicted · none · ref 39 · internal anchor
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study cs.CR · 2026-04-03 · unreviewed · ref 52 · internal anchor

Prompt Injection Attack to Tool Selection in LLM Agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer