pith. sign in

hub Canonical reference

Prompt Injection Attack to Tool Selection in LLM Agents

Canonical reference. 100% of citing Pith papers cite this work as background.

21 Pith papers citing it
Background 100% of classified citations
abstract

Tool selection is a key component of LLM agents. A popular approach follows a two-step process - \emph{retrieval} and \emph{selection} - to pick the most appropriate tool from a tool library for a given task. In this work, we introduce \textit{ToolHijacker}, a novel prompt injection attack targeting tool selection in no-box scenarios. ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. Specifically, we formulate the crafting of such tool documents as an optimization problem and propose a two-phase optimization strategy to solve it. Our extensive experimental evaluation shows that ToolHijacker is highly effective, significantly outperforming existing manual-based and automated prompt injection attacks when applied to tool selection. Moreover, we explore various defenses, including prevention-based defenses (StruQ and SecAlign) and detection-based defenses (known-answer detection, DataSentinel, perplexity detection, and perplexity windowed detection). Our experimental results indicate that these defenses are insufficient, highlighting the urgent need for developing new defense strategies.

hub tools

citation-role summary

background 7

citation-polarity summary

years

2026 20 2025 1

roles

background 7

polarities

background 7

clear filters

representative citing papers

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

Agent-native LLMs are substantially more vulnerable to adversarial instructions arriving in tool descriptions than user messages (with the pattern reversing for general-purpose models and inverting again for tool outputs), as quantified by the new Safety Asymmetry Score across six models and three a

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

cs.CR · 2026-05-05 · unverdicted · novelty 6.0

ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.

Security Considerations for Multi-agent Systems

cs.CR · 2026-03-09 · unverdicted · novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

citing papers explorer

Showing 1 of 1 citing paper after filters.