MalTool: Malicious Tool Attacks on LLM Agents

· 2026 · cs.CR · arXiv 2602.12194

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open full Pith review browse 8 citing papers arXiv PDF

abstract

In a malicious tool attack, an attacker uploads a malicious tool to a distribution platform; once a user inadvertently installs the tool and the LLM agent selects it during task execution, the tool can compromise the user's security and privacy. Prior work focuses on manipulating tool names and descriptions to increase the likelihood of installation by users and selection by LLM agents. However, a successful attack also requires embedding malicious behaviors in the tool's code implementation, which remains largely unexplored. In this work, we bridge this gap by presenting the first systematic study of malicious tool code implementations. We first propose a taxonomy of malicious tool behaviors based on the confidentiality-integrity-availability triad, tailored to LLM-agent settings. To investigate the severity of the risks posed by attackers exploiting coding LLMs to automatically generate malicious tools, we develop MalTool, a coding-LLM-based framework that synthesizes tools exhibiting specified malicious behaviors, either as standalone tools or embedded within otherwise benign implementations. To ensure functional correctness and structural diversity, MalTool leverages an automated verifier that validates whether generated tools exhibit the intended malicious behaviors and differ sufficiently from previously generated instances, iteratively refining generations until success. Our evaluation demonstrates that MalTool is highly effective even when coding LLMs are safety-aligned. Using MalTool, we construct two datasets of malicious tools: 1,300 standalone malicious tools and 5,727 real-world tools with embedded malicious behaviors. We further show that existing detection methods, including conventional malware detection approaches and methods tailored to the LLM-agent setting, exhibit limited effectiveness at detecting the malicious tools, highlighting an urgent need for new defenses.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Dynamic Malicious Skills in Agentic AI

cs.CR · 2026-06-15 · unverdicted · novelty 7.0

Attackers can dynamically inject malicious logic into benign AI agent skills by embedding instructions in documentation like SKILL.md, demonstrated on OpenHands and Claude Code, with a kernel read-only mount defense proposed.

Evidence-Bound Gateway-Path Provenance for Third-Party LLM Inference

cs.CR · 2026-06-21 · unverdicted · novelty 6.0

Proposes evidence-bound LLM gateway using attested runtime for verifiable path provenance and policy enforcement.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

cs.CR · 2026-06-01 · unverdicted · novelty 6.0

SeClaw provides spec-driven synthesis of security tasks and an execution-based docker testbed for evaluating unsafe behaviors in autonomous LLM agents.

Lingering Authority: Revocable Resource-and-Effect Capabilities for Coding Agents

cs.CR · 2026-06-21 · unverdicted · novelty 5.0

PORTICO is a revocable capability reference monitor for coding agents that enforces task contracts via grant-invoke-closure lifecycles and rejects post-closure reuses while preserving task success.

ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense

cs.CR · 2026-06-19 · unverdicted · novelty 5.0

ARENA creates anonymized SOC telemetry artifacts that reveal a measurable privacy-utility boundary when used both as training material for MITRE-mapped challenges and as a substrate to detect non-compliant LLM defender actions.

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

cs.CR · 2026-06-09 · unverdicted · novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures

cs.AI · 2026-05-25 · unverdicted · novelty 2.0

A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.

How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study

cs.CR · 2026-04-03

citing papers explorer

Showing 7 of 7 citing papers after filters.

Dynamic Malicious Skills in Agentic AI cs.CR · 2026-06-15 · unverdicted · none · ref 4 · internal anchor
Attackers can dynamically inject malicious logic into benign AI agent skills by embedding instructions in documentation like SKILL.md, demonstrated on OpenHands and Claude Code, with a kernel read-only mount defense proposed.
Evidence-Bound Gateway-Path Provenance for Third-Party LLM Inference cs.CR · 2026-06-21 · unverdicted · none · ref 14 · internal anchor
Proposes evidence-bound LLM gateway using attested runtime for verifiable path provenance and policy enforcement.
SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents cs.CR · 2026-06-01 · unverdicted · none · ref 6 · internal anchor
SeClaw provides spec-driven synthesis of security tasks and an execution-based docker testbed for evaluating unsafe behaviors in autonomous LLM agents.
Lingering Authority: Revocable Resource-and-Effect Capabilities for Coding Agents cs.CR · 2026-06-21 · unverdicted · none · ref 19 · internal anchor
PORTICO is a revocable capability reference monitor for coding agents that enforces task contracts via grant-invoke-closure lifecycles and rejects post-closure reuses while preserving task success.
ARENA: An Architecture for Measuring the Transferability of Autonomous Cyber Defense cs.CR · 2026-06-19 · unverdicted · none · ref 41 · internal anchor
ARENA creates anonymized SOC telemetry artifacts that reveal a measurable privacy-utility boundary when used both as training material for MITRE-mapped challenges and as a substrate to detect non-compliant LLM defender actions.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 66 · internal anchor
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.
Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures cs.AI · 2026-05-25 · unverdicted · none · ref 69 · internal anchor
A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.

MalTool: Malicious Tool Attacks on LLM Agents

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer