Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment

Dongdong She; Kaikai Zhang; Mingyu Luo; Ping Chen; Shuai Wang; Yuchong Xie; Yu Liu; Zesen Liu; Zhixiang Zhang; Zongjie Li

arxiv: 2509.05755 · v6 · pith:O3I442MWnew · submitted 2025-09-06 · 💻 cs.CR · cs.AI

Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment

Yuchong Xie , Mingyu Luo , Zesen Liu , Zhixiang Zhang , Kaikai Zhang , Yu Liu , Zongjie Li , Ping Chen

show 2 more authors

Shuai Wang Dongdong She

This is my paper

classification 💻 cs.CR cs.AI

keywords agentscodingsecuritytoolclaudecodeinvocationphase

0 comments

read the original abstract

Coding agents powered by large language models are becoming central modules of modern IDEs, helping users perform complex tasks by invoking tools. While powerful, tool invocation opens a substantial attack surface. Prior work has demonstrated attacks against general-purpose and domain-specific agents, but none have focused on the security risks of tool invocation in coding agents. To fill this gap, we conduct the first systematic red-teaming of six popular real-world coding agents: Cursor, Claude Code, Copilot, Windsurf, Cline, and Trae. Our red-teaming proceeds in two phases. In Phase 1, we perform prompt leakage reconnaissance to recover system prompts. We discover a general vulnerability, ToolLeak, which allows malicious prompt exfiltration through benign argument retrieval during tool invocation. In Phase 2, we hijack the agent's tool-invocation behavior using a novel two-channel prompt injection in the tool description and return values, achieving remote code execution (RCE). We adaptively construct payloads using security information leaked in Phase 1. In emulation across five backends, our method outperforms baselines on Claude-Sonnet-4, Claude-Sonnet-4.5, Grok-4, and GPT-5. On real agents, our approach succeeds on 19 of 25 agent-LLM pairs, achieving leakage on every agent using Claude and Grok backends. For tool-invocation hijacking, we obtain RCE on every tested agent-LLM pair, with our two-channel method delivering the highest success rate. We provide case studies on Cursor and Claude Code, analyze security guardrails of external and built-in tools, and conclude with practical defense recommendations.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Parasites in the Toolchain: A Large-Scale Analysis of Attacks on the MCP Ecosystem
cs.CR 2025-09 unverdicted novelty 8.0

This paper defines a new Parasitic Toolchain Attack pattern (MCP-UPD) that assembles legitimate tools into privacy-exfiltrating workflows and reports the first large-scale scan of 12230 MCP tools across 1360 servers r...
When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents
cs.CR 2026-05 unverdicted novelty 7.0

Introduces MCP-TDP benchmark showing near-100% attack success on models like GPT-4o for tool description poisoning and proposes reactive self-correction defense.
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents
cs.CR 2026-05 unverdicted novelty 7.0

A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.
Security Considerations for Multi-agent Systems
cs.CR 2026-03 unverdicted novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation
cs.CR 2026-06 unverdicted novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.