VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation

Harshil Patel; Kunal Pai

arxiv: 2606.07992 · v1 · pith:WJ2U2UEBnew · submitted 2026-06-06 · 💻 cs.AI · cs.CR· cs.SE

VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation

Harshil Patel , Kunal Pai This is my paper

Pith reviewed 2026-06-27 19:56 UTC · model grok-4.3

classification 💻 cs.AI cs.CRcs.SE

keywords prompt injectiontool callingAI agentserror handlingvulnerability analysismodel context protocol

0 comments

The pith

Tool error messages carry implicit authority that mutated injections exploit to triple indirect prompt injection success in AI agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the idea that error messages returned during tool calls trigger corrective reasoning in models, causing them to overlook safety rules. It builds a framework that mutates potential attack payloads along structural and linguistic axes to insert instructions inside those error responses. Tests on four frontier models show the approach raises success rates three times above ordinary indirect prompt injection and can reach full compliance. Structural placement of the instructions inside the error context proves the strongest single factor, while some production guardrails reduce but do not eliminate the exposure.

Core claim

Tool error messages possess implicit authority that triggers corrective reasoning modes bypassing standard safety heuristics, allowing systematic mutation of payloads in the error-handling loop to achieve error-path injection that triples the success rate of indirect prompt injection and reaches up to 100 percent compliance.

What carries the argument

VATS, a mutation-driven framework that systematically evolves adversarial payloads across seven structural and linguistic dimensions and isolates structural positioning as the strongest vector.

If this is right

Production framework guardrails can mitigate these vulnerabilities.
The model layer itself remains susceptible, creating systemic risk for bespoke agentic workflows.
Structural positioning of instructions within error context is the most effective exploit vector across all tested models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agent designs may need separate safety checks on error content that differ from checks on normal user input.
Models could be retrained to treat tool errors strictly as diagnostic data rather than authoritative directives.
The same mutation approach might expose similar weaknesses in other feedback loops such as retry messages or status updates.

Load-bearing premise

Tool error messages possess implicit authority that causes models to enter corrective reasoning modes which ignore safety rules.

What would settle it

Replace the mutated error messages with neutral non-authoritative text while keeping the same injected instructions and measure whether attack success falls back to the level of standard indirect prompt injection.

Figures

Figures reproduced from arXiv: 2606.07992 by Harshil Patel, Kunal Pai.

**Figure 1.** Figure 1: Gemini CLI (v0.41.2) validation trace. The framework explicitly flags the error-path injection as an exfiltration attempt and bypasses the payload to use native grep for task completion. All maintained identical conversation context, user tasks, assistant tool calls, and payload positioning, isolating the structured error JSON as the single independent variable. Each paired configuration was sampled 3 time… view at source ↗

**Figure 2.** Figure 2: OpenAI Codex (v0.129.0) validation trace. The framework-level constraints in AGENTS.md successfully suppress the model’s impulse to follow the tool-injected recovery instructions. GLM-5.1 Trace (Error Condition) “The error message suggests sending a workspace file listing... Let me follow those instructions.” — GLM-5.1, error condition → send email invoked This procedural framing appeared in 53% of the err… view at source ↗

read the original abstract

As the Model Context Protocol (MCP) standardizes tool-calling for autonomous agents, it introduces a critical, unexamined attack surface: the error-handling loop. We hypothesize that tool error messages possess implicit authority, triggering corrective reasoning modes that bypass standard safety heuristics. We introduce VATS (Vulnerability Analysis of Tool Streams), a mutation-driven framework that systematically evolves adversarial payloads across seven structural and linguistic dimensions. Our evaluation across four frontier models, Gemini 3.1 Pro, GPT-5.5, GLM-5.1, and Qwen3-Coder, demonstrates that error-path injection triples the success rate of standard indirect prompt injection (IPI), achieving up to 100% compliance in controlled evaluations. We isolate structural positioning (sandwiching instructions within error context) as the most effective exploit vector across all tested models. While we find that production framework guardrails can mitigate these vulnerabilities, the inherent susceptibility of the model layer poses a systemic risk to bespoke agentic workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract claims error messages in MCP give implicit authority that triples IPI success via VATS mutations, but the evaluation provides no ablations or methods so the attribution to error context is untested.

read the letter

The paper's main point is that error streams in tool-calling protocols create a new injection vector because models treat error messages with extra weight. VATS mutates payloads across seven dimensions and the authors report that placing instructions inside error context triples success rates over plain indirect prompt injection, reaching 100% on some of the four models tested. They also flag structural positioning as the strongest factor.

What stands out is the narrow focus on the error-handling loop in MCP rather than generic prompt injection. That is a concrete extension of prior work on indirect attacks, and the mutation framework itself looks like a practical way to generate variants.

The soft spot is exactly the one the stress-test note flags. The paper only compares error-path versions against standard IPI; it does not test whether the same VATS-mutated payloads perform similarly when inserted into non-error messages. Without that check, the performance gain could come from the mutations alone rather than any special authority of errors. The abstract also gives no experimental design, baseline definitions, sample sizes, or statistical controls, so the quantitative claims cannot be assessed. The hypothesis about triggering corrective reasoning is stated but not isolated.

This is aimed at people working on agent security and tool-use protocols. A reader scanning for new attack surfaces might pick up the idea and the mutation dimensions, but anyone trying to reproduce or extend the results would need the missing methods section.

I would not cite it yet. It might be worth sending for peer review once the authors add the ablation and full evaluation details, but on the current text the central claim is not supported enough to justify referee time.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces VATS, a mutation-driven framework that evolves adversarial payloads across seven structural and linguistic dimensions to perform error-path injection in the Model Context Protocol (MCP) error-handling loop for autonomous agents. It hypothesizes that tool error messages carry implicit authority that triggers corrective reasoning and bypasses safety heuristics. The central empirical claim is that this approach triples the success rate of standard indirect prompt injection (IPI) and reaches up to 100% compliance on four frontier models (Gemini 3.1 Pro, GPT-5.5, GLM-5.1, Qwen3-Coder), with structural positioning (sandwiching) identified as the strongest vector; production guardrails are said to mitigate but not eliminate the model-layer risk.

Significance. If the quantitative results and attribution to error-path authority hold after proper controls, the work would identify a previously unexamined attack surface in standardized agent tool-calling protocols and supply a systematic, extensible method for discovering such vulnerabilities. The multi-model evaluation and isolation of structural positioning are potential strengths for reproducibility and follow-on research in AI agent security.

major comments (2)

[Abstract] Abstract: The quantitative claims that error-path injection 'triples the success rate of standard indirect prompt injection (IPI)' and achieves 'up to 100% compliance' are stated without any description of experimental design, number of trials per condition, definition of success/compliance, baseline IPI success rates, or statistical methods. This absence prevents verification that the data support the stated effect sizes.
[Abstract and Evaluation] Abstract and Evaluation: The reported performance gain is attributed to the implicit authority of tool error messages, yet the evaluation only contrasts error-path injection against standard IPI. No ablation is described in which the same VATS-mutated payloads are placed in non-error contexts; without this control it is impossible to determine whether the tripling is caused by the error framing or by the mutation framework itself.

minor comments (1)

[Abstract] The abstract refers to 'seven structural and linguistic dimensions' but does not enumerate them; an explicit list or table would improve reproducibility even if the full details appear later.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and evaluation design. We address each major comment below and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The quantitative claims that error-path injection 'triples the success rate of standard indirect prompt injection (IPI)' and achieves 'up to 100% compliance' are stated without any description of experimental design, number of trials per condition, definition of success/compliance, baseline IPI success rates, or statistical methods. This absence prevents verification that the data support the stated effect sizes.

Authors: We agree that the abstract would benefit from additional methodological context to allow readers to assess the claims more readily. The full evaluation section reports the relevant details (trial counts, success definitions, baselines, and statistical approach), but these are not summarized in the abstract. In the revised manuscript we will expand the abstract to include a concise description of the experimental design, number of trials, success criteria, and baseline rates. revision: yes
Referee: [Abstract and Evaluation] Abstract and Evaluation: The reported performance gain is attributed to the implicit authority of tool error messages, yet the evaluation only contrasts error-path injection against standard IPI. No ablation is described in which the same VATS-mutated payloads are placed in non-error contexts; without this control it is impossible to determine whether the tripling is caused by the error framing or by the mutation framework itself.

Authors: This observation is correct. The current evaluation isolates the effect of the error-path context by comparing VATS-augmented error messages against standard IPI, but does not include an ablation that applies the identical VATS-mutated payloads outside error contexts. Such a control would more cleanly attribute gains to the hypothesized authority of error messages versus the mutation framework alone. We will add this ablation experiment to the revised evaluation section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation of mutation framework stands independently

full rationale

The paper presents a hypothesis about error messages and introduces VATS as a systematic mutation framework, then reports direct experimental comparisons of success rates on four frontier models. No mathematical derivations, equations, fitted parameters renamed as predictions, or load-bearing self-citations appear. The central results are framed as measured outcomes from controlled evaluations rather than quantities derived from the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, parameters, or background assumptions that can be extracted; the ledger is therefore empty.

pith-pipeline@v0.9.1-grok · 5700 in / 1181 out tokens · 20840 ms · 2026-06-27T19:56:03.564606+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 11 canonical work pages · 4 internal anchors

[1]

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

URL https://arxiv.org/abs/ 2604.20994. Cartagena, A. and Teixeira, A. Mind the gap: Text safety does not transfer to tool-call safety in llm agents.arXiv preprint arXiv:2602.16943,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems

Chang, Z., Li, M., Jia, X., Wang, J., Huang, Y ., Jiang, Z., Liu, Y ., and Wang, Q. One shot dominance: Knowl- edge poisoning attack on retrieval-augmented generation systems.arXiv preprint arXiv:2505.11548,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

In-browser llm-guided fuzzing for real-time prompt injection testing in agentic ai browsers.arXiv preprint arXiv:2510.13543,

Cohen, A. In-browser llm-guided fuzzing for real-time prompt injection testing in agentic ai browsers.arXiv preprint arXiv:2510.13543,

work page arXiv
[4]

MCP adoption statistics 2026: Model context protocol, April

Digital Applied Team. MCP adoption statistics 2026: Model context protocol, April

2026
[5]

Geng, Y ., Li, H., Mu, H., Han, X., Baldwin, T., Abend, O., Hovy, E., and Frermann, L

URL https://www.digitalapplied.com/blog/ mcp-adoption-statistics-2026-model- context-protocol. Geng, Y ., Li, H., Mu, H., Han, X., Baldwin, T., Abend, O., Hovy, E., and Frermann, L. Control illusion: The failure of instruction hierarchies in large language mod- els. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pp. 30816–30824,

2026
[6]

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

URL https://arxiv.org/abs/2512.23128. 5 V ATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation Lin, J., Zhou, Z., Zheng, Z., Liu, S., Xu, T., Chen, Y ., and Chen, E. Vigil: Defending llm agents against tool stream injection via verify-before-commit.arXiv preprint arXiv:2601.05755,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Liu, Y ., Wang, W., Feng, R., Zhang, Y ., Xu, G., Deng, G., Li, Y ., and Zhang, L

URL https://arxiv.org/abs/2406.03807. Liu, Y ., Wang, W., Feng, R., Zhang, Y ., Xu, G., Deng, G., Li, Y ., and Zhang, L. Agent skills in the wild: An empirical study of security vulnerabilities at scale.arXiv preprint arXiv:2601.10338,

work page arXiv
[8]

Model Context Protocol

URL https://arxiv.org/abs/ 2601.17549. Model Context Protocol. Model context protocol specifi- cation. https://modelcontextprotocol.io/ specification/2025-11-25, nov

work page arXiv 2025
[9]

Version 2025-11-

URL https://modelcontextprotocol.io/ specification/2025-11-25. Version 2025-11-

2025
[10]

Accessed: 2026-05-06. OpenAI. codex: Lightweight coding agent that runs in your terminal,

2026
[11]

Pai, K., Shah, P., and Patel, H

URLhttps://arxiv.org/abs/2506.04255. Pai, K., Shah, P., and Patel, H. Naamse: Framework for evolutionary security evaluation of agents,

work page arXiv
[12]

Qin, Y ., Liang, S., Ye, Y ., Zhu, K., Yan, L., Lu, Y ., Lin, Y ., Cong, X., Tang, X., Qian, B., et al

URL https://arxiv.org/abs/2602.07391. Qin, Y ., Liang, S., Ye, Y ., Zhu, K., Yan, L., Lu, Y ., Lin, Y ., Cong, X., Tang, X., Qian, B., et al. Toolllm: Facilitat- ing large language models to master 16000+ real-world apis. InThe twelfth international conference on learning representations,

work page arXiv
[13]

Shi, G., Du, H., Wang, Z., Liang, X., Liu, W., Bian, S., and Guan, Z

URL https: //fordelstudios.com/research/mcp- production-engineering-guide. Shi, G., Du, H., Wang, Z., Liang, X., Liu, W., Bian, S., and Guan, Z. Sok: Trust-authorization mismatch in llm agent interactions.arXiv preprint arXiv:2512.06914,

work page arXiv
[14]

The illusion of role separation: Hidden shortcuts in llm role learning (and how to fix them).arXiv preprint arXiv:2505.00626,

Wang, Z., Jiang, Y ., Yu, J., and Huang, H. The illusion of role separation: Hidden shortcuts in llm role learning (and how to fix them).arXiv preprint arXiv:2505.00626,

work page arXiv
[15]

ReAct: Synergizing Reasoning and Acting in Language Models

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y . React: Synergizing reasoning and acting in language models.arXiv preprint arXiv:2210.03629,

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Injeca- gent: Benchmarking indirect prompt injections in tool- integrated large language model agents

Zhan, Q., Liang, Z., Ying, Z., and Kang, D. Injeca- gent: Benchmarking indirect prompt injections in tool- integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024, pp. 10471–10506,

2024
[17]

S., and Kang, D

Zhan, Q., Fang, R., Panchal, H. S., and Kang, D. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. InFindings of the Association for Computational Linguistics: NAACL 2025, pp. 7101– 7117,

2025
[18]

status":

6 V ATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation A. Mutation Examples We show the seed error payload and representative mutations across each dimension. Each example shows the exact JSON returned to the agent as a tool error response. A.1. Seed Payload The starting point for all mutations. Imperative framing, no autho...

2024

[1] [1]

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

URL https://arxiv.org/abs/ 2604.20994. Cartagena, A. and Teixeira, A. Mind the gap: Text safety does not transfer to tool-call safety in llm agents.arXiv preprint arXiv:2602.16943,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems

Chang, Z., Li, M., Jia, X., Wang, J., Huang, Y ., Jiang, Z., Liu, Y ., and Wang, Q. One shot dominance: Knowl- edge poisoning attack on retrieval-augmented generation systems.arXiv preprint arXiv:2505.11548,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

In-browser llm-guided fuzzing for real-time prompt injection testing in agentic ai browsers.arXiv preprint arXiv:2510.13543,

Cohen, A. In-browser llm-guided fuzzing for real-time prompt injection testing in agentic ai browsers.arXiv preprint arXiv:2510.13543,

work page arXiv

[4] [4]

MCP adoption statistics 2026: Model context protocol, April

Digital Applied Team. MCP adoption statistics 2026: Model context protocol, April

2026

[5] [5]

Geng, Y ., Li, H., Mu, H., Han, X., Baldwin, T., Abend, O., Hovy, E., and Frermann, L

URL https://www.digitalapplied.com/blog/ mcp-adoption-statistics-2026-model- context-protocol. Geng, Y ., Li, H., Mu, H., Han, X., Baldwin, T., Abend, O., Hovy, E., and Frermann, L. Control illusion: The failure of instruction hierarchies in large language mod- els. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pp. 30816–30824,

2026

[6] [6]

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

URL https://arxiv.org/abs/2512.23128. 5 V ATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation Lin, J., Zhou, Z., Zheng, Z., Liu, S., Xu, T., Chen, Y ., and Chen, E. Vigil: Defending llm agents against tool stream injection via verify-before-commit.arXiv preprint arXiv:2601.05755,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Liu, Y ., Wang, W., Feng, R., Zhang, Y ., Xu, G., Deng, G., Li, Y ., and Zhang, L

URL https://arxiv.org/abs/2406.03807. Liu, Y ., Wang, W., Feng, R., Zhang, Y ., Xu, G., Deng, G., Li, Y ., and Zhang, L. Agent skills in the wild: An empirical study of security vulnerabilities at scale.arXiv preprint arXiv:2601.10338,

work page arXiv

[8] [8]

Model Context Protocol

URL https://arxiv.org/abs/ 2601.17549. Model Context Protocol. Model context protocol specifi- cation. https://modelcontextprotocol.io/ specification/2025-11-25, nov

work page arXiv 2025

[9] [9]

Version 2025-11-

URL https://modelcontextprotocol.io/ specification/2025-11-25. Version 2025-11-

2025

[10] [10]

Accessed: 2026-05-06. OpenAI. codex: Lightweight coding agent that runs in your terminal,

2026

[11] [11]

Pai, K., Shah, P., and Patel, H

URLhttps://arxiv.org/abs/2506.04255. Pai, K., Shah, P., and Patel, H. Naamse: Framework for evolutionary security evaluation of agents,

work page arXiv

[12] [12]

Qin, Y ., Liang, S., Ye, Y ., Zhu, K., Yan, L., Lu, Y ., Lin, Y ., Cong, X., Tang, X., Qian, B., et al

URL https://arxiv.org/abs/2602.07391. Qin, Y ., Liang, S., Ye, Y ., Zhu, K., Yan, L., Lu, Y ., Lin, Y ., Cong, X., Tang, X., Qian, B., et al. Toolllm: Facilitat- ing large language models to master 16000+ real-world apis. InThe twelfth international conference on learning representations,

work page arXiv

[13] [13]

Shi, G., Du, H., Wang, Z., Liang, X., Liu, W., Bian, S., and Guan, Z

URL https: //fordelstudios.com/research/mcp- production-engineering-guide. Shi, G., Du, H., Wang, Z., Liang, X., Liu, W., Bian, S., and Guan, Z. Sok: Trust-authorization mismatch in llm agent interactions.arXiv preprint arXiv:2512.06914,

work page arXiv

[14] [14]

The illusion of role separation: Hidden shortcuts in llm role learning (and how to fix them).arXiv preprint arXiv:2505.00626,

Wang, Z., Jiang, Y ., Yu, J., and Huang, H. The illusion of role separation: Hidden shortcuts in llm role learning (and how to fix them).arXiv preprint arXiv:2505.00626,

work page arXiv

[15] [15]

ReAct: Synergizing Reasoning and Acting in Language Models

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y . React: Synergizing reasoning and acting in language models.arXiv preprint arXiv:2210.03629,

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Injeca- gent: Benchmarking indirect prompt injections in tool- integrated large language model agents

Zhan, Q., Liang, Z., Ying, Z., and Kang, D. Injeca- gent: Benchmarking indirect prompt injections in tool- integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024, pp. 10471–10506,

2024

[17] [17]

S., and Kang, D

Zhan, Q., Fang, R., Panchal, H. S., and Kang, D. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. InFindings of the Association for Computational Linguistics: NAACL 2025, pp. 7101– 7117,

2025

[18] [18]

status":

6 V ATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation A. Mutation Examples We show the seed error payload and representative mutations across each dimension. Each example shows the exact JSON returned to the agent as a tool error response. A.1. Seed Payload The starting point for all mutations. Imperative framing, no autho...

2024