pith. sign in

arxiv: 2605.18414 · v1 · pith:LNFZZRR7new · submitted 2026-05-18 · 💻 cs.CR · cs.AI

Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control

Pith reviewed 2026-05-20 09:40 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords LLM agentstool access controlaccess controladversarial robustnessproxy enforcementprompt engineering limitationsABACagent security
0
0 comments X

The pith

LLM agents select unauthorized tools when visible despite instructions, but a proxy enforcing access control at discovery and invocation eliminates them entirely

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models acting as agents will select unauthorized tools from their visible registry even when told not to, in adversarial conditions. It demonstrates a proxy layer that removes unauthorized tools from the context before the model sees them and verifies every call before execution. This method drove the unauthorized invocation rate to zero across three models and 150 tasks in four attack categories, with less than 50 milliseconds of added latency. Prompt restrictions alone cut the rate by only 11 to 18 percentage points. Reliable security for agentic systems therefore depends on removing the opportunity at the architecture level instead of hoping the model follows rules.

Core claim

The central claim is that when unauthorized tools remain visible in an agent's context window, models select and invoke them despite explicit instructions to the contrary. The proposed governed MCP proxy enforces attribute-based access control by filtering tools out of the discovery phase and adding an invocation-time check, resulting in complete prevention of unauthorized calls without significant performance cost.

What carries the argument

The MCP proxy, which applies attribute-based access control at tool discovery by removing unauthorized tools from the context and at tool invocation by blocking disallowed calls.

If this is right

  • Agentic systems can be secured against tool misuse without depending on perfect model obedience.
  • The added latency from the proxy remains low enough for practical use at under 50 milliseconds median.
  • Prompt engineering provides only partial protection and leaves residual risk in adversarial settings.
  • Architectural controls apply consistently across different model architectures and sizes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Deploying such proxies might allow organizations to manage tool permissions centrally without changing individual agent prompts.
  • This approach could be combined with other runtime monitors to address a wider range of agent behaviors.
  • Future work might test whether similar hiding techniques help with other context-based risks like prompt injection.

Load-bearing premise

That unauthorized tool selection is driven primarily by their presence in the visible context window rather than other model behaviors, and that the proxy can be added without interfering with legitimate uses or creating new vulnerabilities.

What would settle it

A single instance where an adversarial prompt causes an unauthorized tool to be invoked successfully despite the proxy's checks would show the method does not fully eliminate the risk.

Figures

Figures reproduced from arXiv: 2605.18414 by Rohith Uppala.

Figure 1
Figure 1. Figure 1: Governed MCP Proxy architecture. (1) Discovery: the proxy verifies the agent’s JWT, enforces ABAC policy, and queries only attribute-matching tools from the registry — unauthorized tools are never returned to the model context. (2) Invocation: a second ABAC check at call time ensures hallucinated or injected tool names cannot bypass the discovery filter. 4.2 Conditions We evaluate three conditions per mode… view at source ↗
Figure 2
Figure 2. Figure 2: UIR across models and conditions. Gov￾erned enforces 0% by construction regardless of model. Prompted UIR varies widely across models, from 4.0% to 37.0%. Role escalation (C) is the most dangerous attack type, reaching 96.0% for Claude Haiku 3.5 un￾filtered — the highest of any category across all models. Multi-step deception (D) is the most tractable under prompting: Llama 3.1 8B and Claude Haiku 3.5 both… view at source ↗
read the original abstract

Large language models increasingly operate as autonomous agents that select and invoke tools from large registries. We identify a critical gap: when unauthorized tools are visible in an agent's context, models select them in adversarial scenarios -- even when explicitly instructed otherwise. We propose a governed MCP proxy that enforces attribute-based access control (ABAC) at two points: tool discovery, where unauthorized tools are removed from the model's context window, and tool invocation, where a second check blocks any unauthorized call. Across three models (Qwen 2.5 7B, Llama 3.1 8B, Claude Haiku 3.5) and 150 adversarial tasks spanning four attack categories, our proxy reduces unauthorized invocation rate (UIR) to 0% while adding under 50ms median latency. Prompt-based restrictions reduce UIR by only 11--18 percentage points, leaving substantial residual risk. Our results show that architectural enforcement -- not prompting -- is necessary for reliable tool access control in deployed agentic systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that prompt-based restrictions fail to prevent LLMs from selecting unauthorized tools in adversarial agentic scenarios even when explicitly instructed otherwise. It proposes a governed MCP proxy enforcing ABAC at tool discovery (by removing unauthorized tools from context) and at invocation (via a second check), reporting that this reduces unauthorized invocation rate (UIR) to 0% across three models (Qwen 2.5 7B, Llama 3.1 8B, Claude Haiku 3.5) and 150 adversarial tasks in four attack categories, while adding under 50 ms median latency; prompt restrictions achieve only 11-18 percentage point reductions.

Significance. If the results are robust, the work would be significant for LLM security and agentic systems by demonstrating that architectural enforcement is required for reliable tool access control rather than relying on prompting. The introduction of the MCP Proxy as a concrete mechanism and the direct experimental comparison to prompt baselines provide a practical contribution, though the absence of data on legitimate-use impact limits immediate deployability claims.

major comments (3)
  1. [Abstract] Abstract: The central claim of 0% UIR (and superiority over prompts) rests on experiments described only at summary level with no detailed methodology, dataset construction, statistical analysis, or error bars; this prevents verification of whether the result is robust and is load-bearing for the conclusion that architectural enforcement is necessary.
  2. [Abstract] Abstract / Evaluation: No data are reported on authorized-task success rates or false-positive blocks under the ABAC policy, which directly undermines the assumption that the proxy can be deployed without breaking legitimate tool use.
  3. [Abstract] Abstract: The manuscript provides no security review or analysis of new attack surfaces introduced by the proxy (e.g., direct calls bypassing the proxy, misconfiguration, or context leakage), which is load-bearing for recommending architectural enforcement in deployed systems.
minor comments (1)
  1. [Abstract] The acronym 'MCP' and the precise architecture of the proxy would benefit from an explicit definition and diagram on first use to improve readability for readers unfamiliar with the mechanism.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: The central claim of 0% UIR (and superiority over prompts) rests on experiments described only at summary level with no detailed methodology, dataset construction, statistical analysis, or error bars; this prevents verification of whether the result is robust and is load-bearing for the conclusion that architectural enforcement is necessary.

    Authors: The abstract summarizes the key results, but the full manuscript provides the detailed methodology, dataset construction for the 150 adversarial tasks across four attack categories, model specifications, and evaluation protocol in the Evaluation section. We will revise the abstract to include a concise reference to these details and add error bars or confidence intervals to the reported UIR figures in the revision. revision: yes

  2. Referee: No data are reported on authorized-task success rates or false-positive blocks under the ABAC policy, which directly undermines the assumption that the proxy can be deployed without breaking legitimate tool use.

    Authors: We agree this is an important omission for assessing deployability. The current evaluation focuses on security effectiveness against adversarial tasks. In the revised manuscript we will add experiments measuring authorized-task success rates and any false-positive blocking under the ABAC policy to quantify impact on legitimate use. revision: yes

  3. Referee: The manuscript provides no security review or analysis of new attack surfaces introduced by the proxy (e.g., direct calls bypassing the proxy, misconfiguration, or context leakage), which is load-bearing for recommending architectural enforcement in deployed systems.

    Authors: We acknowledge the value of analyzing new attack surfaces. The manuscript centers on the proxy's enforcement mechanism and empirical results. We will add a Security Considerations subsection discussing potential bypass vectors, misconfiguration risks, and context leakage, along with recommended mitigations such as authenticated proxy deployment. A comprehensive formal analysis or red-team evaluation is noted as future work. revision: partial

Circularity Check

0 steps flagged

No circularity in experimental evaluation

full rationale

The paper reports direct empirical measurements of unauthorized invocation rates (UIR) across three models and 150 tasks, comparing the MCP proxy against prompt-based baselines. No equations, fitted parameters, derivations, or self-citations are invoked to support the central claim; results are presented as observed outcomes from controlled experiments rather than quantities that reduce to inputs by construction. The evaluation is self-contained against the reported benchmarks with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review provides no explicit free parameters, mathematical axioms, or invented entities beyond the high-level proposal of the MCP proxy itself.

invented entities (1)
  • MCP Proxy no independent evidence
    purpose: Enforce ABAC at tool discovery and invocation for LLM agents
    Proposed architectural component introduced to address the identified gap

pith-pipeline@v0.9.0 · 5699 in / 1116 out tokens · 32474 ms · 2026-05-20T09:40:32.226957+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages · 1 internal anchor

  1. [1]

    Advances in Neural Information Processing Systems , year =

    Toolformer: Language Models Can Teach Themselves to Use Tools , author =. Advances in Neural Information Processing Systems , year =

  2. [2]

    Qin, Yujia and Liang, Shengding and Ye, Yining and Zhu, Kunlun and Yan, Lan and Lu, Yaxi and Lin, Yankai and Cong, Xin and Tang, Xiangru and Qian, Bill and Zhao, Sihan and Hong, Lauren and Tian, Runchu and Xie, Ruobing and Zhou, Jie and Gerstein, Mark and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , booktitle =

  3. [3]

    2024 , url =

    Model Context Protocol , author =. 2024 , url =

  4. [4]

    Taming Various Privilege Escalation in

    Ji, Zimo and Wu, Daoyuan and Jiang, Wenyuan and Ma, Pingchuan and Li, Zongjie and Gao, Yudong and Wang, Shuai and Li, Yingjiu , journal =. Taming Various Privilege Escalation in

  5. [5]

    Formal Policy Enforcement for Real-World Agentic Systems

    Formal Policy Enforcement for Real-World Agentic Systems , author =. arXiv preprint arXiv:2602.16708 , year =

  6. [6]

    Solver-Aided Verification of Policy Compliance in Tool-Augmented

    Winston, Cailin and Winston, Claris and Just, Ren. Solver-Aided Verification of Policy Compliance in Tool-Augmented. arXiv preprint arXiv:2603.20449 , year =

  7. [7]

    Prompt Injection Attack to Tool Selection in

    Shi, Jiawen and Yuan, Zenghui and Tie, Guiyao and Zhou, Pan and Gong, Neil Zhenqiang and Sun, Lichao , journal =. Prompt Injection Attack to Tool Selection in

  8. [8]

    Not What You've Signed Up For: Compromising Real-World

    Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , journal =. Not What You've Signed Up For: Compromising Real-World

  9. [9]

    Guide to Attribute Based Access Control (

    Hu, Vincent C and Ferraiolo, David and Kuhn, Rick and Schnitzer, Adam and Sandlin, Kenneth and Miller, Robert and Scarfone, Karen , institution =. Guide to Attribute Based Access Control (