Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control
Pith reviewed 2026-05-20 09:40 UTC · model grok-4.3
The pith
LLM agents select unauthorized tools when visible despite instructions, but a proxy enforcing access control at discovery and invocation eliminates them entirely
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that when unauthorized tools remain visible in an agent's context window, models select and invoke them despite explicit instructions to the contrary. The proposed governed MCP proxy enforces attribute-based access control by filtering tools out of the discovery phase and adding an invocation-time check, resulting in complete prevention of unauthorized calls without significant performance cost.
What carries the argument
The MCP proxy, which applies attribute-based access control at tool discovery by removing unauthorized tools from the context and at tool invocation by blocking disallowed calls.
If this is right
- Agentic systems can be secured against tool misuse without depending on perfect model obedience.
- The added latency from the proxy remains low enough for practical use at under 50 milliseconds median.
- Prompt engineering provides only partial protection and leaves residual risk in adversarial settings.
- Architectural controls apply consistently across different model architectures and sizes.
Where Pith is reading between the lines
- Deploying such proxies might allow organizations to manage tool permissions centrally without changing individual agent prompts.
- This approach could be combined with other runtime monitors to address a wider range of agent behaviors.
- Future work might test whether similar hiding techniques help with other context-based risks like prompt injection.
Load-bearing premise
That unauthorized tool selection is driven primarily by their presence in the visible context window rather than other model behaviors, and that the proxy can be added without interfering with legitimate uses or creating new vulnerabilities.
What would settle it
A single instance where an adversarial prompt causes an unauthorized tool to be invoked successfully despite the proxy's checks would show the method does not fully eliminate the risk.
Figures
read the original abstract
Large language models increasingly operate as autonomous agents that select and invoke tools from large registries. We identify a critical gap: when unauthorized tools are visible in an agent's context, models select them in adversarial scenarios -- even when explicitly instructed otherwise. We propose a governed MCP proxy that enforces attribute-based access control (ABAC) at two points: tool discovery, where unauthorized tools are removed from the model's context window, and tool invocation, where a second check blocks any unauthorized call. Across three models (Qwen 2.5 7B, Llama 3.1 8B, Claude Haiku 3.5) and 150 adversarial tasks spanning four attack categories, our proxy reduces unauthorized invocation rate (UIR) to 0% while adding under 50ms median latency. Prompt-based restrictions reduce UIR by only 11--18 percentage points, leaving substantial residual risk. Our results show that architectural enforcement -- not prompting -- is necessary for reliable tool access control in deployed agentic systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that prompt-based restrictions fail to prevent LLMs from selecting unauthorized tools in adversarial agentic scenarios even when explicitly instructed otherwise. It proposes a governed MCP proxy enforcing ABAC at tool discovery (by removing unauthorized tools from context) and at invocation (via a second check), reporting that this reduces unauthorized invocation rate (UIR) to 0% across three models (Qwen 2.5 7B, Llama 3.1 8B, Claude Haiku 3.5) and 150 adversarial tasks in four attack categories, while adding under 50 ms median latency; prompt restrictions achieve only 11-18 percentage point reductions.
Significance. If the results are robust, the work would be significant for LLM security and agentic systems by demonstrating that architectural enforcement is required for reliable tool access control rather than relying on prompting. The introduction of the MCP Proxy as a concrete mechanism and the direct experimental comparison to prompt baselines provide a practical contribution, though the absence of data on legitimate-use impact limits immediate deployability claims.
major comments (3)
- [Abstract] Abstract: The central claim of 0% UIR (and superiority over prompts) rests on experiments described only at summary level with no detailed methodology, dataset construction, statistical analysis, or error bars; this prevents verification of whether the result is robust and is load-bearing for the conclusion that architectural enforcement is necessary.
- [Abstract] Abstract / Evaluation: No data are reported on authorized-task success rates or false-positive blocks under the ABAC policy, which directly undermines the assumption that the proxy can be deployed without breaking legitimate tool use.
- [Abstract] Abstract: The manuscript provides no security review or analysis of new attack surfaces introduced by the proxy (e.g., direct calls bypassing the proxy, misconfiguration, or context leakage), which is load-bearing for recommending architectural enforcement in deployed systems.
minor comments (1)
- [Abstract] The acronym 'MCP' and the precise architecture of the proxy would benefit from an explicit definition and diagram on first use to improve readability for readers unfamiliar with the mechanism.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: The central claim of 0% UIR (and superiority over prompts) rests on experiments described only at summary level with no detailed methodology, dataset construction, statistical analysis, or error bars; this prevents verification of whether the result is robust and is load-bearing for the conclusion that architectural enforcement is necessary.
Authors: The abstract summarizes the key results, but the full manuscript provides the detailed methodology, dataset construction for the 150 adversarial tasks across four attack categories, model specifications, and evaluation protocol in the Evaluation section. We will revise the abstract to include a concise reference to these details and add error bars or confidence intervals to the reported UIR figures in the revision. revision: yes
-
Referee: No data are reported on authorized-task success rates or false-positive blocks under the ABAC policy, which directly undermines the assumption that the proxy can be deployed without breaking legitimate tool use.
Authors: We agree this is an important omission for assessing deployability. The current evaluation focuses on security effectiveness against adversarial tasks. In the revised manuscript we will add experiments measuring authorized-task success rates and any false-positive blocking under the ABAC policy to quantify impact on legitimate use. revision: yes
-
Referee: The manuscript provides no security review or analysis of new attack surfaces introduced by the proxy (e.g., direct calls bypassing the proxy, misconfiguration, or context leakage), which is load-bearing for recommending architectural enforcement in deployed systems.
Authors: We acknowledge the value of analyzing new attack surfaces. The manuscript centers on the proxy's enforcement mechanism and empirical results. We will add a Security Considerations subsection discussing potential bypass vectors, misconfiguration risks, and context leakage, along with recommended mitigations such as authenticated proxy deployment. A comprehensive formal analysis or red-team evaluation is noted as future work. revision: partial
Circularity Check
No circularity in experimental evaluation
full rationale
The paper reports direct empirical measurements of unauthorized invocation rates (UIR) across three models and 150 tasks, comparing the MCP proxy against prompt-based baselines. No equations, fitted parameters, derivations, or self-citations are invoked to support the central claim; results are presented as observed outcomes from controlled experiments rather than quantities that reduce to inputs by construction. The evaluation is self-contained against the reported benchmarks with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
invented entities (1)
-
MCP Proxy
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems , year =
Toolformer: Language Models Can Teach Themselves to Use Tools , author =. Advances in Neural Information Processing Systems , year =
-
[2]
Qin, Yujia and Liang, Shengding and Ye, Yining and Zhu, Kunlun and Yan, Lan and Lu, Yaxi and Lin, Yankai and Cong, Xin and Tang, Xiangru and Qian, Bill and Zhao, Sihan and Hong, Lauren and Tian, Runchu and Xie, Ruobing and Zhou, Jie and Gerstein, Mark and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , booktitle =
- [3]
-
[4]
Taming Various Privilege Escalation in
Ji, Zimo and Wu, Daoyuan and Jiang, Wenyuan and Ma, Pingchuan and Li, Zongjie and Gao, Yudong and Wang, Shuai and Li, Yingjiu , journal =. Taming Various Privilege Escalation in
-
[5]
Formal Policy Enforcement for Real-World Agentic Systems
Formal Policy Enforcement for Real-World Agentic Systems , author =. arXiv preprint arXiv:2602.16708 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Solver-Aided Verification of Policy Compliance in Tool-Augmented
Winston, Cailin and Winston, Claris and Just, Ren. Solver-Aided Verification of Policy Compliance in Tool-Augmented. arXiv preprint arXiv:2603.20449 , year =
-
[7]
Prompt Injection Attack to Tool Selection in
Shi, Jiawen and Yuan, Zenghui and Tie, Guiyao and Zhou, Pan and Gong, Neil Zhenqiang and Sun, Lichao , journal =. Prompt Injection Attack to Tool Selection in
-
[8]
Not What You've Signed Up For: Compromising Real-World
Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , journal =. Not What You've Signed Up For: Compromising Real-World
-
[9]
Guide to Attribute Based Access Control (
Hu, Vincent C and Ferraiolo, David and Kuhn, Rick and Schnitzer, Adam and Sandlin, Kenneth and Miller, Robert and Scarfone, Karen , institution =. Guide to Attribute Based Access Control (
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.