Capability Gates Are Not Authorization: Confused-Deputy Failures in LLM Agent Frameworks

David Mellafe Zuvic

arxiv: 2606.28679 · v1 · pith:7AR64QWUnew · submitted 2026-06-27 · 💻 cs.CR · cs.AI

Capability Gates Are Not Authorization: Confused-Deputy Failures in LLM Agent Frameworks

David Mellafe Zuvic This is my paper

Pith reviewed 2026-06-30 10:10 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords LLM agentsauthorizationconfused deputytool callingcapability gatingagent frameworksScopeGate

0 comments

The pith

LLM agent frameworks gate tool access by default but do not re-authorize each call's concrete argument values before execution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines tool-using LLM agents that read untrusted content while holding side-effecting tools such as payments and APIs. It audits LangChain, LlamaIndex, and the Stripe Agent Toolkit to check whether each model-emitted call is re-authorized with its specific values. All three frameworks supply capability gating by default, yet none supplies a deterministic fail-closed per-call value authorization gate. The authors present ScopeGate, a five-stage PDP/PEP consisting of scope, authorization, money ceiling, idempotency, and default deny, which blocks an unauthorized payout that succeeds under the frameworks' defaults.

Core claim

Across pinned public-source commits, all three frameworks provide capability gating by default, but none provides a deterministic fail-closed per-call value authorization gate by default. An identical unauthorized payout call executes under LangChain's default dispatch and a LlamaIndex proof-of-concept, yet is denied by ScopeGate, which reports zero static bypasses, zero unauthorized attempts over an adaptive run, zero benign false-denies, and full containment on the tested payment-agent scenario.

What carries the argument

ScopeGate, a five-stage PDP/PEP for agent tool calls that performs scope, authorization, money ceiling, idempotency, and default deny checks.

If this is right

The unauthorized payout call with concrete values succeeds under LangChain default dispatch and a LlamaIndex PoC.
ScopeGate denies the same call while recording 0/48 static bypasses and 0/29 unauthorized attempts.
The tested control produces 0/10 benign false-denies and 10/10 containment on the Latam-GPT payment-agent scenario.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Production agent deployments may require explicit per-call value authorization layers in addition to capability exposure.
The confused-deputy pattern could appear in any agent system that lets models emit tool calls after ingesting untrusted data.
ScopeGate's staged checks could be adapted to other side-effecting domains such as email or infrastructure APIs.

Load-bearing premise

The chosen pinned commits, static bypass tests, and adaptive attack runs on LangChain and LlamaIndex are representative of production deployments, and ScopeGate's containment results generalize beyond the tested model classes.

What would settle it

Execute the same unauthorized payout call against a production instance of LangChain or LlamaIndex without ScopeGate and observe whether the call succeeds or is blocked.

Figures

Figures reproduced from arXiv: 2606.28679 by David Mellafe Zuvic.

**Figure 2.** Figure 2: SCOPEGATE re-authorizes each model-emitted tool call before side effects execute. of the conversation. This excludes upstream verified-set poisoning, session-seeded time-of-check/time-of-use policy mutation, and dynamic policy fetch over model-reachable egress. D. Compromise-source independence The boundary does not depend on why the model emits a malicious call. Runtime prompt injection and supply-chain … view at source ↗

**Figure 3.** Figure 3: SCOPEGATE fails closed unless scope, authorization, money, and idempotency checks all pass. A. Invariant For each proposed call decide(tool,args,ctx), SCOPEGATE evaluates the stages in figure 3 in order. Scope. Is this tool governed by policy? Unlisted tools deny by default. This prevents model-discovered tools or misspelled variants from reaching side effects. Authorization. Are value-constrained argument… view at source ↗

read the original abstract

Tool-using LLM agents increasingly read untrusted content while holding side-effecting tools such as payments, email, CRM, and infrastructure APIs, yet common framework defaults still conflate tool exposure with authorization. We audit whether LangChain/LangGraph, LlamaIndex, and the Stripe Agent Toolkit re-authorize each model-emitted call, with concrete argument values, before execution. Across pinned public-source commits, all three provide capability gating by default, but none provides a deterministic fail-closed per-call value authorization gate by default. We introduce ScopeGate, a five-stage PDP/PEP for agent tool calls: scope, authorization, money ceiling, idempotency, and default deny. Evaluation shows the identical unauthorized payout call executes under LangChain's default dispatch (with a companion LlamaIndex PoC) but is denied by ScopeGate; the tested control reports 0/48 static bypasses, 0/29 unauthorized attempts (40-iteration adaptive run), 0/10 benign false-denies, and Latam-GPT payment-agent containment at 10/10. ASR denotes attempted unauthorized action, containment is not a cure, deployment-tier claims are inference over measured model classes, and no CVE is asserted.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper audits three LLM agent frameworks and finds they lack per-call value authorization on tool arguments despite capability gates, then describes ScopeGate as a five-stage fix, but the results are tied to narrow pinned commits and limited model tests.

read the letter

The key takeaway is that LangChain, LlamaIndex, and the Stripe Agent Toolkit all expose tools via capability checks but do not re-authorize the concrete argument values on each model-generated call before execution. This matches the confused deputy pattern when agents ingest untrusted content.

The work applies that classic attack surface to current agent dispatch paths and gives concrete audit outcomes on pinned commits. ScopeGate's stages (scope, authorization, money ceiling, idempotency, default deny) are laid out as a PDP/PEP structure that blocks the tested unauthorized payout while the framework defaults do not. The reported counts (0/48 static bypasses, 0/29 unauthorized attempts, 0/10 false denies, 10/10 containment on the payment-agent case) are presented cleanly.

The soft spot is the narrow test surface. The central claim about default behavior rests on three specific commits plus attack runs limited to LangChain/LlamaIndex and one model class for the adaptive and containment experiments. Production setups often use later commits, forks, or wrappers that could insert argument checks, so the "by default" finding may not transfer. The methodology for constructing the 40-iteration adaptive runs and the static bypass tests is not visible enough in the supplied text to judge coverage.

This is useful for engineers who build or secure LLM agents that call side-effecting APIs. A reader working on authorization layers for agents would get practical examples and a structured proposal to consider.

It deserves peer review. The observation is practical and the proposed mechanism is concrete; referees can push on the evaluation scope and generalization without the core point being incoherent.

Referee Report

3 major / 2 minor

Summary. The paper audits whether LangChain/LangGraph, LlamaIndex, and the Stripe Agent Toolkit re-authorize each model-emitted tool call with concrete argument values before execution. Across pinned public-source commits, all three provide capability gating by default but none supplies a deterministic fail-closed per-call value authorization gate. The authors introduce ScopeGate, a five-stage PDP/PEP (scope, authorization, money ceiling, idempotency, default deny) for agent tool calls, and report that an unauthorized payout call succeeds under LangChain's default dispatch (with a LlamaIndex PoC) but is denied by ScopeGate; the control reports 0/48 static bypasses, 0/29 unauthorized attempts (40-iteration adaptive run), 0/10 benign false-denies, and 10/10 containment on a Latam-GPT payment-agent scenario.

Significance. If the empirical results hold, the work identifies a concrete security gap in widely used LLM agent frameworks where capability exposure is conflated with per-call value authorization, enabling confused-deputy attacks on side-effecting tools. ScopeGate supplies a practical, staged enforcement design whose measured containment (0/29, 10/10) on the tested surface demonstrates a viable mitigation path. The audit of pinned commits and the explicit metrics provide actionable evidence for framework developers and deployers.

major comments (3)

[Evaluation] Evaluation section: the central claim that none of the three frameworks supplies a deterministic fail-closed per-call value authorization gate by default rests on static inspection of three pinned commits plus attack runs limited to LangChain/LlamaIndex and one payment-agent model class, yet no methodology, test-case enumeration, or raw data tables are provided to allow verification of the 0/48 and 0/29 counts or to assess coverage of dispatch paths.
[Framework Audit and ScopeGate Evaluation] § on framework defaults and ScopeGate results: the representativeness assumption—that the chosen pinned commits, static bypass tests, and adaptive attack runs are representative of production deployments—is load-bearing for the 'by default' finding, but the manuscript does not discuss whether later commits, forks, wrappers, or other model families alter the dispatch path to insert argument-level checks.
[ScopeGate Design] ScopeGate description: the five-stage PDP/PEP is presented at a high level without pseudocode, formal policy language, or interface specification, making it impossible to determine whether the reported containment results (0/29, 10/10) are reproducible or depend on unstated implementation choices.

minor comments (2)

[Abstract] Abstract: the sentence 'ASR denotes attempted unauthorized action' introduces an acronym that is not subsequently expanded or used in the provided text.
[Evaluation] The manuscript should clarify whether the 40-iteration adaptive run and the 10/10 Latam-GPT result used the same model class or whether cross-class generalization is claimed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the insightful comments on our manuscript arXiv:2606.28679. We provide point-by-point responses to the major comments below and will revise the manuscript accordingly to address the concerns regarding evaluation details, representativeness, and design specification.

read point-by-point responses

Referee: Evaluation section: the central claim that none of the three frameworks supplies a deterministic fail-closed per-call value authorization gate by default rests on static inspection of three pinned commits plus attack runs limited to LangChain/LlamaIndex and one payment-agent model class, yet no methodology, test-case enumeration, or raw data tables are provided to allow verification of the 0/48 and 0/29 counts or to assess coverage of dispatch paths.

Authors: The evaluation was based on systematic static analysis of the dispatch paths in the pinned commits, followed by dynamic testing. We will expand the Evaluation section in the revision to include a full methodology description, an enumeration of all test cases used for the 48 static bypass attempts and the 29 unauthorized attempts (including the 40-iteration adaptive run), and supplementary tables with raw outcomes. This will enable verification and assessment of dispatch path coverage. The tests were limited to the specified frameworks and model class as described, with the LlamaIndex PoC serving as a cross-framework validation. revision: yes
Referee: § on framework defaults and ScopeGate results: the representativeness assumption—that the chosen pinned commits, static bypass tests, and adaptive attack runs are representative of production deployments—is load-bearing for the 'by default' finding, but the manuscript does not discuss whether later commits, forks, wrappers, or other model families alter the dispatch path to insert argument-level checks.

Authors: Our audit is explicitly scoped to the pinned public commits to ensure exact reproducibility. We will add a new subsection in the revised manuscript discussing the representativeness of these commits, noting that while later commits or custom wrappers could potentially insert additional checks, the fundamental design pattern of capability gating without per-call value authorization was observed consistently across the three frameworks. We will also address how this finding applies to production deployments by referencing common usage patterns. revision: yes
Referee: ScopeGate description: the five-stage PDP/PEP is presented at a high level without pseudocode, formal policy language, or interface specification, making it impossible to determine whether the reported containment results (0/29, 10/10) are reproducible or depend on unstated implementation choices.

Authors: To improve reproducibility, the revised manuscript will include detailed pseudocode for the five-stage PDP/PEP pipeline, a specification of the policy language for defining scopes, authorizations, money ceilings, and idempotency rules, and the interface definitions for the policy decision and enforcement points. These additions will make the implementation choices explicit and allow independent reproduction of the containment results. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical audit without derivation or self-referential fitting

full rationale

The paper is an empirical audit of three frameworks on pinned commits plus description of ScopeGate; it contains no equations, no parameter fitting, no predictions derived from fitted inputs, and no load-bearing self-citations. Central claims rest on direct static inspection and attack runs that are externally falsifiable. No step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities beyond the high-level description of ScopeGate are visible.

invented entities (1)

ScopeGate no independent evidence
purpose: Five-stage PDP/PEP enforcing scope, authorization, money ceiling, idempotency, and default deny on agent tool calls
New mechanism introduced to address the identified gap; no independent evidence provided in abstract.

pith-pipeline@v0.9.1-grok · 5750 in / 1157 out tokens · 24839 ms · 2026-06-30T10:10:48.805116+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 7 canonical work pages · 6 internal anchors

[1]

J. H. Saltzer and M. D. Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9), 1975

1975
[2]

N. Hardy. The confused deputy: or why capabilities might have been invented.ACM SIGOPS Operating Systems Review, 22(4), 1988

1988
[3]

OW ASP Top 10 for LLM Applications 2025

OW ASP Foundation. OW ASP Top 10 for LLM Applications 2025. https: //genai.owasp.org/llm-top-10/

2025
[4]

AML.T0051: Prompt Injection

MITRE ATLAS. AML.T0051: Prompt Injection. https://atlas.mitre.org/ techniques/AML.T0051/
[5]

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. arXiv:2302.12173, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tramer. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. arXiv:2406.13352, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Q. Zhan, Z. Liang, Z. Ying, and D. Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. arXiv:2403.02691, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto. Identifying the risks of LM agents with an LM-emulated sandbox. arXiv:2309.15817, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

I. Evtimov, A. Zharmagambetov, A. Grattafiori, C. Guo, and K. Chaudhuri. W ASP: Benchmarking web agent security against prompt injection attacks. arXiv:2504.18575, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

J. Zhu, K. Tseng, G. Vernik, X. Huang, S. G. Patil, V . Fang, and R. A. Popa. MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents. arXiv:2512.11147, 2025

work page arXiv 2025
[11]

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

T. Debi and W. Zhu. Whispers of Wealth: Red-Teaming Google’s Agent Payments Protocol via Prompt Injection. arXiv:2601.22569, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

LangChain public source repository

LangChain. LangChain public source repository. Pinned audit commit 00ad96c. https://github.com/langchain-ai/langchain
[13]

LangGraph public source repository

LangGraph. LangGraph public source repository. Pinned audit commit bdb323e. https://github.com/langchain-ai/langgraph
[14]

LlamaIndex public source repository

LlamaIndex. LlamaIndex public source repository. Pinned audit version v0.14.23, commit520aa4e. https://github.com/run-llama/llama_index
[15]

Stripe Agent Toolkit public source repository

Stripe. Stripe Agent Toolkit public source repository. Pinned audit com- mits0b4961fandf54c9e6. https://github.com/stripe/agent-toolkit
[16]

Mellafe Zuvic

D. Mellafe Zuvic. Task-aligned prompt injection: A deterministic cross- model susceptibility benchmark across frontier LLMs. Companion manuscript and artifacts, 2026

2026
[17]

Mellafe Zuvic

D. Mellafe Zuvic. ScopeGate Runtime. Reference PDP/PEP implemen- tation for agent tool calls. https://github.com/raceksd-source/scopegate- runtime

[1] [1]

J. H. Saltzer and M. D. Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9), 1975

1975

[2] [2]

N. Hardy. The confused deputy: or why capabilities might have been invented.ACM SIGOPS Operating Systems Review, 22(4), 1988

1988

[3] [3]

OW ASP Top 10 for LLM Applications 2025

OW ASP Foundation. OW ASP Top 10 for LLM Applications 2025. https: //genai.owasp.org/llm-top-10/

2025

[4] [4]

AML.T0051: Prompt Injection

MITRE ATLAS. AML.T0051: Prompt Injection. https://atlas.mitre.org/ techniques/AML.T0051/

[5] [5]

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. arXiv:2302.12173, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tramer. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. arXiv:2406.13352, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Q. Zhan, Z. Liang, Z. Ying, and D. Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. arXiv:2403.02691, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto. Identifying the risks of LM agents with an LM-emulated sandbox. arXiv:2309.15817, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

I. Evtimov, A. Zharmagambetov, A. Grattafiori, C. Guo, and K. Chaudhuri. W ASP: Benchmarking web agent security against prompt injection attacks. arXiv:2504.18575, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

J. Zhu, K. Tseng, G. Vernik, X. Huang, S. G. Patil, V . Fang, and R. A. Popa. MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents. arXiv:2512.11147, 2025

work page arXiv 2025

[11] [11]

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

T. Debi and W. Zhu. Whispers of Wealth: Red-Teaming Google’s Agent Payments Protocol via Prompt Injection. arXiv:2601.22569, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[12] [12]

LangChain public source repository

LangChain. LangChain public source repository. Pinned audit commit 00ad96c. https://github.com/langchain-ai/langchain

[13] [13]

LangGraph public source repository

LangGraph. LangGraph public source repository. Pinned audit commit bdb323e. https://github.com/langchain-ai/langgraph

[14] [14]

LlamaIndex public source repository

LlamaIndex. LlamaIndex public source repository. Pinned audit version v0.14.23, commit520aa4e. https://github.com/run-llama/llama_index

[15] [15]

Stripe Agent Toolkit public source repository

Stripe. Stripe Agent Toolkit public source repository. Pinned audit com- mits0b4961fandf54c9e6. https://github.com/stripe/agent-toolkit

[16] [16]

Mellafe Zuvic

D. Mellafe Zuvic. Task-aligned prompt injection: A deterministic cross- model susceptibility benchmark across frontier LLMs. Companion manuscript and artifacts, 2026

2026

[17] [17]

Mellafe Zuvic

D. Mellafe Zuvic. ScopeGate Runtime. Reference PDP/PEP implemen- tation for agent tool calls. https://github.com/raceksd-source/scopegate- runtime