Solver-Aided Verification of Policy Compliance in Tool-Augmented

Winston, Cailin, Winston, Claris, Just, Ren · arXiv 2603.20449

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control

cs.CR · 2026-05-18 · unverdicted · novelty 6.0

MCP proxy enforces ABAC for LLM tool access by filtering discovery and invocation, achieving 0% unauthorized invocation rate across tested models and attacks where prompts reduce risk by only 11-18 points.

Owner-Harm: A Missing Threat Model for AI Agent Safety

cs.CR · 2026-04-20 · unverdicted · novelty 6.0

Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic criminal harm.

citing papers explorer

Showing 2 of 2 citing papers.

Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control cs.CR · 2026-05-18 · unverdicted · none · ref 6
MCP proxy enforces ABAC for LLM tool access by filtering discovery and invocation, achieving 0% unauthorized invocation rate across tested models and attacks where prompts reduce risk by only 11-18 points.
Owner-Harm: A Missing Threat Model for AI Agent Safety cs.CR · 2026-04-20 · unverdicted · none · ref 11
Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic criminal harm.

Solver-Aided Verification of Policy Compliance in Tool-Augmented

fields

years

verdicts

representative citing papers

citing papers explorer