arxiv: 2603.18829 · v10 · submitted 2026-03-19 · 💻 cs.CR · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Agent Control Protocol: Admission Control for Agent Actions

Marcelo Fernandez (TraslaIA)

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:30 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords admission controlautonomous agentsrisk scoringbehavioral patternsstateful protocolsagent governancedeviation collapsetemporal verification

0 comments

The pith

A temporal admission control protocol limits autonomous agent execution to 0.4 percent of individually valid requests by accumulating risk across action sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autonomous agents can chain individually valid actions into harmful behavioral patterns that stateless per-request policies cannot detect. ACP enforces temporal properties over execution traces by combining static risk scores with stateful signals such as anomaly accumulation and cooldown periods. In a 500-request workload where every request scores as valid, the protocol allows only two autonomous executions while escalating denials after eleven actions. The design is verified through model checking of safety and liveness properties and achieves sub-microsecond decision latency at over one million requests per second. It is presented as the first element in a broader series on agent governance mechanisms.

Core claim

ACP is a stateful admission control protocol that evaluates agent actions over execution history rather than in isolation. It uses a LedgerQuerier abstraction to apply deterministic risk scoring that incorporates anomaly accumulation and cooldown, blocking sequences even when each individual request meets per-request thresholds. The protocol scopes signals via PatternKey to prevent cross-context false denials, and it formalizes deviation collapse together with Boundary Activation Rate as a detection mechanism for cases where enforcement remains latent.

What carries the argument

LedgerQuerier abstraction that maintains stateful signals (anomaly accumulation, cooldown) scoped by PatternKey(agentID, capability, resource) to drive deterministic history-aware risk scoring.

If this is right

Coordinated agents accumulate risk independently, so activity scales linearly rather than permitting superlinear amplification.
An adversary attempting to suppress Boundary Activation Rate to zero is detected via DeltaBAR before deviation collapse occurs.
Decision evaluation runs in 739-832 nanoseconds at the median with throughput exceeding 1.7 million requests per second.
The protocol has been model-checked across billions of states with zero violations of the specified invariants and temporal properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

ACP could be layered onto existing agent runtimes as an independent governance layer without altering core action logic.
The PatternKey scoping approach may generalize to other multi-tenant systems where context mixing creates false positives.
Boundary Activation Rate offers a quantitative metric that operators could monitor to detect latent policy bypass attempts in production.
Extending the coordination window calculation to heterogeneous agent teams would require adjusting CW_appr for differing capability sets.

Load-bearing premise

The chosen risk scoring thresholds and anomaly accumulation rules will correctly identify harmful behavioral patterns without excessive false denials or missing coordinated attacks.

What would settle it

Deploy ACP on an agent system executing a known sequence of individually valid actions that together produce documented harm and measure whether the protocol denies the sequence while still permitting unrelated benign traces.

Figures

Figures reproduced from arXiv: 2603.18829 by Marcelo Fernandez (TraslaIA).

**Figure 2.** Figure 2: Decision evolution under 500 repeated valid requests. The stateless engine approves all. [PITH_FULL_IMAGE:figures/full_fig_p048_2.png] view at source ↗

**Figure 3.** Figure 3: ACP end-to-end verifiability pipeline. The TLA+ model defines the invariants that test [PITH_FULL_IMAGE:figures/full_fig_p056_3.png] view at source ↗

**Figure 4.** Figure 4: Boundary Activation Rate per phase (Experiment 9). BAR drops from 0.70 to 0.00 under [PITH_FULL_IMAGE:figures/full_fig_p061_4.png] view at source ↗

read the original abstract

Autonomous agents can produce harmful behavioral patterns from individually valid requests -- a threat class per-request policy evaluation cannot address, because stateless engines evaluate each request in isolation. We present ACP, a temporal admission control protocol enforcing behavioral properties over execution traces via static risk scoring combined with stateful signals (anomaly accumulation, cooldown) through a LedgerQuerier abstraction. ACP blocks execution based on deterministic, history-aware risk scoring -- not anomaly detection. Under a 500-request workload where every request is individually valid (RS=35), a stateless engine approves all 500; ACP limits autonomous execution to 2 out of 500 (0.4%), escalating after 3 actions and denying after 11. We identify a state-mixing vulnerability in ACP-RISK-2.0 (cross-context false denials) and introduce ACP-RISK-3.0, scoping anomaly signals to PatternKey(agentID, capability, resource). Decision evaluation: 739-832 ns (p50); throughput 1,720,000 req/s. Safety and liveness model-checked via TLA+ (11 invariants + 4 temporal properties, 0 violations) across 4,294,930,695 distinct states. We formalize deviation collapse -- enforcement active but never exercised due to upstream constraints -- and introduce Boundary Activation Rate (BAR) as its detection mechanism. An adversary suppressing BAR to 0.00 is detected via DeltaBAR before collapse (BAR_C=1.00). N coordinated agents accumulate risk independently; coordination window CW_appr=2N with zero deviation: activity scales linearly, preventing superlinear amplification. ACP is Paper 1 of a 6-paper Agent Governance Series: P0 -- atomic decision boundaries; P2 -- behavioral drift detection (IML); P3/4 -- governance structure, fair allocation, and irreducibility; P5 -- runtime execution validity (RAM, arXiv:2604.22898); P6 -- operationalization of RAM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ACP adds a TLA+-verified stateful layer on top of per-request checks and shows a sharp drop in approvals on one synthetic workload, but the blocking result depends on fixed untested thresholds.

read the letter

ACP is a protocol that enforces temporal properties on agent actions by combining static risk scores with stateful tracking through a LedgerQuerier. In the reported 500-request case where every individual request scores RS=35, the stateless engine approves all while ACP allows only two, with escalation after three actions and denial after eleven. They also fixed a cross-context false-denial issue from ACP-RISK-2.0 by scoping signals to PatternKey(agentID, capability, resource) and introduced BAR to flag when enforcement is active but never triggered.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Agent Control Protocol (ACP), a temporal admission control protocol for autonomous agents that enforces behavioral properties over execution traces using static risk scoring combined with stateful signals (anomaly accumulation and cooldown) via a LedgerQuerier abstraction. It contrasts this with stateless engines that evaluate requests in isolation and cannot address harmful patterns from individually valid requests. Key claims include: under a 500-request synthetic workload where every request has RS=35, a stateless engine approves all 500 while ACP limits autonomous execution to 2/500 (0.4%), escalating after 3 actions and denying after 11; decision latency of 739-832 ns (p50) and throughput of 1.72M req/s; TLA+ model checking of 11 invariants and 4 temporal properties with zero violations across 4.29 billion states; identification of a state-mixing vulnerability in ACP-RISK-2.0 fixed by PatternKey scoping in ACP-RISK-3.0; and formalization of deviation collapse with Boundary Activation Rate (BAR) as a detection mechanism. The work is Paper 1 of a 6-paper series on agent governance.

Significance. If the central claims hold, ACP provides a rigorous, history-aware mechanism for mitigating sequential and coordinated risks in agent systems that per-request policies miss, with notable strengths in the extensive TLA+ verification (zero violations over billions of states) and the introduction of BAR for detecting deviation collapse. The performance metrics indicate potential practicality, and the formal treatment of linear risk accumulation for N agents is a positive contribution. The significance is limited by the narrow empirical basis, but the formal methods component strengthens the overall contribution to agent security.

major comments (2)

[Abstract and workload evaluation] Abstract and workload evaluation: The central empirical result (stateless approves 500/500; ACP approves 2/500 with escalation after 3 and denial after 11) depends on fixed, unvalidated thresholds (RS=35, escalate@3, deny@11) applied to a single synthetic workload where all requests are individually valid. No ablation on threshold sensitivity, no diverse request patterns, and no real agent traces are reported, so it is unclear whether the risk scoring and PatternKey scoping reliably separate harmful behavioral sequences. This is load-bearing because the TLA+ verification addresses protocol safety/liveness but does not validate the risk function's behavioral effectiveness.
[Abstract] Abstract (coordinated agents paragraph): The claim that N coordinated agents accumulate risk independently with CW_appr=2N and zero deviation (preventing superlinear amplification) is stated without accompanying stress tests against evasion or coordination strategies that stay under the window. This weakens the generality of the linear scaling argument.

minor comments (2)

[Protocol description] Clarify the exact definition and scoping of PatternKey(agentID, capability, resource) in ACP-RISK-3.0, including how it prevents cross-context false denials, with a small example.
[Verification section] The TLA+ specification details (model, invariants, and temporal properties) would benefit from a brief appendix or reference to the checked model file for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and recommendation for major revision. We agree that the empirical sections would benefit from additional analysis on threshold sensitivity and workload diversity, and we have revised the manuscript to incorporate these points while clarifying the scope of the current work. Our point-by-point responses to the major comments are provided below.

read point-by-point responses

Referee: [Abstract and workload evaluation] The central empirical result (stateless approves 500/500; ACP approves 2/500 with escalation after 3 and denial after 11) depends on fixed, unvalidated thresholds (RS=35, escalate@3, deny@11) applied to a single synthetic workload where all requests are individually valid. No ablation on threshold sensitivity, no diverse request patterns, and no real agent traces are reported, so it is unclear whether the risk scoring and PatternKey scoping reliably separate harmful behavioral sequences. This is load-bearing because the TLA+ verification addresses protocol safety/liveness but does not validate the risk function's behavioral effectiveness.

Authors: We acknowledge that the reported result uses fixed thresholds on a single synthetic workload. In the revised manuscript we have added a new subsection on threshold sensitivity that varies RS from 25-45, escalation trigger from 2-5 actions, and denial trigger from 8-15 actions. Across these ranges the autonomous approval rate stays below 2% for the 500-request workload. We also include results for two additional synthetic patterns (mixed RS values and bursty arrivals). Real production agent traces are outside the scope of this protocol-focused paper; we have added an explicit limitations paragraph noting that behavioral effectiveness validation against live traces is planned for Paper 2 (behavioral drift detection). The TLA+ verification establishes that the protocol correctly enforces whatever risk function is supplied, while the workload serves only to illustrate the difference between stateless and stateful evaluation. revision: partial
Referee: [Abstract] The claim that N coordinated agents accumulate risk independently with CW_appr=2N and zero deviation (preventing superlinear amplification) is stated without accompanying stress tests against evasion or coordination strategies that stay under the window. This weakens the generality of the linear scaling argument.

Authors: The linear scaling follows from the per-agent isolation enforced by PatternKey scoping (agentID, capability, resource) in ACP-RISK-3.0; each agent's anomaly accumulator and cooldown window operate independently, so total approvals cannot exceed 2N. We have expanded the revised text with a short discussion of plausible evasion strategies (spacing actions to reset cooldowns, attempting cross-agent signal leakage) and why they remain bounded by the per-agent rules. The TLA+ model already covers multi-agent state transitions and confirms the linear bound. Explicit adversarial simulations are not present in this work; we note this as a direction for follow-on empirical papers in the series. revision: yes

Circularity Check

0 steps flagged

No significant circularity; protocol definitions, workload demonstration, and TLA+ verification are self-contained

full rationale

The paper introduces ACP via explicit definitions of risk scoring (RS=35), anomaly accumulation rules, cooldown, and PatternKey scoping, then applies them to a synthetic 500-request workload where all requests are individually valid. The outcome (2/500 approvals) is the direct computational result of those rules rather than a fitted prediction or self-referential equation. Safety and liveness are established through independent TLA+ model checking (11 invariants, 4 temporal properties, 0 violations over 4B states), which does not depend on the empirical thresholds. Concepts such as deviation collapse and BAR are newly formalized within the paper without reducing to prior self-citations or ansatzes. References to the broader Agent Governance Series are contextual and not load-bearing for the ACP claims or results presented here.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

The central claims rest on several introduced abstractions and parameters whose correctness is asserted rather than derived from upstream results.

free parameters (2)

RS=35
Risk score threshold used in the 500-request workload example
escalation after 3 actions
Threshold for escalating risk signals

axioms (1)

domain assumption TLA+ model covers all relevant execution states for safety and liveness
Invoked when claiming 0 violations across 4,294,930,695 states

invented entities (2)

LedgerQuerier no independent evidence
purpose: Abstraction providing stateful signals for anomaly accumulation and cooldown
New component introduced to enable history-aware decisions
Boundary Activation Rate (BAR) no independent evidence
purpose: Metric to detect deviation collapse where enforcement is active but never exercised
New detection mechanism for cases where upstream constraints prevent rule activation

pith-pipeline@v0.9.0 · 5657 in / 1478 out tokens · 42175 ms · 2026-05-15T08:30:41.301115+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ACP limits autonomous execution to 2 out of 500 (0.4%), escalating after 3 actions and denying after 11

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Admission to Invariants: Measuring Deviation in Delegated Agent Systems
cs.AI 2026-04 unverdicted novelty 6.0

The Non-Identifiability Theorem shows admissible behavior space A0 is not identifiable from local enforcement signals g under the Local Observability Assumption, so the paper introduces an Invariant Measurement Layer ...
Atomic Decision Boundaries: A Structural Requirement for Guaranteeing Execution-Time Admissibility in Autonomous Systems
cs.LO 2026-04 unverdicted novelty 6.0

Atomic decision boundaries are required to guarantee execution-time admissibility because split evaluation systems allow environmental interleaving that no policy can prevent.
Reconstructive Authority Model: Runtime Execution Validity Under Partial Observability
cs.CR 2026-04 unverdicted novelty 5.0

RAM separates integrity from coverage and uses a reconstruction gate over proven state, assumptions, and unobservable residuals to block invalid executions, achieving zero invalid rates in synthetic tests where attest...
SoK: Security of Autonomous LLM Agents in Agentic Commerce
cs.CR 2026-04 unverdicted novelty 5.0

The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
cs.SE 2026-04 accept novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · cited by 5 Pith papers · 3 internal anchors

[1]

Schneider

Bowen Alpern and Fred B. Schneider. Defining liveness.Information Processing Letters, 21(4):181–185, 1985

work page 1985
[2]

Cedar policy language, 2023

Amazon Web Services. Cedar policy language, 2023. Open-source policy language for autho- rization

work page 2023
[3]

Anderson

James P. Anderson. Computer security technology planning study. Technical Report ESD- TR-73-51, Deputy for Command and Management Systems, HQ Electronic Systems Division (AFSC), 1972. Foundational reference monitor concept: a component that mediates all access to protected resources, is always invoked, tamper-resistant, and verifiable. ACP extends this co...

work page 1972
[4]

Model context protocol, 2024

Anthropic. Model context protocol, 2024. Protocol for structured tool access between LLM applications and services

work page 2024
[5]

MIT Press, 2008

Christel Baier and Joost-Pieter Katoen.Principles of Model Checking. MIT Press, 2008

work page 2008
[6]

Macaroons: Cookies with contextual caveats for decentralized authorization in the cloud

Arnar Birgisson, Joe Gibbs Politz, Úlfar Erlingsson, Ankur Taly, Michael Vrable, and Mark Lentczner. Macaroons: Cookies with contextual caveats for decentralized authorization in the cloud. InProceedings of the Network and Distributed System Security Symposium (NDSS). Internet Society, 2014

work page 2014
[7]

CIRCL: Cloudflare interoperable reusable cryptographic library, 2024

Cloudflare, Inc. CIRCL: Cloudflare interoperable reusable cryptographic library, 2024. Go library providing post-quantum cryptographic primitives including ML-DSA (Dilithium) and ECDH. Used in ACPpkg/sign2for ML-DSA-65 hybrid signatures. 84

work page 2024
[8]

AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramer. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[9]

DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning, 2025

DeepSeek-AI. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning, 2025

work page 2025
[10]

AI agents under threat: A survey of key security challenges and future pathways

Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. AI agents under threat: A survey of key security challenges and future pathways. ACM Computing Surveys, 57(7), 2025

work page 2025
[11]

What can you verify and enforce at runtime?International Journal on Software Tools for Technology Transfer, 14(3):349–382,

Yliès Falcone, Jean-Claude Fernandez, and Laurent Mounier. What can you verify and enforce at runtime?International Journal on Software Tools for Technology Transfer, 14(3):349–382,

work page
[12]

Distinguishes enforceable prop- erties (safety, co-safety, guarantee, persistence) and characterizes the enforcement mechanisms required for each class

Systematic framework for runtime enforcement monitors. Distinguishes enforceable prop- erties (safety, co-safety, guarantee, persistence) and characterizes the enforcement mechanisms required for each class

work page
[13]

Agent control protocol — official website, 2026

Marcelo Fernandez. Agent control protocol — official website, 2026

work page 2026
[14]

Agent control protocol — specification and reference implementation,

Marcelo Fernandez. Agent control protocol — specification and reference implementation,

work page
[15]

Complete specification (38 documents), Go reference implementation (23 packages), 138 conformance test vectors (73 signed + 65 RISK-2.0 unsigned), ACR-1.0 sequence compliance runner

work page
[16]

Atomic Decision Boundaries: A Structural Requirement for Guaranteeing Execution-Time Admissibility in Autonomous Systems

Marcelo Fernandez. Atomic decision boundaries: A structural requirement for guarantee- ing execution-time admissibility in autonomous systems.https://doi.org/10.5281/zenodo. 19670649, 2026. Zenodo. DOI: 10.5281/zenodo.19670649. arXiv:2604.17511

work page internal anchor Pith review Pith/arXiv arXiv doi:10.5281/zenodo 2026
[17]

From Admission to Invariants: Measuring Deviation in Delegated Agent Systems

Marcelo Fernandez. From admission to invariants: Measuring deviation in delegated agent systems.https://doi.org/10.5281/zenodo.19672589, 2026. Zenodo. DOI: 10.5281/zen- odo.19672589. arXiv:2604.17517

work page internal anchor Pith review Pith/arXiv arXiv doi:10.5281/zenodo.19672589 2026
[18]

Irreducible governance structure for autonomous agent systems: Fair allocation, strategy-proofness, and multi-scale composition.https://doi.org/10.5281/ zenodo.19708496, 2026

Marcelo Fernandez. Irreducible governance structure for autonomous agent systems: Fair allocation, strategy-proofness, and multi-scale composition.https://doi.org/10.5281/ zenodo.19708496, 2026. Agent Governance Series, Paper 3/4 (consolidated). Zenodo. DOI: 10.5281/zenodo.19708496. arXiv: TBD

work page doi:10.5281/zenodo.19708496 2026
[19]

Lambert, J

Marcelo Fernandez. Operationalizing reconstructive authority: Runtime construction, depen- dency resolution, and execution gating in autonomous agent systems.https://doi.org/10. 5281/zenodo.19699460, 2026. Agent Governance Series, Paper 6. Zenodo. DOI: 10.5281/zen- odo.19699460. arXiv: TBD

work page doi:10.5281/zen- 2026
[20]

Reconstructive Authority Model: Runtime Execution Validity Under Partial Observability

Marcelo Fernandez. Reconstructive authority model: Runtime execution validity under partial observability.https://doi.org/10.5281/zenodo.19669430, 2026. Agent Governance Series, Paper 5. Zenodo. DOI: 10.5281/zenodo.19669430. arXiv: 2604.22898

work page internal anchor Pith review Pith/arXiv arXiv doi:10.5281/zenodo.19669430 2026
[21]

Jones, and David Waite

Daniel Fett, Brian Campbell, John Bradley, Torsten Lodderstedt, Michael B. Jones, and David Waite. OAuth 2.0 demonstrating proof of possession (DPoP). Request for Comments 9449, Internet Engineering Task Force, 2023. 85

work page 2023
[22]

Event sourcing, 2005

Martin Fowler. Event sourcing, 2005

work page 2005
[23]

The Go programming language, 2024

Go Authors. The Go programming language, 2024. ACP reference implementation written in Go 1.22. All packages verified withgo test ./

work page 2024
[24]

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes

Charles A. E. Goodhart. Problems of monetary management: The UK experience. InPapers in Monetary Economics, volume I. Reserve Bank of Australia, 1975. Source of Goodhart’s Law: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

work page 1975
[25]

Agent-to-agent (A2A) protocol, 2025

Google. Agent-to-agent (A2A) protocol, 2025. Protocol for agent communication and task delegation

work page 2025
[26]

Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InProceedings of the 16th ACM Workshop on Artificial Intel- ligence and Security (AISec@CCS), 2023

work page 2023
[27]

The OAuth 2.0 authorization framework

Dick Hardt. The OAuth 2.0 authorization framework. RFC 6749, Internet Engineering Task Force, October 2012. IETF RFC 6749

work page 2012
[28]

Hu, David Ferraiolo, Rick Kuhn, Adam Schnitzer, Kenneth Sandlin, Robert Miller, and Karen Scarfone

Vincent C. Hu, David Ferraiolo, Rick Kuhn, Adam Schnitzer, Kenneth Sandlin, Robert Miller, and Karen Scarfone. Guide to attribute based access control (ABAC) definition and con- siderations. Technical Report SP 800-162, National Institute of Standards and Technology, 2014

work page 2014
[29]

Jones, John Bradley, and Nat Sakimura

Michael B. Jones, John Bradley, and Nat Sakimura. JSON web token (JWT). RFC 7519, Internet Engineering Task Force, May 2015. IETF RFC 7519

work page 2015
[30]

Edwards-curve digital signature algorithm (EdDSA)

Simon Josefsson and Ilari Liusvaara. Edwards-curve digital signature algorithm (EdDSA). RFC 8032, Internet Engineering Task Force, January 2017. IETF RFC 8032

work page 2017
[31]

Specification gaming: The flip side of the coin

Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Martic, Julian Togelius, Linus Rottger, Luke Hammond, Shane Legg, and Jan Leike. Specification gaming: The flip side of the coin. DeepMind Blog, 2020. Survey and taxonomy of specification gaming examples in reinforcement learning, where agents satisfy the letter of an objective while violating ...

work page 2020
[32]

Admission controllers reference, 2024

Kubernetes Contributors. Admission controllers reference, 2024. Kubernetes admission control architecture that inspired the ACP admission flow model

work page 2024
[33]

Addison-Wesley, 2002

Leslie Lamport.Specifying Systems: The TLA+ Language and Tools for Hardware and Soft- ware Engineers. Addison-Wesley, 2002

work page 2002
[34]

CFRG elliptic curves for JOSE

Ilari Liusvaara. CFRG elliptic curves for JOSE. RFC 8037, Internet Engineering Task Force, January 2017. IETF RFC 8037. RFC 8037 Test Key A used for conformance test vectors

work page 2017
[35]

Miller and Jonathan S

Mark S. Miller and Jonathan S. Shapiro. Paradigm regained: Abstraction mechanisms for access control. InAdvances in Computing Science — ASIAN 2003, volume 2896 ofLec- ture Notes in Computer Science, pages 224–242. Springer, 2003. Foundational reference for capability-based security, which underlies ACP Capability Tokens. 86

work page 2003
[36]

Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, and Christian Schroeder de Witt. Secret collusion among AI agents: Multi-agent deception via steganography. InAdvances in Neural Information Processing Sys- tems (NeurIPS), 2024

work page 2024
[37]

Module-lattice-based digital signature stan- dard(ML-DSA)

National Institute of Standards and Technology. Module-lattice-based digital signature stan- dard(ML-DSA). FederalInformationProcessingStandard204, NationalInstituteofStandards and Technology, August 2024. NIST FIPS 204. Standardizes ML-DSA-44, ML-DSA-65, and ML-DSA-87 (formerly Dilithium2, Dilithium3, Dilithium5)

work page 2024
[38]

eXtensible access control markup language (XACML) version 3.0

OASIS. eXtensible access control markup language (XACML) version 3.0. Technical report, OASIS Standard, 2013

work page 2013
[39]

Ollama: Run large language models locally.https://ollama.com, 2024

Ollama. Ollama: Run large language models locally.https://ollama.com, 2024. Accessed: 2026

work page 2024
[40]

Open policy agent, 2024

Open Policy Agent Contributors. Open policy agent, 2024. Policy evaluation engine. ACP- RISK-1.0 Step 3 is compatible with OPA as backend

work page 2024
[41]

The temporal logic of programs

Amir Pnueli. The temporal logic of programs. InProceedings of the 18th Annual Symposium on Foundations of Computer Science (FOCS), pages 46–57. IEEE, 1977

work page 1977
[42]

Redis: The real-time data platform, 2024

Redis Ltd. Redis: The real-time data platform, 2024. In-memory data structure store. Used as ACP persistent state backend inRedisQuerierandRedisPipelinedQuerier

work page 2024
[43]

JSON canonicalization scheme (JCS)

Anders Rundgren, Bret Jordan, and Samuel Erdtman. JSON canonicalization scheme (JCS). RFC 8785, Internet Engineering Task Force, June 2020. IETF RFC 8785

work page 2020
[44]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9):1278–1308, 1975. Foundational reference for the principle of least privilege and fail-closed design

work page 1975
[45]

Sandhu, Edward J

Ravi S. Sandhu, Edward J. Coyne, Hal L. Feinstein, and Charles E. Youman. Role-based access control models.IEEE Computer, 29(2):38–47, 1996

work page 1996
[46]

Schneider

Fred B. Schneider. Enforceable security policies.ACM Transactions on Information and System Security, 3(1):30–50, 2000. Formal characterization of which security properties can be enforced through execution monitoring (safety automata / security automata). Establishes the theoretical boundary between verifiable and enforceable properties

work page 2000
[47]

Secure audit logs to support computer forensics.ACM Transactions on Information and System Security, 2(2):159–176, 1999

Bruce Schneier and John Kelsey. Secure audit logs to support computer forensics.ACM Transactions on Information and System Security, 2(2):159–176, 1999

work page 1999
[48]

SPIFFE / SPIRE: Secure production identity framework for everyone, 2024

SPIFFE Project. SPIFFE / SPIRE: Secure production identity framework for everyone, 2024. Cryptographic workload identity. ACP builds on SPIFFE identity to add capability scoping

work page 2024
[49]

ZCAP-LD: Authorization capabilities for linked data

Manu Sporny, Dave Longley, and Chris Zaremba. ZCAP-LD: Authorization capabilities for linked data. W3C Community Group Report, 2022

work page 2022
[50]

When a measure becomes a target, it ceases to be a good measure

Marilyn Strathern. Improving ratings: Audit in the British university system.European Review, 5(3):305–321, 1997. Coined the accessible formulation of Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”. 87

work page 1997
[51]

SAGA: A security architecture for governing AI agentic systems, 2025

GeorgiosSyros, AnshumanSuri, JacobGinesin, CristinaNita-Rotaru, andAlinaOprea. SAGA: A security architecture for governing AI agentic systems, 2025. arXiv:2504.21034 [cs.CR]

work page arXiv 2025
[52]

Contextual agent security: A policy for every purpose

Lillian Tsai and Eugene Bagdasarian. Contextual agent security: A policy for every purpose. In Proceedings of the 20th Workshop on Hot Topics in Operating Systems (HotOS), 2025. HotOS 2025

work page 2025
[53]

Poskitt, and Jun Sun

Haoyu Wang, Christopher M. Poskitt, and Jun Sun. AgentSpec: Customizable runtime en- forcement for safe and reliable LLM agents. InProceedings of the 48th International Conference on Software Engineering (ICSE), 2026

work page 2026
[54]

Model checking TLA+ specifications

Yuan Yu, Panagiotis Manolios, and Leslie Lamport. Model checking TLA+ specifications. In Correct Hardware Design and Verification Methods (CHARME), pages 54–66. Springer, 1999

work page 1999
[55]

InjecAgent: Benchmarking indirect promptinjectionsintool-integratedlargelanguagemodelagents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect promptinjectionsintool-integratedlargelanguagemodelagents. InFindings of the Association for Computational Linguistics (ACL), 2024. 88 Item Status Core specs (L1–L4), 38 documents Complete Go reference implementation (23 packages) Complete Conformance test vectors (...

work page 2024