Recognition: 1 theorem link
· Lean TheoremSentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems
Pith reviewed 2026-05-13 20:10 UTC · model grok-4.3
The pith
A formal calculus and protocol for AI delegation chains enforces seven properties to block all attacks while preserving traceability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the Delegation Chain Calculus formalizes six deterministic properties (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) plus one probabilistic property (intent preservation), and that the Intent-Preserving Delegation Protocol enforces all seven at runtime through a non-LLM Delegation Authority Service. This produces 100 percent combined true-positive rate at zero false-positive rate on DelegationBench v4, blocks every one of 30 black-box attacks, and keeps deterministic properties unbroken under stress while the remaining rules constrain any evaded intent case to only the
What carries the argument
The Delegation Chain Calculus that defines the seven properties, enforced at runtime by the Intent-Preserving Delegation Protocol through a non-LLM Delegation Authority Service.
If this is right
- Deterministic properties remain unbroken under adversarial stress testing and are confirmed by TLA+ model checking across 2.7 million states.
- Even when intent preservation is evaded, the other six properties restrict the adversary to permitted API calls, conformant outputs, and bounded cascades.
- Fine-tuning the natural-language-inference model on government delegation examples raises intent-detection true-positive rate from 1.7 percent to 88.3 percent.
- The three-point verification lifecycle covers all 10 attack categories and 13 federal domains with zero false positives.
- Properties P1 and P3 through P7 are mechanically verified with no violations found.
Where Pith is reading between the lines
- The separation of deterministic mechanical checks from probabilistic intent checks could be reused in other AI security settings where full determinism is impossible.
- The framework might reduce the need for LLM-based policy guards that themselves create new attack surfaces.
- Deployment in commercial multi-agent systems outside federal contexts could follow the same structure with only domain-specific policy definitions changed.
- Live monitoring of real delegation traffic could test whether the properties actually prevent policy violations beyond the benchmark scenarios.
Load-bearing premise
A non-LLM Delegation Authority Service can enforce all seven properties at runtime in real deployments without introducing new attack surfaces or implementation errors.
What would settle it
An observed attack chain that violates one of the deterministic properties such as cascade containment or scope-action conformance yet still passes the three-point verification lifecycle.
Figures
read the original abstract
When Agent A delegates to Agent B, which invokes Tool C on behalf of User X, no existing framework can answer: whose authorization chain led to this action, and where did it violate policy? This paper introduces SentinelAgent, a formal framework for verifiable delegation chains in federal multi-agent AI systems. The Delegation Chain Calculus (DCC) defines seven properties - six deterministic (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) and one probabilistic (intent preservation) - with four meta-theorems and one proposition establishing the practical infeasibility of deterministic intent verification. The Intent-Preserving Delegation Protocol (IPDP) enforces all seven properties at runtime through a non-LLM Delegation Authority Service. A three-point verification lifecycle achieves 100% combined TPR at 0% FPR on DelegationBench v4 (516 scenarios, 10 attack categories, 13 federal domains). Under black-box adversarial conditions, the DAS blocks 30/30 attacks with 0 false positives. Deterministic properties are unbreakable under adversarial stress testing; intent verification degrades to 13% against sophisticated paraphrasing. Fine-tuning the NLI model on 190 government delegation examples improves P2 from 1.7% to 88.3% TPR (5-fold cross-validated, F1=82.1%). Properties P1, P3-P7 are mechanically verified via TLA+ model checking across 2.7 million states with zero violations. Even when intent verification is evaded, the remaining six properties constrain the adversary to permitted API calls, conformant outputs, traceable actions, bounded cascades, and compliant behavior.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SentinelAgent, a formal framework for verifiable delegation chains in federal multi-agent AI systems. It defines the Delegation Chain Calculus (DCC) with seven properties—six deterministic (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) and one probabilistic (intent preservation)—along with four meta-theorems and one proposition on the infeasibility of deterministic intent verification. The Intent-Preserving Delegation Protocol (IPDP) is enforced at runtime by a non-LLM Delegation Authority Service (DAS). The paper reports a three-point verification lifecycle achieving 100% combined TPR at 0% FPR on DelegationBench v4 (516 scenarios, 10 attack categories, 13 federal domains), with the DAS blocking 30/30 black-box attacks at 0 false positives. Deterministic properties are verified via TLA+ model checking across 2.7 million states with zero violations; fine-tuning an NLI model on 190 government examples improves the probabilistic property from 1.7% to 88.3% TPR.
Significance. If the runtime DAS faithfully implements the TLA+ specification, this would constitute a meaningful contribution to securing multi-agent AI systems in regulated environments by providing mechanically verified deterministic constraints that remain effective even when probabilistic intent checks are evaded. The separation of deterministic and probabilistic components, combined with explicit TLA+ verification and a dedicated benchmark, offers a structured approach that could influence standards for delegation in federal AI deployments. The reported perfect scores under adversarial conditions and the fallback guarantees are notable strengths if the implementation correspondence holds.
major comments (3)
- [IPDP and DAS runtime enforcement] The abstract and the section describing the IPDP and DAS claim that the non-LLM service enforces all seven properties at runtime, with TLA+ establishing zero violations for P1, P3-P7 across 2.7M states. However, no correspondence proof, pseudocode, or explicit mapping between the verified model and the deployed runtime checks is provided. This assumption is load-bearing for the 100% TPR/0% FPR result on DelegationBench v4 and the 30/30 attack blocking claim, as any implementation discrepancy could permit undetected violations.
- [Benchmark evaluation] In the evaluation section on DelegationBench v4, the construction of the 516 scenarios—including the precise mapping from the 10 attack categories and 13 federal domains to the three-point verification lifecycle—is insufficiently detailed. Without this, the generalizability of the perfect detection rates cannot be assessed, and it remains unclear whether the benchmark inadvertently aligns with the proposed properties.
- [Intent preservation evaluation] The probabilistic component (P2) is shown to degrade to 13% TPR under sophisticated paraphrasing, with fine-tuning on 190 examples raising it to 88.3% TPR (F1=82.1%). The paper should clarify how this component integrates with the deterministic fallback without introducing new attack surfaces or performance issues that could undermine the overall lifecycle in real deployments.
minor comments (2)
- [DCC definitions] The definitions of the seven DCC properties would benefit from additional formal notation or pseudocode to make the distinctions between deterministic and probabilistic elements more precise for readers.
- [Related work] The manuscript would be strengthened by including more references to prior work on formal verification of multi-agent protocols and delegation mechanisms in security literature.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which highlights important areas for improving clarity and rigor in the manuscript. We address each major comment point-by-point below, agreeing to revisions that strengthen the presentation of the IPDP-DAS correspondence, benchmark details, and probabilistic-deterministic integration. These changes will be incorporated in the revised version without altering the core technical claims or results.
read point-by-point responses
-
Referee: The abstract and the section describing the IPDP and DAS claim that the non-LLM service enforces all seven properties at runtime, with TLA+ establishing zero violations for P1, P3-P7 across 2.7M states. However, no correspondence proof, pseudocode, or explicit mapping between the verified model and the deployed runtime checks is provided. This assumption is load-bearing for the 100% TPR/0% FPR result on DelegationBench v4 and the 30/30 attack blocking claim, as any implementation discrepancy could permit undetected violations.
Authors: We agree that an explicit mapping would strengthen the manuscript. The TLA+ specification defines the exact state invariants and transition rules that the DAS runtime must uphold; the deployed checks are direct, non-LLM implementations of these rules (e.g., authority narrowing is enforced by a simple set-inclusion check on delegation tokens). In the revision we will add (1) a new subsection with pseudocode for the four core DAS enforcement functions and (2) a correspondence table that maps each TLA+ action and invariant to the corresponding runtime predicate and its implementation location. This addition will make the load-bearing assumption fully traceable while preserving the reported verification results. revision: yes
-
Referee: In the evaluation section on DelegationBench v4, the construction of the 516 scenarios—including the precise mapping from the 10 attack categories and 13 federal domains to the three-point verification lifecycle—is insufficiently detailed. Without this, the generalizability of the perfect detection rates cannot be assessed, and it remains unclear whether the benchmark inadvertently aligns with the proposed properties.
Authors: We acknowledge the need for greater transparency. The 516 scenarios were constructed by enumerating all combinations of the 10 attack categories with the 13 federal domains, then assigning each to one of the three verification points (pre-delegation, runtime, post-action) according to the property it targets. In the revised manuscript we will expand the benchmark subsection with: an explicit table showing the scenario counts per category-domain pair, the assignment rule to the three-point lifecycle, and two representative scenario examples per attack category. This will allow readers to evaluate coverage and generalizability directly. revision: yes
-
Referee: The probabilistic component (P2) is shown to degrade to 13% TPR under sophisticated paraphrasing, with fine-tuning on 190 examples raising it to 88.3% TPR (F1=82.1%). The paper should clarify how this component integrates with the deterministic fallback without introducing new attack surfaces or performance issues that could undermine the overall lifecycle in real deployments.
Authors: The integration is intentionally asymmetric: the deterministic properties (P1, P3–P7) are always enforced first and provide a hard safety net; P2 is consulted only when the deterministic checks pass. If P2 flags an intent violation the action is blocked; if P2 is evaded the remaining properties still restrict the action to permitted scopes, schemas, and cascades. In the revision we will add a dedicated paragraph describing the exact control flow, measured latency overhead (<5 ms per check on the evaluated hardware), and why the isolated NLI component does not create new attack surfaces—the deterministic layer remains independent and is not bypassed by any P2 outcome. We will also report the measured false-positive rate of the combined system under the same paraphrasing attacks. revision: yes
Circularity Check
No significant circularity; derivation relies on external TLA+ verification and separate empirical benchmark
full rationale
The paper defines the seven properties (six deterministic, one probabilistic) in the Delegation Chain Calculus, states four meta-theorems plus one proposition on intent verification infeasibility, mechanically verifies P1/P3-P7 via TLA+ model checking (2.7M states, zero violations), and reports empirical results (100% TPR/0% FPR on DelegationBench v4 plus NLI fine-tuning) on held-out data. The TLA+ step uses an external model checker independent of the benchmark; the benchmark evaluation is a separate runtime test of the implemented DAS. No equations reduce to their inputs by construction, no parameters are fitted then relabeled as predictions, and no self-citation chains or ansatzes are invoked as load-bearing. The central claims therefore remain self-contained against external benchmarks rather than circular.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math TLA+ model checking can exhaustively verify the six deterministic properties across 2.7 million states
- domain assumption A non-LLM Delegation Authority Service can enforce all seven properties at runtime without new vulnerabilities
invented entities (2)
-
Delegation Chain Calculus (DCC)
no independent evidence
-
Intent-Preserving Delegation Protocol (IPDP)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The Delegation Chain Calculus (DCC) defines seven properties—six deterministic (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) and one probabilistic (intent preservation)—with four meta-theorems...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification
DEMM defines four executable evidence-sufficiency categories plus a conflicting category for agentic AI decisions and rolls per-property verdicts into a five-level maturity rubric.
Reference graph
Works this paper leans on
-
[1]
CivicShield: A cross-domain defense-in-depth framework for securing government-facing AI chatbots,
K. S. R. Patil, “CivicShield: Seven-Layer Defense-in-Depth for Govern- ment AI Chatbot Security,” arXiv:2603.29062, 2026
-
[2]
RAGShield: Detecting Numerical Claim Manipulation in Government RAG Systems
K. S. R. Patil, “RAGShield: Provenance-Verified Defense-in-Depth Against Knowledge Base Poisoning in Government RAG Systems,” arXiv:2604.00387, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[3]
Detecting and Disrupting AI-Orchestrated Cyber Espi- onage,
Anthropic, “Detecting and Disrupting AI-Orchestrated Cyber Espi- onage,” Anthropic Blog, Nov. 2025
work page 2025
- [4]
- [5]
-
[6]
AI Agent Standards Initiative,
NIST, “AI Agent Standards Initiative,” NIST CAISI, Feb. 2026
work page 2026
-
[7]
Agentic Trust Framework Survey,
CSA and Strata Identity, “Agentic Trust Framework Survey,” Feb. 2026
work page 2026
-
[8]
Top 10 for Agentic Applications,
OW ASP Foundation, “Top 10 for Agentic Applications,” Dec. 2025
work page 2025
-
[9]
J. Woodruff, M. Savage, and J. Kindervag, “Agentic Trust Framework,” Cloud Security Alliance, Feb. 2026
work page 2026
-
[10]
B. Nassi, O. Brodt, E. Feldman, and B. Schneier, “The Promptware Kill Chain,” arXiv:2601.09625, Jan. 2026
-
[11]
N. Shapira et al., “Agents of Chaos,” arXiv:2602.20021, Feb. 2026
work page internal anchor Pith review arXiv 2026
-
[12]
Stac: When innocent tools form dangerous chains to jailbreak llm agents,
J.-J. Li et al., “STAC: Sequential Tool Attack Chaining,” arXiv:2509.25624, 2025
-
[13]
Agent-in-the-Middle: Red-Teaming LLM Multi-Agent Systems via Communication Attacks,
P. He et al., “Agent-in-the-Middle: Red-Teaming LLM Multi-Agent Systems via Communication Attacks,” arXiv:2502.14847, 2025
-
[14]
David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via RL,
S. Nellessen et al., “David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via RL,” arXiv:2602.02395, 2026
-
[15]
Preprint at https://arxiv.org/abs/ 2601.11893
Z. Ji et al., “Taming Various Privilege Escalation in LLM-Based Agent Systems,” arXiv:2601.11893, 2026
-
[16]
Shieldagent: Shielding agents via verifiable safety policy reasoning
Z. Chen et al., “ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning,” arXiv:2503.22738, 2025
-
[17]
Agent Behavioral Contracts: Formal Specification and Runtime Enforcement,
V . P. Bhardwaj et al., “Agent Behavioral Contracts: Formal Specification and Runtime Enforcement,” arXiv:2602.22302, 2026
-
[18]
Z. Ying et al., “Uncovering Security Threats and Architecting Defenses in Autonomous Agents,” arXiv:2603.12644, 2026
-
[19]
Authenticated delegation and authorized ai agents,
T. South et al., “Authenticated Delegation and Authorized AI Agents,” arXiv:2501.09674, 2025
-
[20]
SP 800-53 Rev 5.2.0: Security and Privacy Controls,
NIST, “SP 800-53 Rev 5.2.0: Security and Privacy Controls,” Aug. 2025
work page 2025
-
[21]
Agent Identity Protocol: Invocation-Bound Capability To- kens for Delegation Chains,
S. Prakash, “Agent Identity Protocol: Invocation-Bound Capability To- kens for Delegation Chains,” arXiv:2603.24775, 2026
-
[22]
Authenticated Workflows: A Trust Layer for Enterprise Agentic AI,
S. Rajagopalan and S. Rao, “Authenticated Workflows: A Trust Layer for Enterprise Agentic AI,” arXiv:2602.10465, 2026
-
[23]
FormalJudge: Neuro-Symbolic Agentic Oversight via Dafny and Z3,
Y . Zhou et al., “FormalJudge: Neuro-Symbolic Agentic Oversight via Dafny and Z3,” arXiv:2602.11136, 2026
-
[24]
ILION: Deterministic Execution Gate for Agentic AI,
A. Chitan, “ILION: Deterministic Execution Gate for Agentic AI,” arXiv:2603.13247, 2026
-
[25]
Agentic JWT: Dual-Faceted Intent Token Binding,
A. Goswami, “Agentic JWT: Dual-Faceted Intent Token Binding,” arXiv:2509.13597, 2025
-
[26]
Delegated Authorization with ASTRA: Task-Based Access Control,
N. El Helou et al., “Delegated Authorization with ASTRA: Task-Based Access Control,” arXiv:2510.26702, 2025
-
[27]
Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems
A. Allegrini et al., “Formalizing Safety and Security Properties for Agentic AI,” arXiv:2510.14133, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[28]
SoK: Attack Surface of Agentic AI,
Z. Wang et al., “SoK: Attack Surface of Agentic AI,” arXiv:2603.22928, 2026
-
[29]
MCP-38: Threat Taxonomy for Model Context Protocol,
Y . Shen et al., “MCP-38: Threat Taxonomy for Model Context Protocol,” arXiv:2603.18063, 2026
-
[30]
GroupGuard: Collusive Attack Defense for Multi-Agent Systems,
Z. Tao et al., “GroupGuard: Collusive Attack Defense for Multi-Agent Systems,” arXiv:2603.13940, 2026
-
[31]
Privacy-R1: Privacy-Aware Multi-LLM Agent Collaboration via Reinforcement Learning
X. Hui et al., “PrivacyPAD: RL-Based Privacy-Aware Delegation,” arXiv:2510.16054, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.