pith. machine review for the scientific record. sign in

arxiv: 2604.02767 · v1 · submitted 2026-04-03 · 💻 cs.CR · cs.AI· cs.MA

Recognition: 1 theorem link

· Lean Theorem

SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:10 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.MA
keywords multi-agent AIdelegation chainsintent verificationadversarial securityformal verificationpolicy enforcementfederal AI systemsDelegation Chain Calculus
0
0 comments X

The pith

A formal calculus and protocol for AI delegation chains enforces seven properties to block all attacks while preserving traceability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SentinelAgent to solve the problem of verifying authorization when one AI agent delegates tasks to another that then uses tools on a user's behalf. It defines seven required properties in a Delegation Chain Calculus, six of which are deterministic and one probabilistic, and shows how a dedicated non-LLM service can enforce them at runtime. Testing on a benchmark of 516 scenarios across federal domains and attack types yields complete detection with no false alarms, and even when intent checks are bypassed the other rules still limit what an adversary can do. If the approach holds, federal multi-agent systems could safely handle chained actions without losing auditability or policy control.

Core claim

The central claim is that the Delegation Chain Calculus formalizes six deterministic properties (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) plus one probabilistic property (intent preservation), and that the Intent-Preserving Delegation Protocol enforces all seven at runtime through a non-LLM Delegation Authority Service. This produces 100 percent combined true-positive rate at zero false-positive rate on DelegationBench v4, blocks every one of 30 black-box attacks, and keeps deterministic properties unbroken under stress while the remaining rules constrain any evaded intent case to only the

What carries the argument

The Delegation Chain Calculus that defines the seven properties, enforced at runtime by the Intent-Preserving Delegation Protocol through a non-LLM Delegation Authority Service.

If this is right

  • Deterministic properties remain unbroken under adversarial stress testing and are confirmed by TLA+ model checking across 2.7 million states.
  • Even when intent preservation is evaded, the other six properties restrict the adversary to permitted API calls, conformant outputs, and bounded cascades.
  • Fine-tuning the natural-language-inference model on government delegation examples raises intent-detection true-positive rate from 1.7 percent to 88.3 percent.
  • The three-point verification lifecycle covers all 10 attack categories and 13 federal domains with zero false positives.
  • Properties P1 and P3 through P7 are mechanically verified with no violations found.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The separation of deterministic mechanical checks from probabilistic intent checks could be reused in other AI security settings where full determinism is impossible.
  • The framework might reduce the need for LLM-based policy guards that themselves create new attack surfaces.
  • Deployment in commercial multi-agent systems outside federal contexts could follow the same structure with only domain-specific policy definitions changed.
  • Live monitoring of real delegation traffic could test whether the properties actually prevent policy violations beyond the benchmark scenarios.

Load-bearing premise

A non-LLM Delegation Authority Service can enforce all seven properties at runtime in real deployments without introducing new attack surfaces or implementation errors.

What would settle it

An observed attack chain that violates one of the deterministic properties such as cascade containment or scope-action conformance yet still passes the three-point verification lifecycle.

Figures

Figures reproduced from arXiv: 2604.02767 by KrishnaSaiReddy Patil.

Figure 1
Figure 1. Figure 1: DAS architecture and three-point verification lifecycle. P2 checks [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Delegation chain walkthrough. Scope narrows at each step (P1). [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

When Agent A delegates to Agent B, which invokes Tool C on behalf of User X, no existing framework can answer: whose authorization chain led to this action, and where did it violate policy? This paper introduces SentinelAgent, a formal framework for verifiable delegation chains in federal multi-agent AI systems. The Delegation Chain Calculus (DCC) defines seven properties - six deterministic (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) and one probabilistic (intent preservation) - with four meta-theorems and one proposition establishing the practical infeasibility of deterministic intent verification. The Intent-Preserving Delegation Protocol (IPDP) enforces all seven properties at runtime through a non-LLM Delegation Authority Service. A three-point verification lifecycle achieves 100% combined TPR at 0% FPR on DelegationBench v4 (516 scenarios, 10 attack categories, 13 federal domains). Under black-box adversarial conditions, the DAS blocks 30/30 attacks with 0 false positives. Deterministic properties are unbreakable under adversarial stress testing; intent verification degrades to 13% against sophisticated paraphrasing. Fine-tuning the NLI model on 190 government delegation examples improves P2 from 1.7% to 88.3% TPR (5-fold cross-validated, F1=82.1%). Properties P1, P3-P7 are mechanically verified via TLA+ model checking across 2.7 million states with zero violations. Even when intent verification is evaded, the remaining six properties constrain the adversary to permitted API calls, conformant outputs, traceable actions, bounded cascades, and compliant behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces SentinelAgent, a formal framework for verifiable delegation chains in federal multi-agent AI systems. It defines the Delegation Chain Calculus (DCC) with seven properties—six deterministic (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) and one probabilistic (intent preservation)—along with four meta-theorems and one proposition on the infeasibility of deterministic intent verification. The Intent-Preserving Delegation Protocol (IPDP) is enforced at runtime by a non-LLM Delegation Authority Service (DAS). The paper reports a three-point verification lifecycle achieving 100% combined TPR at 0% FPR on DelegationBench v4 (516 scenarios, 10 attack categories, 13 federal domains), with the DAS blocking 30/30 black-box attacks at 0 false positives. Deterministic properties are verified via TLA+ model checking across 2.7 million states with zero violations; fine-tuning an NLI model on 190 government examples improves the probabilistic property from 1.7% to 88.3% TPR.

Significance. If the runtime DAS faithfully implements the TLA+ specification, this would constitute a meaningful contribution to securing multi-agent AI systems in regulated environments by providing mechanically verified deterministic constraints that remain effective even when probabilistic intent checks are evaded. The separation of deterministic and probabilistic components, combined with explicit TLA+ verification and a dedicated benchmark, offers a structured approach that could influence standards for delegation in federal AI deployments. The reported perfect scores under adversarial conditions and the fallback guarantees are notable strengths if the implementation correspondence holds.

major comments (3)
  1. [IPDP and DAS runtime enforcement] The abstract and the section describing the IPDP and DAS claim that the non-LLM service enforces all seven properties at runtime, with TLA+ establishing zero violations for P1, P3-P7 across 2.7M states. However, no correspondence proof, pseudocode, or explicit mapping between the verified model and the deployed runtime checks is provided. This assumption is load-bearing for the 100% TPR/0% FPR result on DelegationBench v4 and the 30/30 attack blocking claim, as any implementation discrepancy could permit undetected violations.
  2. [Benchmark evaluation] In the evaluation section on DelegationBench v4, the construction of the 516 scenarios—including the precise mapping from the 10 attack categories and 13 federal domains to the three-point verification lifecycle—is insufficiently detailed. Without this, the generalizability of the perfect detection rates cannot be assessed, and it remains unclear whether the benchmark inadvertently aligns with the proposed properties.
  3. [Intent preservation evaluation] The probabilistic component (P2) is shown to degrade to 13% TPR under sophisticated paraphrasing, with fine-tuning on 190 examples raising it to 88.3% TPR (F1=82.1%). The paper should clarify how this component integrates with the deterministic fallback without introducing new attack surfaces or performance issues that could undermine the overall lifecycle in real deployments.
minor comments (2)
  1. [DCC definitions] The definitions of the seven DCC properties would benefit from additional formal notation or pseudocode to make the distinctions between deterministic and probabilistic elements more precise for readers.
  2. [Related work] The manuscript would be strengthened by including more references to prior work on formal verification of multi-agent protocols and delegation mechanisms in security literature.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important areas for improving clarity and rigor in the manuscript. We address each major comment point-by-point below, agreeing to revisions that strengthen the presentation of the IPDP-DAS correspondence, benchmark details, and probabilistic-deterministic integration. These changes will be incorporated in the revised version without altering the core technical claims or results.

read point-by-point responses
  1. Referee: The abstract and the section describing the IPDP and DAS claim that the non-LLM service enforces all seven properties at runtime, with TLA+ establishing zero violations for P1, P3-P7 across 2.7M states. However, no correspondence proof, pseudocode, or explicit mapping between the verified model and the deployed runtime checks is provided. This assumption is load-bearing for the 100% TPR/0% FPR result on DelegationBench v4 and the 30/30 attack blocking claim, as any implementation discrepancy could permit undetected violations.

    Authors: We agree that an explicit mapping would strengthen the manuscript. The TLA+ specification defines the exact state invariants and transition rules that the DAS runtime must uphold; the deployed checks are direct, non-LLM implementations of these rules (e.g., authority narrowing is enforced by a simple set-inclusion check on delegation tokens). In the revision we will add (1) a new subsection with pseudocode for the four core DAS enforcement functions and (2) a correspondence table that maps each TLA+ action and invariant to the corresponding runtime predicate and its implementation location. This addition will make the load-bearing assumption fully traceable while preserving the reported verification results. revision: yes

  2. Referee: In the evaluation section on DelegationBench v4, the construction of the 516 scenarios—including the precise mapping from the 10 attack categories and 13 federal domains to the three-point verification lifecycle—is insufficiently detailed. Without this, the generalizability of the perfect detection rates cannot be assessed, and it remains unclear whether the benchmark inadvertently aligns with the proposed properties.

    Authors: We acknowledge the need for greater transparency. The 516 scenarios were constructed by enumerating all combinations of the 10 attack categories with the 13 federal domains, then assigning each to one of the three verification points (pre-delegation, runtime, post-action) according to the property it targets. In the revised manuscript we will expand the benchmark subsection with: an explicit table showing the scenario counts per category-domain pair, the assignment rule to the three-point lifecycle, and two representative scenario examples per attack category. This will allow readers to evaluate coverage and generalizability directly. revision: yes

  3. Referee: The probabilistic component (P2) is shown to degrade to 13% TPR under sophisticated paraphrasing, with fine-tuning on 190 examples raising it to 88.3% TPR (F1=82.1%). The paper should clarify how this component integrates with the deterministic fallback without introducing new attack surfaces or performance issues that could undermine the overall lifecycle in real deployments.

    Authors: The integration is intentionally asymmetric: the deterministic properties (P1, P3–P7) are always enforced first and provide a hard safety net; P2 is consulted only when the deterministic checks pass. If P2 flags an intent violation the action is blocked; if P2 is evaded the remaining properties still restrict the action to permitted scopes, schemas, and cascades. In the revision we will add a dedicated paragraph describing the exact control flow, measured latency overhead (<5 ms per check on the evaluated hardware), and why the isolated NLI component does not create new attack surfaces—the deterministic layer remains independent and is not bypassed by any P2 outcome. We will also report the measured false-positive rate of the combined system under the same paraphrasing attacks. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external TLA+ verification and separate empirical benchmark

full rationale

The paper defines the seven properties (six deterministic, one probabilistic) in the Delegation Chain Calculus, states four meta-theorems plus one proposition on intent verification infeasibility, mechanically verifies P1/P3-P7 via TLA+ model checking (2.7M states, zero violations), and reports empirical results (100% TPR/0% FPR on DelegationBench v4 plus NLI fine-tuning) on held-out data. The TLA+ step uses an external model checker independent of the benchmark; the benchmark evaluation is a separate runtime test of the implemented DAS. No equations reduce to their inputs by construction, no parameters are fitted then relabeled as predictions, and no self-citation chains or ansatzes are invoked as load-bearing. The central claims therefore remain self-contained against external benchmarks rather than circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The framework rests on standard formal-methods assumptions plus the domain claim that a dedicated non-LLM service can be trusted to enforce the properties; no numerical free parameters are fitted to data in the reported results.

axioms (2)
  • standard math TLA+ model checking can exhaustively verify the six deterministic properties across 2.7 million states
    Invoked for mechanical verification of P1, P3-P7
  • domain assumption A non-LLM Delegation Authority Service can enforce all seven properties at runtime without new vulnerabilities
    Central premise of the IPDP protocol
invented entities (2)
  • Delegation Chain Calculus (DCC) no independent evidence
    purpose: Formal definition of the seven delegation properties
    New mathematical framework introduced by the paper
  • Intent-Preserving Delegation Protocol (IPDP) no independent evidence
    purpose: Runtime enforcement mechanism using the non-LLM service
    New protocol presented as the practical implementation

pith-pipeline@v0.9.0 · 5603 in / 1555 out tokens · 28225 ms · 2026-05-13T20:10:06.922382+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The Delegation Chain Calculus (DCC) defines seven properties—six deterministic (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) and one probabilistic (intent preservation)—with four meta-theorems...

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification

    cs.CY 2026-04 unverdicted novelty 4.0

    DEMM defines four executable evidence-sufficiency categories plus a conflicting category for agentic AI decisions and rolls per-property verdicts into a five-level maturity rubric.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    CivicShield: A cross-domain defense-in-depth framework for securing government-facing AI chatbots,

    K. S. R. Patil, “CivicShield: Seven-Layer Defense-in-Depth for Govern- ment AI Chatbot Security,” arXiv:2603.29062, 2026

  2. [2]

    RAGShield: Detecting Numerical Claim Manipulation in Government RAG Systems

    K. S. R. Patil, “RAGShield: Provenance-Verified Defense-in-Depth Against Knowledge Base Poisoning in Government RAG Systems,” arXiv:2604.00387, 2026

  3. [3]

    Detecting and Disrupting AI-Orchestrated Cyber Espi- onage,

    Anthropic, “Detecting and Disrupting AI-Orchestrated Cyber Espi- onage,” Anthropic Blog, Nov. 2025

  4. [4]

    2026 Global Threat Report,

    CrowdStrike, “2026 Global Threat Report,” Feb. 2026

  5. [5]

    State of AI Agent Security 2026,

    Gravitee, “State of AI Agent Security 2026,” 2026

  6. [6]

    AI Agent Standards Initiative,

    NIST, “AI Agent Standards Initiative,” NIST CAISI, Feb. 2026

  7. [7]

    Agentic Trust Framework Survey,

    CSA and Strata Identity, “Agentic Trust Framework Survey,” Feb. 2026

  8. [8]

    Top 10 for Agentic Applications,

    OW ASP Foundation, “Top 10 for Agentic Applications,” Dec. 2025

  9. [9]

    Agentic Trust Framework,

    J. Woodruff, M. Savage, and J. Kindervag, “Agentic Trust Framework,” Cloud Security Alliance, Feb. 2026

  10. [10]

    The Promptware Kill Chain,

    B. Nassi, O. Brodt, E. Feldman, and B. Schneier, “The Promptware Kill Chain,” arXiv:2601.09625, Jan. 2026

  11. [11]

    Agents of Chaos

    N. Shapira et al., “Agents of Chaos,” arXiv:2602.20021, Feb. 2026

  12. [12]

    Stac: When innocent tools form dangerous chains to jailbreak llm agents,

    J.-J. Li et al., “STAC: Sequential Tool Attack Chaining,” arXiv:2509.25624, 2025

  13. [13]

    Agent-in-the-Middle: Red-Teaming LLM Multi-Agent Systems via Communication Attacks,

    P. He et al., “Agent-in-the-Middle: Red-Teaming LLM Multi-Agent Systems via Communication Attacks,” arXiv:2502.14847, 2025

  14. [14]

    David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via RL,

    S. Nellessen et al., “David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via RL,” arXiv:2602.02395, 2026

  15. [15]

    Preprint at https://arxiv.org/abs/ 2601.11893

    Z. Ji et al., “Taming Various Privilege Escalation in LLM-Based Agent Systems,” arXiv:2601.11893, 2026

  16. [16]

    Shieldagent: Shielding agents via verifiable safety policy reasoning

    Z. Chen et al., “ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning,” arXiv:2503.22738, 2025

  17. [17]

    Agent Behavioral Contracts: Formal Specification and Runtime Enforcement,

    V . P. Bhardwaj et al., “Agent Behavioral Contracts: Formal Specification and Runtime Enforcement,” arXiv:2602.22302, 2026

  18. [18]

    Uncovering security threats and architecting defenses in autonomous agents: A case study of openclaw,

    Z. Ying et al., “Uncovering Security Threats and Architecting Defenses in Autonomous Agents,” arXiv:2603.12644, 2026

  19. [19]

    Authenticated delegation and authorized ai agents,

    T. South et al., “Authenticated Delegation and Authorized AI Agents,” arXiv:2501.09674, 2025

  20. [20]

    SP 800-53 Rev 5.2.0: Security and Privacy Controls,

    NIST, “SP 800-53 Rev 5.2.0: Security and Privacy Controls,” Aug. 2025

  21. [21]

    Agent Identity Protocol: Invocation-Bound Capability To- kens for Delegation Chains,

    S. Prakash, “Agent Identity Protocol: Invocation-Bound Capability To- kens for Delegation Chains,” arXiv:2603.24775, 2026

  22. [22]

    Authenticated Workflows: A Trust Layer for Enterprise Agentic AI,

    S. Rajagopalan and S. Rao, “Authenticated Workflows: A Trust Layer for Enterprise Agentic AI,” arXiv:2602.10465, 2026

  23. [23]

    FormalJudge: Neuro-Symbolic Agentic Oversight via Dafny and Z3,

    Y . Zhou et al., “FormalJudge: Neuro-Symbolic Agentic Oversight via Dafny and Z3,” arXiv:2602.11136, 2026

  24. [24]

    ILION: Deterministic Execution Gate for Agentic AI,

    A. Chitan, “ILION: Deterministic Execution Gate for Agentic AI,” arXiv:2603.13247, 2026

  25. [25]

    Agentic JWT: Dual-Faceted Intent Token Binding,

    A. Goswami, “Agentic JWT: Dual-Faceted Intent Token Binding,” arXiv:2509.13597, 2025

  26. [26]

    Delegated Authorization with ASTRA: Task-Based Access Control,

    N. El Helou et al., “Delegated Authorization with ASTRA: Task-Based Access Control,” arXiv:2510.26702, 2025

  27. [27]

    Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems

    A. Allegrini et al., “Formalizing Safety and Security Properties for Agentic AI,” arXiv:2510.14133, 2025

  28. [28]

    SoK: Attack Surface of Agentic AI,

    Z. Wang et al., “SoK: Attack Surface of Agentic AI,” arXiv:2603.22928, 2026

  29. [29]

    MCP-38: Threat Taxonomy for Model Context Protocol,

    Y . Shen et al., “MCP-38: Threat Taxonomy for Model Context Protocol,” arXiv:2603.18063, 2026

  30. [30]

    GroupGuard: Collusive Attack Defense for Multi-Agent Systems,

    Z. Tao et al., “GroupGuard: Collusive Attack Defense for Multi-Agent Systems,” arXiv:2603.13940, 2026

  31. [31]

    Privacy-R1: Privacy-Aware Multi-LLM Agent Collaboration via Reinforcement Learning

    X. Hui et al., “PrivacyPAD: RL-Based Privacy-Aware Delegation,” arXiv:2510.16054, 2025