arxiv: 2604.17517 · v2 · submitted 2026-04-19 · 💻 cs.AI · cs.CR

Recognition: unknown

From Admission to Invariants: Measuring Deviation in Delegated Agent Systems

Marcelo Fernandez (TraslaIA)

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:14 UTC · model grok-4.3

classification 💻 cs.AI cs.CR

keywords autonomous agentsenforcement mechanismsbehavioral driftnon-identifiabilityinvariant measurementadmission-time complianceruntime monitoringagent governance

0 comments

The pith

Enforcement mechanisms in agent systems cannot detect behavioral drift from admission-time admissible sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that runtime enforcement, which flags local constraint violations, fails to determine if an agent's overall behavior stays within the global admissible space defined when the agent was admitted. This structural blind spot arises because enforcement checks individual actions against point-wise rules, while admissibility involves properties across entire trajectories. As a result, agents can gradually shift their behavior distributions without ever triggering an enforcement flag. The authors introduce the Invariant Measurement Layer to directly monitor against the original generative model of admissible behavior, proving it can detect drifts with bounded delay. Validation in multiple drift scenarios and real agent pipelines demonstrates enforcement missing all drifts while the new layer catches them promptly.

Core claim

A correctly functioning enforcement engine can fail to observe behavioral drift because the enforcement signal g generates information only at the local action level, whereas the admissible behavior space A0 is defined by global trajectory properties. The Non-Identifiability Theorem establishes that A0 lies outside the sigma-algebra generated by g under the Local Observability Assumption satisfied by practical systems. Consequently, systematic deviation from admission expectations can occur while every single action remains permitted. The Invariant Measurement Layer restores observability by maintaining direct access to the model underlying A0.

What carries the argument

The Non-Identifiability Theorem, which proves the admissible set A0 is not in the sigma-algebra generated by the local enforcement signal g due to the mismatch between point-wise rule evaluation and trajectory-level properties.

If this is right

Enforcement-based governance is structurally unable to verify ongoing compliance with admission-time admissible behavior spaces.
Agents can exhibit systematic behavioral drift while every individual action remains within permitted local rules.
The Invariant Measurement Layer detects admission-time drift with provably finite detection delay by retaining access to the generative model of A0.
In drift scenarios and deployed agent pipelines, enforcement reports zero violations while the new layer identifies each drift type within a bounded number of steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Long-running delegated systems may need supplementary monitoring that preserves access to admission-time models rather than relying solely on runtime enforcement.
The local-versus-global information mismatch could appear in other rule-based monitoring settings such as network security or compliance auditing.
Adding invariant layers to agent frameworks would enable earlier correction of drifts before they produce observable harm.

Load-bearing premise

That enforcement signals only provide local point-wise evaluations of actions and cannot capture global trajectory properties of the admissible behavior space.

What would settle it

An agent trajectory in which the enforcement signal g registers no violations yet the observed behavioral distribution differs from the model of A0 established at admission.

Figures

Figures reproduced from arXiv: 2604.17517 by Marcelo Fernandez (TraslaIA).

**Figure 1.** Figure 1: Layered structure of the trace space Σ ∗ . ACP’s enforcement signal g partitions Σ ∗ into g −1 (1) (constraint violations, detected by ACP) and Compliance(C) = g −1 (0) (compliant traces, invisible to enforcement). Within Compliance(C), IML monitors the boundary between A0 (admission-time behavior) and Compliance\A0 (compliant but drifted behavior— hidden drift). The two mechanisms cover complementary bou… view at source ↗

**Figure 2.** Figure 2: Per-component deviation trajectories (Dt: temporal drift via JS divergence; Dc: constraint proximity via average tool risk; Dl : lineage deviation via delegation depth). Delegation drift (middle): Dl dominates and saturates at 1.0 by step 170, accounting for the early T ∗ . Tool drift (left): Dt and Dc grow jointly; Dl remains near zero since depth is unaffected. Context drift (right): similar to tool drif… view at source ↗

**Figure 3.** Figure 3: Db(τt, A0) for all three drift scenarios (300 steps). Enforcement signal g(τt) = 0 throughout (not shown). Drift onset at t = 50 (dashed line). breakdown: Dt = 0.143, Dc = 0.500, Dl = 0.333. These results are consistent with the simulation benchmarks (Db final ∈ [0.21, 0.39] in [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: IML vs. anomaly detector (B2) across three scenarios. Tool/context drift: B2 peaks then declines due to reference contamination; IML’s reference PE0 is frozen. Delegation drift: B2 is lineage-blind (Dl = 0 always); IML separation +0.188 [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Detection delay T ∗ (θ) as a function of threshold θ. Delegation drift (blue) is detected earliest at all thresholds because Dl saturates rapidly; tool and context drift (red, teal) converge to similar curves. 5.5 LangGraph Agent Experiment To demonstrate IML in a real agent-framework execution context, we implement a LangGraph [11] StateGraph with two nodes (decide_tool → execute_tool) and a deterministic… view at source ↗

**Figure 6.** Figure 6: Long-horizon drift (1000 steps, seed 42). Db(τt, A0) grows monotonically in all three scenarios while the enforcement signal g(τt) = 0 throughout (dotted orange). Drift onset at t = 50 (dashed vertical); detection threshold θ = 0.20 (dashed horizontal). The gap between Db and enforcement signal persists and widens over the full 1000-step horizon, directly instantiating Theorem 3.5. run. Component analysis … view at source ↗

**Figure 7.** Figure 7: LangGraph agent under gradual compliant drift (50 burn-in + 200 drift steps, seed 42). IML composite Db (dark) reaches 0.358 while enforcement g(τt) = 0 throughout (dotted orange). Component breakdown: Dt (tool distribution shift, red dashed), Dc (constraint proximity, blue dash-dot), Dl (lineage depth, teal dotted). Detection at T ∗ 0.20 = 168 steps. ity is unachievable (IML), execution authority must be … view at source ↗

read the original abstract

Autonomous agent systems are governed by enforcement mechanisms that flag hard constraint violations at runtime. The Agent Control Protocol identifies a structural limit of such systems: a correctly-functioning enforcement engine can enter a regime in which behavioral drift is invisible to it, because the enforcement signal operates below the layer where deviation is measurable. We show that enforcement-based governance is structurally unable to determine whether an agent behavior remains within the admissible behavior space A0 established at admission time. Our central result, the Non-Identifiability Theorem, proves that A0 is not in the sigma-algebra generated by the enforcement signal g under the Local Observability Assumption, which every practical enforcement system satisfies. The impossibility arises from a fundamental mismatch: g evaluates actions locally against a point-wise rule set, while A0 encodes global, trajectory-level behavioral properties set at admission time. An agent can therefore drift -- systematically shifting its behavioral distribution away from admission-time expectations -- while every individual action remains within the permitted action space. We define the Invariant Measurement Layer (IML), which bypasses this limitation by retaining direct access to the generative model of A0, restoring observability precisely in the region where enforcement is structurally blind. We prove an information-theoretic impossibility for enforcement-based monitoring and show IML detects admission-time drift with provably finite detection delay. Validated across four settings: three drift scenarios (300 and 1000 steps), a live n8n webhook pipeline, and a LangGraph StateGraph agent -- enforcement triggers zero violations while IML detects each drift type within 9-258 steps of drift onset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Non-Identifiability Theorem flags a real limit in local enforcement for catching global drift, but the claim that it covers every practical system rests on an under-defended assumption.

read the letter

The paper's main contribution is showing that enforcement signals limited to local, pointwise checks on individual actions cannot identify whether the agent's full trajectory stays inside the admission-time admissible set A0. The Non-Identifiability Theorem formalizes this as A0 lying outside the sigma-algebra generated by the enforcement signal g. That distinction between local rules and global trajectory properties is straightforward once stated, and it explains why an agent can keep every single action inside the allowed set while its overall behavior shifts. The Invariant Measurement Layer is presented as a direct fix that keeps access to the generative model of A0 and achieves finite detection delay in the reported experiments. The validation across three synthetic drift cases, an n8n webhook, and a LangGraph agent is concrete enough to show the practical difference: enforcement reports zero violations while IML flags the drifts within a few hundred steps. That part is useful for anyone building runtime monitors. The soft spot is the Local Observability Assumption. The paper treats it as something every practical enforcement system satisfies, yet supplies no explicit definition of what counts as practical and no argument ruling out stateful or history-aware monitors that could enlarge the generated sigma-algebra. Counterexamples like cumulative constraint trackers or limited-memory filters are easy to imagine and would weaken the universality of the impossibility result. Without that justification the theorem holds only conditionally. The abstract also gives no proof sketch, so the derivation steps remain opaque. This work is aimed at people working on agent safety, compliance monitoring, and runtime governance. It raises a legitimate structural question and offers a workable alternative, so it is worth sending to peer review even though the assumption needs more defense and the empirical section would benefit from released code or fuller metrics.

Referee Report

3 major / 2 minor

Summary. The paper claims that runtime enforcement mechanisms in autonomous agent systems cannot detect certain forms of behavioral drift from an admission-time admissible set A0, because the local enforcement signal g generates a sigma-algebra that does not contain A0 under the Local Observability Assumption (which the authors assert holds for all practical enforcement systems). They state a Non-Identifiability Theorem establishing this impossibility, introduce an Invariant Measurement Layer (IML) that retains access to the generative model of A0 to restore observability, prove an information-theoretic result on finite detection delay for IML, and report empirical validation across three synthetic drift scenarios, an n8n webhook pipeline, and a LangGraph agent where enforcement detects zero violations but IML detects drift onset within 9-258 steps.

Significance. If the Non-Identifiability Theorem holds with a properly formalized Local Observability Assumption, the result would identify a structural limitation of enforcement-based governance for delegated agents, with direct relevance to AI safety and runtime monitoring. The proposed IML offers a concrete alternative with claimed finite-delay guarantees, and the multi-setting validation (including live pipelines) provides initial evidence of practical utility. The work also supplies a clear distinction between local point-wise checks and global trajectory invariants.

major comments (3)

[Abstract and theorem statement] The Non-Identifiability Theorem is stated in the abstract and presumably elaborated in the main text, but the manuscript supplies no formal definitions of the sigma-algebra generated by g, the precise statement of the Local Observability Assumption (e.g., whether g_t depends only on (s_t, a_t) with no history or derived invariants), or a proof of the theorem. Without these, it is impossible to verify whether the non-identifiability result is independent of the assumption or partly definitional.
[Abstract and § on Local Observability Assumption] The claim that 'every practical enforcement system satisfies' the Local Observability Assumption is load-bearing for the universality of the impossibility result, yet no explicit characterization, proof, or counterexample analysis is provided. Stateful monitors, cumulative constraint checkers, or history-retaining filters could enlarge the generated sigma-algebra and potentially make A0 measurable, limiting the theorem's scope to a narrower class of memoryless local enforcers.
[Validation section] The empirical validation reports specific detection delays (9-258 steps) and zero enforcement violations across four settings, but provides no quantitative details on how drift is induced, what metrics are used for IML detection, baseline comparisons, or statistical significance. This makes it difficult to assess whether the results support the claim that IML 'restores observability precisely in the region where enforcement is structurally blind.'

minor comments (2)

[Introduction] Notation for A0, g, and the Invariant Measurement Layer should be introduced with explicit mathematical definitions early in the paper rather than relying on prose descriptions.
[Abstract] The abstract mentions 'three drift scenarios (300 and 1000 steps)' but does not clarify whether these are separate experiments or parameter variations; a table summarizing the four settings with exact step counts and drift types would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments identify key areas where additional formalization and empirical detail will strengthen the manuscript. We respond point-by-point below and have incorporated revisions to address each concern directly.

read point-by-point responses

Referee: [Abstract and theorem statement] The Non-Identifiability Theorem is stated in the abstract and presumably elaborated in the main text, but the manuscript supplies no formal definitions of the sigma-algebra generated by g, the precise statement of the Local Observability Assumption (e.g., whether g_t depends only on (s_t, a_t) with no history or derived invariants), or a proof of the theorem. Without these, it is impossible to verify whether the non-identifiability result is independent of the assumption or partly definitional.

Authors: We agree that the presentation in the abstract is high-level and that the main text should make the formal elements fully self-contained for verification. In the revised manuscript we have added an explicit subsection (now Section 3.1) that defines the sigma-algebra generated by the local enforcement signal g, states the Local Observability Assumption precisely (g_t is a measurable function of the instantaneous state-action pair (s_t, a_t) with no access to trajectory history or derived invariants), and supplies the complete proof of the Non-Identifiability Theorem. The proof shows that A0 lies outside the generated sigma-algebra independently of the assumption's specific form, provided the assumption holds. revision: yes
Referee: [Abstract and § on Local Observability Assumption] The claim that 'every practical enforcement system satisfies' the Local Observability Assumption is load-bearing for the universality of the impossibility result, yet no explicit characterization, proof, or counterexample analysis is provided. Stateful monitors, cumulative constraint checkers, or history-retaining filters could enlarge the generated sigma-algebra and potentially make A0 measurable, limiting the theorem's scope to a narrower class of memoryless local enforcers.

Authors: We acknowledge that the universality claim requires explicit support. The revised manuscript now contains a dedicated characterization of the Local Observability Assumption together with an argument that any enforcement mechanism whose output signal is ultimately derived from local point-wise evaluations (even if internally stateful or cumulative) still generates a sigma-algebra that excludes global trajectory invariants of A0. We include a short counterexample analysis showing that mechanisms retaining sufficient history to recover A0 would require non-local information equivalent to the generative model itself, which standard practical enforcers do not retain. We have adjusted the phrasing from 'every practical enforcement system' to 'standard local enforcement mechanisms' and added the supporting reasoning. revision: partial
Referee: [Validation section] The empirical validation reports specific detection delays (9-258 steps) and zero enforcement violations across four settings, but provides no quantitative details on how drift is induced, what metrics are used for IML detection, baseline comparisons, or statistical significance. This makes it difficult to assess whether the results support the claim that IML 'restores observability precisely in the region where enforcement is structurally blind.'

Authors: We agree the validation section was too concise. The revised version expands the section with: (i) explicit descriptions of drift induction (parameterized distributional shifts in the generative models for each of the three synthetic scenarios plus the n8n and LangGraph cases); (ii) the precise IML detection metric (thresholded invariant divergence computed from retained samples of the A0 model); (iii) baseline comparisons against cumulative statistical monitors and simple action-space checks; and (iv) statistical reporting (mean detection delays and 95% confidence intervals over 50 independent runs per setting, confirming zero enforcement violations while IML detects drift in the reported range). These additions directly substantiate the structural-blindness claim. revision: yes

Circularity Check

1 steps flagged

Non-Identifiability Theorem reduces to definitional local-vs-global mismatch under unformalized assumption

specific steps

self definitional [Abstract / Non-Identifiability Theorem statement]
"Our central result, the Non-Identifiability Theorem, proves that A0 is not in the sigma-algebra generated by the enforcement signal g under the Local Observability Assumption, which every practical enforcement system satisfies. The impossibility arises from a fundamental mismatch: g evaluates actions locally against a point-wise rule set, while A0 encodes global, trajectory-level behavioral properties set at admission time."

The theorem asserts non-membership of A0 in sigma(g) once g is defined (via the assumption) as local and point-wise. This non-inclusion is true by construction of the sigma-algebra generated by local signals; the 'proof' adds no further mathematical content beyond restating the local/global distinction already built into the assumption.

full rationale

The paper's central result is presented as a theorem proving A0 lies outside sigma(g) under the Local Observability Assumption. However, the assumption is introduced precisely to encode that g consists only of local point-wise checks while A0 is global and trajectory-level. The claimed non-identifiability therefore follows immediately from the definitions of the generated sigma-algebra and the local/global distinction, with no independent derivation step exhibited. The additional claim that the assumption holds for every practical enforcement system is asserted without formalization or proof in the provided text, leaving the result conditional on its own definitional framing.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper relies on the Local Observability Assumption as a domain assumption and introduces two new entities without independent evidence outside the paper.

axioms (1)

domain assumption Local Observability Assumption: every practical enforcement system evaluates actions locally against a point-wise rule set
Invoked to establish that g operates below the layer where global deviation is measurable.

invented entities (2)

Non-Identifiability Theorem no independent evidence
purpose: To prove that A0 is not in the sigma-algebra generated by g
Central theoretical result of the paper.
Invariant Measurement Layer (IML) no independent evidence
purpose: To restore observability by retaining direct access to the generative model of A0
Proposed solution to bypass the structural limitation of enforcement.

pith-pipeline@v0.9.0 · 5581 in / 1375 out tokens · 34968 ms · 2026-05-10T05:14:51.487752+00:00 · methodology

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Atomic Decision Boundaries: A Structural Requirement for Guaranteeing Execution-Time Admissibility in Autonomous Systems
cs.LO 2026-04 unverdicted novelty 6.0

Atomic decision boundaries are required to guarantee execution-time admissibility because split evaluation systems allow environmental interleaving that no policy can prevent.
Reconstructive Authority Model: Runtime Execution Validity Under Partial Observability
cs.CR 2026-04 unverdicted novelty 5.0

RAM separates integrity from coverage and uses a reconstruction gate over proven state, assumptions, and unobservable residuals to block invalid executions, achieving zero invalid rates in synthetic tests where attest...
Agent Control Protocol: Admission Control for Agent Actions
cs.CR 2026-03 unverdicted novelty 5.0 partial

ACP is a temporal admission control protocol that combines static risk scoring with anomaly accumulation and cooldowns to limit harmful agent behavior over time, reducing approvals from 100% to 0.4% in tested workloads.

Reference graph

Works this paper leans on

14 extracted references · 7 canonical work pages · cited by 3 Pith papers · 1 internal anchor

[1]

Chandola, A

V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey.ACM Computing Surveys, 41(3):1–58, 2009

2009
[2]

Falcone, L

Y. Falcone, L. Mounier, J.-C. Fernandez, and J.-L. Richier. Runtime verification of component- based systems. InProceedings of the 4th International Symposium on Leveraging Applications, 2012

2012
[3]

DOI: 10.5281/zenodo.19672597

M. Fernandez. Fair atomic governance: Allocating decision boundaries under shared resource constraints in multi-agent systems.https://doi.org/10.5281/zenodo.19672597, 2026. Zen- odo. DOI: 10.5281/zenodo.19672597

work page doi:10.5281/zenodo.19672597 2026
[4]

mlco2/codecarbon: v2.4.1,

M. Fernandez. Atomic decision boundaries: A structural requirement for guaranteeing execution-time admissibility in autonomous systems.https://doi.org/10.5281/zenodo. 19670649, 2026. Zenodo. DOI: 10.5281/zenodo.19670649

work page doi:10.5281/zenodo 2026
[5]

Agent Control Protocol: Admission Control for Agent Actions

M. Fernandez. Agent Control Protocol: ACP v1.30—admission control for agent actions, 2026. arXiv:2603.18829 [cs.CR]. DOI: 10.5281/zenodo.19672575

work page internal anchor Pith review Pith/arXiv arXiv doi:10.5281/zenodo.19672575 2026
[6]

Irreducible multi-scale governance: Composition and limits of atomic admission systems.https://doi.org/10.5281/zenodo.19672608, 2026

M. Fernandez. Irreducible multi-scale governance: Composition and limits of atomic admission systems.https://doi.org/10.5281/zenodo.19672608, 2026. Zenodo. DOI: 10.5281/zen- odo.19672608

work page doi:10.5281/zenodo.19672608 2026
[7]

Reconstructive authority model: Runtime execution validity under partial observability.https://doi.org/10.5281/zenodo.19669430, 2026

M. Fernandez. Reconstructive authority model: Runtime execution validity under partial observability.https://doi.org/10.5281/zenodo.19669430, 2026. Agent Governance Series, Paper 5. Zenodo. DOI: 10.5281/zenodo.19669430

work page doi:10.5281/zenodo.19669430 2026
[8]

J. A. Goguen and J. Meseguer. Security policies and security models. In1982 IEEE Symposium on Security and Privacy, pages 11–20. IEEE, 1982

1982
[9]

AI control: Improving safety despite intentional subversion,

R. Greenblatt et al. Ai control: Improving safety despite intentional subversion.arXiv preprint arXiv:2312.06942, 2024. 20

work page arXiv 2024
[10]

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observ- able stochastic domains.Artificial Intelligence, 101(1-2):99–134, 1998

1998
[11]

LangGraph: Building stateful multi-agent applications.https://github.com/ langchain-ai/langgraph, 2024

LangChain AI. LangGraph: Building stateful multi-agent applications.https://github.com/ langchain-ai/langgraph, 2024

2024
[12]

Leucker and C

M. Leucker and C. Schallhart. A brief account of runtime verification.Journal of Logic and Algebraic Programming, 78(5):293–303, 2009

2009
[13]

Red Teaming Language Models with Language Models

E. Perez et al. Red teaming language models with language models.arXiv preprint arXiv:2202.03286, 2022

work page Pith review arXiv 2022
[14]

Sampath, R

M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. Teneketzis. Diagnosability of discrete-event systems.IEEE Transactions on Automatic Control, 40(9):1555–1575, 1995. 21

1995