arxiv: 2604.24686 · v1 · submitted 2026-04-27 · 💻 cs.AI

Recognition: unknown

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

German Marin , Jatin Chaudhary

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:38 UTC · model grok-4.3

classification 💻 cs.AI

keywords autonomous AI agentsruntime governanceunobserved riskviability theoryfailure modessafety marginsadaptive controlmonitoring and anticipation

0 comments

The pith

Governing autonomous AI agents reduces to estimating a bound on unobserved risk and enforcing a safety margin.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that autonomous AI agents can drift into unsafe states without code changes due to behavior shifts and adaptation. It proposes the Informational Viability Principle, which reduces governance to calculating an upper bound on unobserved risk from components like uncertainty, statistical bias, and risk growth, then permitting actions only if the agent's capacity exceeds this bound by a safety margin. The Agent Viability Framework identifies three properties—monitoring, anticipation, and monotonic restriction—as individually necessary and collectively sufficient to address documented failure modes. This enables a shift to predictive governance using a viability index that forecasts future states. A reader would care because it offers a principled way to maintain safety in systems that cannot be fully observed at runtime.

Core claim

The Informational Viability Principle states that governing an agent reduces to estimating a bound on unobserved risk hat{B}(x) = U(x) + SB(x) + RG(x) and allowing an action only when its capacity S(x) exceeds hat{B}(x) by a safety margin. Grounded in Aubin's viability theory, the Agent Viability Framework establishes monitoring (P1), anticipation (P2), and monotonic restriction (P3) as individually necessary and collectively sufficient for documented failure modes. RiskGate implements this with statistical estimators including KL divergence and z-tests, a fail-secure pipeline, and a closed-loop Autopilot as a regulation map, using a scalar Viability Index VI(t) for first-order prediction.

What carries the argument

The Agent Viability Framework with its three properties of monitoring, anticipation, and monotonic restriction, which together ensure coverage of failure modes, supported by the Informational Viability Principle for risk bounding.

Load-bearing premise

The three properties are collectively sufficient to cover all documented failure modes and the risk bound components can be estimated reliably enough to support safety margin decisions.

What would settle it

A documented case where an agent exhibits unsafe behavior despite satisfying monitoring, anticipation, and monotonic restriction properties, or where the risk bound estimate is exceeded but no failure occurs, would challenge the claims.

Figures

Figures reproduced from arXiv: 2604.24686 by German Marin, Jatin Chaudhary.

**Figure 1.** Figure 1: Viability trajectory under the emergent-bias scenario of view at source ↗

read the original abstract

Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change. We propose the \textbf{Informational Viability Principle}: governing an agent reduces to estimating a bound on unobserved risk $\hat{B}(x) = U(x) + SB(x) + RG(x)$ and allowing an action only when its capacity $S(x)$ exceeds $\hat{B}(x)$ by a safety margin. The \textbf{Agent Viability Framework}, grounded in Aubin's viability theory, establishes three properties -- monitoring (P1), anticipation (P2), and monotonic restriction (P3) -- as individually necessary and collectively sufficient for documented failure modes. \textbf{RiskGate} instantiates the framework with dedicated statistical estimators (KL divergence, segment-vs-rest $z$-tests, sequential pattern matching), a fail-secure monotonic pipeline, and a closed-loop Autopilot formalised as an instance of Aubin's regulation map with kill-switch-as-last-resort; a scalar Viability Index $VI(t) \in [-1,+1]$ with first-order $t^*$ prediction transforms governance from reactive to predictive. Contributions are the theoretical framework, the reference implementation, and analytical coverage against published agent-failure taxonomies; quantitative empirical evaluation is scoped as follow-up work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a viability-theory framing for runtime AI agent safety with a new principle and index, but asserts sufficiency of its three properties without proof or data.

read the letter

This paper offers a theoretical structure for governing autonomous AI agents when their behavior drifts at runtime. It defines the Informational Viability Principle as estimating an unobserved risk bound from three components and only permitting actions when capacity clears a safety margin. The Agent Viability Framework then maps Aubin's viability theory to three properties—monitoring, anticipation, and monotonic restriction—that it calls individually necessary and collectively sufficient for documented failure modes. A scalar Viability Index and RiskGate estimators (KL divergence, z-tests, pattern matching) turn the idea into a predictive autopilot with a kill switch as fallback.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Informational Viability Principle, stating that governing autonomous AI agents amounts to estimating a bound on unobserved risk B̂(x) = U(x) + SB(x) + RG(x) and permitting an action only if the agent's capacity S(x) exceeds this bound by a safety margin. Grounded in Aubin's viability theory, the Agent Viability Framework defines three properties—monitoring (P1), anticipation (P2), and monotonic restriction (P3)—as individually necessary and collectively sufficient for addressing documented failure modes in AI agents. It presents RiskGate as an instantiation with statistical estimators including KL divergence, segment-vs-rest z-tests, and sequential pattern matching, along with a fail-secure pipeline, an Autopilot as a regulation map, and a Viability Index VI(t) for predictive governance. The contributions include the theoretical framework, a reference implementation, and analytical coverage against agent-failure taxonomies, with quantitative empirical evaluation deferred to future work.

Significance. If the necessity and sufficiency of P1–P3 can be rigorously established and the risk estimators shown to be reliable, the framework could provide a principled, predictive approach to runtime governance of autonomous agents by leveraging viability theory and shifting from reactive to anticipatory control via the Viability Index. The reference implementation and analytical coverage against published taxonomies are concrete strengths that facilitate follow-up work. However, the absence of a formal proof for sufficiency and the deferred empirical validation limit the immediate applicability to safety-critical deployments.

major comments (2)

[Abstract] Abstract: The claim that monitoring (P1), anticipation (P2), and monotonic restriction (P3) are individually necessary and collectively sufficient for documented failure modes is asserted without a formal theorem, proposition, or exhaustive case analysis. Support is limited to 'analytical coverage against published agent-failure taxonomies,' but the manuscript does not demonstrate that these three properties necessarily capture every drift, adversary adaptation, or emergent failure mode, nor that they are minimal. This sufficiency assertion is load-bearing for the central contribution.
[RiskGate instantiation] The section on RiskGate estimators: The unobserved risk bound components U(x), SB(x), and RG(x) are defined using estimators (KL divergence, segment-vs-rest z-tests, sequential pattern matching) introduced within the framework itself. Without independent grounding, shipped validation data, or a proof that these estimators reliably bound the true unobserved risk, the governance rule B̂(x) < S(x) − margin reduces to an internal consistency check rather than an externally validated safety margin. This directly affects the practical utility of the Informational Viability Principle.

minor comments (2)

[Notation and definitions] The notation for capacity S(x) and the safety margin is introduced without an explicit early definition or example computation; adding a dedicated notation table or worked example in §2 would improve readability.
[Autopilot and Viability Index] The first-order t* prediction for the Viability Index VI(t) is described at a high level; a short derivation or pseudocode for how t* is obtained from the regulation map would clarify the transition from reactive to predictive governance.

Simulated Author's Rebuttal

2 responses · 2 unresolved

We thank the referee for the thoughtful and constructive report. The comments highlight important areas for strengthening the presentation of the Informational Viability Principle and the Agent Viability Framework. We address each major comment below and outline targeted revisions.

read point-by-point responses

Referee: [Abstract] The claim that monitoring (P1), anticipation (P2), and monotonic restriction (P3) are individually necessary and collectively sufficient for documented failure modes is asserted without a formal theorem, proposition, or exhaustive case analysis. Support is limited to 'analytical coverage against published agent-failure taxonomies,' but the manuscript does not demonstrate that these three properties necessarily capture every drift, adversary adaptation, or emergent failure mode, nor that they are minimal.

Authors: We acknowledge that the manuscript relies on analytical coverage rather than a formal theorem for the necessity and sufficiency claim. The properties were derived by mapping each documented failure mode from published taxonomies to the minimal set of capabilities required to address it, showing that P1–P3 together cover all listed modes while each is required for at least one. In the revised manuscript we will add an expanded appendix with a structured table that explicitly links every taxonomy entry to the relevant property (or properties), including a brief argument for minimality based on the taxonomy structure. We agree a complete formal proof across all conceivable emergent behaviors lies beyond the current scope; the revision will therefore qualify the claim as holding for the documented failure modes in the literature while noting the analytical nature of the support. revision: partial
Referee: [RiskGate instantiation] The unobserved risk bound components U(x), SB(x), and RG(x) are defined using estimators (KL divergence, segment-vs-rest z-tests, sequential pattern matching) introduced within the framework itself. Without independent grounding, shipped validation data, or a proof that these estimators reliably bound the true unobserved risk, the governance rule B̂(x) < S(x) − margin reduces to an internal consistency check rather than an externally validated safety margin.

Authors: The individual estimators are drawn from established statistical literature (KL divergence for distributional shift, z-tests for segment anomalies, and sequential pattern matching for drift detection) and are not invented ad hoc. The framework’s contribution is their integration into a viability-theoretic bound on unobserved risk. In revision we will add explicit citations to the foundational statistical papers for each estimator and a short subsection clarifying that the bound is constructed from these externally validated techniques. We maintain that the resulting governance rule is not merely internal consistency because the estimators have independent statistical grounding; however, we agree that end-to-end empirical validation of the composite bound in deployed agents is required for safety-critical use and remains scoped as future work as stated in the manuscript. revision: partial

standing simulated objections not resolved

Formal theorem establishing necessity and sufficiency of P1–P3 across all possible failure modes
Quantitative empirical validation of the composite unobserved-risk bound produced by the RiskGate estimators

Circularity Check

0 steps flagged

No significant circularity; framework proposes definitions grounded in external theory with new estimators and coverage argument

full rationale

The paper proposes the Informational Viability Principle as a reduction of governance to estimating the defined bound B̂(x) = U(x) + SB(x) + RG(x) and applies a safety margin test; it grounds the necessity and sufficiency of P1-P2-P3 in Aubin's external viability theory while supporting the claim via analytical coverage of published taxonomies rather than a closed derivation. RiskGate supplies new statistical estimators (KL, z-tests, pattern matching) for the components instead of fitting parameters to force a match with prior inputs. The Viability Index and t* prediction are presented as transformations within the new framework, not as outputs forced by construction from the same data or self-citations. No equation or claim reduces a derived result to an input by definition, renaming, or self-referential fitting; the work is self-contained as a proposal plus implementation with deferred empirical validation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The framework rests on Aubin's viability theory for the necessity and sufficiency of the three properties, plus newly introduced risk components and the Viability Index without independent falsifiable evidence supplied in the abstract.

free parameters (1)

safety margin
The amount by which capacity S(x) must exceed the estimated risk bound is introduced but not derived from first principles or data.

axioms (1)

domain assumption Aubin's viability theory supplies the three properties as individually necessary and collectively sufficient for the documented failure modes.
Invoked directly to ground the Agent Viability Framework.

invented entities (2)

Informational Viability Principle no independent evidence
purpose: Reduces governance to estimating and comparing the unobserved risk bound against capacity.
Newly proposed reduction rule.
Viability Index VI(t) no independent evidence
purpose: Scalar in [-1, +1] that enables first-order prediction of when governance intervention is needed.
Introduced as part of the closed-loop Autopilot.

pith-pipeline@v0.9.0 · 5532 in / 1517 out tokens · 58833 ms · 2026-05-08T03:38:12.272577+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 7 canonical work pages · 2 internal anchors

[1]

Agresti.Categorical Data Analysis.Wiley, 3rd edition, 2012

A. Agresti.Categorical Data Analysis.Wiley, 3rd edition, 2012

2012
[2]

Aubin.Viability Theory.Birkh¨ auser, 1991

J.-P. Aubin.Viability Theory.Birkh¨ auser, 1991. 35

1991
[3]

Aubin, A

J.-P. Aubin, A. Bayen, and P. Saint-Pierre.Viability Theory: New Directions.Springer, 2nd edition, 2011

2011
[4]

Governance-as-a-service: A multi-agent frame- work for AI system compliance and policy enforcement

S. Gaurav, J. Heikkonen, and J. Chaudhary. “Governance-as-a-service: A multi-agent frame- work for AI system compliance and policy enforcement.”arXiv preprint arXiv:2508.18765, 2025

work page arXiv 2025
[5]

P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit prob- lem.Machine Learning,47(2):235–256, 2002

2002
[6]

Amazon Bedrock AgentCore — Policy

Amazon Web Services. Amazon Bedrock AgentCore — Policy. AWS Documentation, 2025

2025
[7]

Constitutional AI: Harmlessness from AI Feedback

Y. Bai et al. Constitutional AI: Harmlessness from AI feedback. arXiv:2212.08073, 2022

work page internal anchor Pith review arXiv 2022
[8]

V. P. Bhardwaj. Agent behavioral contracts: Formal specification and runtime enforcement for reliable autonomous AI agents. arXiv:2602.22302, 2026

work page arXiv 2026
[9]

Bifet and R

A. Bifet and R. Gavald` a. Learning from time-changing data with adaptive windowing. InSDM, pages 443–448, 2007

2007
[10]

Cutler, C

J. Cutler, C. Friesen, E. Ito, E. Mulder, A. Paljak, and M. Singhal. Cedar: A new language for expressive, fast, safe, and analyzable authorization. InOOPSLA,2024

2024
[11]

Regulation (EU) 2024/1689 (AI Act).Official Journal of the European Union,2024

European Parliament and Council. Regulation (EU) 2024/1689 (AI Act).Official Journal of the European Union,2024

2024
[12]

Garivier and E

A. Garivier and E. Moulines. On upper-confidence bound policies for switching bandit prob- lems. InALT,pages 174–188, 2011

2011
[13]

Hoeffding

W. Hoeffding. Probability inequalities for sums of bounded random variables.JASA, 58(301):13–30, 1963

1963
[14]

James and C

W. James and C. Stein. Estimation with quadratic loss. InProc. Fourth Berkeley Symposium on Mathematical Statistics and Probability,pages 361–379, 1961

1961
[15]

Kaptein, V.-J

M. Kaptein, V.-J. Khan, and A. Podstavnychy. Runtime governance for AI agents: policies on paths. arXiv:2603.16586, 2026

work page arXiv 2026
[16]

T. B. L. Kirkwood. Understanding the odd science of aging.Cell,120(4):437–447, 2005

2005
[17]

Kullback and R

S. Kullback and R. A. Leibler. On information and sufficiency.Annals of Mathematical Statis- tics,22(1):79–86, 1951

1951
[18]

T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules.Advances in Applied Mathematics,6(1):4–22, 1985

1985
[19]

Nassi, S

B. Nassi, S. Cohen, and O. Yair. Invitation is all you need! Promptware attacks against LLM-powered assistants in production are practical and dangerous. arXiv:2508.12175, 2025

work page arXiv 2025
[20]

Ouyang et al

L. Ouyang et al. Training language models to follow instructions with human feedback. In NeurIPS,2022

2022
[21]

A. Rath. Agent Drift: Quantifying behavioral degradation in multi-agent LLM systems over extended interactions. arXiv:2601.04170, January 2026. 36

work page arXiv 2026
[22]

Rebedea et al

T. Rebedea et al. NeMo Guardrails: A toolkit for controllable and safe LLM applications. In EMNLP (Industry Track),2023

2023
[23]

Agents of Chaos

N. Shapira et al. Agents of Chaos. arXiv:2602.20021, February 2026

work page internal anchor Pith review arXiv 2026
[24]

Van Valen

L. Van Valen. A new evolutionary law.Evolutionary Theory,1:1–30, 1973

1973
[25]

J. Gama, I. ˇZliobait˙ e, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation.ACM Computing Surveys,46(4):44:1–44:37, 2014

2014
[26]

2024 Data Breach Investigations Report

Verizon. 2024 Data Breach Investigations Report. Verizon Business, 2024

2024
[27]

W. K. Newey and K. D. West. A simple, positive semi-definite, heteroskedasticity and auto- correlation consistent covariance matrix.Econometrica,55(3):703–708, 1987

1987
[28]

W. K. Newey and K. D. West. Automatic lag selection in covariance matrix estimation.Review of Economic Studies,61(4):631–653, 1994

1994
[29]

H. R. K¨ unsch. The jackknife and the bootstrap for general stationary observations.Annals of Statistics,17(3):1217–1241, 1989. 37

1989