Recognition: unknown
Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents
Pith reviewed 2026-05-08 03:38 UTC · model grok-4.3
The pith
Governing autonomous AI agents reduces to estimating a bound on unobserved risk and enforcing a safety margin.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Informational Viability Principle states that governing an agent reduces to estimating a bound on unobserved risk hat{B}(x) = U(x) + SB(x) + RG(x) and allowing an action only when its capacity S(x) exceeds hat{B}(x) by a safety margin. Grounded in Aubin's viability theory, the Agent Viability Framework establishes monitoring (P1), anticipation (P2), and monotonic restriction (P3) as individually necessary and collectively sufficient for documented failure modes. RiskGate implements this with statistical estimators including KL divergence and z-tests, a fail-secure pipeline, and a closed-loop Autopilot as a regulation map, using a scalar Viability Index VI(t) for first-order prediction.
What carries the argument
The Agent Viability Framework with its three properties of monitoring, anticipation, and monotonic restriction, which together ensure coverage of failure modes, supported by the Informational Viability Principle for risk bounding.
Load-bearing premise
The three properties are collectively sufficient to cover all documented failure modes and the risk bound components can be estimated reliably enough to support safety margin decisions.
What would settle it
A documented case where an agent exhibits unsafe behavior despite satisfying monitoring, anticipation, and monotonic restriction properties, or where the risk bound estimate is exceeded but no failure occurs, would challenge the claims.
Figures
read the original abstract
Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change. We propose the \textbf{Informational Viability Principle}: governing an agent reduces to estimating a bound on unobserved risk $\hat{B}(x) = U(x) + SB(x) + RG(x)$ and allowing an action only when its capacity $S(x)$ exceeds $\hat{B}(x)$ by a safety margin. The \textbf{Agent Viability Framework}, grounded in Aubin's viability theory, establishes three properties -- monitoring (P1), anticipation (P2), and monotonic restriction (P3) -- as individually necessary and collectively sufficient for documented failure modes. \textbf{RiskGate} instantiates the framework with dedicated statistical estimators (KL divergence, segment-vs-rest $z$-tests, sequential pattern matching), a fail-secure monotonic pipeline, and a closed-loop Autopilot formalised as an instance of Aubin's regulation map with kill-switch-as-last-resort; a scalar Viability Index $VI(t) \in [-1,+1]$ with first-order $t^*$ prediction transforms governance from reactive to predictive. Contributions are the theoretical framework, the reference implementation, and analytical coverage against published agent-failure taxonomies; quantitative empirical evaluation is scoped as follow-up work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Informational Viability Principle, stating that governing autonomous AI agents amounts to estimating a bound on unobserved risk B̂(x) = U(x) + SB(x) + RG(x) and permitting an action only if the agent's capacity S(x) exceeds this bound by a safety margin. Grounded in Aubin's viability theory, the Agent Viability Framework defines three properties—monitoring (P1), anticipation (P2), and monotonic restriction (P3)—as individually necessary and collectively sufficient for addressing documented failure modes in AI agents. It presents RiskGate as an instantiation with statistical estimators including KL divergence, segment-vs-rest z-tests, and sequential pattern matching, along with a fail-secure pipeline, an Autopilot as a regulation map, and a Viability Index VI(t) for predictive governance. The contributions include the theoretical framework, a reference implementation, and analytical coverage against agent-failure taxonomies, with quantitative empirical evaluation deferred to future work.
Significance. If the necessity and sufficiency of P1–P3 can be rigorously established and the risk estimators shown to be reliable, the framework could provide a principled, predictive approach to runtime governance of autonomous agents by leveraging viability theory and shifting from reactive to anticipatory control via the Viability Index. The reference implementation and analytical coverage against published taxonomies are concrete strengths that facilitate follow-up work. However, the absence of a formal proof for sufficiency and the deferred empirical validation limit the immediate applicability to safety-critical deployments.
major comments (2)
- [Abstract] Abstract: The claim that monitoring (P1), anticipation (P2), and monotonic restriction (P3) are individually necessary and collectively sufficient for documented failure modes is asserted without a formal theorem, proposition, or exhaustive case analysis. Support is limited to 'analytical coverage against published agent-failure taxonomies,' but the manuscript does not demonstrate that these three properties necessarily capture every drift, adversary adaptation, or emergent failure mode, nor that they are minimal. This sufficiency assertion is load-bearing for the central contribution.
- [RiskGate instantiation] The section on RiskGate estimators: The unobserved risk bound components U(x), SB(x), and RG(x) are defined using estimators (KL divergence, segment-vs-rest z-tests, sequential pattern matching) introduced within the framework itself. Without independent grounding, shipped validation data, or a proof that these estimators reliably bound the true unobserved risk, the governance rule B̂(x) < S(x) − margin reduces to an internal consistency check rather than an externally validated safety margin. This directly affects the practical utility of the Informational Viability Principle.
minor comments (2)
- [Notation and definitions] The notation for capacity S(x) and the safety margin is introduced without an explicit early definition or example computation; adding a dedicated notation table or worked example in §2 would improve readability.
- [Autopilot and Viability Index] The first-order t* prediction for the Viability Index VI(t) is described at a high level; a short derivation or pseudocode for how t* is obtained from the regulation map would clarify the transition from reactive to predictive governance.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. The comments highlight important areas for strengthening the presentation of the Informational Viability Principle and the Agent Viability Framework. We address each major comment below and outline targeted revisions.
read point-by-point responses
-
Referee: [Abstract] The claim that monitoring (P1), anticipation (P2), and monotonic restriction (P3) are individually necessary and collectively sufficient for documented failure modes is asserted without a formal theorem, proposition, or exhaustive case analysis. Support is limited to 'analytical coverage against published agent-failure taxonomies,' but the manuscript does not demonstrate that these three properties necessarily capture every drift, adversary adaptation, or emergent failure mode, nor that they are minimal.
Authors: We acknowledge that the manuscript relies on analytical coverage rather than a formal theorem for the necessity and sufficiency claim. The properties were derived by mapping each documented failure mode from published taxonomies to the minimal set of capabilities required to address it, showing that P1–P3 together cover all listed modes while each is required for at least one. In the revised manuscript we will add an expanded appendix with a structured table that explicitly links every taxonomy entry to the relevant property (or properties), including a brief argument for minimality based on the taxonomy structure. We agree a complete formal proof across all conceivable emergent behaviors lies beyond the current scope; the revision will therefore qualify the claim as holding for the documented failure modes in the literature while noting the analytical nature of the support. revision: partial
-
Referee: [RiskGate instantiation] The unobserved risk bound components U(x), SB(x), and RG(x) are defined using estimators (KL divergence, segment-vs-rest z-tests, sequential pattern matching) introduced within the framework itself. Without independent grounding, shipped validation data, or a proof that these estimators reliably bound the true unobserved risk, the governance rule B̂(x) < S(x) − margin reduces to an internal consistency check rather than an externally validated safety margin.
Authors: The individual estimators are drawn from established statistical literature (KL divergence for distributional shift, z-tests for segment anomalies, and sequential pattern matching for drift detection) and are not invented ad hoc. The framework’s contribution is their integration into a viability-theoretic bound on unobserved risk. In revision we will add explicit citations to the foundational statistical papers for each estimator and a short subsection clarifying that the bound is constructed from these externally validated techniques. We maintain that the resulting governance rule is not merely internal consistency because the estimators have independent statistical grounding; however, we agree that end-to-end empirical validation of the composite bound in deployed agents is required for safety-critical use and remains scoped as future work as stated in the manuscript. revision: partial
- Formal theorem establishing necessity and sufficiency of P1–P3 across all possible failure modes
- Quantitative empirical validation of the composite unobserved-risk bound produced by the RiskGate estimators
Circularity Check
No significant circularity; framework proposes definitions grounded in external theory with new estimators and coverage argument
full rationale
The paper proposes the Informational Viability Principle as a reduction of governance to estimating the defined bound B̂(x) = U(x) + SB(x) + RG(x) and applies a safety margin test; it grounds the necessity and sufficiency of P1-P2-P3 in Aubin's external viability theory while supporting the claim via analytical coverage of published taxonomies rather than a closed derivation. RiskGate supplies new statistical estimators (KL, z-tests, pattern matching) for the components instead of fitting parameters to force a match with prior inputs. The Viability Index and t* prediction are presented as transformations within the new framework, not as outputs forced by construction from the same data or self-citations. No equation or claim reduces a derived result to an input by definition, renaming, or self-referential fitting; the work is self-contained as a proposal plus implementation with deferred empirical validation.
Axiom & Free-Parameter Ledger
free parameters (1)
- safety margin
axioms (1)
- domain assumption Aubin's viability theory supplies the three properties as individually necessary and collectively sufficient for the documented failure modes.
invented entities (2)
-
Informational Viability Principle
no independent evidence
-
Viability Index VI(t)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Agresti.Categorical Data Analysis.Wiley, 3rd edition, 2012
A. Agresti.Categorical Data Analysis.Wiley, 3rd edition, 2012
2012
-
[2]
Aubin.Viability Theory.Birkh¨ auser, 1991
J.-P. Aubin.Viability Theory.Birkh¨ auser, 1991. 35
1991
-
[3]
Aubin, A
J.-P. Aubin, A. Bayen, and P. Saint-Pierre.Viability Theory: New Directions.Springer, 2nd edition, 2011
2011
-
[4]
Governance-as-a-service: A multi-agent frame- work for AI system compliance and policy enforcement
S. Gaurav, J. Heikkonen, and J. Chaudhary. “Governance-as-a-service: A multi-agent frame- work for AI system compliance and policy enforcement.”arXiv preprint arXiv:2508.18765, 2025
-
[5]
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit prob- lem.Machine Learning,47(2):235–256, 2002
2002
-
[6]
Amazon Bedrock AgentCore — Policy
Amazon Web Services. Amazon Bedrock AgentCore — Policy. AWS Documentation, 2025
2025
-
[7]
Constitutional AI: Harmlessness from AI Feedback
Y. Bai et al. Constitutional AI: Harmlessness from AI feedback. arXiv:2212.08073, 2022
work page internal anchor Pith review arXiv 2022
- [8]
-
[9]
Bifet and R
A. Bifet and R. Gavald` a. Learning from time-changing data with adaptive windowing. InSDM, pages 443–448, 2007
2007
-
[10]
Cutler, C
J. Cutler, C. Friesen, E. Ito, E. Mulder, A. Paljak, and M. Singhal. Cedar: A new language for expressive, fast, safe, and analyzable authorization. InOOPSLA,2024
2024
-
[11]
Regulation (EU) 2024/1689 (AI Act).Official Journal of the European Union,2024
European Parliament and Council. Regulation (EU) 2024/1689 (AI Act).Official Journal of the European Union,2024
2024
-
[12]
Garivier and E
A. Garivier and E. Moulines. On upper-confidence bound policies for switching bandit prob- lems. InALT,pages 174–188, 2011
2011
-
[13]
Hoeffding
W. Hoeffding. Probability inequalities for sums of bounded random variables.JASA, 58(301):13–30, 1963
1963
-
[14]
James and C
W. James and C. Stein. Estimation with quadratic loss. InProc. Fourth Berkeley Symposium on Mathematical Statistics and Probability,pages 361–379, 1961
1961
-
[15]
M. Kaptein, V.-J. Khan, and A. Podstavnychy. Runtime governance for AI agents: policies on paths. arXiv:2603.16586, 2026
-
[16]
T. B. L. Kirkwood. Understanding the odd science of aging.Cell,120(4):437–447, 2005
2005
-
[17]
Kullback and R
S. Kullback and R. A. Leibler. On information and sufficiency.Annals of Mathematical Statis- tics,22(1):79–86, 1951
1951
-
[18]
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules.Advances in Applied Mathematics,6(1):4–22, 1985
1985
- [19]
-
[20]
Ouyang et al
L. Ouyang et al. Training language models to follow instructions with human feedback. In NeurIPS,2022
2022
- [21]
-
[22]
Rebedea et al
T. Rebedea et al. NeMo Guardrails: A toolkit for controllable and safe LLM applications. In EMNLP (Industry Track),2023
2023
-
[23]
N. Shapira et al. Agents of Chaos. arXiv:2602.20021, February 2026
work page internal anchor Pith review arXiv 2026
-
[24]
Van Valen
L. Van Valen. A new evolutionary law.Evolutionary Theory,1:1–30, 1973
1973
-
[25]
J. Gama, I. ˇZliobait˙ e, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation.ACM Computing Surveys,46(4):44:1–44:37, 2014
2014
-
[26]
2024 Data Breach Investigations Report
Verizon. 2024 Data Breach Investigations Report. Verizon Business, 2024
2024
-
[27]
W. K. Newey and K. D. West. A simple, positive semi-definite, heteroskedasticity and auto- correlation consistent covariance matrix.Econometrica,55(3):703–708, 1987
1987
-
[28]
W. K. Newey and K. D. West. Automatic lag selection in covariance matrix estimation.Review of Economic Studies,61(4):631–653, 1994
1994
-
[29]
H. R. K¨ unsch. The jackknife and the bootstrap for general stationary observations.Annals of Statistics,17(3):1217–1241, 1989. 37
1989
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.