pith. machine review for the scientific record. sign in

arxiv: 2604.27292 · v2 · submitted 2026-04-30 · 💻 cs.AI

Recognition: unknown

The Two Boundaries: Why Behavioral AI Governance Fails Structurally

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:36 UTC · model grok-4.3

classification 💻 cs.AI
keywords AI governancebehavioral governanceRice's theoremcoterminous governanceeffects governanceTuring completenessstructural failureundecidability
0
0 comments X

The pith

AI systems governing effects must make their capability boundary identical to the governance boundary or else risk and theater are inevitable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that every effects-governing AI has two independent boundaries: what the system can express and what its policies can cover. When these boundaries are set separately, three regions appear: the useful overlap of governed capabilities, the risky region of ungoverned capabilities, and the empty region of policies that address nothing real. Rice's theorem demonstrates that no algorithm can decide, for arbitrary programs, whether effects will comply with policy. The only escape is to force the boundaries to coincide through an upfront architectural choice that separates computation from effect, turning governance into a structural property of the execution pipeline rather than a later check. If this reasoning holds, then post-hoc behavioral governance layers on Turing-complete systems cannot succeed.

Core claim

The central claim is that behavioral governance of effects in Turing-complete AI systems is undecidable in general by Rice's theorem, because no algorithm can determine whether an arbitrary program satisfies a non-trivial semantic property such as policy compliance. Coterminous governance is therefore required: the expressiveness boundary must equal the governance boundary. This equality is achieved only by an architectural separation of computation from effect, after which governance checks become part of the execution pipeline and subsume any separate governance infrastructure. The testable criterion follows directly: if the two boundaries are not provably identical, then ungoverned risk,

What carries the argument

Coterminous governance, the requirement that an AI system's expressiveness boundary (what effects it can produce) exactly equals its governance boundary, enforced by separating computation from effects so that policy checks are structural rather than behavioral.

If this is right

  • Any behavioral governance layer added after the fact on unrestricted programs will leave either ungoverned capabilities or policies that cover nothing.
  • Governance checks must be moved inside the execution pipeline rather than run as a parallel system.
  • Structural governance under separated computation and effect renders separate governance infrastructure redundant.
  • The undecidability result applies to any attempt to decide non-trivial properties of effects in Turing-complete systems.
  • Coterminous boundaries become the single measurable test for whether a governance approach avoids structural failure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Restricting the effect-generating component to a non-Turing-complete language would remove the undecidability barrier and allow effective behavioral governance.
  • System designers could verify coterminous boundaries by enumerating every possible effect and confirming that each is explicitly covered and that no policy addresses an impossible action.
  • The same boundary-coincidence requirement may apply to other domains where programs produce external effects, such as operating-system access control or robotic action planning.
  • In practice this would favor agent architectures whose action sets are declared and finite rather than generated on the fly by general computation.

Load-bearing premise

The claim depends on modeling deployed AI effect-governance systems as arbitrary Turing-complete programs whose semantic compliance properties cannot be decided algorithmically after the fact.

What would settle it

A working deployed system that governs effects behaviorally on a Turing-complete architecture yet produces neither ungoverned risky effects nor policies that address impossible actions would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.27292 by Alan L. McCann.

Figure 1
Figure 1. Figure 1: Non-coterminous governance: expressiveness and governance boundaries are misaligned. view at source ↗
Figure 2
Figure 2. Figure 2: Coterminous governance: expressiveness and governance share the same boundary. The view at source ↗
read the original abstract

Every system that performs effects has two boundaries: what it can do (expressiveness) and what governance covers (governance). In nearly all deployed AI systems, these boundaries are defined independently, creating three regions: governed capabilities (the only useful region), ungoverned capabilities (risk), and governance policies that address non-existent capabilities (theater). Two of the three regions are failure modes. We focus on the governance of effects: actions that AI systems perform in the world (API calls, database writes, tool invocations). This is distinct from the governance of model outputs (content quality, bias, fairness), which operates at a different level and requires different mechanisms. We present a formal framework for analyzing this structural gap. Rice's theorem (1953) proves the gap is undecidable in the general case for any Turing-complete architecture that attempts to govern effects behaviorally: no algorithm can decide non-trivial semantic properties of arbitrary programs, including the property "this program's effects comply with the governance policy." We define coterminous governance: a system property where the expressivenessboundary equals the governance boundary. We show that coterminous governance requires an architectural decision (separatingcomputation from effect) rather than a governance layer added after the fact. We show that structural governance under this separation subsumes separate governance infrastructure: governance checks become part of the execution pipeline rather than a second system running alongside it. We propose coterminous governance as the testable criterion for any AI governance system: either the two boundaries are provably identical, or risk and theater are structurally inevitable. Proofs are mechanized in Coq (454 theorems, 36 modules, 0 admitted).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that every AI system performing effects has two independently defined boundaries (expressiveness and governance), creating three regions of which two are structural failure modes (ungoverned risk and governance theater). It invokes Rice's theorem to prove that deciding non-trivial semantic properties such as policy-compliant effects is undecidable for any Turing-complete architecture attempting behavioral governance, defines coterminous governance as the property that the two boundaries coincide, shows this requires an architectural separation of computation from effect rather than a post-hoc layer, and mechanizes the framework in Coq (454 theorems, 36 modules, 0 admits).

Significance. If the reduction from deployed AI effect mechanisms to arbitrary Turing-complete programs holds, the result supplies a formal, testable criterion that subsumes many existing post-hoc governance proposals and explains why behavioral approaches are prone to either residual risk or ineffective theater. The Coq mechanization of 454 theorems with zero admits is a clear strength, providing machine-checked support for the undecidability argument and the derived architectural requirements.

major comments (2)
  1. [§3.2] §3.2 (Mapping to AI architectures): the claim that current tool-calling and API-effect mechanisms in deployed systems are sufficiently expressive to inherit the full undecidability of Rice's theorem is asserted via informal reduction; a concrete lemma or example showing how an arbitrary program is simulated by an LLM-plus-tool loop would make the application load-bearing rather than illustrative.
  2. [Definition 4.1] Definition 4.1 (coterminous governance): the requirement that governance checks become part of the execution pipeline is derived from the undecidability result, yet the paper does not exhibit a formal statement showing that any post-hoc governance layer is necessarily non-coterminous; adding such a lemma would tighten the subsumption claim.
minor comments (2)
  1. [Abstract] Abstract: the three-region diagram is described in text but not referenced by figure number; adding '(see Figure 1)' would improve readability.
  2. [§5.3] §5.3: the statement that 'structural governance subsumes separate infrastructure' uses the term 'subsumes' without a precise set-theoretic or simulation relation; a short clarifying sentence would remove ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments and the recommendation for minor revision. The suggestions to formalize the reduction and the non-coterminous property of post-hoc layers will improve the clarity and rigor of the paper. We outline our responses below and confirm that revisions will be made accordingly.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Mapping to AI architectures): the claim that current tool-calling and API-effect mechanisms in deployed systems are sufficiently expressive to inherit the full undecidability of Rice's theorem is asserted via informal reduction; a concrete lemma or example showing how an arbitrary program is simulated by an LLM-plus-tool loop would make the application load-bearing rather than illustrative.

    Authors: We concur that the mapping in §3.2 relies on an informal argument. In the revised version, we will provide a concrete example illustrating the simulation of an arbitrary Turing-complete program using an LLM with tool-calling capabilities, assuming tools that support persistent state and control flow. Furthermore, we will add a lemma in the Coq formalization that captures this simulation, building on the existing 454 theorems to make the inheritance of undecidability explicit and machine-checked. revision: yes

  2. Referee: [Definition 4.1] Definition 4.1 (coterminous governance): the requirement that governance checks become part of the execution pipeline is derived from the undecidability result, yet the paper does not exhibit a formal statement showing that any post-hoc governance layer is necessarily non-coterminous; adding such a lemma would tighten the subsumption claim.

    Authors: The referee correctly identifies that the derivation of coterminous governance from undecidability would benefit from an explicit lemma. We will add a new lemma stating that for any Turing-complete system, a post-hoc governance layer (operating externally on effects) cannot be coterminous with the expressiveness boundary, because it would necessitate an algorithm to decide non-trivial semantic properties of programs, contradicting Rice's theorem. This lemma will be mechanized in Coq and integrated into the definition of coterminous governance in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external Rice's theorem and Coq mechanization

full rationale

The paper grounds its core claim in Rice's theorem (1953), an independent external result on undecidability of non-trivial semantic properties for arbitrary programs, and mechanizes the mapping to behavioral AI governance effects in Coq (454 theorems, 36 modules, 0 admitted). Coterminous governance is defined directly from the two-boundary distinction and shown to require separation of computation from effect as a logical consequence of the undecidability result rather than by redefinition or fitting. No load-bearing step reduces to self-citation, ansatz smuggling, renaming of known results, or any input-output equivalence by construction within the paper itself. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper relies on standard computability theory with no free parameters fitted to data. New concepts are introduced definitionally to organize the argument.

axioms (1)
  • standard math Rice's theorem: non-trivial semantic properties of programs are undecidable for Turing-complete systems
    Directly invoked to establish that behavioral governance of effects is undecidable in the general case.
invented entities (2)
  • coterminous governance no independent evidence
    purpose: System property in which expressiveness boundary equals governance boundary
    Newly defined as the testable criterion that avoids risk and theater regions.
  • three regions (governed capabilities, ungoverned capabilities, theater) no independent evidence
    purpose: Categorization of outcomes when boundaries are independent
    Conceptual partition introduced to identify failure modes.

pith-pipeline@v0.9.0 · 5596 in / 1509 out tokens · 72813 ms · 2026-05-07T09:36:34.786156+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 19 canonical work pages · 3 internal anchors

  1. [1]

    Constitutional AI: Harmlessness from AI Feedback

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional AI : Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073, 2022

  2. [2]

    arXiv preprint arXiv:2405.06624 , year =

    David Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, et al. Towards guaranteed safe AI : A framework for ensuring robust and reliable AI systems. arXiv preprint arXiv:2405.06624, 2024

  3. [3]

    Dennis and Earl C

    Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9 0 (3): 0 143--155, 1966. doi:10.1145/365230.365252

  4. [4]

    The method of levels of abstraction

    Luciano Floridi. The method of levels of abstraction. Minds and Machines, 18 0 (3): 0 303--329, 2008

  5. [5]

    Gifford and John M

    David K. Gifford and John M. Lucassen. Integrating functional and imperative programming. In ACM Conference on LISP and Functional Programming, pages 28--38, 1986. doi:10.1145/319838.319848

  6. [6]

    Guardrails: Adding guardrails to large language models

    Guardrails AI . Guardrails: Adding guardrails to large language models. https://github.com/guardrails-ai/guardrails, 2024

  7. [7]

    Schuff, Ben L

    Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and J. F. Bastien. Bringing the web up to speed with WebAssembly . In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 185--200, 2017. doi:10.1145/3062341.3062363

  8. [8]

    A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability

    Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thesing, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review, 37: 0 100270, 2020. doi:10.1016/j.cosrev.2020.100270

  9. [9]

    DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

    Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Mober, et al. DSPy : Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714, 2023

  10. [10]

    Type directed compilation of row-typed algebraic effects

    Daan Leijen. Type directed compilation of row-typed algebraic effects. Proceedings of the ACM on Programming Languages, 1 0 (POPL): 0 1--28, 2017. doi:10.1145/3009837.3009872

  11. [11]

    Do be do be do

    Sam Lindley, Conor McBride, and Craig McLaughlin. Do be do be do. In Proceedings of the ACM on Programming Languages (POPL), pages 1--26, 2017. doi:10.1145/3009837.3009897

  12. [12]

    Lee, Jing Tao, and Yang Zhao

    Bertram Lud \"a scher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience, 18 0 (10): 0 1039--1065, 2006. doi:10.1002/cpe.994

  13. [13]

    Alan L. McCann. Algebraic semantics of governed execution: Monoidal categories, effect algebras, and coterminous boundaries, 2026 a . arXiv preprint (to appear)

  14. [14]

    Alan L. McCann. Effect-transparent governance for AI workflow architectures: Semantic preservation, expressive minimality, and decidability boundaries, 2026 b . arXiv preprint (to appear)

  15. [15]

    Alan L. McCann. Mechanized foundations of structural governance: Machine-checked proofs for governed intelligence, 2026 c . arXiv preprint (to appear)

  16. [16]

    Alan L. McCann. Cryptographic registry provenance: Structural defense against dependency confusion in AI package ecosystems, 2026 d . arXiv preprint (to appear)

  17. [17]

    Alan L. McCann. Certified purity for cognitive workflow executors: From static analysis to cryptographic attestation, 2026 e . arXiv preprint (to appear)

  18. [18]

    Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, 2006

  19. [19]

    Notions of

    Eugenio Moggi. Notions of computation and monads. Information and Computation, 93 0 (1): 0 55--92, 1991. doi:10.1016/0890-5401(91)90052-4

  20. [20]

    Andrew C. Myers. JFlow : Practical mostly-static information flow control. In ACM Symposium on Principles of Programming Languages (POPL), pages 228--241, 1999. doi:10.1145/292540.292561

  21. [21]

    Myers and Barbara Liskov

    Andrew C. Myers and Barbara Liskov. A decentralized model for information flow control. In ACM Symposium on Operating Systems Principles (SOSP), pages 129--142, 1997. doi:10.1145/268998.266669

  22. [22]

    OPA : Open policy agent

    Open Policy Agent . OPA : Open policy agent. https://www.openpolicyagent.org/, 2024

  23. [23]

    Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al

    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35: 0 27730--27744, 2022

  24. [24]

    Tackling the awkward squad: Monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell

    Simon Peyton Jones. Tackling the awkward squad: Monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell . In Engineering Theories of Software Construction, pages 47--96. IOS Press, 2001

  25. [25]

    Plotkin and Matija Pretnar

    Gordon Plotkin and Matija Pretnar. Handlers of algebraic effects. In European Symposium on Programming (ESOP), pages 80--94, 2009. doi:10.1007/978-3-642-00590-9_7

  26. [26]

    NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails

    Traian Rebedea, Razvan Dinu, Makesh Narsimhan Sreedhar, Christopher Parisien, and Jonathan Cohen. NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails. In Conference on Empirical Methods in Natural Language Processing (EMNLP), System Demonstrations, 2023

  27. [27]

    Classes of recursively enumerable sets and their decision problems

    Henry Gordon Rice. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 74 0 (2): 0 358--366, 1953

  28. [28]

    Toward Verified Artificial Intelligence,

    Sanjit A. Seshia, Dorsa Sadigh, and S. Shankar Sastry. Toward verified artificial intelligence. Communications of the ACM, 65 0 (7): 0 46--55, 2022. doi:10.1145/3503914

  29. [29]

    Mitchell, and David Mazi \`e res

    Deian Stefan, Alejandro Russo, John C. Mitchell, and David Mazi \`e res. Flexible dynamic information flow control in Haskell . In ACM SIGPLAN Haskell Symposium, pages 95--106, 2011. doi:10.1145/2034675.2034688

  30. [30]

    Monads for functional programming

    Philip Wadler. Monads for functional programming. In Advanced Functional Programming, volume 925 of LNCS, pages 24--52. Springer, 1995. doi:10.1007/3-540-59451-5_2

  31. [31]

    Anderson, and Susan L

    Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. Efficient software-based fault isolation. In ACM Symposium on Operating Systems Principles (SOSP), pages 203--216, 1993. doi:10.1145/168619.168635

  32. [32]

    Jailbroken: How does LLM safety training fail? In Advances in Neural Information Processing Systems, volume 36, 2023

    Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does LLM safety training fail? In Advances in Neural Information Processing Systems, volume 36, 2023

  33. [33]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023