The Two Boundaries: Why Behavioral AI Governance Fails Structurally

Alan L. McCann

arxiv: 2604.27292 · v3 · pith:SSWLMT43new · submitted 2026-04-30 · 💻 cs.AI

The Two Boundaries: Why Behavioral AI Governance Fails Structurally

Alan L. McCann This is my paper

Pith reviewed 2026-05-07 09:36 UTC · model grok-4.3

classification 💻 cs.AI

keywords AI governancebehavioral governanceRice's theoremcoterminous governanceeffects governanceTuring completenessstructural failureundecidability

0 comments

The pith

AI systems governing effects must make their capability boundary identical to the governance boundary or else risk and theater are inevitable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that every effects-governing AI has two independent boundaries: what the system can express and what its policies can cover. When these boundaries are set separately, three regions appear: the useful overlap of governed capabilities, the risky region of ungoverned capabilities, and the empty region of policies that address nothing real. Rice's theorem demonstrates that no algorithm can decide, for arbitrary programs, whether effects will comply with policy. The only escape is to force the boundaries to coincide through an upfront architectural choice that separates computation from effect, turning governance into a structural property of the execution pipeline rather than a later check. If this reasoning holds, then post-hoc behavioral governance layers on Turing-complete systems cannot succeed.

Core claim

The central claim is that behavioral governance of effects in Turing-complete AI systems is undecidable in general by Rice's theorem, because no algorithm can determine whether an arbitrary program satisfies a non-trivial semantic property such as policy compliance. Coterminous governance is therefore required: the expressiveness boundary must equal the governance boundary. This equality is achieved only by an architectural separation of computation from effect, after which governance checks become part of the execution pipeline and subsume any separate governance infrastructure. The testable criterion follows directly: if the two boundaries are not provably identical, then ungoverned risk,

What carries the argument

Coterminous governance, the requirement that an AI system's expressiveness boundary (what effects it can produce) exactly equals its governance boundary, enforced by separating computation from effects so that policy checks are structural rather than behavioral.

If this is right

Any behavioral governance layer added after the fact on unrestricted programs will leave either ungoverned capabilities or policies that cover nothing.
Governance checks must be moved inside the execution pipeline rather than run as a parallel system.
Structural governance under separated computation and effect renders separate governance infrastructure redundant.
The undecidability result applies to any attempt to decide non-trivial properties of effects in Turing-complete systems.
Coterminous boundaries become the single measurable test for whether a governance approach avoids structural failure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Restricting the effect-generating component to a non-Turing-complete language would remove the undecidability barrier and allow effective behavioral governance.
System designers could verify coterminous boundaries by enumerating every possible effect and confirming that each is explicitly covered and that no policy addresses an impossible action.
The same boundary-coincidence requirement may apply to other domains where programs produce external effects, such as operating-system access control or robotic action planning.
In practice this would favor agent architectures whose action sets are declared and finite rather than generated on the fly by general computation.

Load-bearing premise

The claim depends on modeling deployed AI effect-governance systems as arbitrary Turing-complete programs whose semantic compliance properties cannot be decided algorithmically after the fact.

What would settle it

A working deployed system that governs effects behaviorally on a Turing-complete architecture yet produces neither ungoverned risky effects nor policies that address impossible actions would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.27292 by Alan L. McCann.

**Figure 1.** Figure 1: Non-coterminous governance: expressiveness and governance boundaries are misaligned. view at source ↗

**Figure 2.** Figure 2: Coterminous governance: expressiveness and governance share the same boundary. The view at source ↗

read the original abstract

Every system that performs effects has two boundaries: what it can do (expressiveness) and what governance covers (governance). In nearly all deployed AI systems, these boundaries are defined independently, creating three regions: governed capabilities (the only useful region), ungoverned capabilities (risk), and governance policies that address non-existent capabilities (theater). Two of the three regions are failure modes. We focus on the governance of effects: actions that AI systems perform in the world (API calls, database writes, tool invocations). This is distinct from the governance of model outputs (content quality, bias, fairness), which operates at a different level and requires different mechanisms. We present a formal framework for analyzing this structural gap. Rice's theorem (1953) proves the gap is undecidable in the general case for any Turing-complete architecture that attempts to govern effects behaviorally: no algorithm can decide non-trivial semantic properties of arbitrary programs, including the property "this program's effects comply with the governance policy." We define coterminous governance: a system property where the expressivenessboundary equals the governance boundary. We show that coterminous governance requires an architectural decision (separatingcomputation from effect) rather than a governance layer added after the fact. We show that structural governance under this separation subsumes separate governance infrastructure: governance checks become part of the execution pipeline rather than a second system running alongside it. We propose coterminous governance as the testable criterion for any AI governance system: either the two boundaries are provably identical, or risk and theater are structurally inevitable. Proofs are mechanized in Coq (454 theorems, 36 modules, 0 admitted).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Rice's theorem plus Coq proofs show why post-hoc effect governance in AI is structurally broken and architectural separation is required instead.

read the letter

The main takeaway is that behavioral governance of AI effects cannot work in the general case. Rice's theorem establishes that no algorithm can decide non-trivial semantic properties like policy compliance for arbitrary programs, so any attempt at post-hoc checks produces either ungoverned risks or policies that address nothing real. The paper frames this as two independent boundaries creating three regions, with only one useful and the others as structural failure modes. It defines coterminous governance as the fix, where the boundaries must match through design rather than added layers. The Coq development of 454 theorems across 36 modules with zero admits gives the undecidability and subsumption claims real backing that is reproducible. That is the part that holds up cleanly. The application to AI is new in its specific framing and the testable criterion it proposes. The formal steps are direct and avoid circularity by relying on the 1953 result as an external anchor. The soft spot is the modeling assumption. Real AI systems with fixed models, bounded tool APIs, and non-arbitrary execution paths are not equivalent to fully general Turing machines, so partial decidability may exist in practice that the general theorem does not rule out. The paper also stays abstract on how to implement the required separation of computation from effect in deployed systems, leaving the constructive alternative without concrete cases. This is for AI safety and governance researchers who already suspect monitoring layers are insufficient and want a computability-based reason to focus on architecture instead. Readers who work with formal methods will find the mechanization useful. It deserves a serious referee because the core argument is grounded and the questions it raises about redesign are worth external scrutiny. I would send it to peer review, with the expectation that revisions address the gap between the general theorem and current AI constraints.

Referee Report

2 major / 2 minor

Summary. The paper claims that every AI system performing effects has two independently defined boundaries (expressiveness and governance), creating three regions of which two are structural failure modes (ungoverned risk and governance theater). It invokes Rice's theorem to prove that deciding non-trivial semantic properties such as policy-compliant effects is undecidable for any Turing-complete architecture attempting behavioral governance, defines coterminous governance as the property that the two boundaries coincide, shows this requires an architectural separation of computation from effect rather than a post-hoc layer, and mechanizes the framework in Coq (454 theorems, 36 modules, 0 admits).

Significance. If the reduction from deployed AI effect mechanisms to arbitrary Turing-complete programs holds, the result supplies a formal, testable criterion that subsumes many existing post-hoc governance proposals and explains why behavioral approaches are prone to either residual risk or ineffective theater. The Coq mechanization of 454 theorems with zero admits is a clear strength, providing machine-checked support for the undecidability argument and the derived architectural requirements.

major comments (2)

[§3.2] §3.2 (Mapping to AI architectures): the claim that current tool-calling and API-effect mechanisms in deployed systems are sufficiently expressive to inherit the full undecidability of Rice's theorem is asserted via informal reduction; a concrete lemma or example showing how an arbitrary program is simulated by an LLM-plus-tool loop would make the application load-bearing rather than illustrative.
[Definition 4.1] Definition 4.1 (coterminous governance): the requirement that governance checks become part of the execution pipeline is derived from the undecidability result, yet the paper does not exhibit a formal statement showing that any post-hoc governance layer is necessarily non-coterminous; adding such a lemma would tighten the subsumption claim.

minor comments (2)

[Abstract] Abstract: the three-region diagram is described in text but not referenced by figure number; adding '(see Figure 1)' would improve readability.
[§5.3] §5.3: the statement that 'structural governance subsumes separate infrastructure' uses the term 'subsumes' without a precise set-theoretic or simulation relation; a short clarifying sentence would remove ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments and the recommendation for minor revision. The suggestions to formalize the reduction and the non-coterminous property of post-hoc layers will improve the clarity and rigor of the paper. We outline our responses below and confirm that revisions will be made accordingly.

read point-by-point responses

Referee: [§3.2] §3.2 (Mapping to AI architectures): the claim that current tool-calling and API-effect mechanisms in deployed systems are sufficiently expressive to inherit the full undecidability of Rice's theorem is asserted via informal reduction; a concrete lemma or example showing how an arbitrary program is simulated by an LLM-plus-tool loop would make the application load-bearing rather than illustrative.

Authors: We concur that the mapping in §3.2 relies on an informal argument. In the revised version, we will provide a concrete example illustrating the simulation of an arbitrary Turing-complete program using an LLM with tool-calling capabilities, assuming tools that support persistent state and control flow. Furthermore, we will add a lemma in the Coq formalization that captures this simulation, building on the existing 454 theorems to make the inheritance of undecidability explicit and machine-checked. revision: yes
Referee: [Definition 4.1] Definition 4.1 (coterminous governance): the requirement that governance checks become part of the execution pipeline is derived from the undecidability result, yet the paper does not exhibit a formal statement showing that any post-hoc governance layer is necessarily non-coterminous; adding such a lemma would tighten the subsumption claim.

Authors: The referee correctly identifies that the derivation of coterminous governance from undecidability would benefit from an explicit lemma. We will add a new lemma stating that for any Turing-complete system, a post-hoc governance layer (operating externally on effects) cannot be coterminous with the expressiveness boundary, because it would necessitate an algorithm to decide non-trivial semantic properties of programs, contradicting Rice's theorem. This lemma will be mechanized in Coq and integrated into the definition of coterminous governance in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external Rice's theorem and Coq mechanization

full rationale

The paper grounds its core claim in Rice's theorem (1953), an independent external result on undecidability of non-trivial semantic properties for arbitrary programs, and mechanizes the mapping to behavioral AI governance effects in Coq (454 theorems, 36 modules, 0 admitted). Coterminous governance is defined directly from the two-boundary distinction and shown to require separation of computation from effect as a logical consequence of the undecidability result rather than by redefinition or fitting. No load-bearing step reduces to self-citation, ansatz smuggling, renaming of known results, or any input-output equivalence by construction within the paper itself. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper relies on standard computability theory with no free parameters fitted to data. New concepts are introduced definitionally to organize the argument.

axioms (1)

standard math Rice's theorem: non-trivial semantic properties of programs are undecidable for Turing-complete systems
Directly invoked to establish that behavioral governance of effects is undecidable in the general case.

invented entities (2)

coterminous governance no independent evidence
purpose: System property in which expressiveness boundary equals governance boundary
Newly defined as the testable criterion that avoids risk and theater regions.
three regions (governed capabilities, ungoverned capabilities, theater) no independent evidence
purpose: Categorization of outcomes when boundaries are independent
Conceptual partition introduced to identify failure modes.

pith-pipeline@v0.9.0 · 5596 in / 1509 out tokens · 72813 ms · 2026-05-07T09:36:34.786156+00:00 · methodology

The Two Boundaries: Why Behavioral AI Governance Fails Structurally

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)