arxiv: 2604.16338 · v1 · submitted 2026-03-13 · 💻 cs.AI · cs.MA

Recognition: 2 theorem links

· Lean Theorem

Governing the Agentic Enterprise: A Governance Maturity Model for Managing AI Agent Sprawl in Business Operations

Vivek Acharya

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:05 UTC · model grok-4.3

classification 💻 cs.AI cs.MA

keywords AI governanceagentic AImaturity modelAI agent sprawlenterprise operationsrisk managementsimulation validationautonomous agents

0 comments

The pith

A five-level maturity model for AI agent governance produces 94% lower sprawl and 96% fewer risks in enterprise simulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Agentic AI Governance Maturity Model as a structured way to control the spread of autonomous AI agents that plan and execute business workflows. It identifies specific sprawl patterns such as functional duplication and shadow agents, each tied to measurable costs, and grounds the model in existing standards like NIST AI RMF. Validation comes from 750 simulation runs across scenarios that compare five maturity levels on outcomes including cost, risk incidents, and task completion. The results indicate clear differences between levels, suggesting that progressing to higher maturity directly improves operational control and efficiency.

Core claim

The Agentic AI Governance Maturity Model is a five-level framework across 12 governance domains that connects governance capability to reduced agent sprawl, lower risk incidents, and higher task completion rates. Validation through 750 simulation runs shows statistically significant differences between levels, with organizations at Levels 4-5 achieving 94.3% lower sprawl indices, 96.4% fewer risk incidents, and 32.6% higher effective task completion rates than Level 1 organizations.

What carries the argument

The Agentic AI Governance Maturity Model (AAGMM), a five-level progression across 12 domains that measures governance capability and links it to quantified business outcomes through simulation.

If this is right

Enterprises reaching Levels 4-5 can expect substantially lower costs from redundant or conflicting agents.
The taxonomy of sprawl patterns supplies concrete metrics for tracking governance progress.
Adoption of the model offers a roadmap that aligns with established standards for AI risk management.
Simulation-based validation establishes measurable targets for reducing project failure rates projected at 40% by 2027.
Higher maturity directly improves decision quality and operational efficiency in multi-step agent workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The model could be tested in live deployments by tracking agent counts and incidents before and after staged governance improvements.
Similar maturity ladders might apply to other autonomous systems such as robotic process automation or multi-agent platforms.
Quantifying sprawl costs opens the possibility of insurance or audit frameworks that price governance maturity.
Integration with existing IT governance tools could accelerate rollout without requiring entirely new infrastructure.

Load-bearing premise

The simulation model accurately reflects real enterprise dynamics and that outcome differences arise from the governance maturity levels themselves rather than from how those levels were parameterized.

What would settle it

A field study tracking actual enterprises at different governance maturity levels that finds no significant differences in sprawl indices or risk incident rates would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.16338 by Vivek Acharya.

**Figure 2.** Figure 2: Net Business Value heatmap across five experimental scenarios and five governance maturity levels (n = 30 per cell). The color gradient from red (low NBV) to green (high NBV) shows consistent governance benefits across all scenarios. Notable: in the Adversarial scenario (S4), L2 provides virtually no improvement over L1 (0.666 vs. 0.664), confirming that reactive governance is inadequate for security-sens… view at source ↗

read the original abstract

The rapid adoption of agentic AI in enterprise business operations--autonomous systems capable of planning, reasoning, and executing multi-step workflows--has created an urgent governance crisis. Organizations face uncontrolled agent sprawl: the proliferation of redundant, ungoverned, and conflicting AI agents across business functions. Industry surveys report that only 21% of enterprises have mature governance models for autonomous agents, while 40% of agentic AI projects are projected to fail by 2027 due to inadequate governance and risk controls. Despite growing acknowledgment of this challenge, academic literature lacks a formal, empirically validated governance maturity model connecting governance capability to measurable business outcomes. This paper introduces the Agentic AI Governance Maturity Model (AAGMM), a five-level framework spanning 12 governance domains, grounded in NIST AI RMF and ISO/IEC 42001 standards. We additionally propose a novel taxonomy of agent sprawl patterns--functional duplication, shadow agents, orphaned agents, permission creep, and unmonitored delegation chains--each linked to quantifiable business cost models. The framework is validated through 750 simulation runs across five enterprise scenarios and five governance maturity levels, measuring business outcomes including cost containment, risk incident rates, operational efficiency, and decision quality. Results demonstrate statistically significant differences (p < 0.001, large effect sizes d > 2.0) between all governance maturity levels, with Level 4-5 organizations achieving 94.3% lower sprawl indices, 96.4% fewer risk incidents, and 32.6% higher effective task completion rates compared to Level 1. The AAGMM provides practitioners with an actionable roadmap for governing autonomous AI agents while maximizing business returns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a practical five-level AAGMM framework and sprawl taxonomy grounded in existing standards, but the simulation results appear circular and do not independently validate the claims.

read the letter

The main thing here is a five-level Agentic AI Governance Maturity Model plus a taxonomy of five sprawl patterns, with simulations claiming large gains in cost, risk, and efficiency at higher levels. The framework is new in its specific structure and linkage to business metrics, though it draws domains from NIST AI RMF and ISO 42001 rather than starting from scratch. It does a solid job making governance feel actionable by connecting levels to concrete outcomes like sprawl indices and task completion rates, which practitioners could actually use as a checklist. The sprawl patterns themselves are clearly defined and tied to cost models, which is a useful addition. The soft spots sit in the validation. The 750 runs report p<0.001 and d>2.0 differences, yet the setup gives no equations, parameter tables, or agent rules, so it is difficult to rule out that higher maturity levels were simply assigned lower error rates and stricter monitoring by definition. That makes the statistical tests non-informative rather than a genuine test of whether the governance practices produce the outcomes. There is also no real enterprise data or external calibration, only internal simulations, which keeps the circularity burden high. This paper is for enterprise IT and governance teams that need a structured way to talk about AI agent proliferation. A reader in that position would get a usable roadmap even if the numbers require more scrutiny. I would send it to peer review so referees can examine the simulation mechanics and see whether the model can be made falsifiable.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Agentic AI Governance Maturity Model (AAGMM), a five-level framework spanning 12 governance domains grounded in NIST AI RMF and ISO/IEC 42001. It proposes a taxonomy of five agent sprawl patterns (functional duplication, shadow agents, orphaned agents, permission creep, unmonitored delegation chains) each linked to quantifiable business cost models. The framework is validated via 750 simulation runs across five enterprise scenarios and five maturity levels, with results claiming statistically significant differences (p < 0.001, d > 2.0) including 94.3% lower sprawl indices, 96.4% fewer risk incidents, and 32.6% higher task completion rates for Levels 4-5 versus Level 1.

Significance. If the simulation dynamics were shown to be independent of the maturity-level definitions, the AAGMM could provide a useful practitioner roadmap linking governance capabilities to measurable outcomes in cost, risk, and efficiency for agentic AI deployments. The explicit taxonomy of sprawl patterns and grounding in existing standards are constructive contributions to AI governance literature. However, the current validation approach limits the strength of these claims.

major comments (2)

[Simulation methodology and results] Simulation methodology and results sections: The large reported effect sizes (d > 2.0) and p < 0.001 values for sprawl index, risk incidents, and task completion are presented without any equations, parameter tables, or agent behavior rules. This leaves open the possibility that outcome differences are directly encoded into the simulation parameters (e.g., risk probabilities, efficiency metrics, delegation limits) by the maturity-level definitions rather than emerging from independent enterprise dynamics.
[Validation approach] Validation approach: The 750 runs compare outcomes across author-defined levels but provide no external calibration, real-world data benchmarks, or falsification tests. Without showing how governance interventions alter costs and risks independently of the level assignment, the statistical tests cannot distinguish framework efficacy from modeling assumptions.

minor comments (1)

[Abstract] Abstract: The five enterprise scenarios are referenced but not described, making it difficult to assess the scope and generalizability of the simulation results.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment below, indicating where revisions will be made to improve transparency and rigor while honestly noting limitations inherent to the simulation-based approach.

read point-by-point responses

Referee: Simulation methodology and results sections: The large reported effect sizes (d > 2.0) and p < 0.001 values for sprawl index, risk incidents, and task completion are presented without any equations, parameter tables, or agent behavior rules. This leaves open the possibility that outcome differences are directly encoded into the simulation parameters (e.g., risk probabilities, efficiency metrics, delegation limits) by the maturity-level definitions rather than emerging from independent enterprise dynamics.

Authors: We agree that the simulation methodology requires greater transparency to rule out the possibility of hardcoded outcomes. In the revised manuscript, we will add a new subsection titled 'Agent Behavior Rules and Mathematical Formulations' that includes all governing equations for the sprawl index, risk incident probabilities, task completion rates, and delegation chain dynamics. A full parameter table will be provided, listing base values and maturity-level modifiers for each variable (e.g., monitoring frequency, permission revocation thresholds, efficiency multipliers). Agent behavior rules will be described explicitly, demonstrating that differences emerge from the application of governance controls (such as automated auditing and delegation limits) rather than direct encoding of final outcomes. These additions will allow independent verification that the reported effect sizes arise from the modeled interactions. revision: yes
Referee: Validation approach: The 750 runs compare outcomes across author-defined levels but provide no external calibration, real-world data benchmarks, or falsification tests. Without showing how governance interventions alter costs and risks independently of the level assignment, the statistical tests cannot distinguish framework efficacy from modeling assumptions.

Authors: We acknowledge that the validation is limited by its reliance on internally defined levels without external calibration. In the revision, we will introduce a 'Robustness Checks' subsection that includes falsification tests: simulations with randomized parameter assignments and governance interventions decoupled from the five-level structure to confirm that outcome differences are driven by the specific interventions rather than level labels. We will also add an explicit limitations paragraph discussing the absence of real-world benchmarks. However, as this is a simulation study introducing a novel model, we cannot incorporate proprietary enterprise datasets for calibration at this stage. revision: partial

standing simulated objections not resolved

Absence of real-world empirical data or external benchmarks for calibration, as the current validation relies exclusively on controlled simulations.

Circularity Check

1 steps flagged

Simulation validation encodes AAGMM level definitions directly into outcome parameters by construction

specific steps

self definitional [Abstract]
"This paper introduces the Agentic AI Governance Maturity Model (AAGMM), a five-level framework spanning 12 governance domains... The framework is validated through 750 simulation runs across five enterprise scenarios and five governance maturity levels, measuring business outcomes including cost containment, risk incident rates, operational efficiency, and decision quality. Results demonstrate statistically significant differences (p < 0.001, large effect sizes d > 2.0) between all governance maturity levels, with Level 4-5 organizations achieving 94.3% lower sprawl indices, 96.4% fewer risk 0"

The five levels are author-defined constructs within the AAGMM. The simulation then uses those levels as direct inputs to generate the measured outcome differences. Because the model operationalizes higher levels as lower error rates, fewer delegations, and stricter monitoring by definition, the p-values and effect sizes reduce to a restatement of the framework's own parameterization rather than an independent test of governance dynamics.

full rationale

The paper defines the five AAGMM maturity levels as part of its framework and then validates them via 750 simulation runs that instantiate those exact levels across scenarios. The reported large effect sizes (d>2.0) and percentage improvements in sprawl, risk, and task completion are produced by setting simulation parameters to match the level definitions (e.g., stricter controls at higher levels), rendering the statistical tests non-falsifiable and equivalent to the input assumptions rather than emergent from independent dynamics. No external calibration data, real enterprise benchmarks, or parameter tables are provided to break the loop.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claim depends on simulation parameters that represent business costs and risks, plus the assumption that the five maturity levels can be faithfully encoded in those simulations. The sprawl taxonomy is an invented classification without external falsifiable evidence.

free parameters (1)

simulation parameters for cost, risk probability, and efficiency metrics
The 750 runs require numerical values for business outcomes; these are not stated as coming from external data and must be chosen or fitted to produce the reported effect sizes.

axioms (2)

domain assumption The five maturity levels and 12 domains can be directly mapped to measurable differences in agent behavior and business outcomes
Invoked when the simulation is constructed to test the framework.
domain assumption Grounding in NIST AI RMF and ISO/IEC 42001 provides a valid foundation for the new model
Stated in the abstract as the basis for the AAGMM.

invented entities (1)

Five agent sprawl patterns (functional duplication, shadow agents, orphaned agents, permission creep, unmonitored delegation chains) no independent evidence
purpose: To classify and quantify uncontrolled AI agent proliferation
New taxonomy introduced by the paper; no independent evidence outside the simulations is provided.

pith-pipeline@v0.9.0 · 5607 in / 1633 out tokens · 58127 ms · 2026-05-15T12:05:15.417621+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define total sprawl cost as: Csprawl = C redundancy + C security + ... (Eq. 1); NBV = 0.30·ETCR + ... (Eq. 2); shadow probability L1 (0.35) to L5 (0.02)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

750 simulation runs ... p<0.001, d>2.0 ... Level 4-5 ... 94.3% lower sprawl indices

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

A Survey on LLM-based Autonomous Agents.Front

Wang, L.; Ma, C.; Feng, X.; et al. A Survey on LLM-based Autonomous Agents.Front. Comput. Sci. 2024,18, 186345

work page 2024
[2]

Agentic AI: Autonomous Intelligence for Complex Goals

Acharya, D.B.; Kuppan, K.; Divya, B. Agentic AI: Autonomous Intelligence for Complex Goals. IEEE Access2025,13, 1–25

work page
[3]

State of AI in the Enterprise 2026.Deloitte Insights, 2026

Deloitte. State of AI in the Enterprise 2026.Deloitte Insights, 2026

work page 2026
[4]

How Agentic AI is Rewriting Enterprise Innovation.WEF, Jan

World Economic Forum. How Agentic AI is Rewriting Enterprise Innovation.WEF, Jan. 2026. 10

work page 2026
[5]

The ROI of AI: Agents Delivering for Business.Google Cloud Blog, Sep

Google Cloud. The ROI of AI: Agents Delivering for Business.Google Cloud Blog, Sep. 2025

work page 2025
[6]

Agentic AI Strategy: Emerging Technology Trends.Deloitte Insights, 2025

Deloitte. Agentic AI Strategy: Emerging Technology Trends.Deloitte Insights, 2025

work page 2025
[7]

AI at Scale: Agent-Driven Reinvention in 2026.KPMG Q4 Pulse, Jan

KPMG. AI at Scale: Agent-Driven Reinvention in 2026.KPMG Q4 Pulse, Jan. 2026

work page 2026
[8]

A Blueprint for Agentic AI Transformation.HBR (Sponsored), Feb

Google Cloud. A Blueprint for Agentic AI Transformation.HBR (Sponsored), Feb. 2026

work page 2026
[9]

Seizing the Agentic AI Advantage.McKinsey Digital, Jun

McKinsey. Seizing the Agentic AI Advantage.McKinsey Digital, Jun. 2025

work page 2025
[10]

Agentic AI Adoption Trends & ROI Statistics.Arcade Blog, Dec

Arcade.dev. Agentic AI Adoption Trends & ROI Statistics.Arcade Blog, Dec. 2025

work page 2025
[11]

The ROI of AI 2025 (Review).AIGL Blog, Sep

AIGL. The ROI of AI 2025 (Review).AIGL Blog, Sep. 2025

work page 2025
[12]

Unlocking Agentic AI ROI.Moveworks Blog, Sep

Moveworks. Unlocking Agentic AI ROI.Moveworks Blog, Sep. 2025

work page 2025
[13]

The Enterprise Agentic Mesh.IJSRP2025,15, 1–18

Gupta, S.; et al. The Enterprise Agentic Mesh.IJSRP2025,15, 1–18

work page
[14]

Governance-as-a-Service.arXiv2025, 2508.18765

Fernandez, M.; et al. Governance-as-a-Service.arXiv2025, 2508.18765

work page arXiv
[15]

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems, 2025

Ray, P .P . TRiSM for Agentic AI.arXiv2025, 2506.04133

work page arXiv
[16]

AI Agents: A Multi-Expert Analysis.J

Crick, T.; et al. AI Agents: A Multi-Expert Analysis.J. Comput. Inf. Syst.2025,65

work page 2025
[17]

NIST AI 100-1, Jan

NIST.AI Risk Management Framework 1.0. NIST AI 100-1, Jan. 2023

work page 2023
[18]

ISO, 2023

ISO/IEC 42001:2023.AI Management System. ISO, 2023

work page 2023
[19]

The Emerging Agentic Enterprise.MIT Sloan Manag

Ransbotham, S.; et al. The Emerging Agentic Enterprise.MIT Sloan Manag. Rev., Nov. 2025

work page 2025
[20]

NIST.Generative AI Profile (AI 600-1). Jul. 2024

work page 2024
[21]

Regulation (EU) 2024/1689 (AI Act).OJ EU, 2024

European Parliament. Regulation (EU) 2024/1689 (AI Act).OJ EU, 2024

work page 2024
[22]

ISACA, 2018

CMMI Institute.CMMI V2.0. ISACA, 2018

work page 2018
[23]

ISACA, 2019

ISACA.COBIT 2019. ISACA, 2019

work page 2019
[24]

AI Maturity Model.Gartner Research, 2024

Gartner. AI Maturity Model.Gartner Research, 2024

work page 2024
[25]

AI Maturity Framework.MS AI Business School, 2024

Microsoft. AI Maturity Framework.MS AI Business School, 2024

work page 2024
[26]

The Manager Agent

Chen, Y.; et al. The Manager Agent. InProc. DAI 2025; ACM, 2025

work page 2025
[27]

LOKA Protocol.arXiv2025, 2504.10915

Ranjan, R.; et al. LOKA Protocol.arXiv2025, 2504.10915

work page arXiv
[28]

Practices for Governing Agentic AI

OpenAI. Practices for Governing Agentic AI. OpenAI, 2025. 11

work page 2025