Recognition: unknown
Active Inference: A method for Phenotyping Agency in AI systems?
Pith reviewed 2026-05-08 08:01 UTC · model grok-4.3
The pith
Empowerment, measured as the information capacity between actions and anticipated observations, distinguishes zero-, intermediate-, and high-agency AI phenotypes by altering the structure of their generative models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Agency in AI systems can be phenotyped by modeling them as POMDPs under active inference, where the three criteria are realized through beliefs, preferences, and free energy minimization, and where empowerment as the channel capacity between actions and anticipated observations distinguishes zero-agency, intermediate-agency, and high-agency phenotypes via structural manipulations of the generative model.
What carries the argument
Empowerment as the channel capacity between actions and anticipated observations, which functions as the operational metric that separates agency phenotypes when the generative model is structurally altered in a T-maze paradigm.
If this is right
- Structural manipulations of the generative model can produce agents with zero, intermediate, or high agency as quantified by empowerment.
- As agents engage in epistemic foraging, effective governance must shift from external constraints to internal modulation of prior preferences.
- The variational active inference setup supplies a direct bridge from computational phenotyping of agency to concrete AI governance strategies.
Where Pith is reading between the lines
- The same empowerment metric could be applied to compare agency profiles across different AI architectures beyond active inference.
- This phenotyping method suggests testable experiments in more complex environments to see whether agency levels remain distinguishable.
- Regulators might use internal preference modulation as a targeted control mechanism once agents reach higher agency phenotypes.
Load-bearing premise
The three criteria of intentionality, rationality, and explainability can be fully and minimally realized as a POMDP under a variational framework without further unstated assumptions about the generative model or the definition of empowerment.
What would settle it
A replication of the T-maze experiments in which structural changes to the generative model produce no distinguishable differences in empowerment values across the three claimed agency phenotypes, or in which the modeled action chain fails to satisfy the three criteria.
Figures
read the original abstract
The proliferation of agentic artificial intelligence has outpaced the conceptual tools needed to characterize agency in computational systems. Prevailing definitions mainly rely on autonomy and goal-directedness. Here, we argue for a minimal notion open to principled inspection given three criteria: intentionality as action grounded in beliefs and desires, rationality as normatively coherent action entailed by a world model, and explainability as action causally traceable to internal states; we subsequently instantiate these as a partially observable Markov decision process under a variational framework wherein posterior beliefs, prior preferences, and the minimization of expected free energy jointly constitute an agentic action chain. Using a canonical T-maze paradigm, we evidence how empowerment, formulated as the channel capacity between actions and anticipated observations, serves as an operational metric that distinguishes zero-, intermediate-, and high-agency phenotypes through structural manipulations of the generative model. We conclude by arguing that as agents engage in epistemic foraging to resolve ambiguity, the governance controls that remain effective must shift systematically from external constraints to the internal modulation of prior preferences, offering a principled, variational bridge from computational phenotyping to AI governance strategy
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a minimal definition of agency in AI systems based on three criteria—intentionality (actions grounded in beliefs and desires), rationality (normatively coherent actions from a world model), and explainability (actions causally traceable to internal states). These are instantiated as a POMDP under the active inference variational framework, where posterior beliefs, prior preferences, and expected free energy minimization form an 'agentic action chain.' Using structural manipulations of the generative model in a canonical T-maze paradigm, the authors claim that empowerment—defined as the channel capacity (mutual information) between actions and anticipated observations—serves as an operational metric to distinguish zero-, intermediate-, and high-agency phenotypes. The work concludes with implications for AI governance, suggesting a shift from external constraints to internal modulation of prior preferences as agents engage in epistemic foraging.
Significance. If the central mapping holds, this provides a computationally grounded, variational approach to phenotyping agency that could bridge theoretical definitions with practical metrics and governance strategies in AI. Strengths include the use of an established framework (active inference) with potential for reproducible simulations and falsifiable distinctions in controlled environments like the T-maze; it offers a principled way to operationalize abstract criteria without relying solely on autonomy or goal-directedness.
major comments (2)
- [Abstract and §3] Abstract and §3 (T-maze paradigm): The central claim that structural manipulations of the generative model (e.g., to transition or observation matrices) instantiate the three criteria and are then distinguished by empowerment I(A;O) is asserted without explicit construction. No equations are provided showing how, for instance, changes to prior preferences directly encode intentionality or how the variational posterior ensures explainability, making it unclear whether the manipulations are derived from the criteria or chosen post-hoc to yield different channel capacities. This is load-bearing for the operational metric claim.
- [§2] §2 (agentic action chain definition): The instantiation of the three criteria as a POMDP under expected free energy minimization appears to inherit the circularity of the active inference framework itself, as posterior beliefs and prior preferences are defined in terms of the same variational quantities used to compute empowerment. Without an independent derivation or ablation showing that the channel capacity metric minimally captures intentionality/rationality/explainability (rather than being entailed by construction), the distinction between phenotypes risks being non-falsifiable.
minor comments (2)
- [Abstract] Abstract: The phrase 'we evidence how' is imprecise for a conceptual/simulation-based claim; replace with 'we demonstrate' or 'we illustrate' if no statistical tests or error bars are reported.
- [§2] Notation: Empowerment is described as 'channel capacity between actions and anticipated observations' but the precise formulation (e.g., whether it uses the expected free energy or a separate mutual information calculation) is not clarified in the provided abstract; add an explicit equation in §2 or §3.
Simulated Author's Rebuttal
We are grateful to the referee for their constructive feedback, which highlights important areas for strengthening the formal links between our theoretical criteria and the computational model. We respond to each major comment in turn, indicating the revisions we will undertake.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (T-maze paradigm): The central claim that structural manipulations of the generative model (e.g., to transition or observation matrices) instantiate the three criteria and are then distinguished by empowerment I(A;O) is asserted without explicit construction. No equations are provided showing how, for instance, changes to prior preferences directly encode intentionality or how the variational posterior ensures explainability, making it unclear whether the manipulations are derived from the criteria or chosen post-hoc to yield different channel capacities. This is load-bearing for the operational metric claim.
Authors: We accept that the manuscript does not currently provide explicit equations demonstrating how the structural manipulations derive from the three criteria. This is a valid point, and we will revise §3 to include a detailed mapping. Specifically, we will add equations showing that intentionality is operationalized by setting the prior preferences (C matrix) to encode desired states based on the agent's beliefs, rationality by the use of the transition model (B matrix) in expected free energy calculation, and explainability by the traceability through the variational posterior (Q(s)) to the action selection. We will also include a table or diagram clarifying that the manipulations (e.g., altering the observation matrix A to simulate different levels of perceptual accuracy) are chosen to instantiate varying degrees of these criteria, rather than arbitrarily to produce different empowerment values. This will make the operational metric claim more transparent and falsifiable. revision: yes
-
Referee: [§2] §2 (agentic action chain definition): The instantiation of the three criteria as a POMDP under expected free energy minimization appears to inherit the circularity of the active inference framework itself, as posterior beliefs and prior preferences are defined in terms of the same variational quantities used to compute empowerment. Without an independent derivation or ablation showing that the channel capacity metric minimally captures intentionality/rationality/explainability (rather than being entailed by construction), the distinction between phenotypes risks being non-falsifiable.
Authors: Regarding the potential circularity, we note that while the agentic action chain is defined within the active inference framework, the empowerment metric itself—I(A;O), the mutual information between actions and observations—is a general information-theoretic measure that does not depend on the variational inference procedure per se. It can be computed from the generative model and policy. To strengthen this, we will add in the revision an analysis or simulation showing the metric's behavior under different inference schemes or by comparing to a non-variational baseline. This addresses the concern about non-falsifiability by providing an independent check on whether the phenotype distinctions hold beyond the specific variational quantities. We disagree that it is necessarily circular, as the criteria are conceptually prior and the framework is used as a tool to instantiate them. revision: partial
Circularity Check
No significant circularity in the paper's derivation chain
full rationale
The paper proposes instantiating the three agency criteria as a POMDP under the variational framework, with the agentic action chain constituted by posterior beliefs, prior preferences, and expected free energy minimization; this is a modeling choice for phenotyping rather than a derivation reducing results to inputs by construction. Empowerment is formulated using the standard information-theoretic definition of channel capacity between actions and anticipated observations, then applied to distinguish phenotypes via explicit structural manipulations of the generative model in T-maze simulations. No load-bearing self-citations, fitted inputs renamed as predictions, or self-definitional reductions appear in the abstract or described chain; the central claim rests on the proposed mapping and simulation evidence, which remains independent of the target metric.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Agency can be minimally defined by intentionality, rationality, and explainability instantiated in a POMDP with expected free energy minimization
Reference graph
Works this paper leans on
-
[1]
The MIT Press, Cambridge, Massachusetts (2022)
Parr, T., Pezzulo, G., Friston, K.J.: Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. The MIT Press, Cambridge, Massachusetts (2022)
2022
-
[2]
Trends in Cognitive Sciences (2018) https://doi.org/10.1016/j.tics
Montague, P.R., Dolan, R.J., Friston, K.J., Dayan, P.: Computational psychiatry. Trends in Cognitive Sciences16(1), 72–80 (2012). https://doi.org/10.1016/j.tics. 2011.11.018
-
[3]
eNeuro3(4), ENEURO.0049-16.2016 (2016)
Schwartenbeck, P., Friston, K.: Computational Phenotyping in Psychiatry: A Worked Example. eNeuro3(4), ENEURO.0049-16.2016 (2016). https://doi.org/ 10.1523/ENEURO.0049-16.2016
-
[4]
The Knowledge Engineering Review10(2), 115–152 (1995)
Wooldridge, M., Jennings, N.R.: Intelligent Agents: Theory and Practice. The Knowledge Engineering Review10(2), 115–152 (1995). https://doi.org/10.1017/ S0269888900008122
1995
-
[5]
Jennings, N.R., Sycara, K., Wooldridge, M.: A Roadmap of Agent Research and Development. Autonomous Agents and Multi-Agent Systems1(1), 7–38 (1998). https://doi.org/10.1023/A:1010090405266
-
[6]
The Computer Journal44(1), 1–20 (2001)
Luck, M., d’Inverno, M.: A Conceptual Framework for Agent Definition and De- velopment. The Computer Journal44(1), 1–20 (2001). https://doi.org/10.1093/ comjnl/44.1.1
2001
-
[7]
In: Proc
Shavit, Y., Agarwal, S., Brundage, M., et al.: Practices for governing agentic ai systems. In: Proc. Res. Paper, pp. 1–25. OpenAI (2023)
2023
-
[8]
arXiv preprint arXiv:2412.08862 (2024)
Vyas, V., Xu, Z.: Key Safety Design Overview in AI-driven Autonomous Vehicles. arXiv preprint arXiv:2412.08862 (2024). https://arxiv.org/abs/2412.08862
-
[9]
In: ASME 2004 International Mechanical Engineering Congress and Exposition, pp
Huang, H.-M., Messina, E., Wade, R., English, R., Novak, B., Albus, J.: Auton- omy Measures for Robots. In: ASME 2004 International Mechanical Engineering Congress and Exposition, pp. 1241–1247. American Society of Mechanical Engi- neers Digital Collection (2008). https://doi.org/10.1115/IMECE2004-61812
-
[10]
IEEE Technology and Society Magazine39(3), 13–19 (2020)
Stayton, E., Stilgoe, J.: It’s Time to Rethink Levels of Automation for Self-Driving Vehicles [Opinion]. IEEE Technology and Society Magazine39(3), 13–19 (2020). https://doi.org/10.1109/MTS.2020.3012315
-
[11]
NIST (2005)
Huang, H.-M., Pavek, K., Novak, B., Albus, J.S., Messina, E.R.: A Framework For Autonomy Levels For Unmanned Systems (ALFUS). NIST (2005)
2005
-
[12]
In: International Encyclopedia of Ethics, pp
Oshana, M.: Relational Autonomy. In: International Encyclopedia of Ethics, pp. 1–13. John Wiley & Sons, Ltd (2020). https://doi.org/10.1002/9781444367072. wbiee921
-
[13]
In: The Routledge Companion to Feminist Philosophy
Mackenzie, C.: Feminist Conceptions of Autonomy. In: The Routledge Companion to Feminist Philosophy. Routledge (2017)
2017
-
[14]
Ding,J.,Zhang,Y.,Shang,Y.,etal.:UnderstandingWorldorPredictingFuture?A Comprehensive Survey of World Models. arXiv preprint arXiv:2411.14499 (2024). https://doi.org/10.48550/ARXIV.2411.14499
-
[15]
Journal of Mathematical Psychology 99, 102447 (2020)
Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active Infer- ence on Discrete State-Spaces: A Synthesis. Journal of Mathematical Psychology 99, 102447 (2020). https://doi.org/10.1016/j.jmp.2020.102447 Active Inference for Phenotyping AI Agency 17
-
[16]
Tschantz, A., Millidge, B., Seth, A.K., Buckley, C.L.: Reinforcement Learning through Active Inference. CoRRabs/2002.12636(2020). https://arxiv.org/abs/ 2002.12636
-
[17]
Friston, K.J., Daunizeau, J., Kiebel, S.J.: Reinforcement Learning or Active In- ference? PLOS ONE4(7), e6421 (2009). https://doi.org/10.1371/journal.pone. 0006421
-
[18]
Neural Computation35(5), 807–852 (2023)
Da Costa, L., Sajid, N., Parr, T., Friston, K., Smith, R.: Reward Maximization Through Discrete Active Inference. Neural Computation35(5), 807–852 (2023). https://doi.org/10.1162/neco_a_01574
-
[19]
arXiv preprint arXiv:2302.00111 , year=
Du, Y., Yang, M., Dai, B., et al.: Learning Universal Policies via Text-Guided Video Generation. arXiv preprint arXiv:2302.00111 (2023). https://doi.org/10. 48550/arXiv.2302.00111
-
[20]
TD-MPC2: Scalable, Robust World Models for Continuous Control
Hansen, N., Su, H., Wang, X.: TD-MPC2: Scalable, Robust World Models for Continuous Control. arXiv preprint arXiv:2310.16828 (2024). https://doi.org/10. 48550/arXiv.2310.16828
work page internal anchor Pith review arXiv 2024
-
[21]
The British Journal for the Philosophy of Science (2024)
Junker, F.T., Bruineberg, J., Grünbaum, T.: Predictive Minds Can Be Humean Minds. The British Journal for the Philosophy of Science (2024). https://doi.org/ 10.1086/733413
-
[22]
Frontiers in Psychology12, 585493 (2021)
Albarracin, M., Constant, A., Friston, K.J., Ramstead, M.J.D.: A Variational Ap- proach to Scripts. Frontiers in Psychology12, 585493 (2021). https://doi.org/10. 3389/fpsyg.2021.585493
-
[23]
Empowerment: a uni- versal agent-centric measure of control,
A. S. Klyubin, D. Polani and C. L. Nehaniv, "Empowerment: a uni- versal agent-centric measure of control," 2005 IEEE Congress on Evo- lutionary Computation, Edinburgh, UK, 2005, pp. 128-135 Vol.1, doi: 10.1109/CEC.2005.1554676. keywords: Animals;Organisms;Evolution (biol- ogy);Adaptive systems;Humans;Feedback;Computer science;Educational institu- tions;Ac...
-
[24]
Schwartenbeck, P., Johannes Passecker, Tobias U Hauser, Thomas HB FitzGerald, Martin Kronbichler, Karl J Friston (2019) Computational mechanisms of curiosity and goal-directed exploration https://doi.org/10.7554/eLife.41703
-
[25]
Friston, K.J., Frith, C.D.: Active Inference, Communication and Hermeneutics. Cortex68, 129–143 (2015). https://doi.org/10.1016/j.cortex.2015.03.025
-
[26]
Frontiers in Human Neuroscience7, 547 (2013)
Limanowski, J., Blankenburg, F.: Minimal Self-Models and the Free Energy Prin- ciple. Frontiers in Human Neuroscience7, 547 (2013). https://doi.org/10.3389/ fnhum.2013.00547
-
[27]
Frontiers in Psychology 11, 417 (2020)
Vasil, J., Badcock, P.B., Constant, A., Friston, K., Ramstead, M.J.D.: A World Unto Itself: Human Communication as Active Inference. Frontiers in Psychology 11, 417 (2020). https://doi.org/10.3389/fpsyg.2020.00417
-
[28]
The Behavioral and Brain Sciences43, e90 (2019)
Veissière, S.P.L., Constant, A., Ramstead, M.J.D., Friston, K.J., Kirmayer, L.J.: Thinking through Other Minds: A Variational Approach to Cognition and Cul- ture. The Behavioral and Brain Sciences43, e90 (2019). https://doi.org/10.1017/ S0140525X19001213
2019
-
[29]
Cognitive Neuroscience (2015)
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active in- ference and epistemic value. Cognitive Neuroscience (2015)
2015
-
[30]
Clarendon Press, Oxford (1980)
Davidson, D.: Essays on Actions and Events. Clarendon Press, Oxford (1980)
1980
-
[31]
Smith, M.: The Humean Theory of Motivation. Mind96(381), 36–61 (1987). https: //doi.org/10.1093/mind/XCVI.381.36
-
[32]
Minds and Ma- chines28(1), 141–172 (2018)
Williams, D.: Predictive Processing and the Representation Wars. Minds and Ma- chines28(1), 141–172 (2018). https://doi.org/10.1007/s11023-017-9441-6 18 P. Wilson et al
-
[33]
Transformer Circuits Thread, Anthropic (2023)
Bricken, T., Templeton, A., Batson, J., et al.: Towards Monosemanticity: Decom- posing Language Models With Dictionary Learning. Transformer Circuits Thread, Anthropic (2023)
2023
-
[34]
Lipton, Z.C.: The Mythos of Model Interpretability. Queue16(3), 31–57 (2018). https://doi.org/10.1145/3236386.3241340
-
[35]
OpenReview preprint (2022)
LeCun, Y.: A Path Towards Autonomous Machine Intelligence. OpenReview preprint (2022). https://openreview.net/pdf?id=BZ5a1r-kVsf
2022
-
[36]
The Batch, DeepLearning.AI (2024)
Ng, A.: Agentic Design Patterns (Parts 1–5). The Batch, DeepLearning.AI (2024). https://www.deeplearning.ai/the-batch/
2024
-
[37]
Wiley, New York (1957)
Simon, H.A.: Models of Man: Social and Rational. Wiley, New York (1957)
1957
-
[38]
Proceedings of the Royal Society A469(2153), 20120683 (2013)
Ortega, P.A., Braun, D.A.: Thermodynamics as a Theory of Decision-Making with Information-Processing Costs. Proceedings of the Royal Society A469(2153), 20120683 (2013). https://doi.org/10.1098/rspa.2012.0683
-
[39]
In: Advances in Neural Information Pro- cessing Systems, vol
Mohamed, S., Rezende, D.J.: Variational Information Maximisation for Intrinsi- cally Motivated Reinforcement Learning. In: Advances in Neural Information Pro- cessing Systems, vol. 28. Curran Associates, Inc. (2015)
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.