Extended predictive coding framework as variational free-energy minimisation under exponential-family assumption
Pith reviewed 2026-06-28 20:22 UTC · model grok-4.3
The pith
Assuming the exponential family for variational posteriors and priors extends predictive coding to exhibit nonlinearity, heterogeneity, and non-negative firing rates while preserving the free-energy principle correspondence up to the second
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When a broader class of probability distributions, namely the exponential family of distributions, is assumed for the variational posterior and prior, the predictive coding network exhibits nonlinearity and heterogeneity of input-output properties, as well as non-negative firing rates, while maintaining the correspondence to free-energy minimization up to the second cumulant of the posterior. The model can be trained by biologically plausible local plasticity rules.
What carries the argument
The exponential-family assumption on the variational posterior and prior in a recurrent network of neurons, which enforces the free-energy principle correspondence through local dynamics.
If this is right
- Predictive coding networks can now incorporate nonlinear and heterogeneous neuron behaviors without violating the free-energy principle.
- Training relies only on local plasticity rules, avoiding the need for global error signals.
- The correspondence between predictive coding and variational inference holds for distributions beyond Gaussians, limited to the second cumulant.
- This framework better accounts for biological neural properties in perceptual inference.
Where Pith is reading between the lines
- The approach might enable modeling of inference under non-Gaussian sensory inputs, such as in natural scenes with heavy-tailed statistics.
- It suggests that similar extensions could apply to other variational methods in computational neuroscience.
- Testing the model on tasks requiring positive-only rates, like spike-rate coding, could validate its biological relevance.
Load-bearing premise
That the exponential family distributions for posterior and prior can be realized by the recurrent network dynamics in a way that automatically satisfies the second-cumulant match without extra approximations.
What would settle it
Simulate the recurrent network under the exponential family assumption and check whether the variance (second cumulant) of the inferred posterior matches the prediction from free-energy minimization; a mismatch would falsify the maintained correspondence.
Figures
read the original abstract
The sensory cortices of the brain perform perceptual inference efficiently through their complex networks of neurons. One of the theoretical accounts of this process is the free-energy principle (FEP), which postulates that the brain performs variational Bayesian inference. Pioneering studies have shown that FEP can correspond to the predictive coding (PC) hypothesis under the Gaussian assumption and Laplace approximation. However, PC-based implementations of FEP within such a limited Gaussian regime have failed to capture several properties of biological neural networks, such as nonlinearity and heterogeneity of input--output properties within a network, and the biological implausibility of negative firing rates. This study shows that, when a broader class of probability distributions, namely the exponential family of distributions (EFD), is assumed for the variational posterior and prior, these missing characteristics are exhibited within the network, maintaining the FEP--PC correspondence up to the second cumulant of the posterior. We also show that the proposed model can be trained by biologically plausible local plasticity rules. Our results enrich the explanatory power of FEP regarding neural dynamics involved in perception as variational inference.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that replacing the Gaussian assumption in predictive-coding implementations of the free-energy principle with the exponential family of distributions for both variational posterior and prior allows a recurrent neural network to exhibit nonlinearity, input-output heterogeneity and strictly positive firing rates while still performing variational free-energy minimisation up to the second cumulant of the posterior; the model is also claimed to be trainable by local, biologically plausible plasticity rules.
Significance. If the explicit network construction and the claimed automatic maintenance of the FEP-PC correspondence can be verified, the result would meaningfully extend the scope of FEP-based accounts of cortical computation beyond the restrictive Gaussian/Laplace regime, providing a principled route to more realistic neural dynamics and local learning rules.
major comments (3)
- [Abstract] Abstract and opening paragraphs of the introduction: the central claim that the FEP-PC correspondence 'is maintained up to the second cumulant' under the exponential-family assumption is asserted without any derivation steps, explicit network equations, or verification that the second-cumulant truncation suffices; the mapping from natural parameters/sufficient statistics to firing rates and the form of the recurrent interactions that realise the free-energy gradient are not supplied, so the 'automatic' character of the correspondence cannot be assessed.
- [Introduction / Methods] The weakest assumption identified in the stress-test note is load-bearing: the manuscript must demonstrate that an exponential-family posterior and prior can be realised inside a recurrent network such that the dynamics implement the free-energy gradient without auxiliary normalisation, mean-field closure, or distribution-specific approximations that would be biologically non-local; no such construction is provided.
- [Abstract] The claim that 'biological properties emerge within the network' (nonlinearity, heterogeneity, positive rates) is presented as a direct consequence of the EFD assumption, yet the manuscript supplies neither the explicit firing-rate functions nor the interaction terms that would allow a reader to confirm that these properties arise without additional constraints.
minor comments (2)
- Notation for the natural parameters and cumulant-generating function should be introduced once and used consistently; currently the transition from the Gaussian case to the general EFD case is abrupt.
- The statement that the model 'can be trained by biologically plausible local plasticity rules' would benefit from a short explicit rule (e.g., a three-factor Hebbian update) even if the full derivation is deferred to supplementary material.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight areas where the presentation of our derivations and network construction can be strengthened. We address each major comment below and have revised the manuscript to incorporate additional explicit derivations, equations, and clarifications while preserving the core results.
read point-by-point responses
-
Referee: [Abstract] Abstract and opening paragraphs of the introduction: the central claim that the FEP-PC correspondence 'is maintained up to the second cumulant' under the exponential-family assumption is asserted without any derivation steps, explicit network equations, or verification that the second-cumulant truncation suffices; the mapping from natural parameters/sufficient statistics to firing rates and the form of the recurrent interactions that realise the free-energy gradient are not supplied, so the 'automatic' character of the correspondence cannot be assessed.
Authors: We agree that the abstract and introduction state the claim concisely. The full derivation from the variational free-energy under the exponential-family assumption to the network dynamics (including the second-cumulant truncation) appears in the Methods section on exponential-family variational inference. The mapping from natural parameters to firing rates and the recurrent interaction terms realizing the gradient are given in Equations (4)–(7). To improve accessibility we have added an expanded step-by-step derivation and verification of the truncation in a new Appendix A. revision: yes
-
Referee: [Introduction / Methods] The weakest assumption identified in the stress-test note is load-bearing: the manuscript must demonstrate that an exponential-family posterior and prior can be realised inside a recurrent network such that the dynamics implement the free-energy gradient without auxiliary normalisation, mean-field closure, or distribution-specific approximations that would be biologically non-local; no such construction is provided.
Authors: The construction is supplied in the Methods (subsection on network implementation), where the EFD posterior and prior are realized directly via the network's sufficient statistics and natural-parameter dynamics; the free-energy gradient is implemented by local recurrent connections without auxiliary normalisation or mean-field closure. The local plasticity rules close the loop. We have nevertheless expanded this subsection with explicit pseudocode and a diagram of the recurrent architecture to make the absence of non-local operations fully transparent. revision: yes
-
Referee: [Abstract] The claim that 'biological properties emerge within the network' (nonlinearity, heterogeneity, positive rates) is presented as a direct consequence of the EFD assumption, yet the manuscript supplies neither the explicit firing-rate functions nor the interaction terms that would allow a reader to confirm that these properties arise without additional constraints.
Authors: The firing-rate functions are the link functions of the chosen EFD (Equation 3) and the interaction terms are the off-diagonal elements of the precision-weighted connectivity matrix (Section 3.2). These directly produce nonlinearity, unit-wise heterogeneity, and strictly positive rates by the support of the EFD. We have added a new figure (Figure 2) that plots the explicit functions and interaction terms to demonstrate emergence without further constraints. revision: yes
Circularity Check
No significant circularity; derivation presented as consequence of exponential-family assumption
full rationale
The abstract states that assuming the exponential family for both variational posterior and prior yields the listed network properties while maintaining FEP-PC correspondence up to the second cumulant, and that the model can be trained by local rules. No equations, self-citations, or fitted inputs are supplied in the provided text that would allow a reduction of the claimed correspondence to a definition or prior fit. The central step is therefore treated as an independent mathematical consequence of the distributional assumption rather than a renaming or self-referential construction.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math The exponential family of distributions is closed under the operations required for variational inference and yields a well-defined second cumulant.
- domain assumption Variational free-energy minimization under the stated family produces network dynamics that can be realized with local plasticity rules.
Reference graph
Works this paper leans on
-
[1]
The free-energy principle: a rough guide to the brain?Trends Cogn
Karl Friston. The free-energy principle: a rough guide to the brain?Trends Cogn. Sci., 13(7): 293–301, July 2009
2009
-
[2]
Predictive coding under the free-energy principle.Philos
Karl Friston and Stefan Kiebel. Predictive coding under the free-energy principle.Philos. Trans. R. Soc. Lond. B Biol. Sci., 364(1521):1211–1221, May 2009
2009
-
[3]
The free-energy principle: a unified brain theory?Nat
Karl Friston. The free-energy principle: a unified brain theory?Nat. Rev. Neurosci., 11(2): 127–138, February 2010. 16
2010
-
[4]
A tutorial on the free-energy framework for modelling perception and learning
Rafal Bogacz. A tutorial on the free-energy framework for modelling perception and learning. J. Math. Psychol., 76(Pt B):198–211, February 2017
2017
-
[5]
9 ofAllgemeine Encyklopädie der Physik
Hermann von Helmholtz.Handbuch der physiologischen Optik, volume Bd. 9 ofAllgemeine Encyklopädie der Physik. Leopold V oss, Leipzig, 1867
-
[6]
Cambridge University Press, September 1996
David C Knill, Whitman Richards, Whitman Richard, D C Knill, D Kersten, A Yuille, D Mum- ford, A Jepson, W Richards, D C Knill, J Feldman, A L Yuille, H H Bülthoff, B M Bennett, D D Hoffman, C Prakash, S N Richman, P Mamassian, A Blake, D Sheinberg, P N Belhumeur, W T Freeman, K Nakayama, S Shimojo, E H Adelson, A P Pentland, and H Barlow.Perception as Ba...
1996
-
[7]
MIT Press, 2007
Kenji Doya, Shin Ishii, Alexandre Pouget, and Rajesh P N Rao.Bayesian Brain: Probabilistic Approaches to Neural Coding. MIT Press, 2007
2007
-
[8]
Bayesian brain theory: Computational neuroscience of belief.Neuroscience, 566:198–204, February 2025
Hugo Bottemanne. Bayesian brain theory: Computational neuroscience of belief.Neuroscience, 566:198–204, February 2025
2025
-
[9]
Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nat
R P Rao and D H Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nat. Neurosci., 2(1):79–87, January 1999
1999
-
[10]
A new cellular mechanism for coupling inputs arriving at different cortical layers.Nature, 398(6725):338–341, March 1999
M E Larkum, J J Zhu, and B Sakmann. A new cellular mechanism for coupling inputs arriving at different cortical layers.Nature, 398(6725):338–341, March 1999
1999
-
[11]
A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex.Trends Neurosci., 36(3):141–151, March 2013
Matthew Larkum. A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex.Trends Neurosci., 36(3):141–151, March 2013
2013
-
[12]
Conjunctive input processing drives feature selectivity in hippocampal CA1 neurons.Nat
Katie C Bittner, Christine Grienberger, Sachin P Vaidya, Aaron D Milstein, John J Macklin, Junghyup Suh, Susumu Tonegawa, and Jeffrey C Magee. Conjunctive input processing drives feature selectivity in hippocampal CA1 neurons.Nat. Neurosci., 18(8):1133–1142, August 2015
2015
-
[13]
Implications of neuronal diversity on population coding
Maoz Shamir and Haim Sompolinsky. Implications of neuronal diversity on population coding. Neural Comput., 18(8):1951–1986, August 2006
1951
-
[14]
Intrinsic biophysical diversity decorrelates neuronal firing while increasing information content.Nat
Krishnan Padmanabhan and Nathaniel N Urban. Intrinsic biophysical diversity decorrelates neuronal firing while increasing information content.Nat. Neurosci., 13(10):1276–1282, October 2010
2010
-
[15]
Population diversity and function of hyperpolarization- activated current in olfactory bulb mitral cells.Sci
Kamilla Angelo and Troy W Margrie. Population diversity and function of hyperpolarization- activated current in olfactory bulb mitral cells.Sci. Rep., 1(1):50, July 2011
2011
-
[16]
Multivariate analysis of electrophysiological diversity of xenopus visual neurons during development and plasticity.Elife, 4, November 2015
Christopher M Ciarleglio, Arseny S Khakhalin, Angelia F Wang, Alexander C Constantino, Sarah P Yip, and Carlos D Aizenman. Multivariate analysis of electrophysiological diversity of xenopus visual neurons during development and plasticity.Elife, 4, November 2015
2015
-
[17]
Diversity amongst human cortical pyramidal neurons revealed via their sag currents and frequency preferences.Nat
Homeira Moradi Chameh, Scott Rich, Lihua Wang, Fu-Der Chen, Liang Zhang, Peter L Carlen, Shreejoy J Tripathy, and Taufik A Valiante. Diversity amongst human cortical pyramidal neurons revealed via their sag currents and frequency preferences.Nat. Commun., 12(1):2497, May 2021
2021
-
[18]
Gaspard Oliviers, Rafal Bogacz, and Alexander Meulemans. Learning probability distributions of sensory inputs with Monte Carlo predictive coding.PLOS Computational Biology, 20(10): e1012532, October 2024. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1012532. URL https:// journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012532
-
[19]
Active inference and agency.Cogn
Karl Friston. Active inference and agency.Cogn. Neurosci., 5(2):119–121, April 2014
2014
-
[20]
The markov blankets of life: autonomy, active inference and the free energy principle.J
Michael Kirchhoff, Thomas Parr, Ensor Palacios, Karl Friston, and Julian Kiverstein. The markov blankets of life: autonomy, active inference and the free energy principle.J. R. Soc. Interface, 15(138):20170792, January 2018
2018
-
[21]
Pierce.Types and Programming Languages
Thomas Parr, Giovanni Pezzulo, and Karl J Friston. Active inference. https://mitpress. mit.edu/9780262362283/active-inference/, December 2021. Accessed: 2026-3-7. 17
-
[22]
Life as we know it.J
Karl Friston. Life as we know it.J. R. Soc. Interface, 10(86):20130475, September 2013
2013
-
[23]
Applied Mathematical Sciences
Shun-Ichi Amari.Information Geometry and Its Applications. Applied Mathematical Sciences. Springer, Tokyo, Japan, 1 edition, February 2016
2016
-
[24]
Laws of thermodynamics for exponential families.arXiv [cond-mat.stat- mech], January 2025
Akshay Balsubramani. Laws of thermodynamics for exponential families.arXiv [cond-mat.stat- mech], January 2025
2025
-
[25]
Thermodynamics of prediction.Phys
Susanne Still, David A Sivak, Anthony J Bell, and Gavin E Crooks. Thermodynamics of prediction.Phys. Rev. Lett., 109(12):120604, September 2012
2012
-
[26]
On the thermodynamics of prediction under dissipative adaptation.arXiv [q-bio.NC], September 2020
Kai Ueltzhöffer. On the thermodynamics of prediction under dissipative adaptation.arXiv [q-bio.NC], September 2020
2020
-
[27]
C. Beck. Superstatistics: theory and applications.Continuum Mechanics and Thermodynamics, 16(3):293–304, March 2004. ISSN 1432-0959. doi: 10.1007/s00161-003-0145-1. URL http://dx.doi.org/10.1007/s00161-003-0145-1. A On third cumulant-neglecting approximation In Section 3.2.1, we introduced an approximation where we neglect the third cumulant ∇3 ηql Aql(ηq...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.