Predictive Coding with Bayesian Priors via Proximal Gradients

Francesco Bullo

arxiv: 2606.08374 · v1 · pith:Y2I7I7FFnew · submitted 2026-06-06 · 📡 eess.SY · cs.LG· cs.SY

Predictive Coding with Bayesian Priors via Proximal Gradients

Francesco Bullo This is my paper

Pith reviewed 2026-06-27 19:05 UTC · model grok-4.3

classification 📡 eess.SY cs.LGcs.SY

keywords predictive codingproximal gradient descentfiring-rate networksMAP estimationBayesian priorshierarchical modelsvariable splittingleaky integrate-and-fire

0 comments

The pith

Predictive coding arises exactly as continuous-time proximal gradient descent on regularized MAP objectives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that single-level predictive coding networks are identical to leaky firing-rate circuits obtained by applying proximal gradient flow to a Bayesian MAP estimation problem. The membrane leak, recurrent connections, local drive, and activation function all derive directly from the optimization without extra assumptions. The prior distribution determines the network nonlinearity through its proximal operator, while the likelihood precision controls observation gain. For multi-level problems a variable-splitting relaxation converts the deep MAP task into an undirected Markov random field whose level-wise priors are solved by interconnected local and distributed proximal solvers, recovering hierarchical predictive coding.

Core claim

Proximal gradient descent applied to the regularized MAP objective is precisely a leaky firing-rate network: the membrane leak, effective recurrent matrix, local synaptic drive, and static nonlinearity all follow from one optimization principle, reproducing the circuit proposed by Rao and Ballard. The prior selects the nonlinearity via its proximal operator and the likelihood precision sets the gain. In the hierarchical case, classical variable-splitting relaxation of the deep MAP problem yields predictive coding as the interconnection of local and distributed solvers; this replaces the directed generative chain by an undirected Markov random field whose node potentials are the level-wise pr

What carries the argument

Continuous-time proximal gradient flow on the regularized maximum-a-posteriori objective, with the proximal operator of each prior supplying the static nonlinearity of its level.

If this is right

The membrane leak term and recurrent weight matrix emerge directly from the gradient step on the objective.
Different choices of prior induce different activation functions in the resulting network without separate design.
Hierarchical predictive coding corresponds to a standard relaxation of deep MAP estimation rather than an ad-hoc construction.
Each level in the hierarchy solves its own proximal subproblem using its local prior.
The overall architecture is that of an undirected graphical model rather than a directed generative chain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Observed nonlinearities in cortical neurons could be interpreted as signatures of specific Bayesian priors used by the brain.
The same proximal-gradient derivation might be applied to other continuous-time optimization methods to obtain new candidate neural dynamics.
Predictive coding performance could be improved by choosing priors whose proximal operators better match measured neural response functions.
The relaxation step suggests that message-passing algorithms on undirected graphs may underlie multi-area cortical computation.

Load-bearing premise

The continuous-time proximal gradient flow on the regularized MAP objective exactly reproduces the firing-rate dynamics without additional approximations or time-scale separations.

What would settle it

Numerically integrate the proximal gradient ODE for a Laplace prior and a Gaussian likelihood and check whether the resulting membrane-potential trajectories coincide exactly with those of the corresponding leaky integrate-and-fire equations under identical parameters.

Figures

Figures reproduced from arXiv: 2606.08374 by Francesco Bullo.

read the original abstract

We recast predictive coding as continuous-time proximal gradient descent applied to a regularized maximum-a-posteriori (MAP) objective. We study first a single-level problem and then a multi-level hierarchy. For the single-level problem, we show that proximal gradient descent is precisely a leaky firing-rate network: the membrane leak, the effective recurrent matrix, the local synaptic drive, and the static nonlinearity all follow from one optimization principle, and the resulting circuit is the one proposed by Rao and Ballard. The prior selects the nonlinearity through its proximal operator, and the likelihood precision sets the gain on the observation. For the hierarchy, we show that a classical variable-splitting relaxation of the deep MAP problem yields hierarchical predictive coding as the interconnection of local and distributed solvers. In probabilistic modeling terms, this relaxation replaces the directed generative chain by an undirected Markov random field whose node potentials are the level-wise priors. Each level then applies its own activation function, namely the proximal operator of its prior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives predictive coding from proximal gradient steps on MAP, with priors setting the nonlinearities, but the continuous-time exactness claim needs verification.

read the letter

The main takeaway is that this work recasts single-level Rao-Ballard predictive coding as the continuous-time proximal gradient flow on a regularized MAP objective, where the proximal operator of the prior directly supplies the static nonlinearity and the likelihood precision sets the observation gain. For the hierarchy it uses variable splitting to turn the deep MAP into an undirected MRF whose local solvers interconnect into the standard predictive coding circuit.

What is new is the explicit identification of the proximal operator with the network nonlinearity and the derivation of the recurrent matrix and local drive from one optimization principle rather than from ad-hoc circuit assumptions. The multi-level extension via splitting is also a clean move that replaces the directed generative model with an undirected one. This is useful for anyone who wants a principled route from probabilistic objectives to circuit equations.

The soft spot is the continuous-time claim. The abstract asserts that the proximal flow yields the leaky-integrator dynamics without further approximations, but standard proximal flows often rely on differential inclusions or Moreau envelopes, and the exact match to membrane time constants can require a specific step-size limit or instantaneous proximal evaluation. If the full derivations contain hidden time-scale separations or implicit discretizations, the "precisely" equivalence weakens. No empirical checks or error bounds are mentioned in the abstract, so the strength of the central mapping is hard to judge from the summary alone.

This is for readers working at the intersection of optimization and neural circuit modeling. It deserves a serious referee because the optimization framing is coherent and the cited prior literature is engaged directly; a referee can check whether the continuous-time derivations hold without extra assumptions.

Referee Report

2 major / 2 minor

Summary. The manuscript recasts predictive coding as continuous-time proximal gradient descent on a regularized MAP objective. For the single-level case it claims this flow is precisely a leaky firing-rate network whose membrane leak, recurrent matrix, local drive and static nonlinearity are all determined by the objective, reproducing the Rao-Ballard circuit; the prior selects the nonlinearity via its proximal operator. For hierarchies a variable-splitting relaxation of the deep MAP problem is shown to yield hierarchical predictive coding as the interconnection of local and distributed proximal solvers, equivalently replacing the directed generative chain by an undirected MRF whose node potentials are the level-wise priors.

Significance. If the claimed exact equivalences hold, the work supplies a parameter-free optimization derivation of predictive-coding circuits directly from Bayesian MAP estimation, allowing priors to dictate activations without auxiliary assumptions and extending systematically to hierarchies. The absence of fitted parameters and the explicit link between proximal operators and network nonlinearities constitute clear strengths.

major comments (2)

[Abstract / single-level section] Abstract and single-level derivation: the assertion that continuous-time proximal gradient flow on the regularized MAP objective yields exactly the leaky-integrator dynamics τẋ = −x + abla(likelihood) + proximal nonlinearity must be verified without implicit discretization, time-scale separation, or replacement by the gradient of the Moreau envelope; any such step would contradict the 'precisely' claim.
[Hierarchy / multi-level section] Hierarchy section: the variable-splitting relaxation is stated to produce hierarchical predictive coding as interconnected solvers, yet it is unclear whether the resulting dynamics converge to the original MAP objective or only to a relaxed surrogate; an explicit statement of the relaxation gap or convergence guarantee is needed to support the multi-level claim.

minor comments (2)

Notation for the proximal operator and the precision parameter should be introduced once and used uniformly.
A brief remark on the relation to existing continuous-time analyses of proximal gradient methods would help situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the positive assessment of the work's significance. We address the two major comments point by point below.

read point-by-point responses

Referee: [Abstract / single-level section] Abstract and single-level derivation: the assertion that continuous-time proximal gradient flow on the regularized MAP objective yields exactly the leaky-integrator dynamics τẋ = −x + abla(likelihood) + proximal nonlinearity must be verified without implicit discretization, time-scale separation, or replacement by the gradient of the Moreau envelope; any such step would contradict the 'precisely' claim.

Authors: The derivation begins from the exact continuous-time proximal gradient flow au \dot{x} = prox_{ ho g}(x - ho abla f(x)) - x, which is the standard ODE limit of the proximal-gradient iteration and contains no discretization. Algebraic rearrangement then yields the leaky-integrator form with the proximal operator appearing directly as the static nonlinearity; neither time-scale separation nor the gradient of the Moreau envelope is invoked. To remove any residual ambiguity around the word 'precisely,' we will insert an expanded, line-by-line verification of this equivalence in the revised single-level section. revision: yes
Referee: [Hierarchy / multi-level section] Hierarchy section: the variable-splitting relaxation is stated to produce hierarchical predictive coding as interconnected solvers, yet it is unclear whether the resulting dynamics converge to the original MAP objective or only to a relaxed surrogate; an explicit statement of the relaxation gap or convergence guarantee is needed to support the multi-level claim.

Authors: The manuscript already identifies the construction as a classical variable-splitting relaxation. The resulting network dynamics converge to stationary points of the relaxed objective; they do not in general recover the original constrained MAP problem. We will add a concise paragraph stating the convergence guarantee for the relaxed problem, noting that the relaxation gap vanishes as the splitting penalty tends to infinity, and citing the relevant convergence theory. This clarification will be placed immediately after the derivation of the hierarchical circuit. revision: yes

Circularity Check

0 steps flagged

No significant circularity: derivation maps external MAP objective and proximal gradient flow onto network dynamics without reduction to fitted inputs or self-citation chains

full rationale

The paper starts from a standard regularized MAP objective (external to the network) and applies the known continuous-time proximal gradient flow. It then algebraically identifies the resulting ODE terms (leak, recurrent matrix, synaptic drive, proximal nonlinearity) with the components of a leaky firing-rate model. This identification is a direct consequence of the flow equations rather than a redefinition or fit. The subsequent observation that the resulting circuit matches the Rao-Ballard architecture is a post-derivation comparison, not a load-bearing premise. No equations are shown to reduce to their own inputs by construction, no parameters are fitted to data and then relabeled as predictions, and no uniqueness theorems or ansatzes are imported via self-citation. The derivation therefore remains self-contained against the external optimization principle.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claim rests on standard properties of proximal operators and variable splitting; no free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (2)

standard math Proximal gradient descent dynamics are well-defined for the chosen regularized MAP objective in continuous time.
Invoked to equate the flow to network equations.
domain assumption Variable splitting yields an equivalent undirected MRF for the hierarchical MAP problem.
Used to obtain the hierarchical interconnection.

pith-pipeline@v0.9.1-grok · 5694 in / 1272 out tokens · 18585 ms · 2026-06-27T19:05:28.063451+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 32 canonical work pages · 1 internal anchor

[1]

Attinger, B

A. Attinger, B. Wang, and G. B. Keller. Visuomotor coupling shapes the functional development of mouse visual cortex. Cell, 169 0 (7): 0 1291--1302, 2017. doi:10.1016/j.cell.2017.05.023

work page doi:10.1016/j.cell.2017.05.023 2017
[2]

L. F. Barrett and E. K. Miller. Categorization is ‘baked’ into the brain. Nature Reviews Neuroscience, 27 0 (6): 0 435–456, 2026. doi:10.1038/s41583-026-01036-2

work page doi:10.1038/s41583-026-01036-2 2026
[3]

A. M. Bastos, W. M. Usrey, R. A. Adams, G. R. Mangun, P. Fries, and K. J. Friston. Canonical microcircuits for predictive coding. Neuron, 76 0 (4): 0 695--711, 2012. doi:10.1016/j.neuron.2012.10.038

work page doi:10.1016/j.neuron.2012.10.038 2012
[4]

Betteti, G

S. Betteti, G. Baggio, F. Bullo, and S. Zampieri. Firing rate models as associative memory: Synaptic design for robust retrieval. Neural Computation, 37 0 (10): 0 1807--1838, 2025. doi:10.1162/neco.a.28

work page doi:10.1162/neco.a.28 2025
[5]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3 0 (1): 0 1--124, 2010. doi:10.1561/2200000016

work page doi:10.1561/2200000016 2010
[6]

C. L. Buckley, C. S. Kim, S. McGregor, and A. K. Seth. The free energy principle for action and perception: A mathematical review. Journal of Mathematical Psychology, 81: 0 55--79, 2017. doi:10.1016/j.jmp.2017.09.004

work page doi:10.1016/j.jmp.2017.09.004 2017
[7]

Carandini and D

M. Carandini and D. J. Heeger. Normalization as a canonical neural computation. Nature Reviews Neuroscience, 13 0 (1): 0 51--62, 2012. doi:10.1038/nrn3136

work page doi:10.1038/nrn3136 2012
[8]

M. \'A . Carreira-Perpi \ n \'a n and W. Wang. Distributed optimization of deeply nested systems. In Int.\ Conf.\ Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research, pages 10--19, Reykjavik, Iceland, 2014. PMLR. URL https://proceedings.mlr.press/v33/carreira-perpinan14.html

2014
[9]

Centorrino, A

V. Centorrino, A. Gokhale, A. Davydov, G. Russo, and F. Bullo. Euclidean contractivity of neural networks with symmetric weights. IEEE Control Systems Letters, 7: 0 1724--1729, 2023. doi:10.1109/LCSYS.2023.3278250

work page doi:10.1109/lcsys.2023.3278250 2023
[10]

Centorrino, A

V. Centorrino, A. Davydov, A. Gokhale, G. Russo, and F. Bullo. On weakly contracting dynamics for convex optimization. IEEE Control Systems Letters, 8: 0 1745--1750, 2024 a . doi:10.1109/LCSYS.2024.3414348

work page doi:10.1109/lcsys.2024.3414348 2024
[11]

Centorrino, A

V. Centorrino, A. Gokhale, A. Davydov, G. Russo, and F. Bullo. Positive competitive networks for sparse reconstruction. Neural Computation, 36 0 (6): 0 1163–1197, 2024 b . doi:10.1162/neco_a_01657

work page doi:10.1162/neco_a_01657 2024
[12]

P. L. Combettes and J.-C. Pesquet. Proximal Splitting Methods in Signal Processing, page 185–212. Springer New York, 2011. ISBN 9781441995698. doi:10.1007/978-1-4419-9569-8_10

work page doi:10.1007/978-1-4419-9569-8_10 2011
[13]

P. L. Combettes and J.-C. Pesquet. Deep neural network structures solving variational inequalities. Set-Valued and Variational Analysis, 28 0 (3): 0 491--518, 2020. doi:10.1007/s11228-019-00526-z

work page doi:10.1007/s11228-019-00526-z 2020
[14]

Davydov, V

A. Davydov, V. Centorrino, A. Gokhale, G. Russo, and F. Bullo. Time-varying convex optimization: A contraction and equilibrium tracking approach. IEEE Transactions on Automatic Control, 70 0 (11): 0 7446--7460, 2025. doi:10.1109/TAC.2025.3576043

work page doi:10.1109/tac.2025.3576043 2025
[15]

K. J. Friston. A theory of cortical responses. Philosophical Transactions of the Royal Society B, 360 0 (1456): 0 815--836, 2005. doi:10.1098/rstb.2005.1622

work page doi:10.1098/rstb.2005.1622 2005
[16]

K. J. Friston. Hierarchical models in the brain. PLoS Computational Biology, 4 0 (11): 0 e1000211, 2008. doi:10.1371/journal.pcbi.1000211

work page doi:10.1371/journal.pcbi.1000211 2008
[17]

K. J. Friston. The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11 0 (2): 0 127--138, 2010. doi:10.1038/nrn2787

work page doi:10.1038/nrn2787 2010
[18]

K. J. Friston, T. FitzGerald , F. Rigoli, P. Schwartenbeck, and G. Pezzulo. Active inference: A process theory. Neural Computation, 29 0 (1): 0 1--49, 2017 a . doi:10.1162/NECO_a_00912

work page doi:10.1162/neco_a_00912 2017
[19]

K. J. Friston, T. Parr, and B. de Vries . The graphical brain: Belief propagation and active inference. Network Neuroscience, 1 0 (4): 0 381--414, 2017 b . doi:10.1162/NETN_a_00018

work page doi:10.1162/netn_a_00018 2017
[20]

Gokhale, A

A. Gokhale, A. Davydov, and F. Bullo. Proximal gradient dynamics: Monotonicity , exponential convergence, and applications. IEEE Control Systems Letters, 8: 0 2853--2858, 2024. doi:10.1109/LCSYS.2024.3516632

work page doi:10.1109/lcsys.2024.3516632 2024
[21]

Hassan-Moghaddam and M

S. Hassan-Moghaddam and M. R. Jovanovi \'c . Proximal gradient flow and D ouglas- R achford splitting dynamics: G lobal exponential stability via integral quadratic constraints. Automatica, 123: 0 109311, 2021. doi:10.1016/j.automatica.2020.109311

work page doi:10.1016/j.automatica.2020.109311 2021
[22]

G. B. Keller and T. D. Mrsic-Flogel. Predictive processing: A canonical cortical computation. Neuron, 100 0 (2): 0 424--435, 2018. doi:10.1016/j.neuron.2018.10.003

work page doi:10.1016/j.neuron.2018.10.003 2018
[23]

T. S. Lee and D. Mumford. Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America A, 20 0 (7): 0 1434--1448, 2003. doi:10.1364/josaa.20.001434

work page doi:10.1364/josaa.20.001434 2003
[24]

J. Marino. Predictive coding, variational autoencoders, and biological connections. Neural Computation, 34 0 (1): 0 1--44, 2022. doi:10.1162/neco_a_01458

work page doi:10.1162/neco_a_01458 2022
[25]

Millidge, A

B. Millidge, A. Seth, and C. L. Buckley. Predictive coding: A theoretical and experimental review. arXiv preprint, 2021. doi:10.48550/arXiv.2107.12979

work page doi:10.48550/arxiv.2107.12979 2021
[26]

Millidge, A

B. Millidge, A. Tschantz, and C. L. Buckley. Predictive coding approximates backprop along arbitrary computation graphs. Neural Computation, 34 0 (6): 0 1329--1368, 2022. doi:10.1162/neco_a_01497

work page doi:10.1162/neco_a_01497 2022
[27]

B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381 0 (6583): 0 607--609, 1996. doi:10.1038/381607a0

work page doi:10.1038/381607a0 1996
[28]

2014 , volume =

N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1 0 (3): 0 127--239, 2014. doi:10.1561/2400000003

work page doi:10.1561/2400000003 2014
[29]

R. P. N. Rao and D. H. Ballard. Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2 0 (1): 0 79--87, 1999. doi:10.1038/4580

work page doi:10.1038/4580 1999
[30]

Neural Policy Composition from Free Energy Minimization

F. Rossi, V. Centorrino, F. Bullo, and G. Russo. Neural policy composition from free energy minimization. Technical Report, 2025. doi:10.48550/arXiv.2512.04745. arXiv:2512.04745

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2512.04745 2025
[31]

C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen. Sparse coding via thresholding and local competition in neural circuits. Neural Computation, 20 0 (10): 0 2526--2563, 2008. doi:10.1162/neco.2008.03-07-486

work page doi:10.1162/neco.2008.03-07-486 2008
[32]

Buckley, Thomas Lukasiewicz, Rajesh P.N

T. Salvatori, A. Mali, C. L. Buckley, T. Lukasiewicz, R. P. Rao, K. Friston, and A. Ororbia. A survey on neuro-mimetic deep learning via predictive coding. Neural Networks, 195: 0 108161, 2026. doi:10.1016/j.neunet.2025.108161

work page doi:10.1016/j.neunet.2025.108161 2026
[33]

Taylor, R

G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein. Training neural networks without gradients: A scalable ADMM approach. In International Conference on Machine Learning, pages 2722--2731. PMLR, 2016. URL https://proceedings.mlr.press/v48/taylor16.html

2016
[34]

von Helmholtz

H. von Helmholtz. Handbuch der Physiologischen Optik . Voss, Leipzig, 1867
[35]

J. C. R. Whittington and R. Bogacz. An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity. Neural Computation, 29 0 (5): 0 1229--1262, 2017. doi:10.1162/neco_a_00949

work page doi:10.1162/neco_a_00949 2017

[1] [1]

Attinger, B

A. Attinger, B. Wang, and G. B. Keller. Visuomotor coupling shapes the functional development of mouse visual cortex. Cell, 169 0 (7): 0 1291--1302, 2017. doi:10.1016/j.cell.2017.05.023

work page doi:10.1016/j.cell.2017.05.023 2017

[2] [2]

L. F. Barrett and E. K. Miller. Categorization is ‘baked’ into the brain. Nature Reviews Neuroscience, 27 0 (6): 0 435–456, 2026. doi:10.1038/s41583-026-01036-2

work page doi:10.1038/s41583-026-01036-2 2026

[3] [3]

A. M. Bastos, W. M. Usrey, R. A. Adams, G. R. Mangun, P. Fries, and K. J. Friston. Canonical microcircuits for predictive coding. Neuron, 76 0 (4): 0 695--711, 2012. doi:10.1016/j.neuron.2012.10.038

work page doi:10.1016/j.neuron.2012.10.038 2012

[4] [4]

Betteti, G

S. Betteti, G. Baggio, F. Bullo, and S. Zampieri. Firing rate models as associative memory: Synaptic design for robust retrieval. Neural Computation, 37 0 (10): 0 1807--1838, 2025. doi:10.1162/neco.a.28

work page doi:10.1162/neco.a.28 2025

[5] [5]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3 0 (1): 0 1--124, 2010. doi:10.1561/2200000016

work page doi:10.1561/2200000016 2010

[6] [6]

C. L. Buckley, C. S. Kim, S. McGregor, and A. K. Seth. The free energy principle for action and perception: A mathematical review. Journal of Mathematical Psychology, 81: 0 55--79, 2017. doi:10.1016/j.jmp.2017.09.004

work page doi:10.1016/j.jmp.2017.09.004 2017

[7] [7]

Carandini and D

M. Carandini and D. J. Heeger. Normalization as a canonical neural computation. Nature Reviews Neuroscience, 13 0 (1): 0 51--62, 2012. doi:10.1038/nrn3136

work page doi:10.1038/nrn3136 2012

[8] [8]

M. \'A . Carreira-Perpi \ n \'a n and W. Wang. Distributed optimization of deeply nested systems. In Int.\ Conf.\ Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research, pages 10--19, Reykjavik, Iceland, 2014. PMLR. URL https://proceedings.mlr.press/v33/carreira-perpinan14.html

2014

[9] [9]

Centorrino, A

V. Centorrino, A. Gokhale, A. Davydov, G. Russo, and F. Bullo. Euclidean contractivity of neural networks with symmetric weights. IEEE Control Systems Letters, 7: 0 1724--1729, 2023. doi:10.1109/LCSYS.2023.3278250

work page doi:10.1109/lcsys.2023.3278250 2023

[10] [10]

Centorrino, A

V. Centorrino, A. Davydov, A. Gokhale, G. Russo, and F. Bullo. On weakly contracting dynamics for convex optimization. IEEE Control Systems Letters, 8: 0 1745--1750, 2024 a . doi:10.1109/LCSYS.2024.3414348

work page doi:10.1109/lcsys.2024.3414348 2024

[11] [11]

Centorrino, A

V. Centorrino, A. Gokhale, A. Davydov, G. Russo, and F. Bullo. Positive competitive networks for sparse reconstruction. Neural Computation, 36 0 (6): 0 1163–1197, 2024 b . doi:10.1162/neco_a_01657

work page doi:10.1162/neco_a_01657 2024

[12] [12]

P. L. Combettes and J.-C. Pesquet. Proximal Splitting Methods in Signal Processing, page 185–212. Springer New York, 2011. ISBN 9781441995698. doi:10.1007/978-1-4419-9569-8_10

work page doi:10.1007/978-1-4419-9569-8_10 2011

[13] [13]

P. L. Combettes and J.-C. Pesquet. Deep neural network structures solving variational inequalities. Set-Valued and Variational Analysis, 28 0 (3): 0 491--518, 2020. doi:10.1007/s11228-019-00526-z

work page doi:10.1007/s11228-019-00526-z 2020

[14] [14]

Davydov, V

A. Davydov, V. Centorrino, A. Gokhale, G. Russo, and F. Bullo. Time-varying convex optimization: A contraction and equilibrium tracking approach. IEEE Transactions on Automatic Control, 70 0 (11): 0 7446--7460, 2025. doi:10.1109/TAC.2025.3576043

work page doi:10.1109/tac.2025.3576043 2025

[15] [15]

K. J. Friston. A theory of cortical responses. Philosophical Transactions of the Royal Society B, 360 0 (1456): 0 815--836, 2005. doi:10.1098/rstb.2005.1622

work page doi:10.1098/rstb.2005.1622 2005

[16] [16]

K. J. Friston. Hierarchical models in the brain. PLoS Computational Biology, 4 0 (11): 0 e1000211, 2008. doi:10.1371/journal.pcbi.1000211

work page doi:10.1371/journal.pcbi.1000211 2008

[17] [17]

K. J. Friston. The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11 0 (2): 0 127--138, 2010. doi:10.1038/nrn2787

work page doi:10.1038/nrn2787 2010

[18] [18]

K. J. Friston, T. FitzGerald , F. Rigoli, P. Schwartenbeck, and G. Pezzulo. Active inference: A process theory. Neural Computation, 29 0 (1): 0 1--49, 2017 a . doi:10.1162/NECO_a_00912

work page doi:10.1162/neco_a_00912 2017

[19] [19]

K. J. Friston, T. Parr, and B. de Vries . The graphical brain: Belief propagation and active inference. Network Neuroscience, 1 0 (4): 0 381--414, 2017 b . doi:10.1162/NETN_a_00018

work page doi:10.1162/netn_a_00018 2017

[20] [20]

Gokhale, A

A. Gokhale, A. Davydov, and F. Bullo. Proximal gradient dynamics: Monotonicity , exponential convergence, and applications. IEEE Control Systems Letters, 8: 0 2853--2858, 2024. doi:10.1109/LCSYS.2024.3516632

work page doi:10.1109/lcsys.2024.3516632 2024

[21] [21]

Hassan-Moghaddam and M

S. Hassan-Moghaddam and M. R. Jovanovi \'c . Proximal gradient flow and D ouglas- R achford splitting dynamics: G lobal exponential stability via integral quadratic constraints. Automatica, 123: 0 109311, 2021. doi:10.1016/j.automatica.2020.109311

work page doi:10.1016/j.automatica.2020.109311 2021

[22] [22]

G. B. Keller and T. D. Mrsic-Flogel. Predictive processing: A canonical cortical computation. Neuron, 100 0 (2): 0 424--435, 2018. doi:10.1016/j.neuron.2018.10.003

work page doi:10.1016/j.neuron.2018.10.003 2018

[23] [23]

T. S. Lee and D. Mumford. Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America A, 20 0 (7): 0 1434--1448, 2003. doi:10.1364/josaa.20.001434

work page doi:10.1364/josaa.20.001434 2003

[24] [24]

J. Marino. Predictive coding, variational autoencoders, and biological connections. Neural Computation, 34 0 (1): 0 1--44, 2022. doi:10.1162/neco_a_01458

work page doi:10.1162/neco_a_01458 2022

[25] [25]

Millidge, A

B. Millidge, A. Seth, and C. L. Buckley. Predictive coding: A theoretical and experimental review. arXiv preprint, 2021. doi:10.48550/arXiv.2107.12979

work page doi:10.48550/arxiv.2107.12979 2021

[26] [26]

Millidge, A

B. Millidge, A. Tschantz, and C. L. Buckley. Predictive coding approximates backprop along arbitrary computation graphs. Neural Computation, 34 0 (6): 0 1329--1368, 2022. doi:10.1162/neco_a_01497

work page doi:10.1162/neco_a_01497 2022

[27] [27]

B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381 0 (6583): 0 607--609, 1996. doi:10.1038/381607a0

work page doi:10.1038/381607a0 1996

[28] [28]

2014 , volume =

N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1 0 (3): 0 127--239, 2014. doi:10.1561/2400000003

work page doi:10.1561/2400000003 2014

[29] [29]

R. P. N. Rao and D. H. Ballard. Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2 0 (1): 0 79--87, 1999. doi:10.1038/4580

work page doi:10.1038/4580 1999

[30] [30]

Neural Policy Composition from Free Energy Minimization

F. Rossi, V. Centorrino, F. Bullo, and G. Russo. Neural policy composition from free energy minimization. Technical Report, 2025. doi:10.48550/arXiv.2512.04745. arXiv:2512.04745

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2512.04745 2025

[31] [31]

C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen. Sparse coding via thresholding and local competition in neural circuits. Neural Computation, 20 0 (10): 0 2526--2563, 2008. doi:10.1162/neco.2008.03-07-486

work page doi:10.1162/neco.2008.03-07-486 2008

[32] [32]

Buckley, Thomas Lukasiewicz, Rajesh P.N

T. Salvatori, A. Mali, C. L. Buckley, T. Lukasiewicz, R. P. Rao, K. Friston, and A. Ororbia. A survey on neuro-mimetic deep learning via predictive coding. Neural Networks, 195: 0 108161, 2026. doi:10.1016/j.neunet.2025.108161

work page doi:10.1016/j.neunet.2025.108161 2026

[33] [33]

Taylor, R

G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein. Training neural networks without gradients: A scalable ADMM approach. In International Conference on Machine Learning, pages 2722--2731. PMLR, 2016. URL https://proceedings.mlr.press/v48/taylor16.html

2016

[34] [34]

von Helmholtz

H. von Helmholtz. Handbuch der Physiologischen Optik . Voss, Leipzig, 1867

[35] [35]

J. C. R. Whittington and R. Bogacz. An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity. Neural Computation, 29 0 (5): 0 1229--1262, 2017. doi:10.1162/neco_a_00949

work page doi:10.1162/neco_a_00949 2017