Error Highways: Scaling Predictive Coding to Very Deep Networks

Alexander G. Ororbia; Amirhossein Mohammadi

arxiv: 2606.22744 · v1 · pith:CD3YBCRFnew · submitted 2026-06-22 · 💻 cs.LG · cs.NE

Error Highways: Scaling Predictive Coding to Very Deep Networks

Amirhossein Mohammadi , Alexander G. Ororbia This is my paper

Pith reviewed 2026-06-26 09:40 UTC · model grok-4.3

classification 💻 cs.LG cs.NE

keywords predictive codingdeep networkserror propagationhighway networksMNISTFashion-MNISTlocal learningbackpropagation alternative

0 comments

The pith

Highway error propagation lets predictive coding networks train 128-layer MLPs with depth-robust accuracy on MNIST.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes highway error propagation to solve the rapid decay of learning signals that prevents standard predictive coding from scaling beyond shallow networks. It augments the free energy function with linear feedback matrices that directly couple hidden states to output error, delivering a correction whose size stays constant across layers. This change leaves the local synaptic update rules unchanged yet bypasses the exponential attenuation that occurs in vanilla predictive coding. A sympathetic reader would care because the result shows a biologically plausible local-learning method can reach depths where backpropagation alternatives previously failed.

Core claim

Augmenting the free energy function with feedback matrices V_{L→i} that couple selected hidden states directly to the clamped output error supplies a depth-independent correction at every inference step; because the coupling is linear, the highway pathway reaches interior layers without the exponential decay that occurs when error propagates through the usual Jacobian chain, while the local PC synaptic update rule remains intact.

What carries the argument

Highway error propagation (HEP) via linear feedback matrices V_{L→i} that couple hidden states directly to output error.

If this is right

MLPs up to 128 layers deep can be trained effectively under predictive coding on MNIST and Fashion-MNIST.
Classification accuracy remains stable as network depth increases when HEP is used.
The error correction magnitude delivered to any hidden layer stays independent of total depth.
The local synaptic update rules of predictive coding continue to apply without modification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar linear bypass paths might be added to other local-learning schemes to improve depth scaling.
The approach could be tested on datasets larger than MNIST to check whether the depth robustness generalizes.
Because the correction is linear, hardware implementations that support only local updates might adopt HEP without extra global communication.

Load-bearing premise

Inserting the linear feedback matrices supplies a depth-independent correction at every inference step while leaving the local PC synaptic update rule intact.

What would settle it

Train identical 128-layer MLPs on MNIST once with HEP and once with standard predictive coding, then compare final test accuracy to check whether the non-HEP version shows clear depth-dependent degradation.

Figures

Figures reproduced from arXiv: 2606.22744 by Alexander G. Ororbia, Amirhossein Mohammadi.

**Figure 1.** Figure 1: Per-layer RMS prediction error ∥eℓ∥ (color) versus layer (0 = input) and inference iteration for a depth-64 MLP. In (a) vanilla PC and (b) µPC the output error only crawls back along the network, stalling deep and leaving the input-side layers at the numerical floor; (c) HEP’s error highways inject it directly into the hidden layers, guiding the learning procedure early in inference. problem better conditi… view at source ↗

**Figure 2.** Figure 2: Our proposed error highways in a deep PCN. Each (error) highway VL→i transmits the output error ϵL straight to a selected hidden layer i ∈ S within one hop (or E-step), such that the correction applied to that layer does not decay with depth. This stands in contrast to a vanilla PCN, where errors lose a factor of λ per hop. (specifically, the output’s dedicated pair of error units). The projection, in its … view at source ↗

**Figure 3.** Figure 3: Highway strength α vs. test accuracy at depth 64 (MNIST). that the proposed error highways deliver the output error to the interior of the network from the very first inference steps. Sweeping the highway strength. If the highway term is what enables learning at depth, turning it off should trace a path back to vanilla PC. At depth 64 on MNIST we sweep α in Eq. 7 with every other setting fixed ( [PITH_FUL… view at source ↗

**Figure 4.** Figure 4: Per-layer RMS prediction error versus layer and inference step for the same depth-64 MLP as Figure 1, [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Energy of selected layers over inference steps under plain gradient descent. Only the layers next to the [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Energy of the same layers under Adam. Every tracked layer activates early and its energy levels off at a [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

read the original abstract

Predictive coding networks (PCNs) offer a biologically-plausible, local-learning alternative to back-propagation of errors (backprop). Nevertheless, they have remained largely confined to shallow architectures and evaluated on simple machine intelligence benchmarks. A central obstacle to scaling PCNs is that the learning signal decays rapidly as it propagates away from the clamped boundaries, leaving interior layers effectively unchanged. To directly counter this problem, we propose highway error propagation (HEP), a scheme that augments the free energy function underlying predictive coding (PC) by altering its neural structure with feedback matrices $V_{L\to i}$ that couple selected hidden states directly to the clamped output error. Since this coupling is linear in the hidden state, the highway pathway delivers a correction at every inference step whose magnitude is independent of depth, in contrast to vanilla PC where the output error reaches the $i$-th hidden layer with attenuation that decays exponentially in depth. This bypasses the Jacobian chain while preserving the local PC synaptic update rule. On MNIST and Fashion-MNIST, we show that HEP effectively trains MLPs of up to 128 layers with accuracy that is robust with respect to depth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HEP adds linear feedback matrices to PC free energy so error reaches deep layers without decay, and claims 128-layer MLPs train stably on MNIST/Fashion-MNIST, but experiments lack detail and V update locality is unclear.

read the letter

The main point is that they insert linear feedback matrices V_{L→i} into the predictive coding free energy to give each hidden layer a direct additive drive from the output error. This makes the correction magnitude independent of depth instead of decaying through the usual Jacobian chain, while they say the local synaptic update rule for the main weights stays the same. They report this lets standard MLPs reach 128 layers on MNIST and Fashion-MNIST with accuracy that holds up as depth grows.

The construction itself is straightforward and targets the exact scaling bottleneck in PC. Keeping the update local for the primary parameters is the part that matters for the bio-plausibility angle.

The weak parts are the missing experimental substance. The abstract states success on two datasets but supplies no baselines, no ablations on the V matrices, no error bars, and no training protocol. That makes it impossible to tell how much the highways actually contribute versus other factors. The stress-test note is right to flag the V matrices: if they are learned parameters, their updates would need information spanning from layer L to i, which is not local. The abstract does not say whether V are fixed, random, or trained under a different rule, so the full locality claim is not yet supported.

This is for people working on local-learning alternatives to backprop. A reader in that niche would find the mechanism worth seeing even if the results need tightening. It deserves a serious referee because the scaling claim is concrete and the field needs more attempts at depth, provided the experiments and the V-rule details are filled in.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes highway error propagation (HEP) to scale predictive coding networks (PCNs) to deep architectures. It augments the free energy function with linear feedback matrices V_{L→i} that directly couple selected hidden states h_i to the clamped output error. This is claimed to deliver a depth-independent additive drive on the inference dynamics, bypassing the exponential attenuation of error signals through the Jacobian chain in vanilla PC, while preserving the local PC synaptic update rule. The central empirical claim is that HEP trains MLPs of up to 128 layers on MNIST and Fashion-MNIST with accuracy robust to depth.

Significance. If the locality of all updates (including for the V matrices) is rigorously preserved and the depth-robust accuracy is demonstrated with proper controls, the result would be significant for the field of biologically plausible alternatives to backprop, as it directly targets the vanishing-signal obstacle that has confined PCNs to shallow networks.

major comments (2)

[Abstract] Abstract (paragraph describing HEP): the claim that the linear coupling via V_{L→i} 'preserves the local PC synaptic update rule' is load-bearing for the contribution but lacks a derivation showing the update for V itself. If V_{L→i} are trainable parameters, their learning rule would require non-local information spanning from layer L to layer i, which cannot be realized by the standard local PC rule (product of adjacent pre- and post-synaptic activities). This distinction must be resolved to substantiate the 'local-learning alternative' framing.
[Abstract] Abstract (empirical claim): the statement that HEP 'effectively trains MLPs of up to 128 layers with accuracy that is robust with respect to depth' is central but unsupported by any protocol, baseline (standard PC or backprop), ablation on V, number of runs, or error bars in the supplied text; without these the robustness claim cannot be evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments. We address each of the major comments below and indicate where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph describing HEP): the claim that the linear coupling via V_{L→i} 'preserves the local PC synaptic update rule' is load-bearing for the contribution but lacks a derivation showing the update for V itself. If V_{L→i} are trainable parameters, their learning rule would require non-local information spanning from layer L to layer i, which cannot be realized by the standard local PC rule (product of adjacent pre- and post-synaptic activities). This distinction must be resolved to substantiate the 'local-learning alternative' framing.

Authors: The V matrices are fixed feedback connections that do not undergo learning; they are architectural augmentations to the free energy function. Consequently, there is no update rule for V, and the standard local PC synaptic updates for the weight matrices are preserved without modification. The full manuscript elaborates on this structure. We will revise the abstract and relevant sections to explicitly note that the V matrices are fixed and not subject to the PC learning rule. revision: yes
Referee: [Abstract] Abstract (empirical claim): the statement that HEP 'effectively trains MLPs of up to 128 layers with accuracy that is robust with respect to depth' is central but unsupported by any protocol, baseline (standard PC or backprop), ablation on V, number of runs, or error bars in the supplied text; without these the robustness claim cannot be evaluated.

Authors: The abstract provides a high-level summary of the results. The full manuscript contains the detailed experimental setup, including protocols, baselines such as standard PC and backpropagation, ablations on the role of V, statistics over multiple runs, and error bars, all demonstrating the depth-independent performance on MNIST and Fashion-MNIST. We believe the claim is substantiated in the body of the paper. revision: no

Circularity Check

0 steps flagged

No significant circularity; structural augmentation with empirical results

full rationale

The paper proposes HEP by augmenting the free energy function with linear feedback matrices V_{L→i} that couple hidden states to output error. This is presented as a direct structural change that yields depth-independent drive by construction of the added linear terms, while the claim of preserved local PC updates is asserted as a consequence of the formulation. No parameters are fitted to data and then relabeled as predictions, no self-citations form the load-bearing justification, and the MNIST/Fashion-MNIST accuracy results are reported as experimental outcomes rather than derived tautologies. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only the abstract is available, so the ledger records the new feedback matrices introduced by the paper; no free parameters or background axioms are explicitly stated in the provided text.

invented entities (1)

feedback matrices V_{L o i} no independent evidence
purpose: to couple selected hidden states directly to the clamped output error in a linear fashion
Introduced in the abstract to create the highway pathway that bypasses exponential attenuation.

pith-pipeline@v0.9.1-grok · 5732 in / 1248 out tokens · 28493 ms · 2026-06-26T09:40:15.137138+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 5 canonical work pages · 3 internal anchors

[1]

How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation

BENGIO, Y. How auto-encoders could provide credit assignment in deep networks via target propagation. arXiv preprint arXiv:1407.7906(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[2]

Accelerated predictive coding networks via direct kolen–pollack feedback alignment, 2026

CASNICI, D., LEFEBVRE, M., DAUWELS, J.,ANDFRENKEL, C. Accelerated predictive coding networks via direct kolen–pollack feedback alignment, 2026

2026
[3]

P., LAIRD, N

DEMPSTER, A. P., LAIRD, N. M.,ANDRUBIN, D. B. Maximum likelihood from incomplete data via the em algorithm.Journal of the royal statistical society: series B (methodological) 39, 1 (1977), 1–22

1977
[4]

A theory of cortical responses.Philosophical Transactions of the Royal Society B: Biological Sciences 360, 1456 (04 2005), 815–836

FRISTON, K. A theory of cortical responses.Philosophical Transactions of the Royal Society B: Biological Sciences 360, 1456 (04 2005), 815–836

2005
[5]

epc: Fast and deep predictive coding for digital hardware, 2026

GOEMAERE, C., OLIVIERS, G., BOGACZ, R.,ANDDEMEESTER, T. epc: Fast and deep predictive coding for digital hardware, 2026

2026
[6]

H., KIM, H., SUNG, Y., JO, Y., KANG, M

HA, M. H., KIM, H., SUNG, Y., JO, Y., KANG, M. S.,ANDLEE, S. W. Stable and scalable deep predictive coding networks with meta-prediction errors. InThe Fourteenth International Conference on Learning Representations(2026)

2026
[7]

Hierarchy or heterarchy? a theory of long-range connections for the sensorimotor brain.arXiv preprint arXiv:2507.05888(2025)

HAWKINS, J., LEADHOLM, N.,ANDCLAY, V. Hierarchy or heterarchy? a theory of long-range connections for the sensorimotor brain.arXiv preprint arXiv:2507.05888(2025)

work page arXiv 2025
[8]

Deep residual learning for image recognition

HE, K., ZHANG, X., REN, S.,ANDSUN, J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition(2016), pp. 770–778

2016
[9]

M.,ANDBUCKLEY, C

INNOCENTI, F., ACHOUR, E. M.,ANDBUCKLEY, C. L. µPC: Scaling predictive coding to 100+ layer networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2025)

2025
[10]

P.,ANDBA, J

KINGMA, D. P.,ANDBA, J. Adam: A method for stochastic optimization. InICLR (Poster)(2015)

2015
[11]

MILLIDGE, B., TSCHANTZ, A.,ANDBUCKLEY, C. L. Predictive coding approximates backprop along arbitrary computation graphs.Neural Computation 34, 6 (05 2022), 1329–1368

2022
[12]

Direct feedback alignment provides learning in deep neural networks

NØKLAND, A. Direct feedback alignment provides learning in deep neural networks. InAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain(2016), D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett, Eds., pp. 1037–1045

2016
[13]

The neural coding framework for learning generative models.Nature Communications 13, 1 (Apr

ORORBIA, A.,ANDKIFER, D. The neural coding framework for learning generative models.Nature Communications 13, 1 (Apr. 2022)

2022
[14]

L.,ANDKIFER, D

ORORBIA, A., MALI, A., GILES, C. L.,ANDKIFER, D. Continual learning of recurrent neural networks by locally aligning distributed representations.IEEE transactions on neural networks and learning systems 31, 10 (2020), 4267–4278

2020
[15]

L.,ANDKIFER, D

ORORBIA, A., MALI, A., GILES, C. L.,ANDKIFER, D. Lifelong neural predictive coding: Learning cumulatively online without forgetting.Advances in Neural Information Processing Systems 35(2022), 5867–5881

2022
[16]

ORORBIA, A. G. Brain-inspired machine intelligence: A survey of neurobiologically-plausible credit assignment.arXiv preprint arXiv:2312.09257(2023)

work page arXiv 2023
[17]

G.,ANDMALI, A

ORORBIA, A. G.,ANDMALI, A. Biologically motivated algorithms for propagating local target representa- tions. InProceedings of the aaai conference on artificial intelligence(2019), vol. 33, pp. 4651–4658

2019
[18]

G., MALI, A., KIFER, D.,ANDGILES, C

ORORBIA, A. G., MALI, A., KIFER, D.,ANDGILES, C. L. Backpropagation-free deep learning with recursive local representation alignment.Proceedings of the AAAI Conference on Artificial Intelligence 37, 8 (2023), 9327–9335

2023
[19]

On the difficulty of training recurrent neural networks

PASCANU, R., MIKOLOV, T.,ANDBENGIO, Y. On the difficulty of training recurrent neural networks. In International conference on machine learning(2013), Pmlr, pp. 1310–1318

2013
[20]

The entangled brain.Journal of cognitive neuroscience 35, 3 (2023), 349–360

PESSOA, L. The entangled brain.Journal of cognitive neuroscience 35, 3 (2023), 349–360

2023
[21]

Faster predictive coding networks via better initialization, 2026

PINCHETTI, L., FRIEDER, S., LUKASIEWICZ, T.,ANDSALVATORI, T. Faster predictive coding networks via better initialization, 2026. 9 Preprint

2026
[22]

Towards the training of deeper predictive coding neural networks, 2025

QI, C., FORASASSI, M., LUKASIEWICZ, T.,ANDSALVATORI, T. Towards the training of deeper predictive coding neural networks, 2025

2025
[23]

RAO, R. P. N.,ANDBALLARD, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature Neuroscience 2, 1 (1999), 79–87

1999
[24]

E., HINTON, G

RUMELHART, D. E., HINTON, G. E.,ANDWILLIAMS, R. J. Learning representations by back-propagating errors.nature 323, 6088 (1986), 533–536

1986
[25]

L., LUKASIEWICZ, T., RAO, R

SALVATORI, T., MALI, A., BUCKLEY, C. L., LUKASIEWICZ, T., RAO, R. P., FRISTON, K.,ANDORORBIA, A. A survey on neuro-mimetic deep learning via predictive coding.Neural Networks(2025), 108161

2025
[26]

A stable, fast, and fully automatic learning algorithm for predictive coding networks

SALVATORI, T., SONG, Y., YORDANOV, Y., MILLIDGE, B., SHA, L., EMDE, C., XU, Z., BOGACZ, R., ANDLUKASIEWICZ, T. A stable, fast, and fully automatic learning algorithm for predictive coding networks. InInternational Conference on Learning Representations(2024), vol. 2024, pp. 19607–19631

2024
[27]

Recurrent dynamics in the cerebral cortex: Integration of sensory evidence with stored knowledge.Proceedings of the National Academy of Sciences 118, 33 (2021), e2101043118

SINGER, W. Recurrent dynamics in the cerebral cortex: Integration of sensory evidence with stored knowledge.Proceedings of the National Academy of Sciences 118, 33 (2021), e2101043118

2021
[28]

Highway Networks

SRIVASTAVA, R. K., GREFF, K.,ANDSCHMIDHUBER, J. Highway networks.arXiv preprint arXiv:1505.00387(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[29]

WHITTINGTON, J. C. R.,ANDBOGACZ, R. An approximation of the error backpropagation algorithm in a predictive coding network with local hebbian synaptic plasticity.Neural Computation 29, 5 (05 2017), 1229–1262

2017
[30]

NeuroAI and Beyond: Bridging Between Advances in Neuroscience and ArtificialIntelligence

ZADOR, A., FELLOUS, J.-M., SEJNOWSKI, T., ADAM, G., AIMONE, J. B., AKWABOAH, A., ALOIMONOS, Y., ALONSO, C. A., BARTOLOZZI, C., BENNINGTON, M. J.,ET AL. Neuroai and beyond: Bridging between advances in neuroscience and artificialintelligence.arXiv preprint arXiv:2604.18637(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

G., SRIVASTAVA, R

ZILLY, J. G., SRIVASTAVA, R. K., KOUTNIK, J.,ANDSCHMIDHUBER, J. Recurrent highway networks. In International conference on machine learning(2017), PMLR, pp. 4189–4198. 10 Preprint A The Vanishing Learning Signal: Transient and Steady-State Derivations The analyses in this appendix consolidate two complementary results established by recent studies of dept...

2017

[1] [1]

How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation

BENGIO, Y. How auto-encoders could provide credit assignment in deep networks via target propagation. arXiv preprint arXiv:1407.7906(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[2] [2]

Accelerated predictive coding networks via direct kolen–pollack feedback alignment, 2026

CASNICI, D., LEFEBVRE, M., DAUWELS, J.,ANDFRENKEL, C. Accelerated predictive coding networks via direct kolen–pollack feedback alignment, 2026

2026

[3] [3]

P., LAIRD, N

DEMPSTER, A. P., LAIRD, N. M.,ANDRUBIN, D. B. Maximum likelihood from incomplete data via the em algorithm.Journal of the royal statistical society: series B (methodological) 39, 1 (1977), 1–22

1977

[4] [4]

A theory of cortical responses.Philosophical Transactions of the Royal Society B: Biological Sciences 360, 1456 (04 2005), 815–836

FRISTON, K. A theory of cortical responses.Philosophical Transactions of the Royal Society B: Biological Sciences 360, 1456 (04 2005), 815–836

2005

[5] [5]

epc: Fast and deep predictive coding for digital hardware, 2026

GOEMAERE, C., OLIVIERS, G., BOGACZ, R.,ANDDEMEESTER, T. epc: Fast and deep predictive coding for digital hardware, 2026

2026

[6] [6]

H., KIM, H., SUNG, Y., JO, Y., KANG, M

HA, M. H., KIM, H., SUNG, Y., JO, Y., KANG, M. S.,ANDLEE, S. W. Stable and scalable deep predictive coding networks with meta-prediction errors. InThe Fourteenth International Conference on Learning Representations(2026)

2026

[7] [7]

Hierarchy or heterarchy? a theory of long-range connections for the sensorimotor brain.arXiv preprint arXiv:2507.05888(2025)

HAWKINS, J., LEADHOLM, N.,ANDCLAY, V. Hierarchy or heterarchy? a theory of long-range connections for the sensorimotor brain.arXiv preprint arXiv:2507.05888(2025)

work page arXiv 2025

[8] [8]

Deep residual learning for image recognition

HE, K., ZHANG, X., REN, S.,ANDSUN, J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition(2016), pp. 770–778

2016

[9] [9]

M.,ANDBUCKLEY, C

INNOCENTI, F., ACHOUR, E. M.,ANDBUCKLEY, C. L. µPC: Scaling predictive coding to 100+ layer networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2025)

2025

[10] [10]

P.,ANDBA, J

KINGMA, D. P.,ANDBA, J. Adam: A method for stochastic optimization. InICLR (Poster)(2015)

2015

[11] [11]

MILLIDGE, B., TSCHANTZ, A.,ANDBUCKLEY, C. L. Predictive coding approximates backprop along arbitrary computation graphs.Neural Computation 34, 6 (05 2022), 1329–1368

2022

[12] [12]

Direct feedback alignment provides learning in deep neural networks

NØKLAND, A. Direct feedback alignment provides learning in deep neural networks. InAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain(2016), D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett, Eds., pp. 1037–1045

2016

[13] [13]

The neural coding framework for learning generative models.Nature Communications 13, 1 (Apr

ORORBIA, A.,ANDKIFER, D. The neural coding framework for learning generative models.Nature Communications 13, 1 (Apr. 2022)

2022

[14] [14]

L.,ANDKIFER, D

ORORBIA, A., MALI, A., GILES, C. L.,ANDKIFER, D. Continual learning of recurrent neural networks by locally aligning distributed representations.IEEE transactions on neural networks and learning systems 31, 10 (2020), 4267–4278

2020

[15] [15]

L.,ANDKIFER, D

ORORBIA, A., MALI, A., GILES, C. L.,ANDKIFER, D. Lifelong neural predictive coding: Learning cumulatively online without forgetting.Advances in Neural Information Processing Systems 35(2022), 5867–5881

2022

[16] [16]

ORORBIA, A. G. Brain-inspired machine intelligence: A survey of neurobiologically-plausible credit assignment.arXiv preprint arXiv:2312.09257(2023)

work page arXiv 2023

[17] [17]

G.,ANDMALI, A

ORORBIA, A. G.,ANDMALI, A. Biologically motivated algorithms for propagating local target representa- tions. InProceedings of the aaai conference on artificial intelligence(2019), vol. 33, pp. 4651–4658

2019

[18] [18]

G., MALI, A., KIFER, D.,ANDGILES, C

ORORBIA, A. G., MALI, A., KIFER, D.,ANDGILES, C. L. Backpropagation-free deep learning with recursive local representation alignment.Proceedings of the AAAI Conference on Artificial Intelligence 37, 8 (2023), 9327–9335

2023

[19] [19]

On the difficulty of training recurrent neural networks

PASCANU, R., MIKOLOV, T.,ANDBENGIO, Y. On the difficulty of training recurrent neural networks. In International conference on machine learning(2013), Pmlr, pp. 1310–1318

2013

[20] [20]

The entangled brain.Journal of cognitive neuroscience 35, 3 (2023), 349–360

PESSOA, L. The entangled brain.Journal of cognitive neuroscience 35, 3 (2023), 349–360

2023

[21] [21]

Faster predictive coding networks via better initialization, 2026

PINCHETTI, L., FRIEDER, S., LUKASIEWICZ, T.,ANDSALVATORI, T. Faster predictive coding networks via better initialization, 2026. 9 Preprint

2026

[22] [22]

Towards the training of deeper predictive coding neural networks, 2025

QI, C., FORASASSI, M., LUKASIEWICZ, T.,ANDSALVATORI, T. Towards the training of deeper predictive coding neural networks, 2025

2025

[23] [23]

RAO, R. P. N.,ANDBALLARD, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature Neuroscience 2, 1 (1999), 79–87

1999

[24] [24]

E., HINTON, G

RUMELHART, D. E., HINTON, G. E.,ANDWILLIAMS, R. J. Learning representations by back-propagating errors.nature 323, 6088 (1986), 533–536

1986

[25] [25]

L., LUKASIEWICZ, T., RAO, R

SALVATORI, T., MALI, A., BUCKLEY, C. L., LUKASIEWICZ, T., RAO, R. P., FRISTON, K.,ANDORORBIA, A. A survey on neuro-mimetic deep learning via predictive coding.Neural Networks(2025), 108161

2025

[26] [26]

A stable, fast, and fully automatic learning algorithm for predictive coding networks

SALVATORI, T., SONG, Y., YORDANOV, Y., MILLIDGE, B., SHA, L., EMDE, C., XU, Z., BOGACZ, R., ANDLUKASIEWICZ, T. A stable, fast, and fully automatic learning algorithm for predictive coding networks. InInternational Conference on Learning Representations(2024), vol. 2024, pp. 19607–19631

2024

[27] [27]

Recurrent dynamics in the cerebral cortex: Integration of sensory evidence with stored knowledge.Proceedings of the National Academy of Sciences 118, 33 (2021), e2101043118

SINGER, W. Recurrent dynamics in the cerebral cortex: Integration of sensory evidence with stored knowledge.Proceedings of the National Academy of Sciences 118, 33 (2021), e2101043118

2021

[28] [28]

Highway Networks

SRIVASTAVA, R. K., GREFF, K.,ANDSCHMIDHUBER, J. Highway networks.arXiv preprint arXiv:1505.00387(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[29] [29]

WHITTINGTON, J. C. R.,ANDBOGACZ, R. An approximation of the error backpropagation algorithm in a predictive coding network with local hebbian synaptic plasticity.Neural Computation 29, 5 (05 2017), 1229–1262

2017

[30] [30]

NeuroAI and Beyond: Bridging Between Advances in Neuroscience and ArtificialIntelligence

ZADOR, A., FELLOUS, J.-M., SEJNOWSKI, T., ADAM, G., AIMONE, J. B., AKWABOAH, A., ALOIMONOS, Y., ALONSO, C. A., BARTOLOZZI, C., BENNINGTON, M. J.,ET AL. Neuroai and beyond: Bridging between advances in neuroscience and artificialintelligence.arXiv preprint arXiv:2604.18637(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[31] [31]

G., SRIVASTAVA, R

ZILLY, J. G., SRIVASTAVA, R. K., KOUTNIK, J.,ANDSCHMIDHUBER, J. Recurrent highway networks. In International conference on machine learning(2017), PMLR, pp. 4189–4198. 10 Preprint A The Vanishing Learning Signal: Transient and Steady-State Derivations The analyses in this appendix consolidate two complementary results established by recent studies of dept...

2017