pith. sign in

arxiv: 2606.22744 · v1 · pith:CD3YBCRFnew · submitted 2026-06-22 · 💻 cs.LG · cs.NE

Error Highways: Scaling Predictive Coding to Very Deep Networks

Pith reviewed 2026-06-26 09:40 UTC · model grok-4.3

classification 💻 cs.LG cs.NE
keywords predictive codingdeep networkserror propagationhighway networksMNISTFashion-MNISTlocal learningbackpropagation alternative
0
0 comments X

The pith

Highway error propagation lets predictive coding networks train 128-layer MLPs with depth-robust accuracy on MNIST.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes highway error propagation to solve the rapid decay of learning signals that prevents standard predictive coding from scaling beyond shallow networks. It augments the free energy function with linear feedback matrices that directly couple hidden states to output error, delivering a correction whose size stays constant across layers. This change leaves the local synaptic update rules unchanged yet bypasses the exponential attenuation that occurs in vanilla predictive coding. A sympathetic reader would care because the result shows a biologically plausible local-learning method can reach depths where backpropagation alternatives previously failed.

Core claim

Augmenting the free energy function with feedback matrices V_{L→i} that couple selected hidden states directly to the clamped output error supplies a depth-independent correction at every inference step; because the coupling is linear, the highway pathway reaches interior layers without the exponential decay that occurs when error propagates through the usual Jacobian chain, while the local PC synaptic update rule remains intact.

What carries the argument

Highway error propagation (HEP) via linear feedback matrices V_{L→i} that couple hidden states directly to output error.

If this is right

  • MLPs up to 128 layers deep can be trained effectively under predictive coding on MNIST and Fashion-MNIST.
  • Classification accuracy remains stable as network depth increases when HEP is used.
  • The error correction magnitude delivered to any hidden layer stays independent of total depth.
  • The local synaptic update rules of predictive coding continue to apply without modification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar linear bypass paths might be added to other local-learning schemes to improve depth scaling.
  • The approach could be tested on datasets larger than MNIST to check whether the depth robustness generalizes.
  • Because the correction is linear, hardware implementations that support only local updates might adopt HEP without extra global communication.

Load-bearing premise

Inserting the linear feedback matrices supplies a depth-independent correction at every inference step while leaving the local PC synaptic update rule intact.

What would settle it

Train identical 128-layer MLPs on MNIST once with HEP and once with standard predictive coding, then compare final test accuracy to check whether the non-HEP version shows clear depth-dependent degradation.

Figures

Figures reproduced from arXiv: 2606.22744 by Alexander G. Ororbia, Amirhossein Mohammadi.

Figure 1
Figure 1. Figure 1: Per-layer RMS prediction error ∥eℓ∥ (color) versus layer (0 = input) and inference iteration for a depth-64 MLP. In (a) vanilla PC and (b) µPC the output error only crawls back along the network, stalling deep and leaving the input-side layers at the numerical floor; (c) HEP’s error highways inject it directly into the hidden layers, guiding the learning procedure early in inference. problem better conditi… view at source ↗
Figure 2
Figure 2. Figure 2: Our proposed error highways in a deep PCN. Each (error) highway VL→i transmits the output error ϵL straight to a selected hidden layer i ∈ S within one hop (or E-step), such that the correction applied to that layer does not decay with depth. This stands in contrast to a vanilla PCN, where errors lose a factor of λ per hop. (specifically, the output’s dedicated pair of error units). The projection, in its … view at source ↗
Figure 3
Figure 3. Figure 3: Highway strength α vs. test accuracy at depth 64 (MNIST). that the proposed error highways deliver the output error to the interior of the network from the very first inference steps. Sweeping the highway strength. If the highway term is what enables learning at depth, turning it off should trace a path back to vanilla PC. At depth 64 on MNIST we sweep α in Eq. 7 with every other setting fixed ( [PITH_FUL… view at source ↗
Figure 4
Figure 4. Figure 4: Per-layer RMS prediction error versus layer and inference step for the same depth-64 MLP as Figure 1, [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Energy of selected layers over inference steps under plain gradient descent. Only the layers next to the [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Energy of the same layers under Adam. Every tracked layer activates early and its energy levels off at a [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

Predictive coding networks (PCNs) offer a biologically-plausible, local-learning alternative to back-propagation of errors (backprop). Nevertheless, they have remained largely confined to shallow architectures and evaluated on simple machine intelligence benchmarks. A central obstacle to scaling PCNs is that the learning signal decays rapidly as it propagates away from the clamped boundaries, leaving interior layers effectively unchanged. To directly counter this problem, we propose highway error propagation (HEP), a scheme that augments the free energy function underlying predictive coding (PC) by altering its neural structure with feedback matrices $V_{L\to i}$ that couple selected hidden states directly to the clamped output error. Since this coupling is linear in the hidden state, the highway pathway delivers a correction at every inference step whose magnitude is independent of depth, in contrast to vanilla PC where the output error reaches the $i$-th hidden layer with attenuation that decays exponentially in depth. This bypasses the Jacobian chain while preserving the local PC synaptic update rule. On MNIST and Fashion-MNIST, we show that HEP effectively trains MLPs of up to 128 layers with accuracy that is robust with respect to depth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes highway error propagation (HEP) to scale predictive coding networks (PCNs) to deep architectures. It augments the free energy function with linear feedback matrices V_{L→i} that directly couple selected hidden states h_i to the clamped output error. This is claimed to deliver a depth-independent additive drive on the inference dynamics, bypassing the exponential attenuation of error signals through the Jacobian chain in vanilla PC, while preserving the local PC synaptic update rule. The central empirical claim is that HEP trains MLPs of up to 128 layers on MNIST and Fashion-MNIST with accuracy robust to depth.

Significance. If the locality of all updates (including for the V matrices) is rigorously preserved and the depth-robust accuracy is demonstrated with proper controls, the result would be significant for the field of biologically plausible alternatives to backprop, as it directly targets the vanishing-signal obstacle that has confined PCNs to shallow networks.

major comments (2)
  1. [Abstract] Abstract (paragraph describing HEP): the claim that the linear coupling via V_{L→i} 'preserves the local PC synaptic update rule' is load-bearing for the contribution but lacks a derivation showing the update for V itself. If V_{L→i} are trainable parameters, their learning rule would require non-local information spanning from layer L to layer i, which cannot be realized by the standard local PC rule (product of adjacent pre- and post-synaptic activities). This distinction must be resolved to substantiate the 'local-learning alternative' framing.
  2. [Abstract] Abstract (empirical claim): the statement that HEP 'effectively trains MLPs of up to 128 layers with accuracy that is robust with respect to depth' is central but unsupported by any protocol, baseline (standard PC or backprop), ablation on V, number of runs, or error bars in the supplied text; without these the robustness claim cannot be evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments. We address each of the major comments below and indicate where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph describing HEP): the claim that the linear coupling via V_{L→i} 'preserves the local PC synaptic update rule' is load-bearing for the contribution but lacks a derivation showing the update for V itself. If V_{L→i} are trainable parameters, their learning rule would require non-local information spanning from layer L to layer i, which cannot be realized by the standard local PC rule (product of adjacent pre- and post-synaptic activities). This distinction must be resolved to substantiate the 'local-learning alternative' framing.

    Authors: The V matrices are fixed feedback connections that do not undergo learning; they are architectural augmentations to the free energy function. Consequently, there is no update rule for V, and the standard local PC synaptic updates for the weight matrices are preserved without modification. The full manuscript elaborates on this structure. We will revise the abstract and relevant sections to explicitly note that the V matrices are fixed and not subject to the PC learning rule. revision: yes

  2. Referee: [Abstract] Abstract (empirical claim): the statement that HEP 'effectively trains MLPs of up to 128 layers with accuracy that is robust with respect to depth' is central but unsupported by any protocol, baseline (standard PC or backprop), ablation on V, number of runs, or error bars in the supplied text; without these the robustness claim cannot be evaluated.

    Authors: The abstract provides a high-level summary of the results. The full manuscript contains the detailed experimental setup, including protocols, baselines such as standard PC and backpropagation, ablations on the role of V, statistics over multiple runs, and error bars, all demonstrating the depth-independent performance on MNIST and Fashion-MNIST. We believe the claim is substantiated in the body of the paper. revision: no

Circularity Check

0 steps flagged

No significant circularity; structural augmentation with empirical results

full rationale

The paper proposes HEP by augmenting the free energy function with linear feedback matrices V_{L→i} that couple hidden states to output error. This is presented as a direct structural change that yields depth-independent drive by construction of the added linear terms, while the claim of preserved local PC updates is asserted as a consequence of the formulation. No parameters are fitted to data and then relabeled as predictions, no self-citations form the load-bearing justification, and the MNIST/Fashion-MNIST accuracy results are reported as experimental outcomes rather than derived tautologies. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only the abstract is available, so the ledger records the new feedback matrices introduced by the paper; no free parameters or background axioms are explicitly stated in the provided text.

invented entities (1)
  • feedback matrices V_{L o i} no independent evidence
    purpose: to couple selected hidden states directly to the clamped output error in a linear fashion
    Introduced in the abstract to create the highway pathway that bypasses exponential attenuation.

pith-pipeline@v0.9.1-grok · 5732 in / 1248 out tokens · 28493 ms · 2026-06-26T09:40:15.137138+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 5 canonical work pages · 3 internal anchors

  1. [1]

    How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation

    BENGIO, Y. How auto-encoders could provide credit assignment in deep networks via target propagation. arXiv preprint arXiv:1407.7906(2014)

  2. [2]

    Accelerated predictive coding networks via direct kolen–pollack feedback alignment, 2026

    CASNICI, D., LEFEBVRE, M., DAUWELS, J.,ANDFRENKEL, C. Accelerated predictive coding networks via direct kolen–pollack feedback alignment, 2026

  3. [3]

    P., LAIRD, N

    DEMPSTER, A. P., LAIRD, N. M.,ANDRUBIN, D. B. Maximum likelihood from incomplete data via the em algorithm.Journal of the royal statistical society: series B (methodological) 39, 1 (1977), 1–22

  4. [4]

    A theory of cortical responses.Philosophical Transactions of the Royal Society B: Biological Sciences 360, 1456 (04 2005), 815–836

    FRISTON, K. A theory of cortical responses.Philosophical Transactions of the Royal Society B: Biological Sciences 360, 1456 (04 2005), 815–836

  5. [5]

    epc: Fast and deep predictive coding for digital hardware, 2026

    GOEMAERE, C., OLIVIERS, G., BOGACZ, R.,ANDDEMEESTER, T. epc: Fast and deep predictive coding for digital hardware, 2026

  6. [6]

    H., KIM, H., SUNG, Y., JO, Y., KANG, M

    HA, M. H., KIM, H., SUNG, Y., JO, Y., KANG, M. S.,ANDLEE, S. W. Stable and scalable deep predictive coding networks with meta-prediction errors. InThe Fourteenth International Conference on Learning Representations(2026)

  7. [7]

    Hierarchy or heterarchy? a theory of long-range connections for the sensorimotor brain.arXiv preprint arXiv:2507.05888(2025)

    HAWKINS, J., LEADHOLM, N.,ANDCLAY, V. Hierarchy or heterarchy? a theory of long-range connections for the sensorimotor brain.arXiv preprint arXiv:2507.05888(2025)

  8. [8]

    Deep residual learning for image recognition

    HE, K., ZHANG, X., REN, S.,ANDSUN, J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition(2016), pp. 770–778

  9. [9]

    M.,ANDBUCKLEY, C

    INNOCENTI, F., ACHOUR, E. M.,ANDBUCKLEY, C. L. µPC: Scaling predictive coding to 100+ layer networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2025)

  10. [10]

    P.,ANDBA, J

    KINGMA, D. P.,ANDBA, J. Adam: A method for stochastic optimization. InICLR (Poster)(2015)

  11. [11]

    MILLIDGE, B., TSCHANTZ, A.,ANDBUCKLEY, C. L. Predictive coding approximates backprop along arbitrary computation graphs.Neural Computation 34, 6 (05 2022), 1329–1368

  12. [12]

    Direct feedback alignment provides learning in deep neural networks

    NØKLAND, A. Direct feedback alignment provides learning in deep neural networks. InAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain(2016), D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett, Eds., pp. 1037–1045

  13. [13]

    The neural coding framework for learning generative models.Nature Communications 13, 1 (Apr

    ORORBIA, A.,ANDKIFER, D. The neural coding framework for learning generative models.Nature Communications 13, 1 (Apr. 2022)

  14. [14]

    L.,ANDKIFER, D

    ORORBIA, A., MALI, A., GILES, C. L.,ANDKIFER, D. Continual learning of recurrent neural networks by locally aligning distributed representations.IEEE transactions on neural networks and learning systems 31, 10 (2020), 4267–4278

  15. [15]

    L.,ANDKIFER, D

    ORORBIA, A., MALI, A., GILES, C. L.,ANDKIFER, D. Lifelong neural predictive coding: Learning cumulatively online without forgetting.Advances in Neural Information Processing Systems 35(2022), 5867–5881

  16. [16]

    ORORBIA, A. G. Brain-inspired machine intelligence: A survey of neurobiologically-plausible credit assignment.arXiv preprint arXiv:2312.09257(2023)

  17. [17]

    G.,ANDMALI, A

    ORORBIA, A. G.,ANDMALI, A. Biologically motivated algorithms for propagating local target representa- tions. InProceedings of the aaai conference on artificial intelligence(2019), vol. 33, pp. 4651–4658

  18. [18]

    G., MALI, A., KIFER, D.,ANDGILES, C

    ORORBIA, A. G., MALI, A., KIFER, D.,ANDGILES, C. L. Backpropagation-free deep learning with recursive local representation alignment.Proceedings of the AAAI Conference on Artificial Intelligence 37, 8 (2023), 9327–9335

  19. [19]

    On the difficulty of training recurrent neural networks

    PASCANU, R., MIKOLOV, T.,ANDBENGIO, Y. On the difficulty of training recurrent neural networks. In International conference on machine learning(2013), Pmlr, pp. 1310–1318

  20. [20]

    The entangled brain.Journal of cognitive neuroscience 35, 3 (2023), 349–360

    PESSOA, L. The entangled brain.Journal of cognitive neuroscience 35, 3 (2023), 349–360

  21. [21]

    Faster predictive coding networks via better initialization, 2026

    PINCHETTI, L., FRIEDER, S., LUKASIEWICZ, T.,ANDSALVATORI, T. Faster predictive coding networks via better initialization, 2026. 9 Preprint

  22. [22]

    Towards the training of deeper predictive coding neural networks, 2025

    QI, C., FORASASSI, M., LUKASIEWICZ, T.,ANDSALVATORI, T. Towards the training of deeper predictive coding neural networks, 2025

  23. [23]

    RAO, R. P. N.,ANDBALLARD, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature Neuroscience 2, 1 (1999), 79–87

  24. [24]

    E., HINTON, G

    RUMELHART, D. E., HINTON, G. E.,ANDWILLIAMS, R. J. Learning representations by back-propagating errors.nature 323, 6088 (1986), 533–536

  25. [25]

    L., LUKASIEWICZ, T., RAO, R

    SALVATORI, T., MALI, A., BUCKLEY, C. L., LUKASIEWICZ, T., RAO, R. P., FRISTON, K.,ANDORORBIA, A. A survey on neuro-mimetic deep learning via predictive coding.Neural Networks(2025), 108161

  26. [26]

    A stable, fast, and fully automatic learning algorithm for predictive coding networks

    SALVATORI, T., SONG, Y., YORDANOV, Y., MILLIDGE, B., SHA, L., EMDE, C., XU, Z., BOGACZ, R., ANDLUKASIEWICZ, T. A stable, fast, and fully automatic learning algorithm for predictive coding networks. InInternational Conference on Learning Representations(2024), vol. 2024, pp. 19607–19631

  27. [27]

    Recurrent dynamics in the cerebral cortex: Integration of sensory evidence with stored knowledge.Proceedings of the National Academy of Sciences 118, 33 (2021), e2101043118

    SINGER, W. Recurrent dynamics in the cerebral cortex: Integration of sensory evidence with stored knowledge.Proceedings of the National Academy of Sciences 118, 33 (2021), e2101043118

  28. [28]

    Highway Networks

    SRIVASTAVA, R. K., GREFF, K.,ANDSCHMIDHUBER, J. Highway networks.arXiv preprint arXiv:1505.00387(2015)

  29. [29]

    WHITTINGTON, J. C. R.,ANDBOGACZ, R. An approximation of the error backpropagation algorithm in a predictive coding network with local hebbian synaptic plasticity.Neural Computation 29, 5 (05 2017), 1229–1262

  30. [30]

    NeuroAI and Beyond: Bridging Between Advances in Neuroscience and ArtificialIntelligence

    ZADOR, A., FELLOUS, J.-M., SEJNOWSKI, T., ADAM, G., AIMONE, J. B., AKWABOAH, A., ALOIMONOS, Y., ALONSO, C. A., BARTOLOZZI, C., BENNINGTON, M. J.,ET AL. Neuroai and beyond: Bridging between advances in neuroscience and artificialintelligence.arXiv preprint arXiv:2604.18637(2026)

  31. [31]

    G., SRIVASTAVA, R

    ZILLY, J. G., SRIVASTAVA, R. K., KOUTNIK, J.,ANDSCHMIDHUBER, J. Recurrent highway networks. In International conference on machine learning(2017), PMLR, pp. 4189–4198. 10 Preprint A The Vanishing Learning Signal: Transient and Steady-State Derivations The analyses in this appendix consolidate two complementary results established by recent studies of dept...