A Synthesizable RTL Implementation of Predictive Coding Networks

Timothy Oh

arxiv: 2603.18066 · v2 · submitted 2026-03-18 · 💻 cs.NE · cs.AI· cs.AR· cs.LG

A Synthesizable RTL Implementation of Predictive Coding Networks

Timothy Oh This is my paper

Pith reviewed 2026-05-15 09:23 UTC · model grok-4.3

classification 💻 cs.NE cs.AIcs.ARcs.LG

keywords predictive codingRTL implementationhardware neural networkslocal learningsynthesizable architectureprediction error dynamicsdigital substrateboundary clamping

0 comments

The pith

A synthesizable RTL design executes predictive coding learning dynamics directly in hardware using only local layer updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a complete digital hardware substrate for predictive coding networks rather than proposing a new learning algorithm. Each neuron core holds its own activity, prediction error, and weights while exchanging signals only with adjacent layers through fixed connections. A uniform clamping operation on boundary neurons enables supervised learning and inference without altering the internal update schedule. The architecture relies on a sequential multiply-accumulate datapath driven by a fixed finite-state machine so that the system evolves deterministically under local rules. Task behavior is imposed through connectivity, parameters, and clamps instead of any instruction stream inside the learning engine.

Core claim

The central claim is a deterministic, synthesizable RTL architecture that directly implements discrete-time predictive coding updates. Each neural core maintains activity, prediction error, and synaptic weights and communicates solely with neighboring layers. Supervised learning and inference are realized by a uniform per-neuron clamping primitive that sets boundary conditions while leaving the fixed local update schedule unchanged. The design uses a sequential MAC datapath and finite-state controller so the hardware evolves under local prediction-error dynamics with task structure supplied externally through wiring and parameters.

What carries the argument

Per-neuron neural core that performs local prediction-error dynamics through a sequential MAC datapath and fixed finite-state schedule, controlled by a uniform clamping primitive for boundaries.

If this is right

Local per-core updates remove the requirement for centralized memory traffic and global error propagation that backpropagation demands.
The same hardware substrate performs both inference and learning phases without any change to its internal schedule.
Replicating identical cores with hardwired inter-layer links produces larger networks while preserving the fixed local rule set.
Different tasks can be realized on identical hardware simply by changing connectivity patterns and boundary clamping values.
No program counter or instruction fetch is required inside the learning substrate because behavior is encoded in parameters and wiring.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such an architecture could lower energy use in edge devices by eliminating the data movement costs typical of backpropagation implementations.
The fixed schedule opens the possibility of very low-power ASIC realizations where timing is fully deterministic.
Extending the design to asynchronous handshakes between layers could relax the strict synchronous assumption while retaining local dynamics.
Direct hardware measurement of convergence speed versus software floating-point versions would test whether the discrete mapping introduces any systematic bias.

Load-bearing premise

The discrete-time predictive coding update equations can be mapped onto a fixed finite-state schedule and sequential MAC datapath without losing the essential local dynamics or needing extra global coordination.

What would settle it

Synthesize the RTL to an FPGA, configure a small two-layer network with known weights, apply identical input clamping, and verify whether the observed hardware weight changes and error signals match a software reference simulation of the same update equations.

Figures

Figures reproduced from arXiv: 2603.18066 by Timothy Oh.

**Figure 1.** Figure 1: Training curve for the 2 → 4 → 3 ReLU network. After a brief transient at epoch 1, MSE descends rapidly then settles into a slow-improvement plateau. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗

**Figure 2.** Figure 2: Training curve for the 2 → 2 → 1 tanh network. MSE drops more than two orders of magnitude by epoch 3 and subsequently improves slowly to a small residual. 8.3 Architectural scaling To test whether the local update dynamics generalise across network sizes, the same tick schedule, clamping interface, and training protocol are applied to three architectures using a single parameterised testbench (tb_scale_fu… view at source ↗

**Figure 3.** Figure 3: Training curves for three architectures of increasing dimension, trained on 256 [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Stability phase diagram for the 2 → 4 → 3 architecture over a 8 × 8 grid of (α, γ) pairs. All 64 points are either converged (green) or stagnated (orange). Tick-budget sweep. A separate sweep fixes (α, γ) = (0.01, 0.04) and varies the per-sample inference tick budget T ∈ {1, 2, 5, 10, 20, 50, 100, 200, 500} over 20 epochs ( [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Final MSE after 20 epochs as a function of inference tick budget [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Online learning curves (no replay, one pass through data) for the PC network and [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Per-tick agreement between float64 Python reference and float32-emulated variant [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

Backpropagation has enabled modern deep learning but is difficult to realize as an online, fully distributed hardware learning system due to global error propagation, phase separation, and heavy reliance on centralized memory. Predictive coding offers an alternative in which inference and learning arise from local prediction-error dynamics between adjacent layers. This paper presents a digital architecture that implements a discrete-time predictive coding update directly in hardware. Each neural core maintains its own activity, prediction error, and synaptic weights, and communicates only with adjacent layers through hardwired connections. Supervised learning and inference are supported via a uniform per-neuron clamping primitive that enforces boundary conditions while leaving the internal update schedule unchanged. The design is a deterministic, synthesizable RTL substrate built around a sequential MAC datapath and a fixed finite-state schedule. Rather than executing a task-specific instruction sequence inside the learning substrate, the system evolves under fixed local update rules, with task structure imposed through connectivity, parameters, and boundary conditions. The contribution of this work is not a new learning rule, but a complete synthesizable digital substrate that executes predictive-coding learning dynamics directly in hardware.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches an RTL architecture for predictive coding but leaves the actual correctness unverified.

read the letter

The key point is that this is an implementation paper, not a theoretical one. It describes a synthesizable RTL design for predictive coding networks using per-core storage for activity, error, and weights, with only adjacent communication and a clamping mechanism for boundaries. It does a decent job outlining how to map the discrete-time PC updates to a digital datapath with a fixed finite-state machine. The clamping primitive is presented as a way to support supervised learning without altering the internal update rules, which is a practical touch. The main weakness is the complete absence of any implementation evidence. No synthesis results, no resource usage, no simulation waveforms, no comparison to other hardware. The claim that this executes the PC dynamics directly rests entirely on the description. That makes it hard to assess if the design is correct or efficient. There's also the question of whether the sequential MAC and fixed schedule truly keep things local. Predictive coding relies on independent local updates, but a global clock and FSM could introduce synchronization that isn't in the math model. The paper asserts it stays local, but without details or tests, it's unclear if that's accurate. This work would interest hardware designers focused on alternative learning rules for neuromorphic or edge devices. A reader looking for concrete RTL ideas might find the architecture description useful as a starting point. I think it deserves peer review if the authors provide the missing verification and clarify the schedule's impact on locality. Otherwise, it's too preliminary.

Referee Report

2 major / 1 minor

Summary. The paper claims to deliver a complete synthesizable RTL digital substrate that directly executes discrete-time predictive-coding dynamics in hardware. Each neural core maintains local activity, prediction error, and weights; communication is restricted to hardwired adjacent-layer links; supervised learning and inference are realized by a uniform clamping primitive that leaves the internal fixed finite-state schedule unchanged. The architecture is built around a sequential MAC datapath and a deterministic FSM, with task structure imposed only through connectivity, parameters, and boundary conditions rather than instruction sequences.

Significance. If the mapping from the mathematical PC update rules to the fixed hardware schedule is shown to preserve strictly local dynamics, the work would supply a concrete, synthesizable hardware substrate for distributed online learning that avoids backpropagation’s global error propagation and centralized memory requirements. This would be a useful engineering contribution in the neural-engineering and hardware-ML communities.

major comments (2)

[Architecture description (throughout)] The central claim that the design constitutes a working synthesizable RTL substrate rests on an unverified implementation. No simulation waveforms, timing diagrams, post-synthesis resource numbers, or functional verification results are presented to confirm that the sequential MAC operations and fixed FSM schedule correctly realize the discrete-time PC equations without introducing ordering or synchronization artifacts.
[Update schedule and datapath mapping] The weakest assumption—that the discrete-time PC update equations map to a fixed finite-state schedule and sequential MAC datapath while preserving strictly local, per-neuron dynamics—is not demonstrated. The shared clock and global FSM schedule may implicitly enforce layer-wide synchronous updates that are not part of the original local prediction-error rules; no formal equivalence argument or cycle-accurate trace is supplied.

minor comments (1)

[Notation and primitives] Notation for the clamping primitive and the per-neuron state variables could be made more explicit (e.g., by adding a small table relating mathematical symbols to RTL signals).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential engineering contribution of a synthesizable RTL substrate for predictive coding. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Architecture description (throughout)] The central claim that the design constitutes a working synthesizable RTL substrate rests on an unverified implementation. No simulation waveforms, timing diagrams, post-synthesis resource numbers, or functional verification results are presented to confirm that the sequential MAC operations and fixed FSM schedule correctly realize the discrete-time PC equations without introducing ordering or synchronization artifacts.

Authors: We agree that the absence of explicit verification artifacts weakens the central claim. The current manuscript provides a detailed architectural description and RTL-level mapping but does not include simulation or synthesis results. In the revised version we will add post-synthesis resource numbers for a standard FPGA target, timing diagrams of the FSM schedule, and cycle-accurate simulation waveforms that confirm the sequential MAC operations realize the discrete-time PC equations without ordering or synchronization artifacts. revision: yes
Referee: [Update schedule and datapath mapping] The weakest assumption—that the discrete-time PC update equations map to a fixed finite-state schedule and sequential MAC datapath while preserving strictly local, per-neuron dynamics—is not demonstrated. The shared clock and global FSM schedule may implicitly enforce layer-wide synchronous updates that are not part of the original local prediction-error rules; no formal equivalence argument or cycle-accurate trace is supplied.

Authors: Each neural core updates its activity, error, and weights using only locally stored values and hardwired signals from adjacent layers; the global FSM merely sequences these local operations in an order that matches the discrete-time PC equations (error computation precedes weight update within the same time step). The shared clock does not propagate non-local information beyond the intended adjacent-layer connectivity. In the revision we will supply a formal mapping from each PC update rule to specific FSM states together with cycle-accurate execution traces demonstrating preservation of per-neuron locality. revision: yes

Circularity Check

0 steps flagged

No circularity: implementation paper maps existing equations to RTL without self-referential derivation

full rationale

The paper describes a synthesizable RTL hardware substrate for executing pre-existing discrete-time predictive coding update rules. No mathematical result is derived from fitted parameters, no self-citation chain is load-bearing for a uniqueness claim, and no ansatz or renaming is presented as a new prediction. The central contribution is the mapping to a fixed FSM and sequential MAC datapath, whose correctness is assessed against standard digital design rules rather than reducing to the paper's own outputs by construction. The provided abstract and reader's summary contain no equations or citations that exhibit any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard digital logic assumptions and the existence of a discrete-time predictive coding formulation; no new free parameters, invented entities, or ad-hoc axioms are introduced beyond conventional RTL synthesis rules.

axioms (1)

standard math Standard RTL synthesis and timing closure rules apply to the sequential MAC datapath and finite-state schedule.
The design assumes conventional digital hardware constraints and synthesis tools will produce a working circuit from the described schedule.

pith-pipeline@v0.9.0 · 5484 in / 1194 out tokens · 32048 ms · 2026-05-15T09:23:54.267758+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The design is a deterministic, synthesizable RTL substrate built around a sequential MAC datapath and a fixed finite-state schedule... Each tick performs one explicit Euler-style state update together with one local synaptic update.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction and 8-tick period forcing unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The architecture implements a tick-based discrete-time variant... FSM stages: PRED→ERR→BACKSUM→BACKVEC→WUP→STATE

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham N. Chinya, Yongqiang Cao, Sri Harsha Choday, Georgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, Yuyun Liao, Chit-Kwan Lin, Andrew Lines, Ruokun Liu, Deepak Mathaikutty, Steven McCoy, Arnab Paul, Jonathan Tse, Guruguhanathan Venkataramanan, Yi-Hsin Weng, Andreas Wild, Yoonseok Yang, and Hong Wang. ...

work page 2018
[2]

A theory of cortical responses.Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005

Karl Friston. A theory of cortical responses.Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005

work page 2005
[3]

The free-energy principle: a rough guide to the brain?Trends in cognitive sciences, 13(7):293–301, 2009

Karl Friston. The free-energy principle: a rough guide to the brain?Trends in cognitive sciences, 13(7):293–301, 2009

work page 2009
[4]

TheSpiNNaker project.Proceedings of the IEEE, 102(5):652–665, 2014

StephenB.Furber, FrancescoGalluppi, SteveTemple, andLuisA.Plana. TheSpiNNaker project.Proceedings of the IEEE, 102(5):652–665, 2014

work page 2014
[5]

The forward-forward algorithm: Some preliminary investigations

Geoffrey Hinton. The forward-forward algorithm: Some preliminary investigations. arXiv preprint arXiv:2212.13345, 2022

work page arXiv 2022
[6]

On the global con- vergence of (fast) incremental expectation maximization methods.arXiv preprint arXiv:1910.12521, 2019

Belhal Karimi, Hoi-To Wai, Eric Moulines, and Marc Lavielle. On the global con- vergence of (fast) incremental expectation maximization methods.arXiv preprint arXiv:1910.12521, 2019

work page arXiv 1910
[7]

Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

Timothy P Lillicrap, Adam Santoro, Luke Marris, Colin J Akerman, and Geoffrey E Hinton. Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

work page 2020
[8]

Moravec.Mind Children: The Future of Robot and Human Intelligence

Hans P. Moravec.Mind Children: The Future of Robot and Human Intelligence. Harvard University Press, Cambridge, MA, 1988

work page 1988
[9]

A view of the em algorithm that justifies incremental, sparse, and other variants.Learning in graphical models, pages 355–368, 1998

Radford M Neal and Geoffrey E Hinton. A view of the em algorithm that justifies incremental, sparse, and other variants.Learning in graphical models, pages 355–368, 1998

work page 1998
[10]

Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79– 87, 1999

Rajesh PN Rao and Dana H Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79– 87, 1999

work page 1999
[11]

Learning representations by back-propagating errors.Nature, 323(6088):533–536, 1986

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors.Nature, 323(6088):533–536, 1986

work page 1986
[12]

A stable, fast, and fully automatic learning algorithm for predictive coding networks.arXiv preprint arXiv:2212.00720, 2023

Tommaso Salvatori, Yuhang Song, Yordan Yordanov, Beren Millidge, Zhenghua Xu, Lei Sha, Cornelius Emde, Rafal Bogacz, and Thomas Lukasiewicz. A stable, fast, and fully automatic learning algorithm for predictive coding networks.arXiv preprint arXiv:2212.00720, 2023

work page arXiv 2023
[13]

Petrovici, Stefan Schiefer, Stefan Scholze, Bernhard Vogginger, Robert Legenstein, Wolfgang Maass, Christian Mayr, Johannes Schemmel, and Karlheinz Meier

Sebastian Schmitt, Johann Klähn, Guillaume Bellec, Andreas Grübl, Maurice Guettler, Andreas Hartel, Stephan Hartmann, Dan Husmann de Oliveira, Kai Husmann, Vitali Karasenko, Mitja Kleider, Christoph Koke, Christian Mauch, Eric Müller, Paul Müller, Johannes Partzsch, Mihai A. Petrovici, Stefan Schiefer, Stefan Scholze, Bernhard Vogginger, Robert Legenstein...

work page 2017
[14]

James C. R. Whittington and Rafal Bogacz. An approximation of the error backpropa- gation algorithm in a predictive coding network with local hebbian synaptic plasticity. Neural Computation, 29(6):1229–1262, 2017. 16

work page 2017

[1] [1]

Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham N. Chinya, Yongqiang Cao, Sri Harsha Choday, Georgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, Yuyun Liao, Chit-Kwan Lin, Andrew Lines, Ruokun Liu, Deepak Mathaikutty, Steven McCoy, Arnab Paul, Jonathan Tse, Guruguhanathan Venkataramanan, Yi-Hsin Weng, Andreas Wild, Yoonseok Yang, and Hong Wang. ...

work page 2018

[2] [2]

A theory of cortical responses.Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005

Karl Friston. A theory of cortical responses.Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005

work page 2005

[3] [3]

The free-energy principle: a rough guide to the brain?Trends in cognitive sciences, 13(7):293–301, 2009

Karl Friston. The free-energy principle: a rough guide to the brain?Trends in cognitive sciences, 13(7):293–301, 2009

work page 2009

[4] [4]

TheSpiNNaker project.Proceedings of the IEEE, 102(5):652–665, 2014

StephenB.Furber, FrancescoGalluppi, SteveTemple, andLuisA.Plana. TheSpiNNaker project.Proceedings of the IEEE, 102(5):652–665, 2014

work page 2014

[5] [5]

The forward-forward algorithm: Some preliminary investigations

Geoffrey Hinton. The forward-forward algorithm: Some preliminary investigations. arXiv preprint arXiv:2212.13345, 2022

work page arXiv 2022

[6] [6]

On the global con- vergence of (fast) incremental expectation maximization methods.arXiv preprint arXiv:1910.12521, 2019

Belhal Karimi, Hoi-To Wai, Eric Moulines, and Marc Lavielle. On the global con- vergence of (fast) incremental expectation maximization methods.arXiv preprint arXiv:1910.12521, 2019

work page arXiv 1910

[7] [7]

Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

Timothy P Lillicrap, Adam Santoro, Luke Marris, Colin J Akerman, and Geoffrey E Hinton. Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

work page 2020

[8] [8]

Moravec.Mind Children: The Future of Robot and Human Intelligence

Hans P. Moravec.Mind Children: The Future of Robot and Human Intelligence. Harvard University Press, Cambridge, MA, 1988

work page 1988

[9] [9]

A view of the em algorithm that justifies incremental, sparse, and other variants.Learning in graphical models, pages 355–368, 1998

Radford M Neal and Geoffrey E Hinton. A view of the em algorithm that justifies incremental, sparse, and other variants.Learning in graphical models, pages 355–368, 1998

work page 1998

[10] [10]

Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79– 87, 1999

Rajesh PN Rao and Dana H Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79– 87, 1999

work page 1999

[11] [11]

Learning representations by back-propagating errors.Nature, 323(6088):533–536, 1986

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors.Nature, 323(6088):533–536, 1986

work page 1986

[12] [12]

A stable, fast, and fully automatic learning algorithm for predictive coding networks.arXiv preprint arXiv:2212.00720, 2023

Tommaso Salvatori, Yuhang Song, Yordan Yordanov, Beren Millidge, Zhenghua Xu, Lei Sha, Cornelius Emde, Rafal Bogacz, and Thomas Lukasiewicz. A stable, fast, and fully automatic learning algorithm for predictive coding networks.arXiv preprint arXiv:2212.00720, 2023

work page arXiv 2023

[13] [13]

Petrovici, Stefan Schiefer, Stefan Scholze, Bernhard Vogginger, Robert Legenstein, Wolfgang Maass, Christian Mayr, Johannes Schemmel, and Karlheinz Meier

Sebastian Schmitt, Johann Klähn, Guillaume Bellec, Andreas Grübl, Maurice Guettler, Andreas Hartel, Stephan Hartmann, Dan Husmann de Oliveira, Kai Husmann, Vitali Karasenko, Mitja Kleider, Christoph Koke, Christian Mauch, Eric Müller, Paul Müller, Johannes Partzsch, Mihai A. Petrovici, Stefan Schiefer, Stefan Scholze, Bernhard Vogginger, Robert Legenstein...

work page 2017

[14] [14]

James C. R. Whittington and Rafal Bogacz. An approximation of the error backpropa- gation algorithm in a predictive coding network with local hebbian synaptic plasticity. Neural Computation, 29(6):1229–1262, 2017. 16

work page 2017