pith. sign in

arxiv: 1907.07255 · v1 · pith:FGTRQOYInew · submitted 2019-07-15 · 💻 cs.NE · cs.LG

Iterative temporal differencing with random synaptic feedback weights support error backpropagation for deep learning

Pith reviewed 2026-05-24 21:26 UTC · model grok-4.3

classification 💻 cs.NE cs.LG
keywords error backpropagationtemporal differencingrandom feedback alignmentbiologically plausible learningdeep neural networksSTDPactivation functionsweight updates
0
0 comments X

The pith

Error backpropagation works without differentiable activation functions by using iterative temporal differencing with fixed random feedback weights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the derivative of an activation function, normally required for backpropagation, can be replaced by iterative temporal differencing that uses fixed random synaptic feedback alignment. This substitution keeps the weight updates functionally sufficient for training deep networks while removing the need for smooth, differentiable activations. The change makes the algorithm closer to biological rules such as spike-timing-dependent plasticity. Readers should care because it removes a major obstacle to building models that respect biological constraints on how neurons compute and communicate errors.

Core claim

Replacing the activation-function derivative in the backpropagation rule with iterative temporal differencing driven by fixed random feedback weights produces weight updates that support error-driven learning in deep networks without requiring differentiable activations.

What carries the argument

Iterative temporal differencing with fixed random synaptic feedback weights, which supplies the missing derivative signal in the weight-update equation.

If this is right

  • Deep networks can be trained with non-differentiable or threshold activations.
  • The learning rule becomes compatible with spike-timing-dependent plasticity mechanisms.
  • Fixed random feedback weights suffice to propagate error signals across layers.
  • The approach supports direct integration of STDP-style error backpropagation into deep learning architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could enable training of networks whose activations are strictly binary or discrete, which standard backpropagation cannot handle directly.
  • It may reduce sensitivity to the precise symmetry of forward and backward weights, addressing part of the weight-transport problem.
  • Similar temporal-differencing replacements could be tested in recurrent or spiking networks where exact derivatives are unavailable.

Load-bearing premise

Iterative temporal differencing combined with fixed random feedback alignment produces weight updates that are functionally equivalent to or sufficient for the gradients required by error backpropagation.

What would settle it

Train a deep network with non-differentiable activations (step functions) using the proposed method and compare final test accuracy against the same architecture trained with standard backpropagation; large consistent failure of the new method on tasks where backpropagation succeeds would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.07255 by Aras R. Dargazany.

Figure 1
Figure 1. Figure 1: figure 1 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Vanilla backprop vs feedback alignment vs iterative temporal differencing. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The experimental results on MNIST dataset from top to bottom order: [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

This work shows that a differentiable activation function is not necessary any more for error backpropagation. The derivative of the activation function can be replaced by an iterative temporal differencing using fixed random feedback alignment. Using fixed random synaptic feedback alignment with an iterative temporal differencing is transforming the traditional error backpropagation into a more biologically plausible approach for learning deep neural network architectures. This can be a big step toward the integration of STDP-based error backpropagation in deep learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that a differentiable activation function is no longer necessary for error backpropagation. The derivative of the activation function can be replaced by an iterative temporal differencing mechanism that employs fixed random feedback alignment. This replacement is said to transform traditional error backpropagation into a more biologically plausible learning rule for deep networks and to constitute a step toward integrating STDP-based error backpropagation.

Significance. If substantiated with a derivation showing equivalence or sufficiency of the resulting updates and with supporting experiments, the result would be significant for neuromorphic and biologically inspired deep learning. It would remove the differentiability requirement and open a route to STDP-compatible backpropagation in deep architectures. The current manuscript, however, contains no such derivation, conditions of applicability, or empirical validation, so the claimed advance cannot be evaluated.

major comments (2)
  1. [Abstract] Abstract: the central claim that iterative temporal differencing replaces the activation-function derivative and thereby supports error backpropagation is asserted without any derivation, approximation analysis, or set of equations showing how the temporal-difference term approximates the chain-rule factor. No conditions (iteration count, time constants, depth) under which the replacement would hold are stated.
  2. [Abstract] Abstract: the assertion that the method produces weight updates that are functionally equivalent or directionally sufficient for backpropagation gradients is made without any proof, numerical verification, or comparison to standard backpropagation on even a shallow network.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review. We address the two major comments below and indicate planned revisions where the manuscript can be strengthened.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that iterative temporal differencing replaces the activation-function derivative and thereby supports error backpropagation is asserted without any derivation, approximation analysis, or set of equations showing how the temporal-difference term approximates the chain-rule factor. No conditions (iteration count, time constants, depth) under which the replacement would hold are stated.

    Authors: The abstract is a concise summary. The main text introduces the iterative temporal differencing procedure with fixed random feedback and describes how the differencing term substitutes for the activation derivative in the back-propagation update. We acknowledge that an explicit derivation, approximation bounds, and the precise conditions (e.g., iteration count, time constants, network depth) are not stated formally. These elements will be added in a revised version. revision: yes

  2. Referee: [Abstract] Abstract: the assertion that the method produces weight updates that are functionally equivalent or directionally sufficient for backpropagation gradients is made without any proof, numerical verification, or comparison to standard backpropagation on even a shallow network.

    Authors: The manuscript argues that the combination of iterative temporal differencing and random feedback alignment yields updates that are directionally aligned with back-propagation gradients. We agree that neither a formal proof of equivalence nor numerical comparisons on shallow networks are provided. Both a short theoretical argument and empirical verification on shallow networks will be included in the revision. revision: yes

Circularity Check

0 steps flagged

No derivation chain or equations present to inspect for circularity

full rationale

The provided abstract and context contain only a high-level conceptual claim that iterative temporal differencing with fixed random feedback alignment replaces the activation derivative for backpropagation. No equations, derivations, self-citations, or load-bearing steps are exhibited in the text, so no reduction to inputs by construction can be identified. The paper is self-contained at the level of a proposal without a mathematical chain that could exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, no free parameters, axioms, or invented entities are explicitly stated or can be extracted.

pith-pipeline@v0.9.0 · 5596 in / 883 out tokens · 22271 ms · 2026-05-24T21:26:22.439546+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 4 internal anchors

  1. [1]

    Iterative temporal differencing with random synaptic feedback weights support error backpropagation for deep learning

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error-propagation,” in Parallel Distributed Process- arXiv:1907.07255v1 [cs.NE] 15 Jul 2019 Fig. 1. Vanilla backprop vs feedback alignment vs iterative temporal differencing. Problems Solutions Very Deep Hierarchical convo- lutional layers of representa- tion solved D...

  2. [2]

    Learning internal representations by error propagation,

    ——, “Learning internal representations by error propagation,” CALI- FORNIA UNIV SAN DIEGO LA JOLLA INST FOR, Tech. Rep., 1985

  3. [3]

    Learning representations by back-propagating errors,

    ——, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986

  4. [4]

    Backpropagation applied to handwritten zip code recognition,

    Y . LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation , vol. 1, no. 4, pp. 541–551, 1989

  5. [5]

    Deep learning,

    Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015

  6. [6]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural infor- mation processing systems , 2012, pp. 1097–1105

  7. [7]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 770–778

  8. [8]

    Highway Networks

    R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387 , 2015

  9. [9]

    Mastering the game of go with deep neural networks and tree search,

    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016

  10. [10]

    Human-level control through deep reinforcement learning,

    V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015

  11. [11]

    Playing Atari with Deep Reinforcement Learning

    V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing atari with deep reinforcement learn- ing,” arXiv preprint arXiv:1312.5602 , 2013

  12. [12]

    D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in 2 Fig. 2. The experimental results on MNIST dataset from top to bottom order: FBA + ITD-y, FBA + ITD-dy, FBA and VBP. Some acronyms: iterative temporal differencing (ITD , feedback alignemtn (FBA), vanilla backprop (VBP) the cat’s striate cortex,” The Journal of physiology , vol. 148, no...

  13. [13]

    Neocognitron: A hierarchical neural network capable of visual pattern recognition,

    K. Fukushima, “Neocognitron: A hierarchical neural network capable of visual pattern recognition,” Neural networks, vol. 1, no. 2, pp. 119–130, 1988

  14. [14]

    Cognitron: A self-organizing multilayered neural network,

    ——, “Cognitron: A self-organizing multilayered neural network,” Bio- logical cybernetics , vol. 20, no. 3-4, pp. 121–136, 1975

  15. [15]

    Using goal-driven deep learning models to understand sensory cortex,

    D. L. Yamins and J. J. DiCarlo, “Using goal-driven deep learning models to understand sensory cortex,” Nature neuroscience, vol. 19, no. 3, pp. 356–365, 2016

  16. [16]

    Learning representations by recirculation,

    G. E. Hinton and J. L. McClelland, “Learning representations by recirculation,” in Neural information processing systems , 1988, pp. 358– 366

  17. [17]

    How to do backpropagation in a brain,

    G. Hinton, “How to do backpropagation in a brain,” in Invited talk at the NIPS2007 Deep Learning Workshop , vol. 656, 2007

  18. [18]

    Ran- dom synaptic feedback weights support error backpropagation for deep learning,

    T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman, “Ran- dom synaptic feedback weights support error backpropagation for deep learning,” Nature Communications, vol. 7, 2016

  19. [19]

    Towards deep learning with segregated dendrites

    J. Guergiuev, T. P. Lillicrap, and B. A. Richards, “Biologically feasible deep learning with segregated dendrites,” arXiv preprint arXiv:1610.00161, 2016

  20. [20]

    Dropout: a simple way to prevent neural networks from overfitting

    N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting.” Journal of machine learning research , vol. 15, no. 1, pp. 1929–1958, 2014. 3