pith. sign in

arxiv: 2605.30638 · v1 · pith:JC7PQEM7new · submitted 2026-05-28 · 💻 cs.LG · cs.AI

Score Broadcast and Decorrelation: A General Framework for Broadcast-Based Credit Assignment

Pith reviewed 2026-06-29 08:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords broadcast credit assignmenterror broadcastorthogonality principlethree-factor learningscore vector expansiondifferentiable lossescross-entropy
0
0 comments X

The pith

An orthogonality principle between output scores and hidden activations unifies broadcast-based credit assignment for general differentiable losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a single orthogonality condition between the loss score at the output and activations in hidden layers holds whenever the optimal score has conditional mean zero given the input. This condition extends the earlier EBD framework from mean-squared error to unify broadcast credit assignment across families including cross-entropy, Bregman divergences, proper scoring rules, and exponential-family negative log-likelihoods. A sympathetic reader would care because it supplies a unified theoretical basis for three-factor learning rules that use the loss score as the broadcast neuromodulatory signal, while also enabling score vector expansion to enrich the decorrelation objective.

Core claim

The central claim is that the orthogonality between the output score (the gradient of the loss with respect to the final-layer output) and hidden-layer activations, which follows from the optimal score having conditional mean zero, supplies the theoretical grounding for broadcast-based credit assignment across standard differentiable-loss families. The framework derives the cross-entropy case explicitly, characterizes the admissible loss class, and introduces score vector expansion to enrich the broadcast signal while preserving the orthogonality framework.

What carries the argument

The orthogonality principle between the output score and hidden-layer activations.

If this is right

  • The same orthogonality unifies broadcast credit assignment for cross-entropy, Bregman divergences, proper scoring rules, and exponential-family negative log-likelihoods.
  • It grounds the three-factor learning rule with the neuromodulatory factor derived as the broadcast loss score.
  • Score vector expansion enriches the broadcast signal while preserving the orthogonality framework.
  • The resulting method substantially improves performance over existing broadcast approaches on CIFAR-10 and Tiny ImageNet.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could apply directly to other differentiable losses that satisfy the conditional mean zero property even if they are not listed.
  • The same orthogonality check might serve as a diagnostic for whether a given loss family admits broadcast credit assignment.
  • Score vector expansion could be combined with other decorrelation objectives outside the broadcast setting.
  • Empirical verification of the conditional mean zero property on trained networks would test a key step in the derivation.

Load-bearing premise

The optimal score has conditional mean zero given the input.

What would settle it

An observation that the optimal score lacks conditional mean zero for a loss in the claimed class, or that the broadcast rule derived from the orthogonality fails to produce the expected decorrelation on a cross-entropy task.

Figures

Figures reproduced from arXiv: 2605.30638 by Alper T. Erdogan, Cengiz Pehlevan, Mete Erdogan, Mustafa Uzun.

Figure 1
Figure 1. Figure 1: Illustrations of the SBD framework and the empirical score–activation orthogonality. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Poisson proof-of-concept: test excess negative log-likelihood relative to the Bayes oracle, [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Poisson proof-of-concept: binned conditional-mean-zero metric versus epoch on a logarith [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training and test accuracy curves on CIFAR-10 for BP, DFA, SBD, and SBD with [PITH_FULL_IMAGE:figures/full_fig_p031_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training and test accuracy curves on CIFAR-10 for BP and SBD with score expansion [PITH_FULL_IMAGE:figures/full_fig_p032_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cosine similarity between the local SBD gradient and the diagnostic BP gradient for each [PITH_FULL_IMAGE:figures/full_fig_p034_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Training accuracy on Tiny ImageNet for BackProp, SBD, and SBD with score expansion [PITH_FULL_IMAGE:figures/full_fig_p038_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Test accuracy on Tiny ImageNet for BackProp, SBD, and SBD with score expansion [PITH_FULL_IMAGE:figures/full_fig_p038_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Training and test accuracy on Tiny ImageNet for DFA across 500 epochs. Each curve [PITH_FULL_IMAGE:figures/full_fig_p039_9.png] view at source ↗
read the original abstract

We introduce Score Broadcast and Decorrelation (SBD), a principled framework for broadcast-based credit assignment for general families of differentiable losses. Error broadcast is a biologically plausible alternative to backpropagation that sends output information to hidden layers without weight transport. The Error Broadcast and Decorrelation (EBD) framework, recently introduced for the mean-squared-error (MSE) setting, grounded this mechanism in the stochastic orthogonality of optimal estimators, under which the optimal residual is orthogonal to functions of the input. We generalize that foundation by introducing an orthogonality principle between the output score (the gradient of loss with respect to the final-layer output) and hidden-layer activations, which holds whenever the optimal score has conditional mean zero. This single principle unifies broadcast-based credit assignment across the standard differentiable-loss families, including cross-entropy, Bregman divergences, proper scoring rules, and exponential-family negative log-likelihoods. The framework supplies a theoretical grounding for the three-factor learning rule under general losses, with the neuromodulatory factor derived as the broadcast loss score. We derive the cross-entropy case explicitly, characterize the admissible loss class, and introduce a score vector expansion technique that enriches the broadcast signal while preserving the orthogonality framework. Experiments on CIFAR-10 and Tiny ImageNet show that SBD substantially improves over existing broadcast approaches, with score vector expansion delivering further gains. Overall, this work identifies the loss score as the signal to broadcast, supplies the orthogonality theory and theoretical grounding for the three-factor learning rule from neuroscience, and shows how score vector expansion enriches the decorrelation directions of the resulting objective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces Score Broadcast and Decorrelation (SBD) as a generalization of the Error Broadcast and Decorrelation (EBD) framework from the MSE setting to arbitrary differentiable losses. It posits a single orthogonality principle between the output score (gradient of the loss w.r.t. the final output) and hidden-layer activations, which is claimed to hold whenever the optimal score has conditional mean zero given the input. This principle is asserted to unify broadcast-based credit assignment across cross-entropy, Bregman divergences, proper scoring rules, and exponential-family negative log-likelihoods; the paper derives the cross-entropy case explicitly, characterizes the admissible loss class, introduces a score-vector expansion technique that preserves orthogonality, supplies a theoretical grounding for the three-factor learning rule, and reports that SBD with score-vector expansion yields substantial gains on CIFAR-10 and Tiny ImageNet.

Significance. If the orthogonality derivation and unification hold without additional assumptions, the work supplies a principled, loss-agnostic foundation for biologically plausible credit assignment that extends beyond MSE-specific results. The explicit identification of the loss score as the broadcast signal and the score-vector expansion method would constitute a concrete advance for both theoretical understanding of three-factor rules and practical algorithm design in settings that forbid weight transport.

minor comments (2)
  1. The abstract asserts that experiments demonstrate substantial improvement and further gains from score-vector expansion, yet supplies neither quantitative deltas, baseline comparisons, nor statistical details; the results section should include these to allow readers to assess the magnitude and reliability of the reported gains.
  2. The admissible loss class is characterized in the manuscript; a brief explicit statement of the necessary and sufficient conditions (e.g., in terms of the conditional-mean-zero property) would improve clarity for readers outside the immediate subfield.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the significance of the orthogonality principle, and recommendation of minor revision. No specific major comments were provided in the report, so we have no individual points to rebut. We will address any minor editorial or clarification suggestions in the revised version.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The central orthogonality principle follows from the law of total expectation once E[score|x]=0 is granted at optimality; this is a standard property of optimal estimators for cross-entropy, Bregman divergences, proper scoring rules, and exponential-family NLL, and is not obtained by fitting inside the paper or by self-citation. The unification across loss families and the three-factor rule grounding are direct consequences of this external statistical fact rather than any reduction to the paper's own inputs. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the provided derivation outline.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the stochastic orthogonality property of optimal estimators and the conditional-mean-zero property of the optimal score; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The optimal score has conditional mean zero given the input.
    Invoked to establish orthogonality between output score and hidden activations (abstract paragraph on generalization of EBD).

pith-pipeline@v0.9.1-grok · 5834 in / 1382 out tokens · 21068 ms · 2026-06-29T08:25:29.981790+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 4 canonical work pages · 3 internal anchors

  1. [1]

    Rumelhart, Geoffrey E

    David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back-propagating errors. Nature, 323:533--536, 1986

  2. [2]

    The recent excitement about neural networks

    Francis Crick. The recent excitement about neural networks. Nature, 337:129--132, 1989

  3. [3]

    Lillicrap, Adam Santoro, Luke Marris, Colin J

    Timothy P. Lillicrap, Adam Santoro, Luke Marris, Colin J. Akerman, and Geoffrey Hinton. Backpropagation and the brain. Nature Reviews Neuroscience, 21(6):335--346, 2020

  4. [4]

    Humphreys, Timothy Lillicrap, and Douglas Tweed

    Mohamed Akrout, Collin Wilson, Peter C. Humphreys, Timothy Lillicrap, and Douglas Tweed. Deep learning without weight transport. In Advances in Neural Information Processing Systems 32 (NeurIPS) , pages 974--982, 2019

  5. [5]

    James C. R. Whittington and Rafal Bogacz. Theories of error back-propagation in the brain. Trends in Cognitive Sciences, 23(3):235--250, 2019

  6. [6]

    Golkar, T

    S. Golkar, T. Tesileanu, Y. Bahroun, A. Sengupta, and D. Chklovskii. Constrained predictive coding as a biologically plausible model of the cortical hierarchy. Advances in Neural Information Processing Systems 35 (NeurIPS) , pages 14155--14169, 2022

  7. [7]

    Error Broadcast and Decorrelation as a Potential Artificial and Natural Learning Mechanism

    Mete Erdogan, Cengiz Pehlevan, and Alper Tunga Erdogan. Error Broadcast and Decorrelation as a Potential Artificial and Natural Learning Mechanism. In Advances in Neural Information Processing Systems 38 (NeurIPS), 2025

  8. [8]

    Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules

    Nicolas Fr \'e maux and Wulfram Gerstner. Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Frontiers in Neural Circuits , 9:85, 2016

  9. [9]

    Eligibility traces and plasticity on behavioral time scales: experimental support of neo- H ebbian three-factor learning rules

    Wulfram Gerstner, Marco Lehmann, Vasiliki Liakoni, Dane Corneil, and Johanni Brea. Eligibility traces and plasticity on behavioral time scales: experimental support of neo- H ebbian three-factor learning rules. Frontiers in Neural Circuits , 12:53, 2018

  10. [10]

    Learning with three factors: modulating H ebbian plasticity with errors

    ukasz Ku \'s mierz, Takuya Isomura, and Taro Toyoizumi. Learning with three factors: modulating H ebbian plasticity with errors. Current Opinion in Neurobiology , 46:170--177, 2017

  11. [11]

    Predictive reward signal of dopamine neurons

    Wolfram Schultz. Predictive reward signal of dopamine neurons. Journal of Neurophysiology , 80(1):1--27, 1998

  12. [12]

    Self-Supervised Learning with an Information Maximization Criterion

    Serdar Ozsoy, Shadi Hamdan, Sercan Arik, Deniz Yuret, and Alper Erdogan. Self-Supervised Learning with an Information Maximization Criterion. In Advances in Neural Information Processing Systems 35 (NeurIPS), pages 35240--35253, 2022

  13. [13]

    Bariscan Bozkurt, Cengiz Pehlevan, and Alper T. Erdogan. Correlative Information Maximization: A Biologically Plausible Approach to Supervised Deep Neural Networks without Weight Symmetry. In Advances in Neural Information Processing Systems 36 (NeurIPS), pages 34928--34941, 2023

  14. [14]

    Bariscan Bozkurt, Ate s \.I sfendiyaro g lu, Cengiz Pehlevan, and Alper T. Erdogan. Correlative Information Maximization Based Biologically Plausible Neural Networks for Correlated Source Separation. In International Conference on Learning Representations (ICLR), 2023

  15. [15]

    Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation

    Benjamin Scellier and Yoshua Bengio. Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation. Frontiers in Computational Neuroscience, 11:24, 2017

  16. [16]

    Equivalence of equilibrium propagation and recurrent backpropagation

    Benjamin Scellier and Yoshua Bengio. Equivalence of equilibrium propagation and recurrent backpropagation. Neural Computation , 31(2):312--329, 2019

  17. [17]

    How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation

    Yoshua Bengio. How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation. CoRR, abs/1407.7906, 2014

  18. [18]

    Difference Target Propagation

    Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, and Yoshua Bengio. Difference Target Propagation. In Machine Learning and Knowledge Discovery in Databases, pages 498--515, 2015

  19. [19]

    Geoffrey E. Hinton. The Forward-Forward Algorithm: Some Preliminary Investigations. CoRR, abs/2212.13345, 2022

  20. [20]

    Rajesh P. N. Rao and Dana H. Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2:79--87, 1999

  21. [21]

    James C. R. Whittington and Rafal Bogacz. An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity. Neural Computation, 29(5):1229--1262, 2017

  22. [22]

    Contrastive Similarity Matching for Supervised Learning

    Shanshan Qin, Nayantara Mudur, and Cengiz Pehlevan. Contrastive Similarity Matching for Supervised Learning. Neural Computation, 33(5):1300--1328, 2021

  23. [23]

    Lillicrap, Daniel Cownden, Douglas B

    Timothy P. Lillicrap, Daniel Cownden, Douglas B. Tweed, and Colin J. Akerman. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7:13276, 2016

  24. [24]

    Direct Feedback Alignment Provides Learning in Deep Neural Networks

    Arild N kland. Direct Feedback Alignment Provides Learning in Deep Neural Networks. In Advances in Neural Information Processing Systems 29 (NeurIPS), pages 1037--1045, 2016

  25. [25]

    Hinton, and Timothy Lillicrap

    Sergey Bartunov, Adam Santoro, Blake Richards, Luke Marris, Geoffrey E. Hinton, and Timothy Lillicrap. Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures. In Advances in Neural Information Processing Systems 31 (NeurIPS), 2018

  26. [26]

    Efficient Convolutional Neural Network Training with Direct Feedback Alignment

    Donghyeon Han and Hoi-jun Yoo. Efficient Convolutional Neural Network Training with Direct Feedback Alignment. CoRR, abs/1901.01986, 2019

  27. [27]

    Principled Training of Neural Networks with Direct Feedback Alignment

    Julien Launay, Iacopo Poli, and Florent Krzakala. Principled Training of Neural Networks with Direct Feedback Alignment. CoRR, abs/1906.04554, 2019

  28. [28]

    Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

    Julien Launay, Iacopo Poli, François Boniface, and Florent Krzakala. Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures. In Advances in Neural Information Processing Systems 33 (NeurIPS), 2020

  29. [29]

    The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

    Blake Bordelon and Cengiz Pehlevan. The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks. In International Conference on Learning Representations (ICLR), 2023

  30. [30]

    Clark, L

    David G. Clark, L. F. Abbott, and SueYeon Chung. Credit Assignment Through Broadcasting a Global Error Vector. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021

  31. [31]

    Analysis synthesis telephony based on the maximum likelihood method

    Fumitada Itakura and Satoshi Saito. Analysis synthesis telephony based on the maximum likelihood method. In Proceedings of the 6th International Congress on Acoustics, pages C--17--C--20. IEEE, 1968

  32. [32]

    F\' e votte, N

    C. F\' e votte, N. Bertin, and J.-L. Durrieu. Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis. Neural Computation, 21(3):793--830, 2009

  33. [33]

    L. M. Bregman. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3):200--217, 1967

  34. [34]

    Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association , 102(477):359--378, 2007

  35. [35]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Perfo...

  36. [36]

    Learning Multiple Layers of Features from Tiny Images

    Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Technical Report, University of Toronto, 2009

  37. [37]

    Tiny ImageNet Visual Recognition Challenge

    Ya Le and Xuan Yang. Tiny ImageNet Visual Recognition Challenge. CS 231N course project report, Stanford University, 2015

  38. [38]

    Friedman

    Jerome H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics , 19(1):1--67, 1991

  39. [39]

    Experiment tracking with Weights and Biases

    Lukas Biewald. Experiment tracking with Weights and Biases. Software available from wandb.com, 2020