Local learning for stable backpropagation-free neural network training towards physical learning

Bastiaan Ketelaar; Fabian Braun; Richard Norte; Siddhant Kumar; Stephanie Tan; Yaqi Guo

arxiv: 2603.24790 · v2 · submitted 2026-03-25 · 💻 cs.LG · cs.CE

Local learning for stable backpropagation-free neural network training towards physical learning

Yaqi Guo , Fabian Braun , Bastiaan Ketelaar , Stephanie Tan , Richard Norte , Siddhant Kumar This is my paper

Pith reviewed 2026-05-15 00:02 UTC · model grok-4.3

classification 💻 cs.LG cs.CE

keywords forward-only learninglocal learningbackpropagation-freephysical neural networksphotonic neural networksin-situ learningneural network training

0 comments

The pith

FFzero trains neural networks stably using only forward evaluations, without backpropagation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FFzero, a forward-only framework that trains neural networks by performing updates locally at each layer. It relies on prototype-based representations for each layer and optimizes parameters through directional derivatives obtained solely from forward passes. This setup succeeds where backpropagation cannot be realized, such as in physical hardware. The approach is shown to work for multilayer perceptrons and convolutional networks on classification and regression problems. It is demonstrated in a simulated photonic neural network, suggesting a route to training physical systems directly in place.

Core claim

FFzero enables stable neural network training without backpropagation or automatic differentiation by combining layer-wise local learning, prototype-based representations, and directional-derivative optimization performed exclusively through forward evaluations, and this method generalizes to multilayer perceptrons and convolutional networks for classification and regression while providing a viable path toward backpropagation-free in-situ physical learning.

What carries the argument

FFzero framework that performs layer-wise local learning with prototype-based representations and directional-derivative optimization using forward evaluations only.

If this is right

Local learning remains effective even when restricted to forward-only optimization where backpropagation fails.
The method applies to both multilayer perceptron and convolutional neural network architectures.
It supports classification as well as regression tasks.
It enables training of simulated photonic neural networks without requiring backpropagation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Physical hardware could potentially adapt its parameters in real time using only its own forward signals.
Energy use for training might drop if forward-only local updates replace full digital backpropagation simulations.
Similar local-update rules could be tested on other physical substrates such as optical or memristive systems.

Load-bearing premise

Layer-wise local learning with prototypes and forward-only directional derivatives will produce stable, generalizing training across standard network architectures without needing backpropagation.

What would settle it

A direct comparison on a standard benchmark such as MNIST or CIFAR-10 where FFzero either fails to converge or achieves markedly lower accuracy than backpropagation-based training on the same multilayer perceptron or convolutional architecture.

read the original abstract

While backpropagation and automatic differentiation have driven deep learning's success, the physical limits of chip manufacturing and rising environmental costs of deep learning motivate alternative learning paradigms such as physical neural networks. However, most existing physical neural networks still rely on digital computing for training, largely because backpropagation and automatic differentiation are difficult to realize in physical systems. We introduce FFzero, a forward-only learning framework enabling stable neural network training without backpropagation or automatic differentiation. FFzero combines layer-wise local learning, prototype-based representations, and directional-derivative-based optimization through forward evaluations only. We show that local learning is effective under forward-only optimization, where backpropagation fails. FFzero generalizes to multilayer perceptron and convolutional neural networks across classification and regression. Using a simulated photonic neural network as an example, we demonstrate that FFzero provides a viable path toward backpropagation-free in-situ physical learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FFzero sketches a forward-only route to physical NN training but the claims rest on unshown experiments and untested noise assumptions.

read the letter

Colleague, the main point is that FFzero combines layer-wise local learning with directional derivatives computed from forward passes only, aiming to train networks without backpropagation for physical hardware like photonic systems. It claims this works where backprop fails and generalizes to MLPs and CNNs on classification and regression tasks, with a simulated photonic example as proof of concept. That framing around chip limits and energy costs of digital training is on target, and the prototype-based representations plus local updates give a coherent way to sidestep global gradients. The paper earns credit for laying out a distinct forward-only framework that does not collapse to the cited prior methods. The soft spots are in the missing evidence. The abstract states effectiveness and stability but supplies no quantitative results, error bars, ablations, or convergence plots, so the central claims cannot be checked. The stress-test note on noise robustness is on point: directional derivatives from physical forward passes will face measurement noise and device mismatch, yet the work offers no bounds or tests on how those affect training. Without that, the in-situ learning path stays speculative. This is for people working on neuromorphic or physical ML who want alternatives to backprop. A reader hunting for new hardware-aware ideas could pull useful concepts from the framework, but it is not ready for direct use. It deserves peer review because the problem is real and the approach is novel enough to warrant referee input on the experiments. Recommendation: send it to review and ask specifically for noise analysis and full result tables.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces FFzero, a forward-only learning framework for training neural networks without backpropagation or automatic differentiation. It combines layer-wise local learning, prototype-based representations, and directional-derivative optimization computed solely from forward evaluations. The central claims are that this approach enables stable training where backpropagation fails, generalizes to multilayer perceptrons and convolutional networks on classification and regression tasks, and offers a viable route to backpropagation-free in-situ physical learning, illustrated via a simulated photonic neural network example.

Significance. If the core claims hold under rigorous validation, the work could meaningfully advance hardware-efficient and physical neural network training by sidestepping backpropagation's implementation barriers in analog or photonic systems. The forward-only directional-derivative mechanism and local learning combination address a recognized bottleneck in physical computing; however, the absence of quantitative benchmarks, error bars, or noise analysis in the current presentation limits the assessed impact to potential rather than demonstrated.

major comments (2)

[Abstract] Abstract and Experiments section: The claim that local learning is effective under forward-only optimization (where backpropagation fails) and that FFzero generalizes across MLPs and CNNs rests on assertions without reported quantitative results, error bars, ablation studies, or performance metrics, rendering the effectiveness and generalization statements unverifiable from the provided text.
[Photonic NN example] Photonic neural network simulation subsection: No analysis, bounds, or experiments quantify how additive or multiplicative noise, device mismatch, or non-idealities in forward passes affect convergence or generalization of the directional-derivative updates. This omission directly undermines the in-situ physical learning claim, as the stability assertion implicitly assumes noise-free evaluations that do not hold for real hardware.

minor comments (1)

[Methods] Notation for directional derivatives and prototype representations should be defined explicitly with equations in the methods section to improve clarity for readers unfamiliar with the local learning setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and have revised the manuscript to strengthen the presentation of results and the physical learning claims.

read point-by-point responses

Referee: [Abstract] Abstract and Experiments section: The claim that local learning is effective under forward-only optimization (where backpropagation fails) and that FFzero generalizes across MLPs and CNNs rests on assertions without reported quantitative results, error bars, ablation studies, or performance metrics, rendering the effectiveness and generalization statements unverifiable from the provided text.

Authors: We thank the referee for this observation. The Experiments section of the manuscript reports performance metrics for classification and regression tasks on both MLPs and CNNs, including direct comparisons where backpropagation diverges while FFzero converges. To address the lack of error bars and ablations, the revised manuscript will include standard deviations from multiple independent runs and ablation studies isolating the contributions of layer-wise local learning and directional-derivative updates. These additions will make the quantitative support for the claims fully explicit and verifiable. revision: yes
Referee: [Photonic NN example] Photonic neural network simulation subsection: No analysis, bounds, or experiments quantify how additive or multiplicative noise, device mismatch, or non-idealities in forward passes affect convergence or generalization of the directional-derivative updates. This omission directly undermines the in-situ physical learning claim, as the stability assertion implicitly assumes noise-free evaluations that do not hold for real hardware.

Authors: We agree that this is a substantive gap. The current photonic simulation assumes ideal forward passes. In the revision we will add a dedicated noise analysis subsection that injects additive Gaussian noise, multiplicative noise, and device mismatch into the forward evaluations. We will report convergence curves and final generalization performance under these perturbations, together with a brief discussion of robustness bounds. This will directly support the viability claim for in-situ physical learning. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained; no circular reductions to inputs or self-citations

full rationale

The manuscript introduces FFzero as a forward-only framework using layer-wise local learning, prototype representations, and directional derivatives from forward passes alone. No equations, uniqueness theorems, or central claims reduce by construction to fitted parameters, self-referential definitions, or load-bearing self-citations. The abstract and described method present the approach as a novel proposal validated empirically on MLPs and CNNs for classification and regression, without renaming known results or smuggling ansatzes via prior author work. The derivation chain is therefore independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that local learning suffices for global optimization in forward-only settings; no free parameters or invented entities are detailed in the abstract.

axioms (1)

domain assumption Local learning rules remain effective when optimization uses only forward evaluations and directional derivatives
Invoked to claim stability where backpropagation fails

pith-pipeline@v0.9.0 · 5458 in / 1055 out tokens · 39716 ms · 2026-05-15T00:02:13.220067+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FFzero combines layer-wise local learning, prototype-based representations, and directional-derivative-based optimization through forward evaluations only... goodness G1(WWW1|ŷ,x) = ξ1[ŷ]·z1/∥z1∥
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat embedding and J-positivity unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The training objective is to locally and independently optimize each layer’s weights such that the layer output is closest to the prototype of the true label

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.