Unsupervised Local Plasticity in a Multi-Frequency VisNet Hierarchy

C. Alejandro Parraga; Mehdi Fatan Serj; Xavier Otazu

arxiv: 2604.09734 · v4 · submitted 2026-04-09 · 💻 cs.CV · cs.AI

Unsupervised Local Plasticity in a Multi-Frequency VisNet Hierarchy

Mehdi Fatan Serj , C. Alejandro Parraga , Xavier Otazu This is my paper

Pith reviewed 2026-05-10 18:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords unsupervised learninglocal plasticityvisual representationsVisNetCIFAR-10Hebbian learninganti-Hebbian decorrelationhierarchical architecture

0 comments

The pith

Local plasticity rules in a VisNet-inspired hierarchy learn strong visual representations from unlabeled images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a system relying entirely on local plasticity, without labels or backpropagation, can acquire visual features sufficient for strong classification performance. It processes raw image streams through a hierarchical architecture that combines opponent-color inputs, multi-frequency Gabor and wavelet streams, competitive normalization, and several plasticity mechanisms including Hebbian association and anti-Hebbian decorrelation. After 300 epochs of unsupervised training, a linear probe reaches 80.1 percent accuracy on CIFAR-10 and 47.6 percent on CIFAR-100. Ablations identify anti-Hebbian decorrelation, free-energy plasticity, and associative memory as the main drivers, with clear synergy among them. The fixed architecture alone already yields 61.4 percent on CIFAR-10, yet plasticity supplies the remaining lift that narrows the gap to backpropagation-trained networks.

Core claim

The central claim is that continuous local plasticity applied across layers of a multi-frequency VisNet hierarchy produces representations whose intrinsic structure supports high linear-probe accuracy on CIFAR-10 and CIFAR-100, with the gains arising from the specific combination of anti-Hebbian decorrelation, free-energy-inspired updates, and associative memory rather than architecture alone.

What carries the argument

The structured local plasticity rules (anti-Hebbian decorrelation, free-energy plasticity, and associative memory) operating continuously on multi-frequency feature streams inside the VisNet-style hierarchy.

If this is right

The learned features contain enough intrinsic class structure for a nearest-class-mean classifier to reach 78.3 percent accuracy without any gradient-based training.
Independently trained linear probes match co-trained probes within 0.3 percentage points, confirming that the representations are stable and task-agnostic.
Ablation experiments demonstrate strong synergistic effects among anti-Hebbian decorrelation, free-energy plasticity, and associative memory.
The gap to backpropagation-trained CNNs is reduced to 5.7 points on CIFAR-10 and 7.5 points on CIFAR-100.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local rules might be applied to temporal sequences or video by extending the feedback loop across time.
The large contribution of the fixed multi-frequency streams suggests that frequency-selective preprocessing could be a general prerequisite for successful local learning on natural images.
If the approach scales to larger datasets, it could enable training on devices where global error propagation is energetically costly.

Load-bearing premise

The performance improvement is caused by the specific local plasticity rules rather than the inductive biases already present in the fixed multi-frequency architecture and feature streams.

What would settle it

Train the identical fixed architecture for 300 epochs with all plasticity disabled and measure whether linear-probe accuracy on CIFAR-10 stays near 61.4 percent or rises to 80.1 percent.

read the original abstract

We introduce an unsupervised visual representation learning system based entirely on local plasticity rules, without labels, backpropagation, or global error signals. The model is a VisNet-inspired hierarchical architecture combining opponent color inputs, multi-frequency Gabor and wavelet feature streams, competitive normalization with lateral inhibition, saliency modulation, associative memory, and a feedback loop. All representation learning occurs through continuous local plasticity applied to unlabeled image streams over 300 epochs. Performance is evaluated using a fixed linear probe trained only at readout time. The system achieves 80.1 percent accuracy on CIFAR-10 and 47.6 percent on CIFAR-100, improving over a Hebbian-only baseline. Ablation studies show that anti-Hebbian decorrelation, free-energy inspired plasticity, and associative memory are the main contributors, with strong synergistic effects. Even without learning, the fixed architecture alone reaches 61.4 percent on CIFAR-10, indicating that plasticity, not only inductive bias, drives most of the performance. Control analyses show that independently trained probes match co-trained ones within 0.3 percentage points, and a nearest-class-mean classifier achieves 78.3 percent without gradient-based training, confirming the intrinsic structure of the learned features. Overall, the system narrows but does not eliminate the performance gap to backpropagation-trained CNNs (5.7 percentage points on CIFAR-10, 7.5 percentage points on CIFAR-100), demonstrating that structured local plasticity alone can learn strong visual representations from raw unlabeled data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The local plasticity rules add a useful 19-point boost on CIFAR-10 but the fixed multi-frequency architecture already delivers 61%, so the claim that plasticity drives most performance does not hold up.

read the letter

The paper builds a VisNet hierarchy that processes raw images through opponent-color channels, multi-frequency Gabor and wavelet streams, normalization, inhibition, saliency, associative memory, and feedback, then applies only local plasticity rules over 300 epochs of unlabeled data. A linear probe at the end reaches 80.1% on CIFAR-10 and 47.6% on CIFAR-100, beating a Hebbian-only baseline, with ablations crediting anti-Hebbian decorrelation, free-energy plasticity, and memory for the gains. The nearest-class-mean classifier at 78.3% and the near-identical results from independent probes are clean controls that show the features carry usable structure without extra training. Those elements are the parts that actually move the work forward from earlier local-rule and VisNet papers.

Referee Report

3 major / 2 minor

Summary. The paper introduces an unsupervised hierarchical visual representation learning system modeled on VisNet, relying exclusively on local plasticity rules applied to unlabeled image streams over 300 epochs. The architecture combines fixed components including opponent-color inputs, multi-frequency Gabor/wavelet feature streams, competitive normalization, lateral inhibition, saliency modulation, associative memory, and feedback loops. All learning occurs via continuous local updates without labels, backpropagation, or global signals. Using a fixed linear probe at readout, the system reports 80.1% accuracy on CIFAR-10 and 47.6% on CIFAR-100, outperforming a Hebbian-only baseline. Ablations attribute gains primarily to anti-Hebbian decorrelation, free-energy-inspired plasticity, and associative memory rules, with noted synergies. The fixed architecture without any plasticity achieves 61.4% on CIFAR-10. Control experiments include matching performance between independently trained and co-trained probes (within 0.3 points) and a nearest-class-mean classifier reaching 78.3% on the learned features. The work positions itself as narrowing the gap to backpropagation-trained CNNs while demonstrating that structured local plasticity can produce strong representations from raw data.

Significance. If the central empirical claims hold after addressing isolation of effects, the work would be significant for showing that biologically motivated local plasticity rules, when structured within a multi-frequency hierarchy, can yield competitive unsupervised visual features without global error signals. The reported ablations, nearest-class-mean controls, and direct comparison to a Hebbian baseline provide concrete evidence for the contribution of specific rules. The high baseline performance of the untrained architecture, however, indicates that the result primarily illustrates the power of combining engineered inductive biases with local learning rather than plasticity in isolation, which tempers the broader claim but still offers value for local-rule research in vision.

major comments (3)

[Abstract] Abstract: The statement that 'plasticity, not only inductive bias, drives most of the performance' is not supported by the reported numbers. The fixed architecture alone reaches 61.4% on CIFAR-10 while plasticity adds 18.7 points to reach 80.1%; thus inductive biases account for the majority of the accuracy, contradicting the 'most' attribution.
[Results] Results (ablation studies): No architecture-level ablations are presented (e.g., removing or freezing the multi-frequency Gabor/wavelet streams while retaining plasticity rules). Without such controls, it is impossible to quantify whether the performance gains are driven primarily by the local plasticity mechanisms rather than the fixed multi-frequency inductive biases already present in the 61.4% baseline.
[Methods] Methods: The manuscript provides no error bars, standard deviations across multiple random seeds, or statistical significance tests for the reported accuracies (80.1%, 47.6%, ablation deltas). Given the empirical nature of the central claim, this omission prevents assessment of result reliability and reproducibility.

minor comments (2)

[Methods] The exact hyperparameter values and initialization details for the plasticity rules (anti-Hebbian, free-energy, memory) are only partially described, which would aid replication even if code is not released.
[Figures] Figure captions and axis labels for the ablation bar plots could more explicitly state the exact conditions (e.g., which rules are disabled) to avoid ambiguity when comparing to the main results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments that help refine the presentation of our contributions. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Abstract] Abstract: The statement that 'plasticity, not only inductive bias, drives most of the performance' is not supported by the reported numbers. The fixed architecture alone reaches 61.4% on CIFAR-10 while plasticity adds 18.7 points to reach 80.1%; thus inductive biases account for the majority of the accuracy, contradicting the 'most' attribution.

Authors: We agree that the abstract wording is imprecise and risks overstating the relative contribution of plasticity. The fixed architecture provides 61.4% while the full system reaches 80.1%, so inductive biases form the larger absolute share. We will revise the abstract to state: 'Even without learning, the fixed architecture alone reaches 61.4 percent on CIFAR-10, while the addition of local plasticity rules yields an 18.7-point improvement, indicating that both inductive biases and plasticity contribute to performance.' This accurately reflects the data. revision: yes
Referee: [Results] Results (ablation studies): No architecture-level ablations are presented (e.g., removing or freezing the multi-frequency Gabor/wavelet streams while retaining plasticity rules). Without such controls, it is impossible to quantify whether the performance gains are driven primarily by the local plasticity mechanisms rather than the fixed multi-frequency inductive biases already present in the 61.4% baseline.

Authors: We acknowledge that architecture-level ablations would provide additional isolation. However, the multi-frequency Gabor/wavelet streams constitute a core fixed element of the VisNet hierarchy on which the local plasticity rules operate; removing them would change the model class rather than test plasticity in isolation. Our existing ablations isolate the contributions of specific plasticity mechanisms (anti-Hebbian decorrelation, free-energy-inspired plasticity, and associative memory) within this architecture and demonstrate their individual and synergistic effects, with the Hebbian-only baseline confirming the value of the chosen rules. We will add an explicit discussion paragraph clarifying the interplay between fixed inductive biases and local plasticity and noting architecture ablations as future work. revision: partial
Referee: [Methods] Methods: The manuscript provides no error bars, standard deviations across multiple random seeds, or statistical significance tests for the reported accuracies (80.1%, 47.6%, ablation deltas). Given the empirical nature of the central claim, this omission prevents assessment of result reliability and reproducibility.

Authors: We agree that variability statistics improve assessment of reliability. The main results derive from single training runs owing to the substantial compute required for 300 epochs of unsupervised training on image streams. We will insert a limitations statement in the Methods and Results sections noting the single-run nature of the experiments. We will also attempt to obtain a small number of additional random seeds for the primary configurations and report standard deviations where feasible. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical results on public benchmarks with explicit architecture ablations

full rationale

This is an empirical machine learning paper reporting measured test-set accuracies on CIFAR-10/100 after training a fixed hierarchical architecture with local plasticity rules. The 61.4% baseline from the untrained model and the 80.1% after plasticity are direct empirical measurements, not quantities derived from or equivalent to fitted parameters in the paper's own equations. Ablations isolate contributions of specific rules (anti-Hebbian, free-energy, memory) without self-referential definitions or predictions that reduce to inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text; the derivation chain consists of implementation, training, and evaluation on external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim depends on numerous unspecified hyperparameters for the plasticity rules, normalization, and architecture, plus the assumption that local rules suffice for representation learning; full details are unavailable from the abstract alone.

free parameters (2)

Training epochs
Duration of 300 epochs over which continuous local plasticity is applied to unlabeled image streams.
Plasticity rule hyperparameters
Parameters controlling Hebbian, anti-Hebbian, free-energy inspired, and associative memory updates are chosen to enable learning.

axioms (1)

domain assumption Local plasticity rules without global error signals can extract useful hierarchical visual representations from raw image streams.
This is the core premise that the entire system and its performance claims rest upon.

pith-pipeline@v0.9.0 · 5583 in / 1304 out tokens · 56201 ms · 2026-05-10T18:07:20.830495+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

All representation learning occurs through continuous local plasticity applied to unlabeled image streams over 300 epochs... anti-Hebbian decorrelation, free-energy inspired plasticity, and associative memory are the main contributors.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The architecture is a VisNet-based hierarchy with opponent-colour inputs, multi-frequency Gabor and wavelet streams, competitive normalisation with lateral inhibition...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

H., & Bergen, J

Adelson, E. H., & Bergen, J. R. (1985). Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A, 2(2), 284-299. Carandini, M., & Heeger, D. J. (2012). Normalization as a canonical neural computation. Nature Rev. Neurosci., 13(1), 51-62. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive lear...

work page arXiv 1985
[2]

Roy, K., Jaiswal, A., & Panda, P. (2019). Towards spike-based machine intelligence with neuromorphic computing. Nature, 575(7784), 607-617. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. Scellier, B., & Bengio, Y. (2017). Equilibrium propagation: bridging the gap ...

work page 2019
[3]

Wandell, B. A. (1995). Foundations of Vision. Sinauer Associates. 26

work page 1995

[1] [1]

H., & Bergen, J

Adelson, E. H., & Bergen, J. R. (1985). Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A, 2(2), 284-299. Carandini, M., & Heeger, D. J. (2012). Normalization as a canonical neural computation. Nature Rev. Neurosci., 13(1), 51-62. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive lear...

work page arXiv 1985

[2] [2]

Roy, K., Jaiswal, A., & Panda, P. (2019). Towards spike-based machine intelligence with neuromorphic computing. Nature, 575(7784), 607-617. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. Scellier, B., & Bengio, Y. (2017). Equilibrium propagation: bridging the gap ...

work page 2019

[3] [3]

Wandell, B. A. (1995). Foundations of Vision. Sinauer Associates. 26

work page 1995