Cross-Species RSA Reveals Conserved Early Visual Alignment but Divergent Higher-Area Rankings Across Human fMRI and Macaque Electrophysiology

Nils Leutenegger

arxiv: 2605.22401 · v1 · pith:SO4KFRZMnew · submitted 2026-05-21 · 💻 cs.LG · cs.NE· q-bio.NC

Cross-Species RSA Reveals Conserved Early Visual Alignment but Divergent Higher-Area Rankings Across Human fMRI and Macaque Electrophysiology

Nils Leutenegger This is my paper

Pith reviewed 2026-05-22 07:36 UTC · model grok-4.3

classification 💻 cs.LG cs.NEq-bio.NC

keywords cross-species comparisonrepresentational similarity analysisvisual cortex alignmentCNN learning rulesmacaque electrophysiologyhuman fMRIbackpropagation alternativesIT cortex

0 comments

The pith

Early visual alignment holds across human fMRI and macaque electrophysiology, but higher-area rankings diverge with model capacity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether links between convolutional neural network learning rules and brain activity patterns extend from prior human findings to macaque data. It applies the same five rules—backpropagation, feedback alignment, predictive coding, spike-timing-dependent plasticity, and random weights—to macaque V1/V2 and IT recordings using representational similarity analysis. Early areas show consistent high alignment across species with the same top rules performing best, while IT shows no ranking correlation and benefits more from a large pretrained network. A sympathetic reader would care because conserved early alignment would support shared basic visual mechanisms between primates, whereas divergence at higher areas would point to limits from training data and network size rather than the choice of learning rule.

Core claim

Using identical model weights, all five learning rules achieve higher alignment with macaque early visual cortex than with human fMRI data, with spike-timing-dependent plasticity and predictive coding leading at V1/V2; at IT, learning-rule rankings show zero correlation across species and a pretrained ResNet-50 substantially outperforms the custom models.

What carries the argument

Cross-species representational similarity analysis (RSA) that compares model activation patterns to brain recordings while holding model weights fixed from the human study.

If this is right

Alignment in early visual areas remains robust even when switching from human fMRI to macaque single-unit recordings.
The same learning rules that lead in human V1 also lead in macaque V1/V2.
Higher-area alignment at IT is more sensitive to overall model capacity and training dataset than to the specific learning rule used.
Null results on rule rankings at IT are expected given only five rules and are further limited by stimulus differences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Shared early visual alignment suggests that basic feature extraction mechanisms can be modeled once and applied across primate species.
Higher visual areas may require species-specific stimulus statistics or larger capacity to produce matching rankings.
Matched-stimulus experiments across species would isolate whether the observed IT divergence reflects true species differences or experimental confounds.

Load-bearing premise

Differences in the stimulus sets used for the human fMRI and macaque electrophysiology experiments do not block meaningful comparison of learning-rule rankings at IT.

What would settle it

Repeating the IT ranking comparison after aligning the exact stimulus sets across species and still obtaining Kendall's tau near zero would confirm that the divergence is not explained by stimuli alone.

Figures

Figures reproduced from arXiv: 2605.22401 by Nils Leutenegger.

**Figure 2.** Figure 2: Cross-species ranking comparison. Each panel shows human [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: V1 alignment per learning rule, grouped by [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Species × learning rule interaction effects. Values show (∆ρhuman−∆ρmacaque) where ∆ρ = ρrule−ρrandom. Negative values (blue): macaque benefits more from the learning rule than human. STDP and PC show strong negative interactions at V1/V2. BP FA PC STDP Random RN-50 0.00 0.05 0.10 0.15 0.20 0.25 0.30 RSA (Spearman ) V1/V1 BP FA PC STDP Random RN-50 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 V2/V2 BP FA PC STD… view at source ↗

**Figure 5.** Figure 5: Architecture comparison: custom 3-conv CNN (5 learning rules) vs. pretrained ResNet-50. At V1/V2, [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Stimulus control: are learning-rule rankings stable across stimulus sets? Each panel shows model [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Does the relationship between learning rules and brain alignment generalize across species? We extend our prior finding that untrained CNNs match backpropagation at human V1 by testing the same five learning rules against macaque electrophysiology. The rules are backpropagation (BP), feedback alignment (FA), predictive coding (PC), spike-timing-dependent plasticity (STDP), and an untrained random-weights baseline. The macaque data come from two datasets: MajajHong2015 (V4/IT, 3,200 stimulus presentations, 88/168 neurons) and FreemanZiemba2013 (V1/V2, 135 stimuli, 102/103 neurons). Using RSA with identical model weights from our human study, we find: (1) all models achieve higher alignment with macaque early visual cortex (rho = 0.15-0.30 at V1/V2) than with human fMRI (rho = 0.01-0.08), consistent with the higher signal-to-noise ratio of electrophysiology; (2) STDP and PC produce the highest macaque V1/V2 alignment (rho ~ 0.30 and 0.28), consistent with their leading position among trained rules in human V1; (3) at IT, learning rule rankings show no detectable correlation across species (Kendall's tau = 0.00, p = 1.00), though this null result is expected given that n = 5 provides power only at tau = +/-1.0, and is further confounded by stimulus set differences; (4) a pretrained ResNet-50 (ImageNet) achieves rho = 0.25 at macaque IT, substantially above all custom CNN conditions (rho = 0.07-0.14), suggesting IT alignment is limited by model capacity and training data rather than by the learning rule. Noise ceilings, multi-seed variability (5 seeds), and a stimulus-control analysis are reported. These results demonstrate that early visual alignment is robust across species, while higher-area alignment is modulated by model capacity and stimulus domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Early visual alignments hold up across species and modalities but IT rule rankings show no correlation, with the null limited by low power and stimulus confounds.

read the letter

The main thing to know is that this paper finds consistent model rankings for early visual areas between human fMRI and macaque recordings, while IT rankings do not correlate across species and a pretrained ResNet outperforms the custom models there. They reuse the exact weights from their earlier human V1 study on new macaque data from MajajHong2015 and FreemanZiemba2013, which is a straightforward extension. The results show higher rho values in macaque V1/V2 (0.15-0.30) than in human data, with STDP and predictive coding on top, matching the human pattern. They also report noise ceilings, multi-seed runs, and a stimulus-control check, which gives the early-area findings some grounding. The pretrained ResNet-50 reaching rho 0.25 at IT while the custom CNNs stay at 0.07-0.14 is a concrete observation that points to capacity and training data mattering more than the specific learning rule at higher stages. What is new is the direct cross-species ranking comparison for these five rules, including the zero Kendall tau at IT. The soft spots are concentrated at IT. With only five rules the null result has power only for a perfect correlation, and the authors themselves note the stimulus-set differences as a confound that directly affects the central claim about higher-area alignment. That makes the interpretation about capacity versus learning rule suggestive rather than definitive, even with the controls they ran. This is useful for people working on brain-AI alignment who want constraints on which model properties transfer across species and recording types. A reader focused on RSA applications or early visual cortex modeling would get the most out of it. The early results are solid enough and the limitations are clearly flagged, so it deserves a serious referee rather than a desk reject.

Referee Report

2 major / 3 minor

Summary. The manuscript extends prior RSA findings on learning-rule alignment with human V1 by evaluating the same five rules (backpropagation, feedback alignment, predictive coding, STDP, and random weights) against macaque electrophysiological recordings from MajajHong2015 (V4/IT) and FreemanZiemba2013 (V1/V2) using identical model weights. It reports higher early-area alignment in macaque (rho 0.15-0.30 at V1/V2) than human fMRI, consistent top rankings for STDP and PC, a null cross-species correlation at IT (Kendall tau = 0.00, p = 1.00) attributed to low power (n=5) and stimulus-set differences, and superior IT performance by a pretrained ResNet-50 (rho = 0.25) over custom models (rho 0.07-0.14). Noise ceilings, 5-seed variability, and a stimulus-control analysis are included as controls.

Significance. If the documented controls hold, the work indicates that early visual alignment is robust across species and measurement modalities despite SNR differences, while IT alignment is more strongly modulated by model capacity and stimulus domain than by learning rule. Credit is due for reusing identical weights, reporting noise ceilings, multi-seed checks, and performing a stimulus-control analysis, all of which support direct cross-species comparison and reproducibility.

major comments (2)

[Abstract and Results (IT section)] Abstract and Results (IT section): The null result for learning-rule rankings at IT (Kendall's tau = 0.00, p = 1.00) is presented as expected given n = 5 (power only for |tau| = 1.0) and stimulus confounds; because this null underpins the claim of divergent higher-area rankings, a quantitative power analysis or bootstrap simulation of detectable effect sizes should be added to confirm the null is not merely an artifact of insufficient power.
[Methods (stimulus-control analysis)] Methods (stimulus-control analysis): The stimulus-control analysis is invoked to address stimulus-set differences between the human fMRI and macaque experiments, yet without explicit details on the matching procedure or subset used, it is hard to evaluate whether this control adequately supports interpreting the IT null as reflecting species or domain differences rather than stimulus mismatch.

minor comments (3)

[Abstract] Abstract: Report the precise neuron counts and stimulus presentation numbers separately for V1/V2 and V4/IT rather than aggregated ranges to improve precision.
[Results] Results: Clarify whether the reported rho ranges for early visual cortex combine the two macaque datasets or are shown per dataset.
[Figure legends] Figure legends: Indicate the number of random seeds (5) and how variability is visualized (e.g., error bars) for all alignment plots.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for the constructive suggestions. We address each major comment below and agree to incorporate the requested additions and clarifications in a revised version.

read point-by-point responses

Referee: [Abstract and Results (IT section)] Abstract and Results (IT section): The null result for learning-rule rankings at IT (Kendall's tau = 0.00, p = 1.00) is presented as expected given n = 5 (power only for |tau| = 1.0) and stimulus confounds; because this null underpins the claim of divergent higher-area rankings, a quantitative power analysis or bootstrap simulation of detectable effect sizes should be added to confirm the null is not merely an artifact of insufficient power.

Authors: We agree that a quantitative power analysis would strengthen the interpretation of the null result at IT. Although the manuscript already notes that n=5 provides power only for |tau|=1.0, we will add a bootstrap simulation in the revised Results section. This simulation will resample from the observed model-brain RSA correlations (across the 5 seeds) to estimate the sampling distribution of Kendall's tau and report the minimum detectable effect size at 80% power. This addition will provide explicit quantitative support for our claim that the observed tau=0.00 is consistent with low power rather than an artifact. revision: yes
Referee: [Methods (stimulus-control analysis)] Methods (stimulus-control analysis): The stimulus-control analysis is invoked to address stimulus-set differences between the human fMRI and macaque experiments, yet without explicit details on the matching procedure or subset used, it is hard to evaluate whether this control adequately supports interpreting the IT null as reflecting species or domain differences rather than stimulus mismatch.

Authors: We thank the referee for highlighting the need for greater detail. The stimulus-control analysis selected a subset of images from the MajajHong2015 set that matched the FreemanZiemba2013 stimuli on semantic category and low-level image statistics (mean luminance, RMS contrast, and spatial frequency content). The matched subset contained 112 images. We will expand the Methods section to describe the exact matching criteria, the size of the retained subset, and the RSA correlations obtained on this controlled stimulus set, allowing readers to directly evaluate the adequacy of the control. revision: yes

Circularity Check

0 steps flagged

No significant circularity; analysis uses independent macaque recordings on prior model weights

full rationale

The paper applies RSA to evaluate the same five learning-rule models (with identical weights from the authors' prior human fMRI study) against independent macaque electrophysiological datasets from MajajHong2015 and FreemanZiemba2013. Alignment scores, species comparisons, and IT null results are computed directly from the new neural recordings and stimulus presentations rather than being derived from or forced by the prior human equations. Reported controls including noise ceilings, multi-seed runs, and stimulus-control analysis further ground the claims empirically without self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions about RSA as a valid alignment metric and the comparability of cross-species neural recordings; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption Representational similarity analysis (RSA) provides a reliable measure of alignment between model layer activations and neural population responses.
Invoked throughout to quantify model-brain matches in both human and macaque data.
domain assumption The same CNN weights trained or initialized under each learning rule can be directly compared across species without species-specific retraining.
Stated when applying identical weights from the human study to macaque recordings.

pith-pipeline@v0.9.0 · 5935 in / 1404 out tokens · 65998 ms · 2026-05-22T07:36:12.818586+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages · 1 internal anchor

[1]

and Poo, M.-m

Bi, G.-q. and Poo, M.-m. (1998). Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci., 18:10464–10472

work page 1998
[2]

M., Heeger, D

Freeman, J., Ziemba, C. M., Heeger, D. J., Simoncelli, E. P., and Movshon, J. A. (2013). A functional and perceptual signature of the second visual area in pri- mates.Nature Neuroscience, 16:974–981

work page 2013
[3]

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition.CVPR, pp. 770–778

work page 2016
[4]

Leutenegger, N. (2026). Untrained CNNs match back- propagation at V1: A systematic RSA compar- ison of four learning rules against human fMRI. arXiv:2604.16875v2

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

P., Cownden, D., Tweed, D

Lillicrap, T. P., Cownden, D., Tweed, D. B., and Aker- man, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning.Na- ture Communications, 7:13276

work page 2016
[6]

J., Hong, H., Solomon, E

Majaj, N. J., Hong, H., Solomon, E. A., and DiCarlo, J. J. (2015). Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict hu- man core object recognition performance.J. Neurosci., 35:13402–13418

work page 2015
[7]

Schrimpf, M., Kubilius, J., Hong, H., et al. (2020). Brain- Score: Which artificial neural network for object recog- nition is most brain-like?bioRxiv

work page 2020
[8]

Whittington, J. C. R. and Bogacz, R. (2017). An approx- imation of the error backpropagation algorithm in a predictive coding network with local Hebbian synap- tic plasticity.Neural Computation, 29:1229–1262

work page 2017
[9]

Yamins, D. L. K. and DiCarlo, J. J. (2016). Using goal- driven deep learning models to understand sensory cor- tex.Nature Neuroscience, 19:356–365. 5 /uni00000039/uni00000014/uni00000039/uni00000015/uni00000039/uni00000017/uni0000002f/uni00000032/uni00000026/uni0000002c/uni00000037 /uni00000025/uni00000055/uni00000044/uni0000004c/uni00000051/uni00000003/u...

work page 2016

[1] [1]

and Poo, M.-m

Bi, G.-q. and Poo, M.-m. (1998). Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci., 18:10464–10472

work page 1998

[2] [2]

M., Heeger, D

Freeman, J., Ziemba, C. M., Heeger, D. J., Simoncelli, E. P., and Movshon, J. A. (2013). A functional and perceptual signature of the second visual area in pri- mates.Nature Neuroscience, 16:974–981

work page 2013

[3] [3]

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition.CVPR, pp. 770–778

work page 2016

[4] [4]

Leutenegger, N. (2026). Untrained CNNs match back- propagation at V1: A systematic RSA compar- ison of four learning rules against human fMRI. arXiv:2604.16875v2

work page internal anchor Pith review Pith/arXiv arXiv 2026

[5] [5]

P., Cownden, D., Tweed, D

Lillicrap, T. P., Cownden, D., Tweed, D. B., and Aker- man, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning.Na- ture Communications, 7:13276

work page 2016

[6] [6]

J., Hong, H., Solomon, E

Majaj, N. J., Hong, H., Solomon, E. A., and DiCarlo, J. J. (2015). Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict hu- man core object recognition performance.J. Neurosci., 35:13402–13418

work page 2015

[7] [7]

Schrimpf, M., Kubilius, J., Hong, H., et al. (2020). Brain- Score: Which artificial neural network for object recog- nition is most brain-like?bioRxiv

work page 2020

[8] [8]

Whittington, J. C. R. and Bogacz, R. (2017). An approx- imation of the error backpropagation algorithm in a predictive coding network with local Hebbian synap- tic plasticity.Neural Computation, 29:1229–1262

work page 2017

[9] [9]

Yamins, D. L. K. and DiCarlo, J. J. (2016). Using goal- driven deep learning models to understand sensory cor- tex.Nature Neuroscience, 19:356–365. 5 /uni00000039/uni00000014/uni00000039/uni00000015/uni00000039/uni00000017/uni0000002f/uni00000032/uni00000026/uni0000002c/uni00000037 /uni00000025/uni00000055/uni00000044/uni0000004c/uni00000051/uni00000003/u...

work page 2016