A shallow residual neural network to predict the visual cortex response

Anne-Ruth Jos\'e Meijer; Arnoud Visser

arxiv: 1906.11578 · v1 · pith:M3NKRSDCnew · submitted 2019-06-27 · 💻 cs.CV · cs.AI

A shallow residual neural network to predict the visual cortex response

Anne-Ruth Jos\'e Meijer , Arnoud Visser This is my paper

Pith reviewed 2026-05-25 14:49 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords residual networkvisual cortexbrain activity predictionconvolutional neural networksimage recognitionneural response modeling

0 comments

The pith

A shallow residual neural network predicts visual cortex response better by training early layers accurately.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper demonstrates the use of a shallow residual neural network to model how the visual cortex responds to images. The key benefit is that residual connections allow accurate training of earlier network stages, which in turn permits adding extra layers there. This results in improved prediction of brain activity, rising from 10.4 percent at the first block to 15.53 percent at the final fully connected layer. Longer training beyond ten epochs can enhance this gain even more. The approach also ties into broader efforts to link brain function with artificial vision systems.

Core claim

The shallow residual neural network enables accurate training of early stages by using skip connections, allowing addition of more layers at the beginning and thereby improving the prediction of visual brain activity from 10.4% to 15.53%.

What carries the argument

Shallow residual neural network that uses residual blocks to facilitate training of initial layers in the network for brain response prediction.

If this is right

Prediction accuracy of visual cortex activity increases with the addition of layers enabled by residual connections.
Extended training over more than 10 epochs leads to further improvements in prediction.
Insights from this model could aid in developing better object-recognition algorithms based on convolutional neural networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar residual techniques might apply to modeling responses in other sensory cortices.
Comparing this network's internal representations to actual neural data could reveal new correspondences between artificial and biological vision.
The method might generalize to predict responses in non-human primates if similar datasets exist.

Load-bearing premise

The reported improvements in prediction accuracy are due to the shallow residual architecture rather than dataset characteristics, evaluation choices, or missing comparisons to other models.

What would settle it

Training a non-residual network with the same number of layers and comparing the prediction percentages; if it matches or exceeds 15.53%, the advantage of the residual structure would be questioned.

Figures

Figures reproduced from arXiv: 1906.11578 by Anne-Ruth Jos\'e Meijer, Arnoud Visser.

**Figure 2.** Figure 2: The evaluation procedure of the 2019 Challenge (Courtesy [6]) [PITH_FULL_IMAGE:figures/full_fig_p001_2.png] view at source ↗

**Figure 3.** Figure 3: ResNet20 architecture III. RELATED WORK The challenge is inspired by the initiative to find a BrainScore [5], which found a correlation between the ImageNet performance and the Brain-Score. Yet, for the CNNs with the highest performance this correlation becomes less strong. The conclusion of the study was that DenseNet169, CORnetS and ResNet-101 were the most brain-liked CNNs. Yet, a number of smaller (i… view at source ↗

**Figure 4.** Figure 4: The noise normalized squared Spearman correlation percentage of [PITH_FULL_IMAGE:figures/full_fig_p002_4.png] view at source ↗

**Figure 5.** Figure 5: The average noise normalized squared Spearman correlation [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗

read the original abstract

Understanding how the visual cortex of the human brain really works is still an open problem for science today. A better understanding of natural intelligence could also benefit object-recognition algorithms based on convolutional neural networks. In this paper we demonstrate the asset of using a shallow residual neural network for this task. The benefit of this approach is that earlier stages of the network can be accurately trained, which allows us to add more layers at the earlier stage. With this additional layer the prediction of the visual brain activity improves from $10.4\%$ (block 1) to $15.53\%$ (last fully connected layer). By training the network for more than 10 epochs this improvement can become even larger.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies residual networks to visual cortex prediction and claims a gain from 10.4% to 15.53% by adding a layer, but supplies no ablations or details to tie the gain to the residuals.

read the letter

The one thing to know is that this paper takes the residual network idea and uses it for predicting visual cortex activity from images, reporting an improvement from 10.4% to 15.53% when they add a layer thanks to the residuals allowing better early training. The second point is that the abstract gives almost no supporting details on how those percentages were obtained. What the paper does is demonstrate an application of a known architecture to this brain modeling task. The authors note that residuals help train earlier stages accurately, which lets them stack more layers there, and that longer training beyond 10 epochs boosts the numbers further. That part is consistent with general deep learning experience. The paper does well in keeping the network shallow and focusing on the early blocks, which is a reasonable choice for matching brain data where early visual areas are the target. If the full text includes the actual network diagram or training procedure, that could be helpful for replication attempts. The soft spots are the lack of any controls or statistics. There is no mention of dataset size, number of trials, how the prediction percentage is calculated, whether it is on independent test data, or any baseline like a non-residual network with the same number of layers. Without an ablation that removes the skip connections, the gain cannot be confidently attributed to the residual design rather than just having an extra layer or training longer. The stress-test concern is on target here. This paper is for specialists in computational neuroscience who build CNNs to predict neural responses. A reader in that area might scan it for the specific numbers as one more data point, but it does not provide enough to change how people build these models. The work shows clear thinking in the sense that it follows standard practice, but the evidence for the central claim is too limited to be useful. I would not bring this to a reading group. I would not cite it. It does not deserve peer review because the main result cannot be evaluated without the missing details and comparisons.

Referee Report

3 major / 0 minor

Summary. The paper proposes using a shallow residual neural network to predict responses in the human visual cortex. It claims that residual connections enable accurate training of early network stages, which in turn permits adding layers at those stages; this yields an improvement in prediction accuracy from 10.4% (block 1) to 15.53% (last fully connected layer), with further gains possible after training beyond 10 epochs.

Significance. If the reported gains can be shown to arise specifically from the residual architecture rather than from added depth, longer training, or dataset-specific fitting, the work could contribute to both computational neuroscience and the design of CNNs that better model biological vision. The manuscript supplies no such isolation, however, so the significance cannot yet be assessed.

major comments (3)

[Abstract] Abstract: the central claim that the residual design produces the observed lift from 10.4% to 15.53% is unsupported because the abstract (and, on the information given, the manuscript) supplies no dataset size, no definition or formula for the percentage metric, no statistical tests, no error bars, and no baseline models or cross-validation procedure.
[Abstract] Abstract: no ablation is described that removes the residual skip connections while holding layer count and training schedule fixed, nor is a plain CNN of identical depth reported; without these controls the attribution of the delta to the residual architecture remains untested.
[Abstract] Abstract: the statement that training beyond 10 epochs can produce still larger gains is presented without any accompanying learning curves, validation-set monitoring, or indication that the metric is computed on held-out data, leaving open the possibility that the numbers reflect training-set fit rather than generalization.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract is too brief and lacks critical details needed to support the claims. We will revise the abstract and, where appropriate, the main text to address the points raised. Our point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the residual design produces the observed lift from 10.4% to 15.53% is unsupported because the abstract (and, on the information given, the manuscript) supplies no dataset size, no definition or formula for the percentage metric, no statistical tests, no error bars, and no baseline models or cross-validation procedure.

Authors: We agree that the abstract must be expanded to include these elements. In the revised version we will add the dataset size, the precise definition and formula for the reported percentage metric, mention of statistical tests and error bars, and a brief description of the baseline models together with the cross-validation procedure. revision: yes
Referee: [Abstract] Abstract: no ablation is described that removes the residual skip connections while holding layer count and training schedule fixed, nor is a plain CNN of identical depth reported; without these controls the attribution of the delta to the residual architecture remains untested.

Authors: The manuscript attributes the improvement to the ability of residual connections to train earlier stages, thereby permitting added depth at those stages. We acknowledge that an explicit ablation removing the skip connections (while keeping depth and training schedule fixed) and a direct comparison to a plain CNN of identical depth are absent. We will add these controls in the revision. revision: yes
Referee: [Abstract] Abstract: the statement that training beyond 10 epochs can produce still larger gains is presented without any accompanying learning curves, validation-set monitoring, or indication that the metric is computed on held-out data, leaving open the possibility that the numbers reflect training-set fit rather than generalization.

Authors: We will remove or qualify the claim regarding gains beyond 10 epochs from the abstract. The revised manuscript will include learning curves computed on held-out validation data to demonstrate that the reported metrics reflect generalization rather than training-set fit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical performance metrics are externally verifiable

full rationale

The paper reports observed performance gains (10.4% to 15.53%) from training a neural network on brain-activity data. These percentages are post-training evaluation metrics on the task of predicting visual cortex responses, not quantities defined in terms of themselves or obtained by fitting a parameter and relabeling it as a prediction. No equations, self-citations, or uniqueness theorems appear in the abstract that would reduce the central claim to an input by construction. The work is a standard empirical demonstration whose results can be checked against the dataset and any held-out test protocol.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the improvement percentages are treated as empirical outcomes whose dependence on training choices cannot be audited.

pith-pipeline@v0.9.0 · 5642 in / 1094 out tokens · 29139 ms · 2026-05-25T14:49:38.966114+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

With this additional layer the prediction of the visual brain activity improves from 10.4% (block 1) to 15.53% (last fully connected layer).
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ResBlocks contain two convolution layers... can be skipped to bring the feedback signal fast back to the earlier layers.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

Using goal-driven deep learning models to understand sensory cortex,

D. L. Yamins and J. J. DiCarlo, “Using goal-driven deep learning models to understand sensory cortex,” Nature neuroscience, vol. 19, no. 3, p. 356, 2016

work page 2016
[2]

Receptive ﬁelds of single neurones in the cat’s striate cortex,

D. H. Hubel and T. N. Wiesel, “Receptive ﬁelds of single neurones in the cat’s striate cortex,” The Journal of physiology , vol. 148, no. 3, pp. 574–591, 1959

work page 1959
[3]

Performance-optimized hierarchical models predict neural responses in higher visual cortex,

D. L. Yamins, H. Hong, C. F. Cadieu, E. A. Solomon, D. Seibert, and J. J. DiCarlo, “Performance-optimized hierarchical models predict neural responses in higher visual cortex,” Proceedings of the National Academy of Sciences , vol. 111, no. 23, pp. 8619–8624, 2014

work page 2014
[4]

Deep learning in neural networks: An overview,

J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks , vol. 61, pp. 85 – 117, 2015. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0893608014002135

work page 2015
[5]

Brain-score: Which artiﬁcial neural network for object recognition is most brain-like?

M. Schrimpf, J. Kubilius, H. Hong, N. J. Majaj, R. Rajalingham, E. B. Issa, K. Kar, P. Bashivan, J. Prescott-Roy, K. Schmidt, D. L. K. Yamins, and J. J. DiCarlo, “Brain-score: Which artiﬁcial neural network for object recognition is most brain-like?” 2018. [Online]. Available: https://www.biorxiv.org/content/early/2018/09/05/407007

work page 2018
[6]

The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence

R. M. Cichy, G. Roig, A. Andonian, K. Dwivedi, B. Lahner, A. Lascelles, Y . Mohsenzadeh, K. Ramakrishnan, and A. Oliva, “The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artiﬁcial Intelligence,” arXiv e-prints , p. arXiv:1905.05675, May 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[7]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition . Ieee, 2009, pp. 248–255

work page 2009
[8]

Representational geometry: inte- grating cognition, computation, and the brain,

N. Kriegeskorte and R. A. Kievit, “Representational geometry: inte- grating cognition, computation, and the brain,” Trends in cognitive sciences, vol. 17, no. 8, pp. 401–412, 2013

work page 2013
[9]

Comparison of values of pearson’s and spearman’s correlation coefﬁcients on the same sets of data,

J. Hauke and T. Kossowski, “Comparison of values of pearson’s and spearman’s correlation coefﬁcients on the same sets of data,” Quaestiones geographicae, vol. 30, no. 2, pp. 87–93, 2011

work page 2011
[10]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25 , F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105. [Online]. Available: http://papers.nips.cc/paper/ 4824-imagenet-classiﬁca...

work page 2012
[11]

Cornet: Modeling the neural mechanisms of core object recognition,

J. Kubilius, M. Schrimpf, A. Nayebi, D. Bear, D. L. K. Yamins, and J. J. DiCarlo, “Cornet: Modeling the neural mechanisms of core object recognition,” bioRxiv, 2018. [Online]. Available: https://www.biorxiv.org/content/early/2018/09/04/408385

work page 2018
[12]

Explaining the human visual brain using a deep neural network,

A.-R. Meijer, “Explaining the human visual brain using a deep neural network,” Bachelor thesis, Universiteit van Amsterdam, June 2019

work page 2019
[13]

Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior,

K. Kar, J. Kubilius, K. Schmidt, E. B. Issa, and J. J. DiCarlo, “Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior,” Nature neuroscience, vol. 22, no. 6, p. 974, 2019

work page 2019
[14]

Cs231n: Convolutional neural networks for visual recognition,

Stanford-University, “Cs231n: Convolutional neural networks for visual recognition,” 2019, [Online; accessed 11-june-2019]. [Online]. Available: http://cs231n.github.io

work page 2019

[1] [1]

Using goal-driven deep learning models to understand sensory cortex,

D. L. Yamins and J. J. DiCarlo, “Using goal-driven deep learning models to understand sensory cortex,” Nature neuroscience, vol. 19, no. 3, p. 356, 2016

work page 2016

[2] [2]

Receptive ﬁelds of single neurones in the cat’s striate cortex,

D. H. Hubel and T. N. Wiesel, “Receptive ﬁelds of single neurones in the cat’s striate cortex,” The Journal of physiology , vol. 148, no. 3, pp. 574–591, 1959

work page 1959

[3] [3]

Performance-optimized hierarchical models predict neural responses in higher visual cortex,

D. L. Yamins, H. Hong, C. F. Cadieu, E. A. Solomon, D. Seibert, and J. J. DiCarlo, “Performance-optimized hierarchical models predict neural responses in higher visual cortex,” Proceedings of the National Academy of Sciences , vol. 111, no. 23, pp. 8619–8624, 2014

work page 2014

[4] [4]

Deep learning in neural networks: An overview,

J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks , vol. 61, pp. 85 – 117, 2015. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0893608014002135

work page 2015

[5] [5]

Brain-score: Which artiﬁcial neural network for object recognition is most brain-like?

M. Schrimpf, J. Kubilius, H. Hong, N. J. Majaj, R. Rajalingham, E. B. Issa, K. Kar, P. Bashivan, J. Prescott-Roy, K. Schmidt, D. L. K. Yamins, and J. J. DiCarlo, “Brain-score: Which artiﬁcial neural network for object recognition is most brain-like?” 2018. [Online]. Available: https://www.biorxiv.org/content/early/2018/09/05/407007

work page 2018

[6] [6]

The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence

R. M. Cichy, G. Roig, A. Andonian, K. Dwivedi, B. Lahner, A. Lascelles, Y . Mohsenzadeh, K. Ramakrishnan, and A. Oliva, “The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artiﬁcial Intelligence,” arXiv e-prints , p. arXiv:1905.05675, May 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[7] [7]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition . Ieee, 2009, pp. 248–255

work page 2009

[8] [8]

Representational geometry: inte- grating cognition, computation, and the brain,

N. Kriegeskorte and R. A. Kievit, “Representational geometry: inte- grating cognition, computation, and the brain,” Trends in cognitive sciences, vol. 17, no. 8, pp. 401–412, 2013

work page 2013

[9] [9]

Comparison of values of pearson’s and spearman’s correlation coefﬁcients on the same sets of data,

J. Hauke and T. Kossowski, “Comparison of values of pearson’s and spearman’s correlation coefﬁcients on the same sets of data,” Quaestiones geographicae, vol. 30, no. 2, pp. 87–93, 2011

work page 2011

[10] [10]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25 , F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105. [Online]. Available: http://papers.nips.cc/paper/ 4824-imagenet-classiﬁca...

work page 2012

[11] [11]

Cornet: Modeling the neural mechanisms of core object recognition,

J. Kubilius, M. Schrimpf, A. Nayebi, D. Bear, D. L. K. Yamins, and J. J. DiCarlo, “Cornet: Modeling the neural mechanisms of core object recognition,” bioRxiv, 2018. [Online]. Available: https://www.biorxiv.org/content/early/2018/09/04/408385

work page 2018

[12] [12]

Explaining the human visual brain using a deep neural network,

A.-R. Meijer, “Explaining the human visual brain using a deep neural network,” Bachelor thesis, Universiteit van Amsterdam, June 2019

work page 2019

[13] [13]

Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior,

K. Kar, J. Kubilius, K. Schmidt, E. B. Issa, and J. J. DiCarlo, “Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior,” Nature neuroscience, vol. 22, no. 6, p. 974, 2019

work page 2019

[14] [14]

Cs231n: Convolutional neural networks for visual recognition,

Stanford-University, “Cs231n: Convolutional neural networks for visual recognition,” 2019, [Online; accessed 11-june-2019]. [Online]. Available: http://cs231n.github.io

work page 2019