What does it mean to understand a neural network?
Pith reviewed 2026-05-24 21:36 UTC · model grok-4.3
The pith
Simple rules for development and learning in brains may be far easier to understand than the complex properties they produce.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Artificial neural networks can be written in fewer than 100 lines of code and yet, once trained, contain millions of weights that encode knowledge about many object types; the same logic suggests that the rules for development and learning in brains may be far simpler to characterize than the resulting mature neural properties such as tuning curves or connection patterns.
What carries the argument
The analogy between the simple code versus complex trained weights in artificial networks and the developmental rules versus mature properties in biological brains.
If this is right
- Neuroscience would benefit from directing more effort toward identifying and testing rules for learning and development rather than cataloging adult properties.
- Understanding of brain function could advance by characterizing the generative processes that produce complex structure instead of analyzing the structure in isolation.
- Models of brain development that stay close to compact rule sets would be preferred over models that require specifying every mature connection or tuning value.
- Experimental work focused on early life stages and plasticity mechanisms would become central to explaining adult brain organization.
Where Pith is reading between the lines
- The same logic could be applied to other complex biological systems where growth rules might be simpler than final forms.
- Efforts to build complete static maps of brain wiring would need to be paired with explicit models of the rules that generate those maps to yield understanding.
- In artificial systems, interpretability research might similarly gain from studying training dynamics rather than only inspecting final weights.
Load-bearing premise
The relationship between simple code and complex trained weights in artificial networks provides a valid parallel for the relationship between developmental rules and mature properties in biological brains.
What would settle it
A demonstration that the minimal set of developmental rules needed to produce observed adult brain properties is itself as large and intricate as a full description of those properties would falsify the central analogy.
Figures
read the original abstract
We can define a neural network that can learn to recognize objects in less than 100 lines of code. However, after training, it is characterized by millions of weights that contain the knowledge about many object types across visual scenes. Such networks are thus dramatically easier to understand in terms of the code that makes them than the resulting properties, such as tuning or connections. In analogy, we conjecture that rules for development and learning in brains may be far easier to understand than their resulting properties. The analogy suggests that neuroscience would benefit from a focus on learning and development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript draws an analogy between artificial neural networks—which can be specified in fewer than 100 lines of code yet produce millions of trained weights encoding object recognition—and biological brains, conjecturing that developmental and learning rules may be substantially easier to understand than the resulting mature properties such as tuning curves or connectivity patterns. It concludes that neuroscience would therefore benefit from greater emphasis on learning and development.
Significance. If the suggested parallel between code-versus-weights and developmental-rules-versus-mature-properties proves heuristically useful, the perspective could usefully redirect research attention in neuroscience toward mechanistic accounts of learning. The manuscript supplies no empirical tests, derivations, or quantitative comparisons, so its value is limited to the directional suggestion rather than any demonstrated equivalence or falsifiable prediction.
Simulated Author's Rebuttal
We thank the referee for their accurate summary of the manuscript and for recommending acceptance. The referee correctly notes that the work is a directional conjecture based on the code-versus-weights analogy rather than a set of empirical tests or quantitative predictions.
Circularity Check
No significant circularity
full rationale
The paper is a short perspective piece that advances an explicit conjecture via analogy between the code/weights distinction in ANNs and the developmental-rules/mature-properties distinction in brains. No equations, derivations, fitted parameters, or quantitative predictions appear in the provided text. The central suggestion to neuroscience is presented as a directional implication of the analogy rather than a result derived from or equivalent to its own assumptions. No self-citation chains or load-bearing reductions are present.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Synthesizing Robust Adversarial Examples
Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adver- sarial examples. arXiv preprint arXiv:1707.07397 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[2]
Learning long-term dependencies with gradient descent is difficult
Yoshua Bengio, Patrice Simard, Paolo Frasconi, et al. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks , 5(2):157–166, 1994
work page 1994
-
[3]
Recognition-by-components: a theory of human image understanding
Irving Biederman. Recognition-by-components: a theory of human image understanding. Psy- chological review, 94(2):115, 1987
work page 1987
-
[4]
Solving the mystery of insect flight
Michael Dickinson. Solving the mystery of insect flight. Scientific American, 284(6):48–57, 2001
work page 2001
-
[5]
A functional microcircuit for cat visual cortex
Rodney J Douglas and KA Martin. A functional microcircuit for cat visual cortex. The Journal of physiology, 440(1):735–769, 1991
work page 1991
-
[6]
Dynamic compression and expansion in a classifying recurrent network
Matthew S Farrell, Stefano Recanatesi, Guillaume Lajoie, and Eric Shea-Brown. Dynamic compression and expansion in a classifying recurrent network. bioRxiv, page 564476, 2019. 7
work page 2019
-
[7]
Operant conditioning of cortical unit activity
Eberhard E Fetz. Operant conditioning of cortical unit activity. Science, 163(3870):955–958, 1969
work page 1969
-
[8]
Distilling a Neural Network Into a Soft Decision Tree
Nicholas Frosst and Geoffrey Hinton. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[9]
Image style transfer using convolu- tional neural networks
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolu- tional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016
work page 2016
-
[10]
Understanding the difficulty of training deep feedforward neural networks
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelli- gence and statistics , pages 249–256, 2010
work page 2010
-
[11]
Explaining and Harnessing Adversarial Examples
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. arXiv preprint arXiv:1412.6572 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[12]
Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neu- ral networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[13]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[14]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[16]
Could a neuroscientist understand a microprocessor? PLoS computational biology, 13(1):e1005268, 2017
Eric Jonas and Konrad Paul Kording. Could a neuroscientist understand a microprocessor? PLoS computational biology, 13(1):e1005268, 2017
work page 2017
-
[17]
Principles of neural science, volume 4
Eric R Kandel, James H Schwartz, Thomas M Jessell, Department of Biochemistry, Molecular Biophysics Thomas Jessell, Steven Siegelbaum, and AJ Hudspeth. Principles of neural science, volume 4. McGraw-hill New York, 2000
work page 2000
-
[18]
Imagenet classification with deep convolutional neural networks
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems , pages 1097–1105, 2012
work page 2012
-
[19]
Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In Advances in neural information processing systems, pages 598–605, 1990
work page 1990
-
[20]
Measuring the Intrinsic Dimension of Objective Landscapes
Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
Humans store about 1.5 megabytes of information during language acquisition
Francis Mollica and Steven T Piantadosi. Humans store about 1.5 megabytes of information during language acquisition. Royal Society Open Science, 6(3):181393, 2019. 8
work page 2019
- [22]
-
[23]
On the difficulty of training recurrent neural networks
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In International conference on machine learning , pages 1310–1318, 2013
work page 2013
-
[24]
Language models are unsupervised multitask learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 2019
work page 2019
-
[25]
An algorithmic barrier to neural circuit understanding.bioRxiv, page 639724, 2019
Venkatakrishnan Ramaswamy. An algorithmic barrier to neural circuit understanding.bioRxiv, page 639724, 2019
work page 2019
-
[26]
Adaptive gain control of vestibuloocular reflex by the cerebellum
DA Robinson. Adaptive gain control of vestibuloocular reflex by the cerebellum. Journal of Neurophysiology, 39(5):954–969, 1976
work page 1976
-
[27]
Complex network measures of brain connectivity: uses and interpretations
Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059–1069, 2010
work page 2010
-
[28]
Distilling free-form natural laws from experimental data
Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data. science, 324(5923):81–85, 2009
work page 2009
-
[29]
Mastering the game of go with deep neural networks and tree search
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanc- tot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016
work page 2016
-
[30]
Mastering the game of go without human knowledge
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017
work page 2017
-
[31]
Intriguing properties of neural networks
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[32]
Binarized Neural Networks on the ImageNet Classification Task
Xundong Wu, Yong Wu, and Yong Zhao. Binarized neural networks on the imagenet classifi- cation task. arXiv preprint arXiv:1604.03058 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[33]
Understanding Neural Networks Through Deep Visualization
Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[34]
Visualizing and understanding convolutional networks
Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818–833. Springer, 2014
work page 2014
-
[35]
Non- vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach
Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P Adams, and Peter Orbanz. Non- vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach. ICLR, 2018. 9
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.