pith. sign in

arxiv: 1907.06374 · v1 · pith:GFTAXKXYnew · submitted 2019-07-15 · 💻 cs.LG · q-bio.NC· stat.ML

What does it mean to understand a neural network?

Pith reviewed 2026-05-24 21:36 UTC · model grok-4.3

classification 💻 cs.LG q-bio.NCstat.ML
keywords neural networksneurosciencelearning rulesdevelopmentanalogyunderstandingplasticity
0
0 comments X

The pith

Simple rules for development and learning in brains may be far easier to understand than the complex properties they produce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A short program can specify a neural network that learns to recognize objects, yet after training the network holds its knowledge in millions of weights that are hard to interpret directly. The paper draws an analogy to biological brains, where the rules governing development and learning could likewise be compact and understandable even if the resulting adult brain shows intricate tuning and connections. If this parallel holds, neuroscience would gain more by studying those generative rules than by attempting to reverse-engineer mature properties alone. The conjecture therefore points toward a shift in research priorities away from static descriptions of the adult brain.

Core claim

Artificial neural networks can be written in fewer than 100 lines of code and yet, once trained, contain millions of weights that encode knowledge about many object types; the same logic suggests that the rules for development and learning in brains may be far simpler to characterize than the resulting mature neural properties such as tuning curves or connection patterns.

What carries the argument

The analogy between the simple code versus complex trained weights in artificial networks and the developmental rules versus mature properties in biological brains.

If this is right

  • Neuroscience would benefit from directing more effort toward identifying and testing rules for learning and development rather than cataloging adult properties.
  • Understanding of brain function could advance by characterizing the generative processes that produce complex structure instead of analyzing the structure in isolation.
  • Models of brain development that stay close to compact rule sets would be preferred over models that require specifying every mature connection or tuning value.
  • Experimental work focused on early life stages and plasticity mechanisms would become central to explaining adult brain organization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same logic could be applied to other complex biological systems where growth rules might be simpler than final forms.
  • Efforts to build complete static maps of brain wiring would need to be paired with explicit models of the rules that generate those maps to yield understanding.
  • In artificial systems, interpretability research might similarly gain from studying training dynamics rather than only inspecting final weights.

Load-bearing premise

The relationship between simple code and complex trained weights in artificial networks provides a valid parallel for the relationship between developmental rules and mature properties in biological brains.

What would settle it

A demonstration that the minimal set of developmental rules needed to produce observed adult brain properties is itself as large and intricate as a full description of those properties would falsify the central analogy.

Figures

Figures reproduced from arXiv: 1907.06374 by Konrad P. Kording, Timothy P. Lillicrap.

Figure 1
Figure 1. Figure 1: The notion of compressability. (A) Unbeatable performance at tic-tac-toe can be obtained with just three rules. These rules can be ordered by their importance (B) Unbeatable performance at the game of Go needs many rules. So many, in fact, that we can not know that number. Importantly, the rules are heavy tailed. If we have any small number of rules, the remaining rules will generally still carry a lot inf… view at source ↗
Figure 2
Figure 2. Figure 2: Dividing theories of brain functions into principles and data. (A) If the brain is not compressable, we can at best divide our description of it into a set of compact principles and a set of non compressable data. We may then hope that the way the brain converts the data into computation may be understandable. (B) A very natural division is to ask for an understanding of anatomy and plasticity rules, which… view at source ↗
read the original abstract

We can define a neural network that can learn to recognize objects in less than 100 lines of code. However, after training, it is characterized by millions of weights that contain the knowledge about many object types across visual scenes. Such networks are thus dramatically easier to understand in terms of the code that makes them than the resulting properties, such as tuning or connections. In analogy, we conjecture that rules for development and learning in brains may be far easier to understand than their resulting properties. The analogy suggests that neuroscience would benefit from a focus on learning and development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 0 minor

Summary. The manuscript draws an analogy between artificial neural networks—which can be specified in fewer than 100 lines of code yet produce millions of trained weights encoding object recognition—and biological brains, conjecturing that developmental and learning rules may be substantially easier to understand than the resulting mature properties such as tuning curves or connectivity patterns. It concludes that neuroscience would therefore benefit from greater emphasis on learning and development.

Significance. If the suggested parallel between code-versus-weights and developmental-rules-versus-mature-properties proves heuristically useful, the perspective could usefully redirect research attention in neuroscience toward mechanistic accounts of learning. The manuscript supplies no empirical tests, derivations, or quantitative comparisons, so its value is limited to the directional suggestion rather than any demonstrated equivalence or falsifiable prediction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their accurate summary of the manuscript and for recommending acceptance. The referee correctly notes that the work is a directional conjecture based on the code-versus-weights analogy rather than a set of empirical tests or quantitative predictions.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a short perspective piece that advances an explicit conjecture via analogy between the code/weights distinction in ANNs and the developmental-rules/mature-properties distinction in brains. No equations, derivations, fitted parameters, or quantitative predictions appear in the provided text. The central suggestion to neuroscience is presented as a directional implication of the analogy rather than a result derived from or equivalent to its own assumptions. No self-citation chains or load-bearing reductions are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper advances an informal analogy without introducing formal axioms, free parameters, or new postulated entities.

pith-pipeline@v0.9.0 · 5619 in / 945 out tokens · 18330 ms · 2026-05-24T21:36:43.296809+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 11 internal anchors

  1. [1]

    Synthesizing Robust Adversarial Examples

    Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adver- sarial examples. arXiv preprint arXiv:1707.07397 , 2017

  2. [2]

    Learning long-term dependencies with gradient descent is difficult

    Yoshua Bengio, Patrice Simard, Paolo Frasconi, et al. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks , 5(2):157–166, 1994

  3. [3]

    Recognition-by-components: a theory of human image understanding

    Irving Biederman. Recognition-by-components: a theory of human image understanding. Psy- chological review, 94(2):115, 1987

  4. [4]

    Solving the mystery of insect flight

    Michael Dickinson. Solving the mystery of insect flight. Scientific American, 284(6):48–57, 2001

  5. [5]

    A functional microcircuit for cat visual cortex

    Rodney J Douglas and KA Martin. A functional microcircuit for cat visual cortex. The Journal of physiology, 440(1):735–769, 1991

  6. [6]

    Dynamic compression and expansion in a classifying recurrent network

    Matthew S Farrell, Stefano Recanatesi, Guillaume Lajoie, and Eric Shea-Brown. Dynamic compression and expansion in a classifying recurrent network. bioRxiv, page 564476, 2019. 7

  7. [7]

    Operant conditioning of cortical unit activity

    Eberhard E Fetz. Operant conditioning of cortical unit activity. Science, 163(3870):955–958, 1969

  8. [8]

    Distilling a Neural Network Into a Soft Decision Tree

    Nicholas Frosst and Geoffrey Hinton. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017

  9. [9]

    Image style transfer using convolu- tional neural networks

    Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolu- tional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016

  10. [10]

    Understanding the difficulty of training deep feedforward neural networks

    Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelli- gence and statistics , pages 249–256, 2010

  11. [11]

    Explaining and Harnessing Adversarial Examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. arXiv preprint arXiv:1412.6572 , 2014

  12. [12]

    Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

    Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neu- ral networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015

  13. [13]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 , 2015

  14. [14]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 , 2017

  15. [15]

    SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

    Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360 , 2016

  16. [16]

    Could a neuroscientist understand a microprocessor? PLoS computational biology, 13(1):e1005268, 2017

    Eric Jonas and Konrad Paul Kording. Could a neuroscientist understand a microprocessor? PLoS computational biology, 13(1):e1005268, 2017

  17. [17]

    Principles of neural science, volume 4

    Eric R Kandel, James H Schwartz, Thomas M Jessell, Department of Biochemistry, Molecular Biophysics Thomas Jessell, Steven Siegelbaum, and AJ Hudspeth. Principles of neural science, volume 4. McGraw-hill New York, 2000

  18. [18]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems , pages 1097–1105, 2012

  19. [19]

    Optimal brain damage

    Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In Advances in neural information processing systems, pages 598–605, 1990

  20. [20]

    Measuring the Intrinsic Dimension of Objective Landscapes

    Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838 , 2018

  21. [21]

    Humans store about 1.5 megabytes of information during language acquisition

    Francis Mollica and Steven T Piantadosi. Humans store about 1.5 megabytes of information during language acquisition. Royal Society Open Science, 6(3):181393, 2019. 8

  22. [22]

    black box

    Julian D Olden and Donald A Jackson. Illuminating the “black box”: a randomization ap- proach for understanding variable contributions in artificial neural networks. Ecological mod- elling, 154(1-2):135–150, 2002

  23. [23]

    On the difficulty of training recurrent neural networks

    Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In International conference on machine learning , pages 1310–1318, 2013

  24. [24]

    Language models are unsupervised multitask learners

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 2019

  25. [25]

    An algorithmic barrier to neural circuit understanding.bioRxiv, page 639724, 2019

    Venkatakrishnan Ramaswamy. An algorithmic barrier to neural circuit understanding.bioRxiv, page 639724, 2019

  26. [26]

    Adaptive gain control of vestibuloocular reflex by the cerebellum

    DA Robinson. Adaptive gain control of vestibuloocular reflex by the cerebellum. Journal of Neurophysiology, 39(5):954–969, 1976

  27. [27]

    Complex network measures of brain connectivity: uses and interpretations

    Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059–1069, 2010

  28. [28]

    Distilling free-form natural laws from experimental data

    Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data. science, 324(5923):81–85, 2009

  29. [29]

    Mastering the game of go with deep neural networks and tree search

    David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanc- tot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016

  30. [30]

    Mastering the game of go without human knowledge

    David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017

  31. [31]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013

  32. [32]

    Binarized Neural Networks on the ImageNet Classification Task

    Xundong Wu, Yong Wu, and Yong Zhao. Binarized neural networks on the imagenet classifi- cation task. arXiv preprint arXiv:1604.03058 , 2016

  33. [33]

    Understanding Neural Networks Through Deep Visualization

    Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 , 2015

  34. [34]

    Visualizing and understanding convolutional networks

    Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818–833. Springer, 2014

  35. [35]

    Non- vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach

    Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P Adams, and Peter Orbanz. Non- vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach. ICLR, 2018. 9