What does it mean to understand a neural network?

Konrad P. Kording; Timothy P. Lillicrap

Reviewed by Pith at T0; open to challenge.

T0 means a machine referee read the full paper against a public rubric. The mark states how deep the mechanical check went, never who wrote it. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

Simple rules for development and learning in brains may be far easier to understand than the complex properties they produce.

2026-05-24 21:36 UTC pith:GFTAXKXY

load-bearing objection Short perspective using the ANN code-vs-weights split to argue that neuroscience should prioritize studying learning rules over mature brain properties.

arxiv 1907.06374 v1 pith:GFTAXKXY submitted 2019-07-15 cs.LG q-bio.NCstat.ML

What does it mean to understand a neural network?

Timothy P. Lillicrap , Konrad P. Kording This is my paper

classification cs.LG q-bio.NCstat.ML

keywords neural networksneurosciencelearning rulesdevelopmentanalogyunderstandingplasticity

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A short program can specify a neural network that learns to recognize objects, yet after training the network holds its knowledge in millions of weights that are hard to interpret directly. The paper draws an analogy to biological brains, where the rules governing development and learning could likewise be compact and understandable even if the resulting adult brain shows intricate tuning and connections. If this parallel holds, neuroscience would gain more by studying those generative rules than by attempting to reverse-engineer mature properties alone. The conjecture therefore points toward a shift in research priorities away from static descriptions of the adult brain.

Core claim

Artificial neural networks can be written in fewer than 100 lines of code and yet, once trained, contain millions of weights that encode knowledge about many object types; the same logic suggests that the rules for development and learning in brains may be far simpler to characterize than the resulting mature neural properties such as tuning curves or connection patterns.

What carries the argument

The analogy between the simple code versus complex trained weights in artificial networks and the developmental rules versus mature properties in biological brains.

Load-bearing premise

The relationship between simple code and complex trained weights in artificial networks provides a valid parallel for the relationship between developmental rules and mature properties in biological brains.

What would settle it

A demonstration that the minimal set of developmental rules needed to produce observed adult brain properties is itself as large and intricate as a full description of those properties would falsify the central analogy.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Neuroscience would benefit from directing more effort toward identifying and testing rules for learning and development rather than cataloging adult properties.
Understanding of brain function could advance by characterizing the generative processes that produce complex structure instead of analyzing the structure in isolation.
Models of brain development that stay close to compact rule sets would be preferred over models that require specifying every mature connection or tuning value.
Experimental work focused on early life stages and plasticity mechanisms would become central to explaining adult brain organization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same logic could be applied to other complex biological systems where growth rules might be simpler than final forms.
Efforts to build complete static maps of brain wiring would need to be paired with explicit models of the rules that generate those maps to yield understanding.
In artificial systems, interpretability research might similarly gain from studying training dynamics rather than only inspecting final weights.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

Short perspective using the ANN code-vs-weights split to argue that neuroscience should prioritize studying learning rules over mature brain properties.

read the letter

The core takeaway is that this is a short conjecture piece: a neural net fits in under 100 lines of code but ends up with millions of weights, so maybe the developmental rules in brains are simpler to understand than the final wiring and tuning. The authors suggest this analogy should shift neuroscience effort toward learning and development mechanisms. That framing is the main thing to take away. The paper does a clean job of using a current deep-learning example to make the point concrete, and the suggestion that understanding the process might be more tractable than the end state is worth saying plainly. Similar ideas have appeared before in computational neuroscience, but the direct link to modern ANN implementations is the freshest angle here. The obvious limitation is that the whole argument rests on an untested analogy with no data, derivation, or even a proposed test. The code-weights distinction in ANNs is intentional by design; nothing in the paper shows why the same split should map onto biological development, and the text does not engage counter-examples or alternative framings. No equations or measurements are offered, so the claim stays at the level of a directional hint rather than a substantiated position. This is the kind of piece that belongs in a journal that publishes perspectives or commentaries rather than a methods or results venue. Readers already working on learning rules or developmental models in neuroscience or AI-adjacent fields might find it useful as a prompt for discussion. It does not contain new results or formal arguments that would require heavy referee scrutiny, but the question it raises is coherent enough that a serious editor could reasonably send it out for review as an opinion piece.

Referee Report

0 major / 0 minor

Summary. The manuscript draws an analogy between artificial neural networks—which can be specified in fewer than 100 lines of code yet produce millions of trained weights encoding object recognition—and biological brains, conjecturing that developmental and learning rules may be substantially easier to understand than the resulting mature properties such as tuning curves or connectivity patterns. It concludes that neuroscience would therefore benefit from greater emphasis on learning and development.

Significance. If the suggested parallel between code-versus-weights and developmental-rules-versus-mature-properties proves heuristically useful, the perspective could usefully redirect research attention in neuroscience toward mechanistic accounts of learning. The manuscript supplies no empirical tests, derivations, or quantitative comparisons, so its value is limited to the directional suggestion rather than any demonstrated equivalence or falsifiable prediction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their accurate summary of the manuscript and for recommending acceptance. The referee correctly notes that the work is a directional conjecture based on the code-versus-weights analogy rather than a set of empirical tests or quantitative predictions.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a short perspective piece that advances an explicit conjecture via analogy between the code/weights distinction in ANNs and the developmental-rules/mature-properties distinction in brains. No equations, derivations, fitted parameters, or quantitative predictions appear in the provided text. The central suggestion to neuroscience is presented as a directional implication of the analogy rather than a result derived from or equivalent to its own assumptions. No self-citation chains or load-bearing reductions are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper advances an informal analogy without introducing formal axioms, free parameters, or new postulated entities.

pith-pipeline@v0.9.0 · 5619 in / 945 out tokens · 18330 ms · 2026-05-24T21:36:43.296809+00:00 · methodology

0 comments

read the original abstract

We can define a neural network that can learn to recognize objects in less than 100 lines of code. However, after training, it is characterized by millions of weights that contain the knowledge about many object types across visual scenes. Such networks are thus dramatically easier to understand in terms of the code that makes them than the resulting properties, such as tuning or connections. In analogy, we conjecture that rules for development and learning in brains may be far easier to understand than their resulting properties. The analogy suggests that neuroscience would benefit from a focus on learning and development.

Figures

Figures reproduced from arXiv: 1907.06374 by Konrad P. Kording, Timothy P. Lillicrap.

**Figure 1.** Figure 1: The notion of compressability. (A) Unbeatable performance at tic-tac-toe can be obtained with just three rules. These rules can be ordered by their importance (B) Unbeatable performance at the game of Go needs many rules. So many, in fact, that we can not know that number. Importantly, the rules are heavy tailed. If we have any small number of rules, the remaining rules will generally still carry a lot inf… view at source ↗

**Figure 2.** Figure 2: Dividing theories of brain functions into principles and data. (A) If the brain is not compressable, we can at best divide our description of it into a set of compact principles and a set of non compressable data. We may then hope that the way the brain converts the data into computation may be understandable. (B) A very natural division is to ask for an understanding of anatomy and plasticity rules, which… view at source ↗

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Validating Causal Abstraction Metrics on Simulated Complex Systems
cs.LG 2026-06 unverdicted novelty 6.0

Authors create a benchmark across discrete/continuous and static/dynamical systems and introduce the Causal Abstraction Error (CAE) metric that reliably distinguishes valid from invalid causal abstractions when it inc...

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · cited by 1 Pith paper · 11 internal anchors

[1]

Synthesizing Robust Adversarial Examples

Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adver- sarial examples. arXiv preprint arXiv:1707.07397 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

Learning long-term dependencies with gradient descent is diﬃcult

Yoshua Bengio, Patrice Simard, Paolo Frasconi, et al. Learning long-term dependencies with gradient descent is diﬃcult. IEEE transactions on neural networks , 5(2):157–166, 1994

work page 1994
[3]

Recognition-by-components: a theory of human image understanding

Irving Biederman. Recognition-by-components: a theory of human image understanding. Psy- chological review, 94(2):115, 1987

work page 1987
[4]

Solving the mystery of insect ﬂight

Michael Dickinson. Solving the mystery of insect ﬂight. Scientiﬁc American, 284(6):48–57, 2001

work page 2001
[5]

A functional microcircuit for cat visual cortex

Rodney J Douglas and KA Martin. A functional microcircuit for cat visual cortex. The Journal of physiology, 440(1):735–769, 1991

work page 1991
[6]

Dynamic compression and expansion in a classifying recurrent network

Matthew S Farrell, Stefano Recanatesi, Guillaume Lajoie, and Eric Shea-Brown. Dynamic compression and expansion in a classifying recurrent network. bioRxiv, page 564476, 2019. 7

work page 2019
[7]

Operant conditioning of cortical unit activity

Eberhard E Fetz. Operant conditioning of cortical unit activity. Science, 163(3870):955–958, 1969

work page 1969
[8]

Distilling a Neural Network Into a Soft Decision Tree

Nicholas Frosst and Geoﬀrey Hinton. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Image style transfer using convolu- tional neural networks

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolu- tional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016

work page 2016
[10]

Understanding the diﬃculty of training deep feedforward neural networks

Xavier Glorot and Yoshua Bengio. Understanding the diﬃculty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artiﬁcial intelli- gence and statistics , pages 249–256, 2010

work page 2010
[11]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. arXiv preprint arXiv:1412.6572 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[12]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neu- ral networks with pruning, trained quantization and huﬀman coding. arXiv preprint arXiv:1510.00149, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[13]

Distilling the Knowledge in a Neural Network

Geoﬀrey Hinton, Oriol Vinyals, and Jeﬀ Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[14]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Eﬃcient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[16]

Could a neuroscientist understand a microprocessor? PLoS computational biology, 13(1):e1005268, 2017

Eric Jonas and Konrad Paul Kording. Could a neuroscientist understand a microprocessor? PLoS computational biology, 13(1):e1005268, 2017

work page 2017
[17]

Principles of neural science, volume 4

Eric R Kandel, James H Schwartz, Thomas M Jessell, Department of Biochemistry, Molecular Biophysics Thomas Jessell, Steven Siegelbaum, and AJ Hudspeth. Principles of neural science, volume 4. McGraw-hill New York, 2000

work page 2000
[18]

Imagenet classiﬁcation with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey E Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in neural information processing systems , pages 1097–1105, 2012

work page 2012
[19]

Optimal brain damage

Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In Advances in neural information processing systems, pages 598–605, 1990

work page 1990
[20]

Measuring the Intrinsic Dimension of Objective Landscapes

Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

Humans store about 1.5 megabytes of information during language acquisition

Francis Mollica and Steven T Piantadosi. Humans store about 1.5 megabytes of information during language acquisition. Royal Society Open Science, 6(3):181393, 2019. 8

work page 2019
[22]

black box

Julian D Olden and Donald A Jackson. Illuminating the “black box”: a randomization ap- proach for understanding variable contributions in artiﬁcial neural networks. Ecological mod- elling, 154(1-2):135–150, 2002

work page 2002
[23]

On the diﬃculty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the diﬃculty of training recurrent neural networks. In International conference on machine learning , pages 1310–1318, 2013

work page 2013
[24]

Language models are unsupervised multitask learners

Alec Radford, Jeﬀrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 2019

work page 2019
[25]

An algorithmic barrier to neural circuit understanding.bioRxiv, page 639724, 2019

Venkatakrishnan Ramaswamy. An algorithmic barrier to neural circuit understanding.bioRxiv, page 639724, 2019

work page 2019
[26]

Adaptive gain control of vestibuloocular reﬂex by the cerebellum

DA Robinson. Adaptive gain control of vestibuloocular reﬂex by the cerebellum. Journal of Neurophysiology, 39(5):954–969, 1976

work page 1976
[27]

Complex network measures of brain connectivity: uses and interpretations

Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059–1069, 2010

work page 2010
[28]

Distilling free-form natural laws from experimental data

Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data. science, 324(5923):81–85, 2009

work page 2009
[29]

Mastering the game of go with deep neural networks and tree search

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanc- tot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016

work page 2016
[30]

Mastering the game of go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017

work page 2017
[31]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[32]

Binarized Neural Networks on the ImageNet Classification Task

Xundong Wu, Yong Wu, and Yong Zhao. Binarized neural networks on the imagenet classiﬁ- cation task. arXiv preprint arXiv:1604.03058 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[33]

Understanding Neural Networks Through Deep Visualization

Jason Yosinski, Jeﬀ Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[34]

Visualizing and understanding convolutional networks

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818–833. Springer, 2014

work page 2014
[35]

Non- vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach

Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P Adams, and Peter Orbanz. Non- vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach. ICLR, 2018. 9

work page 2018

[1] [1]

Synthesizing Robust Adversarial Examples

Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adver- sarial examples. arXiv preprint arXiv:1707.07397 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[2] [2]

Learning long-term dependencies with gradient descent is diﬃcult

Yoshua Bengio, Patrice Simard, Paolo Frasconi, et al. Learning long-term dependencies with gradient descent is diﬃcult. IEEE transactions on neural networks , 5(2):157–166, 1994

work page 1994

[3] [3]

Recognition-by-components: a theory of human image understanding

Irving Biederman. Recognition-by-components: a theory of human image understanding. Psy- chological review, 94(2):115, 1987

work page 1987

[4] [4]

Solving the mystery of insect ﬂight

Michael Dickinson. Solving the mystery of insect ﬂight. Scientiﬁc American, 284(6):48–57, 2001

work page 2001

[5] [5]

A functional microcircuit for cat visual cortex

Rodney J Douglas and KA Martin. A functional microcircuit for cat visual cortex. The Journal of physiology, 440(1):735–769, 1991

work page 1991

[6] [6]

Dynamic compression and expansion in a classifying recurrent network

Matthew S Farrell, Stefano Recanatesi, Guillaume Lajoie, and Eric Shea-Brown. Dynamic compression and expansion in a classifying recurrent network. bioRxiv, page 564476, 2019. 7

work page 2019

[7] [7]

Operant conditioning of cortical unit activity

Eberhard E Fetz. Operant conditioning of cortical unit activity. Science, 163(3870):955–958, 1969

work page 1969

[8] [8]

Distilling a Neural Network Into a Soft Decision Tree

Nicholas Frosst and Geoﬀrey Hinton. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

Image style transfer using convolu- tional neural networks

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using convolu- tional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016

work page 2016

[10] [10]

Understanding the diﬃculty of training deep feedforward neural networks

Xavier Glorot and Yoshua Bengio. Understanding the diﬃculty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artiﬁcial intelli- gence and statistics , pages 249–256, 2010

work page 2010

[11] [11]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. arXiv preprint arXiv:1412.6572 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[12] [12]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neu- ral networks with pruning, trained quantization and huﬀman coding. arXiv preprint arXiv:1510.00149, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[13] [13]

Distilling the Knowledge in a Neural Network

Geoﬀrey Hinton, Oriol Vinyals, and Jeﬀ Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[14] [14]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Eﬃcient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[16] [16]

Could a neuroscientist understand a microprocessor? PLoS computational biology, 13(1):e1005268, 2017

Eric Jonas and Konrad Paul Kording. Could a neuroscientist understand a microprocessor? PLoS computational biology, 13(1):e1005268, 2017

work page 2017

[17] [17]

Principles of neural science, volume 4

Eric R Kandel, James H Schwartz, Thomas M Jessell, Department of Biochemistry, Molecular Biophysics Thomas Jessell, Steven Siegelbaum, and AJ Hudspeth. Principles of neural science, volume 4. McGraw-hill New York, 2000

work page 2000

[18] [18]

Imagenet classiﬁcation with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey E Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in neural information processing systems , pages 1097–1105, 2012

work page 2012

[19] [19]

Optimal brain damage

Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In Advances in neural information processing systems, pages 598–605, 1990

work page 1990

[20] [20]

Measuring the Intrinsic Dimension of Objective Landscapes

Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

Humans store about 1.5 megabytes of information during language acquisition

Francis Mollica and Steven T Piantadosi. Humans store about 1.5 megabytes of information during language acquisition. Royal Society Open Science, 6(3):181393, 2019. 8

work page 2019

[22] [22]

black box

Julian D Olden and Donald A Jackson. Illuminating the “black box”: a randomization ap- proach for understanding variable contributions in artiﬁcial neural networks. Ecological mod- elling, 154(1-2):135–150, 2002

work page 2002

[23] [23]

On the diﬃculty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the diﬃculty of training recurrent neural networks. In International conference on machine learning , pages 1310–1318, 2013

work page 2013

[24] [24]

Language models are unsupervised multitask learners

Alec Radford, Jeﬀrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 2019

work page 2019

[25] [25]

An algorithmic barrier to neural circuit understanding.bioRxiv, page 639724, 2019

Venkatakrishnan Ramaswamy. An algorithmic barrier to neural circuit understanding.bioRxiv, page 639724, 2019

work page 2019

[26] [26]

Adaptive gain control of vestibuloocular reﬂex by the cerebellum

DA Robinson. Adaptive gain control of vestibuloocular reﬂex by the cerebellum. Journal of Neurophysiology, 39(5):954–969, 1976

work page 1976

[27] [27]

Complex network measures of brain connectivity: uses and interpretations

Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059–1069, 2010

work page 2010

[28] [28]

Distilling free-form natural laws from experimental data

Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data. science, 324(5923):81–85, 2009

work page 2009

[29] [29]

Mastering the game of go with deep neural networks and tree search

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanc- tot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016

work page 2016

[30] [30]

Mastering the game of go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017

work page 2017

[31] [31]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[32] [32]

Binarized Neural Networks on the ImageNet Classification Task

Xundong Wu, Yong Wu, and Yong Zhao. Binarized neural networks on the imagenet classiﬁ- cation task. arXiv preprint arXiv:1604.03058 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[33] [33]

Understanding Neural Networks Through Deep Visualization

Jason Yosinski, Jeﬀ Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[34] [34]

Visualizing and understanding convolutional networks

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818–833. Springer, 2014

work page 2014

[35] [35]

Non- vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach

Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P Adams, and Peter Orbanz. Non- vacuous generalization bounds at the imagenet scale: a pac-bayesian compression approach. ICLR, 2018. 9

work page 2018