pith. sign in

arxiv: 2605.25174 · v1 · pith:LPKXKKYZnew · submitted 2026-05-24 · 🧬 q-bio.NC · cs.LG· cs.NE

Growing a Neural Network in Breadth, Depth, and Time

Pith reviewed 2026-06-29 23:20 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.LGcs.NE
keywords recurrent convolutional networksresource constraintsbreadth depth timedifferentiable costsnetwork growthobject recognitionhuman reaction timescomputational graphs
0
0 comments X

The pith

Recurrent convolutional networks learn to trade off breadth, depth, and time when these resources are penalized with differentiable costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce differentiable costs for network breadth, depth, and processing time in a recurrent convolutional neural network modeled as a finite subset of an infinite lattice. Optimizing these costs together with task performance via backpropagation produces networks that can substitute any one resource for the others to reach a given accuracy. With increasing task complexity, the networks expand in all three dimensions, and they use more recurrent steps for partially occluded inputs. The computation time required by the model also correlates with human reaction times during object recognition. This framework provides a normative model for how resource constraints influence the emergence of neural architectures.

Core claim

Jointly optimizing task error with costs on breadth, depth, and time causes diverse computational graphs to emerge, with all three resources trading off against each other, networks growing in every dimension as tasks become harder, increased recurrence under occlusion, and model time correlating with human reaction times.

What carries the argument

Differentiable cost terms for breadth, depth, and time, optimized jointly with task errors via backpropagation within a recurrent convolutional network conceived as a finite subset of an infinite lattice.

If this is right

  • Networks grow in breadth, depth, and time as task complexity increases.
  • All three resources can be traded off to achieve a target accuracy.
  • More recurrent steps are taken spontaneously when inputs are occluded.
  • Time used by the model correlates with human reaction times in object recognition.
  • Diverse computational graphs emerge under varying pressures on the three resources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar mechanisms might explain variations in brain size and recurrence across different species or cognitive demands.
  • The approach could be used to design artificial networks that balance efficiency across multiple resource dimensions.
  • Applying the method to other sensory modalities or tasks might test whether the human reaction time correlation generalizes.
  • The infinite lattice conception allows for continuous rather than discrete network scaling in theory.

Load-bearing premise

The specific differentiable costs chosen for breadth, depth, and time, along with the recurrent convolutional architecture as a finite subset of an infinite lattice, faithfully capture the resource constraints that shape neural computation.

What would settle it

A failure to observe resource trade-offs, lack of growth in all three dimensions with task complexity, or absence of correlation between model time and human reaction times on the object recognition task would falsify the central claims.

Figures

Figures reproduced from arXiv: 2605.25174 by Eivinas Butkus, Kedar Garz\'on Gupta, Nikolaus Kriegeskorte.

Figure 1
Figure 1. Figure 1: a The space of possible computational graphs can be conceptualized as an infinite lattice, extending in the space of resource use. Here we consider breadth, depth, and time. b Each model instance is a finite subset of the infinite lattice with its own profile of resource use. Our framework lets the network select its own position in this space by optimizing differentiable resource costs. We implement the l… view at source ↗
Figure 2
Figure 2. Figure 2: Model architecture. The network implements a finite subset of the infinite lattice ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Breadth vs. depth. a Raw costs decrease smoothly with increasing λbreadth and λdepth. b Average weight magnitudes across layers and channels for each λ combination, with pruned model boundaries shown in red (preserving 98% above-chance accuracy). Top right of each panel shows accuracy before → after pruning. Shallow-and-wide models (top left) can achieve comparable accuracy to narrow-and-deep models (botto… view at source ↗
Figure 4
Figure 4. Figure 4: Time. a λtime vs. time cost Ltime. b λtime vs. time used. c Time used vs. accuracy: adaptive time selection dominates fixed. d Occlusion introduced at test time increases time used, demonstrating that the model adaptively chooses how long to compute. e–h Adaptive model behavior averaged across all λtime > 0. e Easy and hard images for several categories, defined by model time used. Model spends more time o… view at source ↗
Figure 5
Figure 5. Figure 5: Breadth vs. depth vs. time. a Accuracy as a function of λbreadth and λdepth for increasing λtime (left to right). b Pareto-optimal models (red) that achieve ≥70% accuracy while minimizing breadth, depth, and time used, shown in 3D resource space. c Pairwise 2D projections of the Pareto set. Red points spread across all projections, indicating that breadth, depth, and time are fungible. d Error consistency … view at source ↗
Figure 6
Figure 6. Figure 6: Task complexity. a Weight magnitude maps across layers and channels for MNIST, CIFAR￾10, and Tiny ImageNet under matched resource pressures (λtime = 0.1, single model instance shown per panel). Networks grow in breadth and depth as the task becomes more complex. b Resources used (channels, layers, time) as a function of resource pressure for each dataset. CIFAR-10 and Tiny ImageNet use more spatial resourc… view at source ↗
Figure 7
Figure 7. Figure 7: Attribution map entropy as a function of accuracy and number of layers used. At matched [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

Spatial and temporal resource constraints are critical for both biological and artificial intelligent systems. Here we define differentiable cost terms for breadth, depth, and time within a recurrent convolutional neural network conceived as a finite subset of an infinite lattice. We optimize these costs jointly with task errors via backpropagation. We set different pressures on breadth, depth, and time, which leads to diverse computational graphs emerging organically through training. We find that all three resources can be traded off against each other to achieve a given level of accuracy. Networks grow in all three dimensions with task complexity and spontaneously take more recurrent steps when inputs are occluded. Surprisingly, time used by the model correlates with human reaction times in an object recognition task. Our framework provides a normative account of how resource constraints shape neural architectures, connecting to questions about brain design in neuroscience, and may help illuminate the diversity of neural solutions found in nature.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper defines differentiable cost terms for breadth, depth, and time within a recurrent convolutional network treated as a finite subset of an infinite lattice. These costs are optimized jointly with task loss via backpropagation under varying resource pressures, producing emergent computational graphs. Key findings include trade-offs among the three resources for fixed accuracy, growth in all dimensions with task complexity, increased recurrent steps under occlusion, and a correlation between model time usage and human reaction times in object recognition.

Significance. If the results hold under scrutiny of the cost functions and controls, the work supplies a normative model linking resource constraints to architectural diversity, with direct relevance to questions of brain design in neuroscience. The joint optimization and spontaneous emergence of dynamics (e.g., extra recurrent steps) are notable strengths when the cost terms can be independently justified.

major comments (2)
  1. [Abstract] Abstract: the assertion that the framework supplies a 'normative account of how resource constraints shape neural architectures' is load-bearing for the central claim, yet the differentiable cost terms for breadth, depth, and time are selected for differentiability rather than derived from measured biological quantities (e.g., wiring length or metabolic rate); alternative exponents or normalizations could eliminate the reported trade-offs and spontaneous dynamics.
  2. [Abstract] The resource pressure coefficients (lambdas) are free parameters; without an independent justification or sensitivity analysis showing that the growth patterns and human-RT correlation survive changes in functional form, the normative interpretation risks circularity.
minor comments (1)
  1. [Abstract] Abstract: the reported correlation between model time and human reaction times lacks any mention of statistical test, effect size, number of participants, or controls for task difficulty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and limitations of our normative framing. We address the two major points below and will revise the manuscript accordingly to strengthen the presentation of our cost functions and their justification.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that the framework supplies a 'normative account of how resource constraints shape neural architectures' is load-bearing for the central claim, yet the differentiable cost terms for breadth, depth, and time are selected for differentiability rather than derived from measured biological quantities (e.g., wiring length or metabolic rate); alternative exponents or normalizations could eliminate the reported trade-offs and spontaneous dynamics.

    Authors: We agree that the cost terms (linear penalties on breadth and depth, and a step-count penalty on time) were selected to be differentiable so that resource usage can be optimized jointly with task loss via backpropagation. The normative claim refers to the principle that explicit, optimizable resource constraints can produce emergent architectural diversity and dynamics, rather than to the claim that our exact functional forms match measured biological quantities. We will revise the abstract to qualify the normative language and add a dedicated paragraph in the discussion that (i) states the rationale for the chosen forms as tractable approximations and (ii) reports new sensitivity analyses on exponents and normalizations. These analyses will test whether the reported trade-offs, growth patterns, and spontaneous recurrence survive changes in functional form. revision: yes

  2. Referee: [Abstract] The resource pressure coefficients (lambdas) are free parameters; without an independent justification or sensitivity analysis showing that the growth patterns and human-RT correlation survive changes in functional form, the normative interpretation risks circularity.

    Authors: The lambdas are hyperparameters that set the relative strength of each resource cost. Their specific values were selected so that networks reach high accuracy while still exhibiting measurable resource usage. We acknowledge that independent biological justification for particular lambda values is not provided. We will add a sensitivity section that sweeps lambda values over an order-of-magnitude range and tests alternative cost functional forms; we will show that the core phenomena—resource trade-offs for fixed accuracy, growth in all three dimensions with task complexity, increased recurrence under occlusion, and the model-time/human-RT correlation—remain qualitatively intact. This analysis will be reported in the revised manuscript to reduce the risk of circularity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; costs explicitly defined and results emerge from joint optimization with external validation

full rationale

The paper defines differentiable cost terms for breadth, depth, and time, then optimizes them jointly with task loss via backpropagation on a recurrent conv net treated as a lattice subset. Emergent behaviors (resource trade-offs, growth with complexity, extra recurrent steps under occlusion, human RT correlation) are simulation outcomes rather than inputs by construction. No equations or self-citations reduce any central claim to a fitted parameter renamed as prediction or to a self-referential definition. The human RT correlation supplies an independent external benchmark. This is a standard normative modeling setup with chosen but transparent functional forms; no load-bearing step collapses to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

Abstract only; the framework rests on the unstated premise that the chosen cost functions are appropriate normative models and that backpropagation through the recurrent lattice is sufficient to discover biologically plausible architectures. No explicit free parameters, axioms, or invented entities are listed.

free parameters (1)
  • resource pressure coefficients (lambdas for breadth, depth, time)
    The abstract states that different pressures are set on the three resources; these scalar multipliers are chosen by the experimenter and directly control the emergent architectures.
axioms (2)
  • standard math Backpropagation can jointly optimize task loss and differentiable resource costs
    Implicit in the statement that costs are optimized jointly with task errors via backpropagation.
  • domain assumption The recurrent convolutional network can be treated as a finite subset of an infinite lattice without loss of generality for the resource trade-offs
    Stated in the abstract as the modeling choice.
invented entities (1)
  • differentiable cost terms for breadth, depth, and time no independent evidence
    purpose: To penalize resource use so that networks grow organically under different pressures
    These terms are defined by the authors and are central to the framework; no independent evidence outside the model is provided in the abstract.

pith-pipeline@v0.9.1-grok · 5693 in / 1510 out tokens · 22251 ms · 2026-06-29T23:20:37.846012+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 15 canonical work pages · 7 internal anchors

  1. [1]

    Jascha Achterberg, Danyal Akarca, D. J. Strouse, John Duncan, and Duncan E. Astle. Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings. Nature Machine Intelligence, 5(12):1369–1381, November 2023. ISSN 2522-5839. doi: 10.1038/ s42256-023-00748-9. URLhttps://www.nature.com/articles/s4...

  2. [2]

    Predictive coding is a consequence of energy efficiency in recurrent neural networks.Patterns, 3(12), 2022

    Abdullahi Ali, Nasir Ahmad, Elgar de Groot, Marcel Antonius Johannes van Gerven, and Tim Christian Kietzmann. Predictive coding is a consequence of energy efficiency in recurrent neural networks.Patterns, 3(12), 2022

  3. [3]

    Adaptive computation as a new mechanism of dynamic human attention.Psychological Review, 133(3):534, 2026

    Mario Belledonne, Eivinas Butkus, Brian J Scholl, and Ilker Yildirim. Adaptive computation as a new mechanism of dynamic human attention.Psychological Review, 133(3):534, 2026

  4. [4]

    Nicholas M Blauch, Marlene Behrmann, and David C Plaut. A connectivity-constrained computational account of topographic organization in primate high-level visual cortex.Proceedings of the National Academy of Sciences, 119(3):e2112566119, 2022

  5. [5]

    How attention saves energy in vision.bioRxiv,

    Eivinas Butkus, Zhuofan Ying, and Nikolaus Kriegeskorte. How attention saves energy in vision.bioRxiv,

  6. [6]

    doi: 10.64898/2026.03.18.710397

  7. [7]

    Chen, David H

    Beth L. Chen, David H. Hall, and Dmitri B. Chklovskii. Wiring optimization can relate neuronal structure and function.Proceedings of the National Academy of Sciences of the United States of America, 103(12): 4723–4728, March 2006. ISSN 0027-8424. doi: 10.1073/pnas.0506806103

  8. [8]

    Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations, December 2019. URLhttp://arxiv.org/abs/1806.07366. arXiv:1806.07366 [cs]

  9. [9]

    Wiring optimization in cortical circuits

    Dmitri B Chklovskii, Thomas Schikorski, and Charles F Stevens. Wiring optimization in cortical circuits. Neuron, 34(3):341–347, 2002

  10. [10]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255,

  11. [11]

    doi: 10.1109/CVPR.2009.5206848

  12. [12]

    Aldo Faisal, Luc P

    A. Aldo Faisal, Luc P. J. Selen, and Daniel M. Wolpert. Noise in the nervous system.Nature Reviews Neuroscience, 9(4):292–303, April 2008. ISSN 1471-003X, 1471-0048. doi: 10.1038/nrn2258. URL https://www.nature.com/articles/nrn2258

  13. [13]

    The lottery ticket hypothesis: Finding sparse, trainable neural networks

    Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019. URL https://openreview. net/forum?id=rJl-b3RcF7

  14. [14]

    Wichmann

    Robert Geirhos, Kristof Meding, and Felix A. Wichmann. Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency.Advances in Neural Information Pro- cessing Systems, 33:13890–13902, 2020. URL https://proceedings.neurips.cc/paper_files/ paper/2020/hash/9f6992966d4c363ea0162a056cb45fe5-Abstract.html

  15. [15]

    Computational rationality: A converging paradigm for intelligence in brains, minds, and machines.Science, 349(6245):273–278, 2015

    Samuel J Gershman, Eric J Horvitz, and Joshua B Tenenbaum. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines.Science, 349(6245):273–278, 2015

  16. [16]

    Adaptive Computation Time for Recurrent Neural Networks

    Alex Graves. Adaptive computation time for recurrent neural networks.arXiv preprint arXiv:1603.08983, 2016

  17. [17]

    Learning both weights and connections for efficient neural network

    Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Gar- nett, editors,Advances in Neural Information Processing Systems, volume 28. Curran Asso- ciates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/ ae0eb...

  18. [18]

    Song Han, Huizi Mao, and William J. Dally. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, February 2016. URL http://arxiv.org/ abs/1510.00149. arXiv:1510.00149 [cs]

  19. [19]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 11

  20. [20]

    Ho, David Abel, Carlos G

    Mark K. Ho, David Abel, Carlos G. Correa, Michael L. Littman, Jonathan D. Cohen, and Thomas L. Griffiths. People construct simplified mental representations to plan.Nature, 606(7912):129–136, June

  21. [21]

    doi: 10.1038/s41586-022-04743-9

    ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-022-04743-9. URL https://www.nature. com/articles/s41586-022-04743-9

  22. [22]

    C.et al.Recurrence is required to capture the representational dynam- ics of the human visual system.Proceedings of the National Academy of Sciences116, 21854–21863 (2019)

    Tim C. Kietzmann, Courtney J. Spoerer, Lynn K. A. Sörensen, Radoslaw M. Cichy, Olaf Hauk, and Nikolaus Kriegeskorte. Recurrence is required to capture the representational dynamics of the human visual system.Proceedings of the National Academy of Sciences, 116(43):21854–21863, October 2019. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1905544116. URL http...

  23. [23]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, Ontario, 2009

  24. [24]

    Communication in neuronal networks.Science, 301(5641): 1870–1874, 2003

    Simon B Laughlin and Terrence J Sejnowski. Communication in neuronal networks.Science, 301(5641): 1870–1874, 2003

  25. [25]

    On the value of model diversity in neuroscience.Nature Reviews Neuroscience, 21(8): 395–396, 2020

    Gilles Laurent. On the value of model diversity in neuroscience.Nature Reviews Neuroscience, 21(8): 395–396, 2020

  26. [26]

    Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

    Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

  27. [27]

    Optimal brain damage.Advances in neural information processing systems, 2, 1989

    Yann LeCun, John Denker, and Sara Solla. Optimal brain damage.Advances in neural information processing systems, 2, 1989

  28. [28]

    Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

  29. [29]

    Pruning Filters for Efficient ConvNets

    Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning Filters for Efficient ConvNets, March 2017. URLhttp://arxiv.org/abs/1608.08710. arXiv:1608.08710 [cs]

  30. [30]

    Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources.Behavioral and Brain Sciences, 43:e1, 2020

    Falk Lieder and Thomas L Griffiths. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources.Behavioral and Brain Sciences, 43:e1, 2020

  31. [31]

    A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs

    Jack Lindsey, Samuel A Ocko, Surya Ganguli, and Stephane Deny. A unified theory of early visual represen- tations from retina to cortex through anatomically constrained deep cnns.arXiv preprint arXiv:1901.00945, 2019

  32. [32]

    Progressive neural architecture search

    Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. InProceedings of the European conference on computer vision (ECCV), pages 19–34, 2018

  33. [33]

    DARTS: Differentiable architecture search

    Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum? id=S1eYHoC5FX

  34. [34]

    Learning efficient convolutional networks through network slimming

    Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. InProceedings of the IEEE international conference on computer vision, pages 2736–2744, 2017

  35. [35]

    DiCarlo, Kalanit Grill-Spector, and Daniel L.K

    Eshed Margalit, Hyodong Lee, Dawn Finzi, James J. DiCarlo, Kalanit Grill-Spector, and Daniel L.K. Yamins. A unifying framework for functional organization in early and higher ventral visual cortex. Neuron, 112(14):2435–2451.e7, July 2024. ISSN 08966273. doi: 10.1016/j.neuron.2024.04.018. URL https://linkinghub.elsevier.com/retrieve/pii/S0896627324002794

  36. [36]

    Human uncertainty makes classification more robust

    Joshua C Peterson, Ruairidh M Battleday, Thomas L Griffiths, and Olga Russakovsky. Human uncertainty makes classification more robust. InProceedings of the IEEE/CVF international conference on computer vision, pages 9617–9626, 2019

  37. [37]

    A behavioral model of rational choice.The quarterly journal of economics, pages 99–118, 1955

    Herbert A Simon. A behavioral model of rational choice.The quarterly journal of economics, pages 99–118, 1955

  38. [38]

    Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

    Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314, 2024

  39. [39]

    Spoerer, Patrick McClure, and Nikolaus Kriegeskorte

    Courtney J. Spoerer, Patrick McClure, and Nikolaus Kriegeskorte. Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition.Frontiers in Psychology, 8:1551, September

  40. [40]

    doi: 10.3389/fpsyg.2017.01551

    ISSN 1664-1078. doi: 10.3389/fpsyg.2017.01551. URL https://www.frontiersin.org/ article/10.3389/fpsyg.2017.01551/full. 12

  41. [41]

    Re- current neural networks can explain flexible trading of speed and accuracy in biological vision.PLOS Computational Biology, 16(10):e1008215, 2020

    Courtney J Spoerer, Tim C Kietzmann, Johannes Mehrer, Ian Charest, and Nikolaus Kriegeskorte. Re- current neural networks can explain flexible trading of speed and accuracy in biological vision.PLOS Computational Biology, 16(10):e1008215, 2020

  42. [42]

    MIT Press, 2015

    Peter Sterling and Simon Laughlin.Principles of neural design. MIT Press, 2015

  43. [43]

    Speed of processing in the human visual system.Nature, 381(6582):520–522, 1996

    Simon Thorpe, Denis Fize, and Catherine Marlot. Speed of processing in the human visual system.Nature, 381(6582):520–522, 1996

  44. [44]

    One and done? Optimal decisions from very few samples.Cognitive science, 38(4):599–637, 2014

    Edward Vul, Noah Goodman, Thomas L Griffiths, and Joshua B Tenenbaum. One and done? Optimal decisions from very few samples.Cognitive science, 38(4):599–637, 2014

  45. [45]

    Visualizing and understanding convolutional networks

    Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. InEuropean conference on computer vision, pages 818–833. Springer, 2014. 13 A Compute Each model was trained on a single GPU for approximately 2.5 hours, requiring roughly 3.3 GB of GPU memory at batch size 128. Training was conducted on a university cluster with a mix ...