Recognition: no theorem link
Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons
Pith reviewed 2026-05-13 06:23 UTC · model grok-4.3
The pith
Allocating a fixed parameter budget between neuron count, complexity, and connectivity in recurrent networks produces a non-trivial optimum that shifts toward more complex neurons as the budget grows.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under a fixed parameter budget, recurrent networks built from ELM neurons exhibit a non-trivial optimum in the allocation of units, per-unit complexity, and connectivity; larger budgets shift this optimum toward greater per-neuron complexity. The tradeoffs are captured by a closed-form information-theoretic model that attributes diminishing returns to per-neuron signal-to-noise saturation at high complexity and across-neuron redundancy at high connectivity or low complexity.
What carries the argument
The ELM neuron, an Expressive Leaky Memory unit whose design permits independent tuning of effective complexity k_e and connectivity k_c while maintaining stable training across scales.
If this is right
- Performance increases monotonically when varying N, k_e, or k_c individually.
- The optimal balance shifts toward higher k_e as total parameters grow.
- The information-theoretic model predicts the locations of the performance peaks.
- Sweeps over three orders of magnitude in parameters trace a consistent scaling surface.
Where Pith is reading between the lines
- The same allocation principles may apply to other sequence architectures beyond recurrent networks.
- Biological cortical neurons may have evolved their complexity to optimize similar efficiency tradeoffs under resource constraints.
- New benchmarks could test whether the identified optimum generalizes beyond the SHD-Adding and Enwik8 tasks.
Load-bearing premise
The ELM neuron design permits truly independent control of complexity and connectivity without unintended interactions, and the two sequence benchmarks suffice to establish a general scaling law.
What would settle it
A direct test would be to measure whether performance on additional sequence modeling tasks continues to favor increasingly complex neurons as the total parameter count rises beyond the range explored, or whether the predicted information-theoretic curves match observed error rates when k_e or k_c are varied independently.
Figures
read the original abstract
Cortical neurons are complex, multi-timescale processors wired into recurrent circuits, shaped by long evolutionary pressure under stringent biological constraints. Mainstream machine learning, by contrast, predominantly builds models from extremely simple units, a default inherited from early neural-network theory. We treat this as a normative architectural question. How should one split a fixed parameter budget $P$ between the number of units $N$, per-unit effective complexity $k_e$, and per-unit connectivity $k_c$? What controls the optimal allocation? This calls for a model in which per-unit complexity can be tuned independently of width and connectivity. Accordingly, we introduce the ELM Network, whose recurrent layer is built from Expressive Leaky Memory (ELM) neurons, chosen to mirror functional components of cortical neurons. The architecture allows for individually adjusting $N$, $k_e$, and $k_c$ and trains stably across orders of magnitude in scale. We evaluate the model on two qualitatively different sequence benchmarks: the neuromorphic SHD-Adding task and Enwik8 character-level language modeling. Performance improves monotonically along each of the three axes individually. Under a fixed budget, a clear non-trivial optimum emerges in their tradeoff, and larger budgets favor both more and more complex neurons. A closed-form information-theoretic model captures these tradeoffs and attributes the diminishing returns at two ends to: per-neuron signal-to-noise saturation and across-neuron redundancy. A hyperparameter sweep spanning three orders of magnitude in trainable parameters traces a near-Pareto-frontier scaling law consistent with the framework. This suggests that the simple-unit default in ML is not obviously optimal once this tradeoff surface is probed, and offers a normative lens on cortex's reliance on complex spatio-temporal integrators.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Expressive Leaky Memory (ELM) neurons for recurrent networks, allowing independent tuning of neuron count N, per-neuron effective complexity k_e, and per-neuron connectivity k_c within a fixed parameter budget P. On SHD-Adding and Enwik8 benchmarks, performance improves monotonically along each axis separately; under fixed P a non-monotonic optimum appears, with larger budgets favoring both higher N and higher k_e. A closed-form information-theoretic model is derived to explain the observed tradeoffs, attributing diminishing returns to per-neuron signal-to-noise saturation at high k_e and across-neuron redundancy at high N.
Significance. If the claimed independence of k_e and k_c holds and the closed-form model is predictive rather than post-hoc, the work supplies a concrete scaling-law framework that challenges the default use of simple units in recurrent architectures and offers a normative account of why biological circuits employ complex spatio-temporal integrators. The three-axis sweep spanning three orders of magnitude in P and the explicit attribution of the two saturation regimes are the strongest contributions.
major comments (3)
- [ELM neuron definition and hyperparameter sweep] The central claim that N, k_e, and k_c can be varied independently rests on the ELM neuron definition. The leaky-integrator dynamics and recurrent connectivity matrix share the same hidden state; any normalization, initialization, or gradient flow through the leak parameter could induce statistical dependence between effective complexity and effective connectivity. Without an explicit orthogonality test (e.g., ablation of leak time-constant while holding connectivity matrix statistics fixed, or measurement of mutual information between the two axes), the three-axis sweep does not necessarily trace an orthogonal tradeoff surface, rendering the closed-form model and the conclusion that “larger budgets favor both more and more complex neurons” potentially post-hoc.
- [Closed-form information-theoretic model] The information-theoretic model is presented as capturing the observed tradeoffs. If its free parameters (signal-to-noise saturation threshold, redundancy coefficient) are fitted to the same hyperparameter sweeps they are meant to explain, the derivation reduces to a descriptive fit rather than a predictive, parameter-free account. The manuscript should report whether the model parameters were fixed a priori from information-theoretic considerations or optimized on the same data.
- [Experimental evaluation] The two benchmarks (SHD-Adding and Enwik8) are qualitatively different, yet the scaling-law claim is stated generally. No cross-benchmark consistency check or additional task (e.g., a long-range dependency or continuous-control task) is reported to establish that the non-monotonic optimum and the two saturation regimes are not benchmark-specific.
minor comments (2)
- Notation for k_e and k_c should be defined once at first use and used consistently; the abstract and main text occasionally switch between “effective complexity” and “per-unit complexity” without explicit mapping.
- Figure captions for the scaling-law plots should include the exact ranges of N, k_e, and k_c explored and the total parameter count P for each point.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each of the major comments below and have updated the manuscript accordingly to strengthen the claims regarding parameter independence and model derivation.
read point-by-point responses
-
Referee: [ELM neuron definition and hyperparameter sweep] The central claim that N, k_e, and k_c can be varied independently rests on the ELM neuron definition. The leaky-integrator dynamics and recurrent connectivity matrix share the same hidden state; any normalization, initialization, or gradient flow through the leak parameter could induce statistical dependence between effective complexity and effective connectivity. Without an explicit orthogonality test (e.g., ablation of leak time-constant while holding connectivity matrix statistics fixed, or measurement of mutual information between the two axes), the three-axis sweep does not necessarily trace an orthogonal tradeoff surface, rendering the closed-form model and the conclusion that “larger budgets favor both more and more complex neurons” potentially post-hoc.
Authors: The ELM neuron parameterization explicitly separates the leak time constants, which determine the per-neuron effective complexity k_e through multi-timescale integration, from the recurrent connectivity matrix that sets k_c. By construction, these are controlled by distinct sets of parameters, and the hyperparameter sweeps were designed to vary them independently while keeping the total parameter count P fixed. To directly address the potential for induced dependence, we have added an orthogonality analysis in the revised manuscript, including an ablation where leak parameters are varied while holding connectivity statistics fixed, confirming that the effective axes remain largely orthogonal. This supports the validity of the three-axis sweep and the scaling conclusions. revision: yes
-
Referee: [Closed-form information-theoretic model] The information-theoretic model is presented as capturing the observed tradeoffs. If its free parameters (signal-to-noise saturation threshold, redundancy coefficient) are fitted to the same hyperparameter sweeps they are meant to explain, the derivation reduces to a descriptive fit rather than a predictive, parameter-free account. The manuscript should report whether the model parameters were fixed a priori from information-theoretic considerations or optimized on the same data.
Authors: The parameters in the closed-form model, including the signal-to-noise saturation threshold and redundancy coefficient, were determined a priori based on information-theoretic bounds on neuron capacity and redundancy in recurrent networks, derived from standard results on mutual information in noisy channels and population coding. They were not optimized on the experimental data. We have revised the manuscript to explicitly state the a priori derivation and the specific values used, along with a sensitivity analysis showing robustness. revision: yes
-
Referee: [Experimental evaluation] The two benchmarks (SHD-Adding and Enwik8) are qualitatively different, yet the scaling-law claim is stated generally. No cross-benchmark consistency check or additional task (e.g., a long-range dependency or continuous-control task) is reported to establish that the non-monotonic optimum and the two saturation regimes are not benchmark-specific.
Authors: The SHD-Adding and Enwik8 tasks were selected precisely because they differ substantially in input modality, temporal structure, and task demands—one being a neuromorphic spike-based addition task and the other a character-level language modeling benchmark. The observed scaling laws, including the non-monotonic optimum under fixed P and the two saturation regimes, are consistent across both, as detailed in the results section. While we acknowledge that further validation on additional tasks such as long-range dependency benchmarks would be valuable, the qualitative differences between the current pair provide support for the generality of the framework. We have expanded the discussion section to include a cross-benchmark consistency analysis. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper introduces the ELM neuron design to enable independent variation of N, k_e and k_c, reports monotonic improvements along each axis, identifies a non-monotonic optimum under fixed budget, and presents a closed-form information-theoretic model that attributes diminishing returns to signal-to-noise saturation and redundancy. No equation or description in the abstract or provided text shows that the closed-form model is obtained by fitting parameters to the same hyperparameter sweeps it explains, nor does any step reduce a claimed prediction to its inputs by construction. The scaling-law trace is described as consistent with the framework rather than derived from it tautologically. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- per-unit effective complexity k_e
- per-unit connectivity k_c
axioms (1)
- domain assumption ELM neurons mirror functional components of cortical neurons
invented entities (1)
-
Expressive Leaky Memory (ELM) neuron
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Aaron Spieler, Nasim Rahaman, Georg Martius, Bernhard Schölkopf, and Anna Levina. The expressive leaky memory neuron: an efficient and expressive phenomenological neuron model can solve long- horizon tasks. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=vE1e1mLJ0U
work page 2024
-
[2]
Are dendrites conceptually useful?Neuroscience, 2022
Matthew Larkum. Are dendrites conceptually useful?Neuroscience, 2022
work page 2022
-
[3]
What makes human cortical pyramidal neurons functionally complex.bioRxiv, pages 2024–12, 2024
Ido Aizenbud, Daniela Yoeli, David Beniaguev, Christiaan PJ de Kock, Michael London, and Idan Segev. What makes human cortical pyramidal neurons functionally complex.bioRxiv, pages 2024–12, 2024
work page 2024
-
[4]
Illuminating dendritic function with computational models
Panayiota Poirazi and Athanasia Papoutsi. Illuminating dendritic function with computational models. Nature Reviews Neuroscience, 21(6):303–321, 2020
work page 2020
-
[5]
Springer Singapore, Singapore, 2019
Snehashish Chakraverty, Deepti Moyi Sahoo, and Nisha Rani Mahato.McCulloch–Pitts Neural Network Model, pages 167–173. Springer Singapore, Singapore, 2019. ISBN 978-981-13-7430-2. doi: 10.1007/ 978-981-13-7430-2_11. URLhttps://doi.org/10.1007/978-981-13-7430-2_11
-
[6]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[7]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
xlstm: Extended long short-term memory
Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. xlstm: Extended long short-term memory. 37, 2024
work page 2024
-
[9]
Eugene M Izhikevich. Which model to use for cortical spiking neurons?IEEE transactions on neural networks, 15(5):1063–1070, 2004
work page 2004
-
[10]
Romain Brette and Wulfram Gerstner. Adaptive exponential integrate-and-fire model as an effective description of neuronal activity.Journal of neurophysiology, 94(5):3637–3642, 2005
work page 2005
-
[11]
Cambridge University Press, 2014
Wulfram Gerstner, Werner M Kistler, Richard Naud, and Liam Paninski.Neuronal dynamics: From single neurons to networks and models of cognition. Cambridge University Press, 2014
work page 2014
-
[12]
Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, and Wolfgang Maass. Long short-term memory and learning-to-learn in networks of spiking neurons.Advances in neural information processing systems, 31, 2018
work page 2018
-
[13]
Tanguy Fardet and Anna Levina. Simple models including energy and spike constraints reproduce complex activity patterns and metabolic disruptions.PLoS Computational Biology, 16(12):e1008503, 2020
work page 2020
-
[14]
Pyramidal neuron as two-layer neural network
Panayiota Poirazi, Terrence Brannon, and Bartlett W Mel. Pyramidal neuron as two-layer neural network. Neuron, 37(6):989–999, 2003
work page 2003
-
[15]
Monika P Jadi, Bardia F Behabadi, Alon Poleg-Polsky, Jackie Schiller, and Bartlett W Mel. An augmented two-layer model captures nonlinear analog spatial integration effects in pyramidal neuron dendrites. Proceedings of the IEEE, 102(5):782–798, 2014
work page 2014
-
[16]
Albert Gidon, Timothy Adam Zolnik, Pawel Fidzinski, Felix Bolduan, Athanasia Papoutsi, Panayiota Poirazi, Martin Holtkamp, Imre Vida, and Matthew Evan Larkum. Dendritic action potentials and computation in human layer 2/3 cortical neurons.Science, 367(6473):83–87, 2020
work page 2020
-
[17]
Balázs B Ujfalussy, Judit K Makara, Máté Lengyel, and Tiago Branco. Global and multiplexed dendritic computations under in vivo-like conditions.Neuron, 100(3):579–592, 2018
work page 2018
-
[18]
Dendritic integration: 60 years of progress.Nature neuroscience, 18 (12):1713–1721, 2015
Greg J Stuart and Nelson Spruston. Dendritic integration: 60 years of progress.Nature neuroscience, 18 (12):1713–1721, 2015
work page 2015
-
[19]
Single cortical neurons as deep artificial neural networks.Neuron, 109(17):2727–2739, 2021
David Beniaguev, Idan Segev, and Michael London. Single cortical neurons as deep artificial neural networks.Neuron, 109(17):2727–2739, 2021
work page 2021
-
[20]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. 10
work page 1998
-
[21]
Long short-term memory.Neural computation, 9(8):1735–1780, 1997
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural computation, 9(8):1735–1780, 1997
work page 1997
-
[22]
Efficiently modeling long sequences with structured state spaces
Albert Gu, Karan Goel, and Christopher Ré. Efficiently modeling long sequences with structured state spaces. InInternational Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=uYLFoz1vlAC
work page 2022
-
[23]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020. doi: 10.48550/arXiv.2001.08361
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2001.08361 2001
-
[24]
Rae, Oriol Vinyals, and Laurent Sifre
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...
work page 2022
-
[25]
Benjamin Cramer, Yannik Stradmann, Johannes Schemmel, and Friedemann Zenke. The heidelberg spiking data sets for the systematic evaluation of spiking neural networks.IEEE Transactions on Neural Networks and Learning Systems, 2020
work page 2020
-
[26]
Wolfgang Maass. Networks of spiking neurons: the third generation of neural network models.Neural networks, 10(9):1659–1671, 1997
work page 1997
-
[27]
Herbert Jaeger. Tutorial on training recurrent neural networks, covering bppt, rtrl, ekf and the" echo state network" approach.., 2002
work page 2002
-
[28]
Dean V . Buonomano and Wolfgang Maass. State-dependent computations: spatiotemporal processing in cortical networks.Nature Reviews Neuroscience, 10(2):113–125, 2009. doi: 10.1038/nrn2558
-
[29]
Valerio Mante, David Sussillo, Krishna V . Shenoy, and William T. Newsome. Context-dependent computa- tion by recurrent dynamics in prefrontal cortex.Nature, 503(7474):78–84, 2013. doi: 10.1038/nature12742
-
[30]
Peter Dayan and Laurence F Abbott.Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT press, 2005
work page 2005
-
[31]
Jan Koutník, Klaus Greff, Faustino Gomez, and Jürgen Schmidhuber. A clockwork RNN. InProceedings of the 31st International Conference on Machine Learning, volume 32 ofProceedings of Machine Learning Research, pages 1863–1871. PMLR, 2014. URL https://proceedings.mlr.press/v32/koutnik14. html
work page 2014
-
[32]
Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy P
Adam Santoro, Ryan Faulkner, David Raposo, Jack W. Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy P. Lillicrap. Relational recurrent neural networks. InAdvances in Neural Information Processing Systems, volume 31, pages 7310–7321, 2018. URLhttps://papers.nips.cc/paper/7960-relational-recurrent-neural-networks
work page 2018
-
[33]
Recurrent independent mechanisms
Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. Recurrent independent mechanisms. InInternational Conference on Learning Representations, 2021. URLhttps://openreview.net/forum?id=mLcmdlEUxy-
work page 2021
-
[34]
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V . Le, Geoffrey E. Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum? id=B1ckMDqlg
work page 2017
-
[35]
Smith, Andrew Warrington, and Scott Linderman
Jimmy T.H. Smith, Andrew Warrington, and Scott Linderman. Simplified state space layers for sequence modeling. InThe Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=Ai8Hw3AXqks
work page 2023
-
[36]
Luke Nicholas Darlow, Ciaran Regan, Sebastian Risi, Jeffrey Seely, and Llion Jones. Continuous thought machines. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=y0wDflmpLk
work page 2026
-
[37]
Deep Learning Scaling is Predictable, Empirically
Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. Deep learning scaling is predictable, empirically.arXiv preprint arXiv:1712.00409, 2017. doi: 10.48550/arXiv.1712.00409. 11
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1712.00409 2017
-
[38]
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, and Sam McCandlish. Scaling laws for autoregressive generative modeling.arXiv preprint arXiv:2010....
work page internal anchor Pith review doi:10.48550/arxiv 2010
-
[39]
Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory, 39(3):930–945, 1993. doi: 10.1109/18.256500
-
[40]
George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989. doi: 10.1007/BF02551274
-
[41]
Montúfar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio
Guido F. Montúfar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the number of linear regions of deep neural networks. InAdvances in Neural Information Processing Systems, volume 27, 2014
work page 2014
-
[42]
Benefits of depth in neural networks
Matus Telgarsky. Benefits of depth in neural networks. InProceedings of the 29th Conference on Learning Theory, volume 49 ofProceedings of Machine Learning Research, pages 1517–1539. PMLR, 2016. URL https://proceedings.mlr.press/v49/telgarsky16.html
work page 2016
-
[43]
On the expressive power of deep neural networks
Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, and Jascha Sohl-Dickstein. On the expressive power of deep neural networks. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 2847–2854. PMLR, 2017. URL https: //proceedings.mlr.press/v70/raghu17a.html
work page 2017
-
[44]
Claude E. Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3): 379–423, 1948. doi: 10.1002/j.1538-7305.1948.tb01338.x
-
[45]
Horace B. Barlow. Possible principles underlying the transformations of sensory messages. In Walter A. Rosenblith, editor,Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961
work page 1961
-
[46]
Some informational aspects of visual perception.Psychological Review, 61(3):183–193,
Fred Attneave. Some informational aspects of visual perception.Psychological Review, 61(3):183–193,
-
[47]
doi: 10.1037/h0054663
-
[48]
Simon B. Laughlin. A simple coding procedure enhances a neuron’s information capacity.Zeitschrift für Naturforschung C, 36(9–10):910–912, 1981. doi: 10.1515/znc-1981-9-1040
-
[49]
Bruno A. Olshausen and David J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583):607–609, 1996. doi: 10.1038/381607a0
-
[50]
Naftali Tishby, Fernando C. Pereira, and William Bialek. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing, pages 368–377, 1999
work page 1999
-
[51]
Larry F. Abbott and Peter Dayan. The effect of correlated variability on the accuracy of a population code. Neural Computation, 11(1):91–101, 1999. doi: 10.1162/089976699300016827
-
[52]
Bruno B. Averbeck, Peter E. Latham, and Alexandre Pouget. Neural correlations, population coding and computation.Nature Reviews Neuroscience, 7(5):358–366, 2006. doi: 10.1038/nrn1888
-
[53]
Information-limiting correlations.Nature Neuroscience, 17(10):1410–1417, 2014
Rubén Moreno-Bote, Jeffrey Beck, Ingmar Kanitscheider, Xaq Pitkow, Peter Latham, and Alexandre Pouget. Information-limiting correlations.Nature Neuroscience, 17(10):1410–1417, 2014. doi: 10.1038/nn.3807
-
[54]
Friedemann Zenke and Surya Ganguli. Superspike: Supervised learning in multilayer spiking neural networks.Neural computation, 30(6):1514–1541, 2018
work page 2018
-
[55]
Large text compression benchmark, 2011
Matt Mahoney. Large text compression benchmark, 2011. URL http://www.mattmahoney.net/dc/ text.html
work page 2011
-
[56]
Transformer-xl: Attentive language models beyond a fixed-length context
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G Carbonell, Quoc Le, and Ruslan Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context. InProceedings of the 57th annual meeting of the association for computational linguistics, pages 2978–2988, 2019
work page 2019
-
[57]
Julian Rossbroich, Julia Gygax, and Friedemann Zenke. Fluctuation-driven initialization for spiking neural network training.Neuromorphic Computing and Engineering, 2(4):044016, 2022
work page 2022
-
[58]
Alexandre Bittar and Philip N Garner. A surrogate gradient spiking baseline for speech command recognition.Frontiers in Neuroscience, 16:865897, 2022. 12
work page 2022
-
[59]
Rae, Anna Potapenko, Siddhant M
Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Chloe Hillier, and Timothy P. Lillicrap. Com- pressive transformers for long-range sequence modelling. InInternational Conference on Learning Representations, 2020. URLhttps://openreview.net/forum?id=SylKikSYDH
work page 2020
-
[60]
Etay Hay, Sean Hill, Felix Schürmann, Henry Markram, and Idan Segev. Models of neocortical layer 5b pyramidal cells capturing a wide range of dendritic and perisomatic active properties.PLoS computational biology, 7(7):e1002107, 2011. 13 A Architecture, Training, Dataset and Analysis Details The accompanying code repository for experimental reproducibilit...
work page 2011
-
[61]
and NeuronIO [ 19], we use the dataloaders provided by Spieler et al. at https://github. com/AaronSpieler/elmneuron, released under the MIT License; the SHD-Adding dataloader in- gests SHD data. Enwik8 [ 54] is available from Matt Mahoney at http://mattmahoney.net/ dc/enwik8.zip; it consists of the first 108 bytes of the March 3, 2006 English Wikipedia du...
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.