TIDE: Asymmetric Neural Circuits for Stabilized Temporal Inhibitory-Excitatory Dynamics

Alexander Kyuroson; Denis Kleyko; Marcus Liwicki

arxiv: 2605.19403 · v1 · pith:EHCBDALEnew · submitted 2026-05-19 · 💻 cs.LG

TIDE: Asymmetric Neural Circuits for Stabilized Temporal Inhibitory-Excitatory Dynamics

Alexander Kyuroson , Denis Kleyko , Marcus Liwicki This is my paper

Pith reviewed 2026-05-20 07:42 UTC · model grok-4.3

classification 💻 cs.LG

keywords excitatory-inhibitory networksneural dynamicsWilson-Cowan modelimage classificationnetwork stabilityDale's principleasymmetric circuitstemporal inhibition

0 comments

The pith

TIDE shows that asymmetric excitatory-inhibitory networks stabilize neural dynamics while cutting training time and raising accuracy on perturbed ImageNet tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TIDE to model internal neural dynamics with asymmetric excitatory-inhibitory networks that incorporate Wilson-Cowan dynamics and lateral inhibition. This setup is expressed as energy-based systems optimized through a game-theoretic loss and enforces Dale's principle to maintain an 80:20 E-I ratio for biological realism. The goal is to add stability guarantees that earlier continuous thought machine models lacked, along with proofs of convergence and complexity bounds. If the approach holds, it would let neuro-inspired architectures achieve both theoretical stability and practical gains in efficiency and robustness under input perturbations.

Core claim

TIDE is a neuro-inspired architecture that computes internal representations through neural dynamics stabilized by asymmetric Excitatory-Inhibitory networks, Wilson-Cowan dynamics, and lateral inhibition. It balances biological realism by using Hierarchical Receptive Fields and enforcing Dale's principle to ensure a realistic 80:20 E-I balance ratio within an end-to-end trainable architecture. The paper presents proofs of convergence, stability, and complexity bounds, and reports that TIDE surpasses CTM with under 50% of the training time while improving top-1 accuracy by an average of +1.65% on ImageNet under various perturbations.

What carries the argument

Asymmetric excitatory-inhibitory networks that embed Wilson-Cowan dynamics and lateral inhibition, formulated as energy-based systems optimized via game-theoretic loss and constrained by Dale's principle to enforce 80:20 E-I balance.

If this is right

TIDE supplies provable convergence and stability for the modeled neural dynamics.
The architecture maintains a biologically realistic 80:20 E-I ratio through Dale's principle.
Training requires under 50% of the time needed by the Continuous Thought Machine.
Top-1 accuracy rises by an average of +1.65% on ImageNet under perturbations.
Complexity bounds are established for the stabilized dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The stability mechanism could transfer to other recurrent architectures that currently lack convergence guarantees.
Enforcing Dale's principle may make internal representations more interpretable by aligning them with known biological constraints.
The game-theoretic loss offers a template for designing new objectives that directly penalize unstable dynamics in energy-based models.

Load-bearing premise

Embedding Wilson-Cowan dynamics plus lateral inhibition into asymmetric E-I networks with enforced Dale's principle will produce both provable stability and the reported empirical gains without additional post-hoc tuning.

What would settle it

Run an ablation that removes the lateral inhibition term, then check whether the claimed convergence and stability proofs still hold and whether the +1.65% accuracy lift on perturbed ImageNet vanishes.

Figures

Figures reproduced from arXiv: 2605.19403 by Alexander Kyuroson, Denis Kleyko, Marcus Liwicki.

**Figure 2.** Figure 2: Temporal evolution of mean attention as saliency per computation step for TIDE and CTM. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Robustness analysis using ImageNet-C [48] for corrupted images with perturbations. Left panel presents results for TIDE, center panel corresponds to CTM, right panel reports differences. Ablations studies: MNIST and Fashion-MNIST are used to analyze the effects of various hyperparameter choices on learning outcomes and the stability of TIDE. All models are trained for 50 K steps, with an identical simple … view at source ↗

read the original abstract

Recent Continuous Thought Machine architecture decouples internal computation from external inputs via neural dynamics, but relies on multi-layer perceptrons without stability guarantees. We propose to model neural dynamics using asymmetric Excitatory-Inhibitory (E-I) networks, which can be stabilized via principles from network theory and can be expressed as energy-based systems optimized through a game-theoretic loss. Building on this perspective, we introduce Temporal Inhibitory-Excitatory Dynamic Engine (TIDE), a neuro-inspired architecture that computes internal representations through neural dynamics stabilized by incorporating the Wilson-Cowan dynamics and lateral inhibition. TIDE balances biological realism by, for instance, using Hierarchical Receptive Fields and enforcing Dale's principle to ensure a realistic $80:20$ E-I balance ratio with an end-to-end trainable architecture. The aim of this paper is to introduce a new architecture that brings neuro-inspired learning to the forefront. We present proofs of convergence, stability, and complexity bounds, along with empirical ablation studies. Overall, TIDE surpasses CTM with under $50\%$ of the training time and improves $\texttt{top-1}$ accuracy by an average of $+1.65\%$ on ImageNet under various perturbations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TIDE adds E-I balance, Wilson-Cowan dynamics and lateral inhibition to CTM-style models with claimed stability and efficiency gains, but the proofs and discrete-to-continuous bridge need checking.

read the letter

Colleague, the main point is that this paper introduces TIDE to stabilize internal dynamics in continuous thought machine models by building asymmetric excitatory-inhibitory circuits, enforcing an 80:20 E-I ratio under Dale's principle, and folding in Wilson-Cowan equations plus lateral inhibition. It reports faster training and modest accuracy lifts on perturbed ImageNet compared with plain CTM baselines. The specific combination of those biological constraints inside an end-to-end trainable architecture with hierarchical receptive fields is the clearest new element. Prior CTM work did not include this explicit stabilization package. The authors also try to ground the dynamics in network-theory stability and a game-theoretic loss, which is a reasonable direction for making recurrent-style computation more reliable. That part earns some credit for trying to move beyond pure MLP dynamics. The soft spots sit mainly in the evidence presented. The abstract states that proofs of convergence, stability, and complexity exist, yet no derivation steps or error bounds appear in the summary we have, so it is difficult to judge how tight the arguments are. The empirical claims of roughly 1.65 percent top-1 improvement and under 50 percent training time also lack full baseline tables, perturbation definitions, or significance tests, which leaves room for doubt about how much comes from the architecture versus hyperparameter choices. The stress-test concern about continuous-time assumptions surviving discretization and gradient training is fair and worth pressing; without seeing the exact implementation it is unclear whether the reported gains rest on the theory or on implicit tuning. This work is aimed at researchers who build neuro-inspired dynamic models for vision or sequence tasks and want more built-in stability constraints. A reader already working on continuous or recurrent architectures could extract useful design patterns even if the full proofs need expansion. The mix of formal claims and concrete experiments is enough to justify sending it to a serious referee rather than desk rejection. I would recommend peer review, with the main requests being to show the stability derivations explicitly and to tighten the empirical comparisons.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Temporal Inhibitory-Excitatory Dynamic Engine (TIDE), an architecture that incorporates asymmetric excitatory-inhibitory (E-I) networks, Wilson-Cowan dynamics, and lateral inhibition to stabilize internal representations in continuous-time neural computation. Building on the Continuous Thought Machine (CTM), TIDE enforces Dale's principle with an 80:20 E-I ratio, claims to provide proofs of convergence, stability, and complexity bounds via network theory and a game-theoretic loss, and reports empirical results showing an average +1.65% top-1 accuracy improvement and under 50% training time versus CTM on perturbed ImageNet.

Significance. If the stability guarantees transfer from continuous Wilson-Cowan dynamics to the discrete, trained implementation and the reported efficiency/accuracy gains prove robust, the work could meaningfully advance neuro-inspired architectures that prioritize biological constraints like E-I balance for more stable and efficient dynamic neural networks.

major comments (2)

[Abstract and §3] Abstract and §3 (Theoretical Analysis): The manuscript asserts proofs of convergence, stability, and complexity bounds based on continuous-time Wilson-Cowan dynamics and network-theory principles, yet provides no derivation steps, discretization analysis, or verification that the end-to-end trained discrete implementation preserves these properties. This is load-bearing for the central claim that TIDE achieves provable stability without post-hoc tuning.
[§5] §5 (Experiments): The key empirical claims (+1.65% top-1 accuracy and <50% training time on ImageNet under perturbations) are stated without baseline implementation details for CTM, explicit perturbation definitions, or statistical significance measures (e.g., standard error over runs). This directly affects verifiability of the practical gains that rest on the chosen E-I ratio and Wilson-Cowan parameters.

minor comments (2)

[§2] The 80:20 E-I balance ratio is referenced as an example of biological realism but should be explicitly tied to the loss function and architecture equations in the main text for clarity.
[Figures] Figure captions and architecture diagrams would benefit from explicit annotation of lateral inhibition pathways and how they interact with the asymmetric E-I connections during training.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment point by point below and indicate the revisions we will incorporate to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Theoretical Analysis): The manuscript asserts proofs of convergence, stability, and complexity bounds based on continuous-time Wilson-Cowan dynamics and network-theory principles, yet provides no derivation steps, discretization analysis, or verification that the end-to-end trained discrete implementation preserves these properties. This is load-bearing for the central claim that TIDE achieves provable stability without post-hoc tuning.

Authors: We appreciate the referee's emphasis on this foundational aspect. Section 3 presents the theoretical analysis based on Wilson-Cowan dynamics, network theory, and a game-theoretic loss, including outlines of the convergence and stability arguments. However, we acknowledge that the current presentation would be strengthened by including more explicit derivation steps, a dedicated discretization analysis, and verification that the stability properties carry over to the discrete trained model. We will revise §3 accordingly in the next version of the manuscript. revision: yes
Referee: [§5] §5 (Experiments): The key empirical claims (+1.65% top-1 accuracy and <50% training time on ImageNet under perturbations) are stated without baseline implementation details for CTM, explicit perturbation definitions, or statistical significance measures (e.g., standard error over runs). This directly affects verifiability of the practical gains that rest on the chosen E-I ratio and Wilson-Cowan parameters.

Authors: We agree that these details are important for reproducibility and assessment of the results. In the revised manuscript we will add full implementation details for the CTM baseline, explicit definitions of the perturbations applied to ImageNet, and statistical significance measures including standard errors computed over multiple runs. These changes will improve the verifiability of the reported accuracy and training-time improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained with independent theoretical and empirical content.

full rationale

The paper defines TIDE by incorporating Wilson-Cowan dynamics and lateral inhibition into asymmetric E-I networks with Dale's principle, then separately presents proofs of convergence/stability/complexity and reports empirical results on ImageNet. No quoted step shows a prediction or first-principles result reducing by construction to a fitted hyperparameter, self-citation chain, or renamed input. The stability claims rest on network-theory principles and game-theoretic loss applied to the defined architecture rather than tautological re-expression of the inputs. Empirical gains (+1.65% accuracy, <50% training time) are presented as measured outcomes distinct from the model definition. This is the expected non-finding for a paper whose central claims retain independent content from its assumptions and experiments.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The architecture rests on biological modeling choices treated as fixed rather than derived; the 80:20 ratio and Wilson-Cowan equations are imported without new justification inside the paper.

free parameters (1)

80:20 E-I balance ratio
Explicitly enforced to match biological observation; treated as a hard constraint rather than learned.

axioms (2)

domain assumption Wilson-Cowan dynamics stabilize asymmetric E-I networks when combined with lateral inhibition
Invoked to guarantee convergence and stability of the temporal dynamics.
domain assumption Dale's principle holds and produces realistic 80:20 E-I ratio
Used to constrain neuron types and connection signs throughout the network.

invented entities (1)

TIDE architecture no independent evidence
purpose: Computes internal representations via stabilized neural dynamics
New proposed model that integrates the listed biological constraints.

pith-pipeline@v0.9.0 · 5747 in / 1518 out tokens · 47780 ms · 2026-05-20T07:42:43.169875+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose to model neural dynamics using asymmetric Excitatory-Inhibitory (E-I) networks, which can be stabilized via principles from network theory and can be expressed as energy-based systems optimized through a game-theoretic loss. ... incorporating the Wilson-Cowan dynamics and lateral inhibition.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 3 internal anchors

[1]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998
[2]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks.Communications of the ACM, 60(6):84–90, 2017

work page 2017
[3]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016
[4]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, pages 5998–6008, 2017

work page 2017
[5]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021

work page 2021
[6]

Complex-valued neural networks: A comprehensive survey.IEEE/CAA Journal of Automatica Sinica, 9(8):1406–1426, 2022

ChiYan Lee, Hideyuki Hasegawa, and Shangce Gao. Complex-valued neural networks: A comprehensive survey.IEEE/CAA Journal of Automatica Sinica, 9(8):1406–1426, 2022

work page 2022
[7]

Lillicrap, Daniel Cownden, Douglas B

Timothy P. Lillicrap, Daniel Cownden, Douglas B. Tweed, and Colin J. Akerman. Random synaptic feedback weights support error backpropagation for deep learning.Nature Communications, 7:13276, 2016

work page 2016
[8]

Equilibrium propagation: Bridging the gap between energy-based models and backpropagation.Frontiers in Computational Neuroscience, 11:24, 2017

Benjamin Scellier and Yoshua Bengio. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation.Frontiers in Computational Neuroscience, 11:24, 2017

work page 2017
[9]

Richards, and Richard Naud

Alexandre Payeur, Jordan Guerguiev, Friedemann Zenke, Blake A. Richards, and Richard Naud. Burst- dependent synaptic plasticity can coordinate learning in hierarchical circuits.Nature Neuroscience, 24:1010–1019, 2021

work page 2021
[10]

Anthony M. Zador. A critique of pure learning and what artificial neural networks can learn from animal brains.Nature Communications, 10(1):3770, 2019

work page 2019
[11]

Isaacson and Massimo Scanziani

Jeffry S. Isaacson and Massimo Scanziani. How inhibition shapes cortical activity.Neuron, 72(2):231–243, 2011

work page 2011
[12]

The computational and learning benefits of Daleian neural networks

Adam Haber and Elad Schneidman. The computational and learning benefits of Daleian neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, 2022

work page 2022
[13]

Kullmann, and Blake Richards

Jonathan Cornford, Damjan Kalajdzievski, Marco Leite, Amélie Lamarquette, Dimitri M. Kullmann, and Blake Richards. Learning to live with Dale’s principle: ANNs with separate excitatory and inhibitory units. InInternational Conference on Learning Representations (ICLR), pages 1–27, 2021

work page 2021
[14]

Hasenstaub, and David A

Bilal Haider, Alvaro Duque, Andrea R. Hasenstaub, and David A. McCormick. Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition.The Journal of Neuroscience, 26(17):4535–4545, 2006

work page 2006
[15]

Continuous thought machines

Luke Darlow, Ciaran Regan, Sebastian Risi, Jeffrey Seely, and Llion Jones. Continuous thought machines. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[16]

Pharmacology and nerve-endings.Proceedings of the Royal Society of Medicine, 28(3):319– 332, 1935

Henry Dale. Pharmacology and nerve-endings.Proceedings of the Royal Society of Medicine, 28(3):319– 332, 1935

work page 1935
[17]

Eccles, Paul Fatt, and Kyozo Koketsu

John C. Eccles, Paul Fatt, and Kyozo Koketsu. Cholinergic and inhibitory synapses in a pathway from motor-axon collaterals to motoneurones.The Journal of Physiology, 126(3):524–562, 1954

work page 1954
[18]

Competition, stability, and functionality in excitatory-inhibitory neural circuits.arXiv:2512.05252, 2025

Simone Betteti, William Retnaraj, Alexander Davydov, Jorge Cortés, and Francesco Bullo. Competition, stability, and functionality in excitatory-inhibitory neural circuits.arXiv:2512.05252, 2025

work page arXiv 2025
[19]

Can you learn an algorithm? Generalizing from easy to hard problems with recurrent networks

Avi Schwarzschild, Eitan Borgnia, Arjun Gupta, Furong Huang, Uzi Vishkin, Micah Goldblum, and Tom Goldstein. Can you learn an algorithm? Generalizing from easy to hard problems with recurrent networks. InAdvances in Neural Information Processing Systems (NeurIPS), pages 6695–6706, 2021

work page 2021
[20]

Zico Kolter, and Vladlen Koltun

Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. Deep equilibrium models. InAdvances in Neural Information Processing Systems (NeurIPS), pages 688–699, 2019. 10

work page 2019
[21]

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018
[22]

Adaptive Computation Time for Recurrent Neural Networks

Alex Graves. Adaptive computation time for recurrent neural networks.arXiv preprint arXiv:1603.08983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[23]

PonderNet: Learning to ponder

Andrea Banino, Jan Balaguer, and Charles Blundell. PonderNet: Learning to ponder. InICML Workshop on Automated Machine Learning, 2021

work page 2021
[24]

Wilson and Jack D

Hugh R. Wilson and Jack D. Cowan. Excitatory and inhibitory interactions in localized populations of model neurons.Biophysical Journal, 12(1):1–24, 1972

work page 1972
[25]

Wilson and Jack D

Hugh R. Wilson and Jack D. Cowan. A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue.Kybernetik, 13(2):55–80, 1973

work page 1973
[26]

Chaos in neuronal networks with balanced excitatory and inhibitory activity.Science, 274(5293):1724–1726, 1996

Carl van Vreeswijk and Haim Sompolinsky. Chaos in neuronal networks with balanced excitatory and inhibitory activity.Science, 274(5293):1724–1726, 1996

work page 1996
[27]

Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons

Nicolas Brunel. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. Journal of Computational Neuroscience, 8(3):183–208, 2000

work page 2000
[28]

V ogels, Henning Sprekeler, Friedemann Zenke, Claudia Clopath, and Wulfram Gerstner

Tim P. V ogels, Henning Sprekeler, Friedemann Zenke, Claudia Clopath, and Wulfram Gerstner. In- hibitory plasticity balances excitation and inhibition in sensory pathways and memory networks.Science, 334(6062):1569–1573, 2011

work page 2011
[29]

Turrigiano

Gina G. Turrigiano. The self-tuning neuron: Synaptic scaling of excitatory synapses.Cell, 135(3):422–435, 2008

work page 2008
[30]

Harris and Thomas D

Kenneth D. Harris and Thomas D. Mrsic-Flogel. Cortical connectivity and sensory coding.Nature, 503(7474):51–58, 2013

work page 2013
[31]

J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982

work page 1982
[32]

Atlas Kazemian, Eric Elmoznino, and Michael F. Bonner. Convolutional architectures are cortex-aligned de novo.Nature Machine Intelligence, 7:1834–1844, 2025

work page 2025
[33]

Hierarchical models of object recognition in cortex.Nature Neuroscience, 2(11):1019–1025, 1999

Maximilian Riesenhuber and Tomaso Poggio. Hierarchical models of object recognition in cortex.Nature Neuroscience, 2(11):1019–1025, 1999

work page 1999
[34]

Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014

work page 2014
[35]

Martin Schrimpf, Jonas Kubilius, Michael J. Lee, N. Apurva Ratan Murty, Robert Ajemian, and James J. DiCarlo. Integrative benchmarking to advance neurally mechanistic models of human intelligence.Neuron, 108(3):413–423, 2020

work page 2020
[36]

Titans: Learning to memorize at test time

Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. InAdvances in Neural Information Processing Systems (NeurIPS), pages 1–38, 2025

work page 2025
[37]

It’s all connected: A journey through test-time memorization, attentional bias, retention, and online optimization.arXiv:2504.13173, 2025

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, and Vahab Mirrokni. It’s all connected: A journey through test-time memorization, attentional bias, retention, and online optimization.arXiv:2504.13173, 2025

work page arXiv 2025
[38]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

xLSTM: Extended long short-term memory

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. xLSTM: Extended long short-term memory. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[40]

Root mean square layer normalization

Biao Zhang and Rico Sennrich. Root mean square layer normalization. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019
[41]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations (ICLR), 2019

work page 2019
[42]

SGDR: Stochastic gradient descent with warm restarts

Ilya Loshchilov and Frank Hutter. SGDR: Stochastic gradient descent with warm restarts. InInternational Conference on Learning Representations (ICLR), 2017. 11

work page 2017
[43]

Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed precision training. InInternational Conference on Learning Representations (ICLR), 2018

work page 2018
[44]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet large scale visual recognition challenge.International Journal of Computer Vision, 115(3):211–252, 2015

work page 2015
[45]

Theory of edge detection.Proceedings of the Royal Society of London

David Marr and Ellen Hildreth. Theory of edge detection.Proceedings of the Royal Society of London. Series B, Biological Sciences, 207(1167):187–217, 1980

work page 1980
[46]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms.arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[47]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009

work page 2009
[48]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InInternational Conference on Learning Representations (ICLR), 2019

work page 2019
[49]

Tiny ImageNet visual recognition challenge

Ya Le and Xuan Yang. Tiny ImageNet visual recognition challenge. Technical report, Stanford University, 2015

work page 2015
[50]

The many faces of robustness: A critical analysis of out-of- distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer. The many faces of robustness: A critical analysis of out-of- distribution generalization. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 8340–8349, 2021

work page 2021
[51]

Springer Monographs in Mathematics

Andrzej Granas and James Dugundji.Fixed Point Theory. Springer Monographs in Mathematics. Springer, 2003

work page 2003
[52]

Chaotic balanced state in a model of cortical circuits.Neural Computation, 10(6):1321–1371, 1998

Carl van Vreeswijk and Haim Sompolinsky. Chaotic balanced state in a model of cortical circuits.Neural Computation, 10(6):1321–1371, 1998

work page 1998
[53]

Yashar Ahmadian and Kenneth D. Miller. What is the dynamical regime of cerebral cortex?Neuron, 109(21):3373–3391, 2021

work page 2021
[54]

Chiu, Alexander Rush, and V olodymyr Kuleshov

Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T. Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models. In Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[55]

Khalil.Nonlinear Systems

Hassan K. Khalil.Nonlinear Systems. Prentice Hall, Upper Saddle River, NJ, 3rd edition, 2002

work page 2002
[56]

Horn and Charles R

Roger A. Horn and Charles R. Johnson.Matrix Analysis. Cambridge University Press, New York, NY , USA, 2nd edition, 2013

work page 2013
[57]

Fast global oscillations in networks of integrate-and-fire neurons with low firing rates.Neural Computation, 11(7):1621–1671, 1999

Nicolas Brunel and Vincent Hakim. Fast global oscillations in networks of integrate-and-fire neurons with low firing rates.Neural Computation, 11(7):1621–1671, 1999

work page 1999
[58]

Stephen W. Kuffler. Discharge patterns and functional organization of mammalian retina.Journal of Neurophysiology, 16(1):37–68, 1953

work page 1953
[59]

Dacey, Beth B

Dennis M. Dacey, Beth B. Peterson, Farrel R. Robinson, and Paul D. Gamlin. Fireworks in the primate retina: In vitro photodynamics reveals diverse LGN-projecting ganglion cell types.Neuron, 37(1):15–27, 2003

work page 2003
[60]

Hubel and Torsten N

David H. Hubel and Torsten N. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex.The Journal of Physiology, 160(1):106–154, 1962

work page 1962
[61]

BNM,MHN->BNH

Tony Lindeberg.Scale-Space Theory in Computer Vision. The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Boston, MA, 1994. 12 NeurIPS Paper Checklist 1.Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: ...

work page 1994

[1] [1]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998

[2] [2]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks.Communications of the ACM, 60(6):84–90, 2017

work page 2017

[3] [3]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016

[4] [4]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, pages 5998–6008, 2017

work page 2017

[5] [5]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021

work page 2021

[6] [6]

Complex-valued neural networks: A comprehensive survey.IEEE/CAA Journal of Automatica Sinica, 9(8):1406–1426, 2022

ChiYan Lee, Hideyuki Hasegawa, and Shangce Gao. Complex-valued neural networks: A comprehensive survey.IEEE/CAA Journal of Automatica Sinica, 9(8):1406–1426, 2022

work page 2022

[7] [7]

Lillicrap, Daniel Cownden, Douglas B

Timothy P. Lillicrap, Daniel Cownden, Douglas B. Tweed, and Colin J. Akerman. Random synaptic feedback weights support error backpropagation for deep learning.Nature Communications, 7:13276, 2016

work page 2016

[8] [8]

Equilibrium propagation: Bridging the gap between energy-based models and backpropagation.Frontiers in Computational Neuroscience, 11:24, 2017

Benjamin Scellier and Yoshua Bengio. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation.Frontiers in Computational Neuroscience, 11:24, 2017

work page 2017

[9] [9]

Richards, and Richard Naud

Alexandre Payeur, Jordan Guerguiev, Friedemann Zenke, Blake A. Richards, and Richard Naud. Burst- dependent synaptic plasticity can coordinate learning in hierarchical circuits.Nature Neuroscience, 24:1010–1019, 2021

work page 2021

[10] [10]

Anthony M. Zador. A critique of pure learning and what artificial neural networks can learn from animal brains.Nature Communications, 10(1):3770, 2019

work page 2019

[11] [11]

Isaacson and Massimo Scanziani

Jeffry S. Isaacson and Massimo Scanziani. How inhibition shapes cortical activity.Neuron, 72(2):231–243, 2011

work page 2011

[12] [12]

The computational and learning benefits of Daleian neural networks

Adam Haber and Elad Schneidman. The computational and learning benefits of Daleian neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, 2022

work page 2022

[13] [13]

Kullmann, and Blake Richards

Jonathan Cornford, Damjan Kalajdzievski, Marco Leite, Amélie Lamarquette, Dimitri M. Kullmann, and Blake Richards. Learning to live with Dale’s principle: ANNs with separate excitatory and inhibitory units. InInternational Conference on Learning Representations (ICLR), pages 1–27, 2021

work page 2021

[14] [14]

Hasenstaub, and David A

Bilal Haider, Alvaro Duque, Andrea R. Hasenstaub, and David A. McCormick. Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition.The Journal of Neuroscience, 26(17):4535–4545, 2006

work page 2006

[15] [15]

Continuous thought machines

Luke Darlow, Ciaran Regan, Sebastian Risi, Jeffrey Seely, and Llion Jones. Continuous thought machines. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[16] [16]

Pharmacology and nerve-endings.Proceedings of the Royal Society of Medicine, 28(3):319– 332, 1935

Henry Dale. Pharmacology and nerve-endings.Proceedings of the Royal Society of Medicine, 28(3):319– 332, 1935

work page 1935

[17] [17]

Eccles, Paul Fatt, and Kyozo Koketsu

John C. Eccles, Paul Fatt, and Kyozo Koketsu. Cholinergic and inhibitory synapses in a pathway from motor-axon collaterals to motoneurones.The Journal of Physiology, 126(3):524–562, 1954

work page 1954

[18] [18]

Competition, stability, and functionality in excitatory-inhibitory neural circuits.arXiv:2512.05252, 2025

Simone Betteti, William Retnaraj, Alexander Davydov, Jorge Cortés, and Francesco Bullo. Competition, stability, and functionality in excitatory-inhibitory neural circuits.arXiv:2512.05252, 2025

work page arXiv 2025

[19] [19]

Can you learn an algorithm? Generalizing from easy to hard problems with recurrent networks

Avi Schwarzschild, Eitan Borgnia, Arjun Gupta, Furong Huang, Uzi Vishkin, Micah Goldblum, and Tom Goldstein. Can you learn an algorithm? Generalizing from easy to hard problems with recurrent networks. InAdvances in Neural Information Processing Systems (NeurIPS), pages 6695–6706, 2021

work page 2021

[20] [20]

Zico Kolter, and Vladlen Koltun

Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. Deep equilibrium models. InAdvances in Neural Information Processing Systems (NeurIPS), pages 688–699, 2019. 10

work page 2019

[21] [21]

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018

[22] [22]

Adaptive Computation Time for Recurrent Neural Networks

Alex Graves. Adaptive computation time for recurrent neural networks.arXiv preprint arXiv:1603.08983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[23] [23]

PonderNet: Learning to ponder

Andrea Banino, Jan Balaguer, and Charles Blundell. PonderNet: Learning to ponder. InICML Workshop on Automated Machine Learning, 2021

work page 2021

[24] [24]

Wilson and Jack D

Hugh R. Wilson and Jack D. Cowan. Excitatory and inhibitory interactions in localized populations of model neurons.Biophysical Journal, 12(1):1–24, 1972

work page 1972

[25] [25]

Wilson and Jack D

Hugh R. Wilson and Jack D. Cowan. A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue.Kybernetik, 13(2):55–80, 1973

work page 1973

[26] [26]

Chaos in neuronal networks with balanced excitatory and inhibitory activity.Science, 274(5293):1724–1726, 1996

Carl van Vreeswijk and Haim Sompolinsky. Chaos in neuronal networks with balanced excitatory and inhibitory activity.Science, 274(5293):1724–1726, 1996

work page 1996

[27] [27]

Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons

Nicolas Brunel. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. Journal of Computational Neuroscience, 8(3):183–208, 2000

work page 2000

[28] [28]

V ogels, Henning Sprekeler, Friedemann Zenke, Claudia Clopath, and Wulfram Gerstner

Tim P. V ogels, Henning Sprekeler, Friedemann Zenke, Claudia Clopath, and Wulfram Gerstner. In- hibitory plasticity balances excitation and inhibition in sensory pathways and memory networks.Science, 334(6062):1569–1573, 2011

work page 2011

[29] [29]

Turrigiano

Gina G. Turrigiano. The self-tuning neuron: Synaptic scaling of excitatory synapses.Cell, 135(3):422–435, 2008

work page 2008

[30] [30]

Harris and Thomas D

Kenneth D. Harris and Thomas D. Mrsic-Flogel. Cortical connectivity and sensory coding.Nature, 503(7474):51–58, 2013

work page 2013

[31] [31]

J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982

work page 1982

[32] [32]

Atlas Kazemian, Eric Elmoznino, and Michael F. Bonner. Convolutional architectures are cortex-aligned de novo.Nature Machine Intelligence, 7:1834–1844, 2025

work page 2025

[33] [33]

Hierarchical models of object recognition in cortex.Nature Neuroscience, 2(11):1019–1025, 1999

Maximilian Riesenhuber and Tomaso Poggio. Hierarchical models of object recognition in cortex.Nature Neuroscience, 2(11):1019–1025, 1999

work page 1999

[34] [34]

Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014

work page 2014

[35] [35]

Martin Schrimpf, Jonas Kubilius, Michael J. Lee, N. Apurva Ratan Murty, Robert Ajemian, and James J. DiCarlo. Integrative benchmarking to advance neurally mechanistic models of human intelligence.Neuron, 108(3):413–423, 2020

work page 2020

[36] [36]

Titans: Learning to memorize at test time

Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. InAdvances in Neural Information Processing Systems (NeurIPS), pages 1–38, 2025

work page 2025

[37] [37]

It’s all connected: A journey through test-time memorization, attentional bias, retention, and online optimization.arXiv:2504.13173, 2025

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, and Vahab Mirrokni. It’s all connected: A journey through test-time memorization, attentional bias, retention, and online optimization.arXiv:2504.13173, 2025

work page arXiv 2025

[38] [38]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[39] [39]

xLSTM: Extended long short-term memory

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. xLSTM: Extended long short-term memory. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[40] [40]

Root mean square layer normalization

Biao Zhang and Rico Sennrich. Root mean square layer normalization. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019

[41] [41]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations (ICLR), 2019

work page 2019

[42] [42]

SGDR: Stochastic gradient descent with warm restarts

Ilya Loshchilov and Frank Hutter. SGDR: Stochastic gradient descent with warm restarts. InInternational Conference on Learning Representations (ICLR), 2017. 11

work page 2017

[43] [43]

Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed precision training. InInternational Conference on Learning Representations (ICLR), 2018

work page 2018

[44] [44]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet large scale visual recognition challenge.International Journal of Computer Vision, 115(3):211–252, 2015

work page 2015

[45] [45]

Theory of edge detection.Proceedings of the Royal Society of London

David Marr and Ellen Hildreth. Theory of edge detection.Proceedings of the Royal Society of London. Series B, Biological Sciences, 207(1167):187–217, 1980

work page 1980

[46] [46]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms.arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[47] [47]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009

work page 2009

[48] [48]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InInternational Conference on Learning Representations (ICLR), 2019

work page 2019

[49] [49]

Tiny ImageNet visual recognition challenge

Ya Le and Xuan Yang. Tiny ImageNet visual recognition challenge. Technical report, Stanford University, 2015

work page 2015

[50] [50]

The many faces of robustness: A critical analysis of out-of- distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer. The many faces of robustness: A critical analysis of out-of- distribution generalization. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 8340–8349, 2021

work page 2021

[51] [51]

Springer Monographs in Mathematics

Andrzej Granas and James Dugundji.Fixed Point Theory. Springer Monographs in Mathematics. Springer, 2003

work page 2003

[52] [52]

Chaotic balanced state in a model of cortical circuits.Neural Computation, 10(6):1321–1371, 1998

Carl van Vreeswijk and Haim Sompolinsky. Chaotic balanced state in a model of cortical circuits.Neural Computation, 10(6):1321–1371, 1998

work page 1998

[53] [53]

Yashar Ahmadian and Kenneth D. Miller. What is the dynamical regime of cerebral cortex?Neuron, 109(21):3373–3391, 2021

work page 2021

[54] [54]

Chiu, Alexander Rush, and V olodymyr Kuleshov

Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T. Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models. In Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[55] [55]

Khalil.Nonlinear Systems

Hassan K. Khalil.Nonlinear Systems. Prentice Hall, Upper Saddle River, NJ, 3rd edition, 2002

work page 2002

[56] [56]

Horn and Charles R

Roger A. Horn and Charles R. Johnson.Matrix Analysis. Cambridge University Press, New York, NY , USA, 2nd edition, 2013

work page 2013

[57] [57]

Fast global oscillations in networks of integrate-and-fire neurons with low firing rates.Neural Computation, 11(7):1621–1671, 1999

Nicolas Brunel and Vincent Hakim. Fast global oscillations in networks of integrate-and-fire neurons with low firing rates.Neural Computation, 11(7):1621–1671, 1999

work page 1999

[58] [58]

Stephen W. Kuffler. Discharge patterns and functional organization of mammalian retina.Journal of Neurophysiology, 16(1):37–68, 1953

work page 1953

[59] [59]

Dacey, Beth B

Dennis M. Dacey, Beth B. Peterson, Farrel R. Robinson, and Paul D. Gamlin. Fireworks in the primate retina: In vitro photodynamics reveals diverse LGN-projecting ganglion cell types.Neuron, 37(1):15–27, 2003

work page 2003

[60] [60]

Hubel and Torsten N

David H. Hubel and Torsten N. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex.The Journal of Physiology, 160(1):106–154, 1962

work page 1962

[61] [61]

BNM,MHN->BNH

Tony Lindeberg.Scale-Space Theory in Computer Vision. The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Boston, MA, 1994. 12 NeurIPS Paper Checklist 1.Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: ...

work page 1994