arxiv: 2512.01015 · v2 · submitted 2025-11-30 · 💻 cs.LG · math.DS· math.FA

Recognition: 2 theorem links

· Lean Theorem

Upper Approximation Bounds for Neural Oscillators

Zifeng Huang , Konstantin M. Zuev , Yong Xia , Michael Beer

Authors on Pith no claims yet

Pith reviewed 2026-05-17 02:31 UTC · model grok-4.3

classification 💻 cs.LG math.DSmath.FA

keywords neural oscillatorsapproximation boundssecond-order dynamical systemsmultilayer perceptronscausal operatorserror scalingmachine learningODE-based models

0 comments

The pith

Neural oscillators approximate stable second-order dynamical systems with error that scales polynomially in the inverse widths of two MLPs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes upper bounds on the approximation power of neural oscillators, which combine a second-order ODE with an MLP. It shows these models can approximate causal uniformly continuous operators on temporal functions and, more specifically, uniformly asymptotically incrementally stable second-order dynamical systems. The derived error bounds decrease polynomially as the widths of the MLPs grow, which the authors argue overcomes the curse of parametric complexity that typically demands exponentially more parameters for better accuracy. The same bounding technique extends directly to certain state-space models built from linear time-continuous complex recurrent networks followed by an MLP. Four numerical examples are used to check that the predicted convergence rates hold in practice.

Core claim

The neural oscillator consisting of a second-order ODE followed by a multilayer perceptron is considered. Its upper approximation bound for approximating causal and uniformly continuous operators between continuous temporal function spaces and that for approximating uniformly asymptotically incrementally stable second-order dynamical systems are derived. The established proof method of the approximation bound for approximating the causal continuous operators can also be directly applied to state-space models consisting of a linear time-continuous complex recurrent neural network followed by an MLP. Theoretical results reveal that the approximation error of the neural oscillator for the state

What carries the argument

A neural oscillator formed by solving a second-order ordinary differential equation and feeding the solution through an MLP, with approximation bounds obtained via uniform continuity and incremental stability properties of the target operators and systems.

If this is right

Approximation error for the dynamical systems decreases polynomially with the reciprocals of the MLP widths.
The same proof technique yields bounds for causal uniformly continuous operators between temporal function spaces.
The bounding method applies without change to state-space models that use a linear time-continuous complex RNN followed by an MLP.
Convergence rates of the two error bounds are confirmed by four numerical test cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Wider but not exponentially larger MLPs should suffice for high-accuracy modeling of many physical systems that satisfy the stability condition.
The architecture may offer efficiency advantages over generic recurrent networks when long-term causal dependencies dominate.
Similar polynomial bounds could be pursued for higher-order or non-stable systems to broaden the theoretical coverage.

Load-bearing premise

The target operators must be causal and uniformly continuous on spaces of continuous temporal functions, while the dynamical systems must be uniformly asymptotically incrementally stable second-order systems.

What would settle it

A numerical experiment on a uniformly asymptotically incrementally stable second-order system where the observed approximation error fails to decrease polynomially as the widths of the two MLPs are increased would falsify the claimed scaling.

Figures

Figures reproduced from arXiv: 2512.01015 by Konstantin M. Zuev, Michael Beer, Yong Xia, Zifeng Huang.

**Figure 2.** Figure 2: Neural oscillator approximation errors ˜ε [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

read the original abstract

Neural oscillators, originating from second-order ordinary differential equations (ODEs), have demonstrated strong performance in stably learning causal mappings between long-term sequences or continuous temporal functions, as well as in accurately approximating physical systems. However, theoretically quantifying the capacities of their neural network architectures remains a significant challenge. In this study, the neural oscillator consisting of a second-order ODE followed by a multilayer perceptron (MLP) is considered. Its upper approximation bound for approximating causal and uniformly continuous operators between continuous temporal function spaces and that for approximating uniformly asymptotically incrementally stable second-order dynamical systems are derived. The established proof method of the approximation bound for approximating the causal continuous operators can also be directly applied to state-space models consisting of a linear time-continuous complex recurrent neural network followed by an MLP. Theoretical results reveal that the approximation error of the neural oscillator for approximating the second-order dynamical systems scales polynomially with the reciprocals of the widths of two utilized MLPs, thus overcoming the curse of parametric complexity. The convergence rates of two established approximation error bounds are validated through four numerical cases. These results provide a robust theoretical foundation for the effective application of the neural oscillator in science and engineering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives polynomial upper bounds on approximation error for neural oscillators approximating stable dynamical systems and causal operators.

read the letter

This paper derives upper bounds for how well a neural oscillator approximates causal operators and stable second-order dynamical systems. The error scales polynomially with the reciprocals of the widths of the two MLPs in the architecture. That polynomial scaling is the main takeaway, as it suggests these models can avoid the usual explosion in parameters needed for good approximation. What is new is the application of approximation theory techniques to this particular setup: a second-order ODE followed by an MLP. They establish the bound for approximating causal and uniformly continuous operators on temporal function spaces, then extend the method to the dynamical systems case. They also point out that the same proof works for state-space models using linear complex recurrent networks plus an MLP. The numerical experiments in four cases validate that the convergence rates match the theory. The paper does a decent job grounding the claims in existing work on neural network approximation and ODE models. The focus on overcoming the curse of parametric complexity through polynomial dependence is a clear strength, and the numerical checks provide some practical reassurance. The soft spots are mostly around the assumptions and proof details. The bounds require the target operators to be causal and uniformly continuous, and the dynamical systems to be uniformly asymptotically incrementally stable. These are standard but restrictive; many real systems might only satisfy weaker versions. Without the full proofs visible here, it's possible that the hidden constants depend badly on the time horizon or the degree of stability, which could limit long-term usefulness. The abstract is straightforward, but a referee would need to verify the derivations carefully. This work is for people studying theoretical guarantees for neural models in dynamical systems and sequence learning. A reader interested in approximation bounds for physics-informed or causal models would find it relevant. It has enough new analysis and supporting numerics to deserve serious peer review. I would recommend sending it to referees rather than desk rejecting it.

Referee Report

2 major / 2 minor

Summary. The paper considers neural oscillators formed by a second-order ODE followed by an MLP. It derives upper bounds on the approximation error when these oscillators approximate causal and uniformly continuous operators between spaces of continuous temporal functions, and when they approximate uniformly asymptotically incrementally stable second-order dynamical systems. The bounds are shown to scale polynomially with the reciprocals of the widths of the two MLPs. The same proof technique is stated to apply directly to state-space models consisting of a linear time-continuous complex RNN followed by an MLP. Convergence rates of the two bounds are validated numerically on four example cases.

Significance. If the stated polynomial scaling holds under the given assumptions, the result supplies a concrete theoretical guarantee that neural-oscillator architectures can approximate stable dynamical systems without incurring an exponential dependence on the number of parameters. This would be a useful addition to the literature on approximation theory for neural networks applied to continuous-time systems and long-horizon causal mappings.

major comments (2)

[Section deriving the bound for dynamical systems (around the statement of the main theorem)] The central claim that the approximation error for second-order dynamical systems scales polynomially in the reciprocals of the MLP widths rests on the uniform asymptotic incremental stability assumption. The manuscript should make explicit how the stability margin and the time horizon appear in the final bound (e.g., in the constant multiplying the polynomial term) to rule out a hidden exponential factor that would reintroduce the curse of complexity.
[Paragraph discussing the extension to state-space models] The assertion that the causal-operator proof technique applies directly to linear time-continuous complex RNN state-space models requires a short but self-contained argument showing that the complex-valued linear dynamics remain causal and uniformly continuous on the relevant function spaces; without this step the extension is not yet load-bearing.

minor comments (2)

[Numerical validation section] In the numerical experiments, state the precise widths of the two MLPs, the integration scheme for the ODE, and the precise error metric used to produce the reported convergence rates.
[Preliminaries / notation] Clarify whether the uniform continuity of the target operator is with respect to the sup norm or another topology on the space of continuous temporal functions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and have revised the paper to incorporate the suggested clarifications.

read point-by-point responses

Referee: [Section deriving the bound for dynamical systems (around the statement of the main theorem)] The central claim that the approximation error for second-order dynamical systems scales polynomially in the reciprocals of the MLP widths rests on the uniform asymptotic incremental stability assumption. The manuscript should make explicit how the stability margin and the time horizon appear in the final bound (e.g., in the constant multiplying the polynomial term) to rule out a hidden exponential factor that would reintroduce the curse of complexity.

Authors: We agree that an explicit statement of the dependence is warranted for full transparency. The bound derived under uniform asymptotic incremental stability takes the form C(δ, T) ⋅ (1/w₁ + 1/w₂)^k, where the prefactor C(δ, T) depends on the stability margin δ and time horizon T but is independent of the MLP widths w₁, w₂ and contains no exponential dependence on the number of parameters. In the revised manuscript we will restate the main theorem with this dependence written out explicitly and add a short remark confirming that the polynomial scaling in the reciprocals of the widths is unaffected by δ or T. revision: yes
Referee: [Paragraph discussing the extension to state-space models] The assertion that the causal-operator proof technique applies directly to linear time-continuous complex RNN state-space models requires a short but self-contained argument showing that the complex-valued linear dynamics remain causal and uniformly continuous on the relevant function spaces; without this step the extension is not yet load-bearing.

Authors: We accept the referee’s observation. Although the underlying proof for causal uniformly continuous operators extends immediately once the linear dynamics are shown to map the relevant function spaces into themselves, we will add a concise, self-contained paragraph immediately after the statement of the extension. This paragraph will verify that the linear time-continuous complex RNN preserves causality (by the variation-of-constants formula) and uniform continuity (by the boundedness of the state-transition matrix on compact time intervals) on the space of continuous temporal functions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives upper bounds on approximation error for causal uniformly continuous operators and for uniformly asymptotically incrementally stable second-order systems using the neural oscillator architecture (second-order ODE followed by MLP). These bounds are obtained by applying established proof techniques for causal operators directly to the stated assumptions of causality, uniform continuity on temporal function spaces, and incremental stability; the resulting polynomial scaling in the reciprocals of the two MLP widths follows from the analysis rather than being presupposed or fitted. No self-definitional reductions, fitted inputs relabeled as predictions, or load-bearing self-citations appear in the derivation. The numerical convergence checks are presented separately as validation and do not enter the theoretical claims.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claims rest on standard assumptions about the function spaces and stability properties, with no free parameters or new entities introduced in the abstract.

axioms (3)

domain assumption The neural oscillator consists of a second-order ODE followed by an MLP.
Stated as the architecture considered in the study.
domain assumption Target operators are causal and uniformly continuous between continuous temporal function spaces.
Required for the first approximation bound.
domain assumption Dynamical systems are uniformly asymptotically incrementally stable second-order systems.
Required for the second approximation bound.

pith-pipeline@v0.9.0 · 5508 in / 1408 out tokens · 72693 ms · 2026-05-17T02:31:13.284140+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theoretical results reveal that the approximation error of the neural oscillator for approximating the second-order dynamical systems scales polynomially with the reciprocals of the widths of two utilized MLPs
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

uniformly asymptotically incrementally stable second-order dynamical systems

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Upper Generalization Bounds for Neural Oscillators
cs.LG 2026-03 conditional novelty 6.0

Upper generalization bounds for neural oscillators scale polynomially with MLP size and time length, avoiding the curse of parametric complexity, with numerical validation on a Bouc-Wen nonlinear system.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...

work page
[2]

, author Pachpatte, B

author Ames, W.F. , author Pachpatte, B. , year 1997 . title Inequalities for differential and integral equations . volume volume 197 . publisher Elsevier

work page 1997
[3]

, year 1993

author Barron, A.R. , year 1993 . title Universal approximation bounds for superpositions of a sigmoidal function . journal IEEE Transactions on Information theory volume 39 , pages 930--945

work page 1993
[4]

, author Butzmann, H.P

author Beattie, R. , author Butzmann, H.P. , year 2013 . title Convergence structures and applications to functional analysis . publisher Springer Science & Business Media

work page 2013
[5]

, author Chua, L.O

author Boyd, S. , author Chua, L.O. , author Desoer, C.A. , year 1984 . title Analytical foundations of volterra series . journal IMA Journal of Mathematical Control and Information volume 1 , pages 243--282

work page 1984
[6]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

author Cho, K. , author Van Merri \"e nboer, B. , author Gulcehre, C. , author Bahdanau, D. , author Bougares, F. , author Schwenk, H. , author Bengio, Y. , year 2014 . title Learning phrase representations using rnn encoder-decoder for statistical machine translation . journal arXiv preprint arXiv:1406.1078

work page internal anchor Pith review Pith/arXiv arXiv 2014
[7]

, author Li, X.D

author Chow, T.W. , author Li, X.D. , year 2000 . title Modeling of continuous time dynamical systems with input by recurrent neural networks . journal IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications volume 47 , pages 575--578

work page 2000
[8]

, author Nakamura, Y

author Funahashi, K.i. , author Nakamura, Y. , year 1993 . title Approximation of dynamical systems by continuous time recurrent neural networks . journal Neural networks volume 6 , pages 801--806

work page 1993
[9]

, author Grigoryeva, L

author Gonon, L. , author Grigoryeva, L. , author Ortega, J.P. , year 2023 . title Approximation bounds for random neural networks and reservoir systems . journal The Annals of Applied Probability volume 33 , pages 28--69

work page 2023
[10]

, author Higham, D.J

author Griffiths, D.F. , author Higham, D.J. , year 2010 . title Numerical methods for ordinary differential equations: initial value problems . volume volume 5 . publisher Springer

work page 2010
[11]

, author Ortega, J.P

author Grigoryeva, L. , author Ortega, J.P. , year 2018 a. title Echo state networks are universal . journal Neural Networks volume 108 , pages 495--508

work page 2018
[12]

, author Ortega, J.P

author Grigoryeva, L. , author Ortega, J.P. , year 2018 b. title Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems . journal Journal of Machine Learning Research volume 19 , pages 1--40

work page 2018
[13]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

author Gu, A. , author Dao, T. , year 2023 . title Mamba: Linear-time sequence modeling with selective state spaces . journal arXiv preprint arXiv:2312.00752

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

, author Dao, T

author Gu, A. , author Dao, T. , author Ermon, S. , author Rudra, A. , author R \'e , C. , year 2020 . title Hippo: Recurrent memory with optimal polynomial projections . journal Advances in neural information processing systems volume 33 , pages 1474--1487

work page 2020
[15]

, author Goel, K

author Gu, A. , author Goel, K. , author Gupta, A. , author R \'e , C. , year 2022 . title On the parameterization and initialization of diagonal state space models . journal Advances in Neural Information Processing Systems volume 35 , pages 35971--35983

work page 2022
[16]

Efficiently Modeling Long Sequences with Structured State Spaces

author Gu, A. , author Goel, K. , author R \'e , C. , year 2021 . title Efficiently modeling long sequences with structured state spaces . journal arXiv preprint arXiv:2111.00396

work page internal anchor Pith review Pith/arXiv arXiv 2021
[17]

, year 2019

author Hanin, B. , year 2019 . title Universal function approximation by deep neural nets with bounded width and relu activations . journal Mathematics volume 7 , pages 992

work page 2019
[18]

Approximating Continuous Functions by ReLU Nets of Minimal Width

author Hanin, B. , author Sellke, M. , year 2017 . title Approximating continuous functions by relu nets of minimal width . journal arXiv preprint arXiv:1710.11278

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

, author Raginsky, M

author Hanson, J. , author Raginsky, M. , year 2020 . title Universal simulation of stable dynamical systems by recurrent neural nets , in: booktitle Learning for Dynamics and Control , organization PMLR . pp. pages 384--392

work page 2020
[20]

, author Zhang, X

author He, K. , author Zhang, X. , author Ren, S. , author Sun, J. , year 2015 . title Delving deep into rectifiers: Surpassing human-level performance on imagenet classification , in: booktitle Proceedings of the IEEE international conference on computer vision , pp. pages 1026--1034

work page 2015
[21]

, author Schmidhuber, J

author Hochreiter, S. , author Schmidhuber, J. , year 1997 . title Long short-term memory . journal Neural computation volume 9 , pages 1735--1780

work page 1997
[22]

, author Beer, M

author Huang, Z. , author Beer, M. , year 2024 . title Probability distributions for dynamic and extreme responses of linear elastic structures under quasi-stationary harmonizable loads . journal Probabilistic Engineering Mechanics volume 75 , pages 103590

work page 2024
[23]

, author Xia, Y

author Huang, Z. , author Xia, Y. , year 2025 . title Universal runge–kutta neural oscillator for stochastic response analysis of nonlinear dynamic systems under random loads . journal Journal of Engineering Mechanics volume 151 , pages 04025033

work page 2025
[24]

, author Szegedy, C

author Ioffe, S. , author Szegedy, C. , year 2015 . title Batch normalization: Accelerating deep network training by reducing internal covariate shift , in: booktitle International conference on machine learning , organization pmlr . pp. pages 448--456

work page 2015
[25]

Adam: A Method for Stochastic Optimization

author Kingma, D.P. , author Ba, J. , year 2014 . title Adam: A method for stochastic optimization . journal arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[26]

, year 1956

author Knopp, K. , year 1956 . title Infinite sequences and series . publisher Courier Corporation

work page 1956
[27]

, author Li, Z

author Kovachki, N. , author Li, Z. , author Liu, B. , author Azizzadenesheli, K. , author Bhattacharya, K. , author Stuart, A. , author Anandkumar, A. , year 2023 . title Neural operator: Learning maps between function spaces with applications to pdes . journal Journal of Machine Learning Research volume 24 , pages 1--97

work page 2023
[28]

, author Furuya, T

author Kratsios, A. , author Furuya, T. , author Benitez, J.A.L. , author Lassas, M. , author de Hoop, M. , year 2024 . title Mixture of experts soften the curse of dimensionality in operator learning . journal arXiv preprint arXiv:2404.09101

work page arXiv 2024
[29]

, author Papon, L

author Kratsios, A. , author Papon, L. , year 2022 . title Universal approximation theorems for differentiable geometric deep learning . journal Journal of Machine Learning Research volume 23 , pages 1--73

work page 2022
[30]

, year 2023

author Lanthaler, S. , year 2023 . title Operator learning with pca-net: upper and lower complexity bounds . journal Journal of Machine Learning Research volume 24 , pages 1--67

work page 2023
[31]

, year 2024

author Lanthaler, S. , year 2024 . title Operator learning of lipschitz operators: An information-theoretic perspective . journal arXiv preprint arXiv:2406.18794

work page arXiv 2024
[32]

, author Rusch, T.K

author Lanthaler, S. , author Rusch, T.K. , author Mishra, S. , year 2023 . title Neural oscillators are universal . journal Advances in Neural Information Processing Systems volume 36 , pages 46786--46806

work page 2023
[33]

, author Stuart, A.M

author Lanthaler, S. , author Stuart, A.M. , year 2025 . title The parametric complexity of operator learning . journal IMA Journal of Numerical Analysis , pages draf028

work page 2025
[34]

, author Han, J

author Li, Z. , author Han, J. , author E, W. , author Li, Q. , year 2022 . title Approximation and optimization theory for linear continuous-time recurrent neural networks . journal Journal of Machine Learning Research volume 23 , pages 1--85 . http://jmlr.org/papers/v23/21-0368.html

work page 2022
[35]

, author Yang, H

author Liu, H. , author Yang, H. , author Chen, M. , author Zhao, T. , author Liao, W. , year 2024 . title Deep nonparametric estimation of operators between infinite dimensional spaces . journal Journal of Machine Learning Research volume 25 , pages 1--67

work page 2024
[36]

, author Pinkus, A

author Maiorov, V. , author Pinkus, A. , year 1999 . title Lower bounds for approximation by mlp neural networks . journal Neurocomputing volume 25 , pages 81--91

work page 1999
[37]

, author Orvieto, A

author Muca Cirone, N. , author Orvieto, A. , author Walker, B. , author Salvi, C. , author Lyons, T. , year 2024 . title Theoretical foundations of deep selective state-space models . journal Advances in Neural Information Processing Systems volume 37 , pages 127226--127272

work page 2024
[38]

, author Hinton, G.E

author Nair, V. , author Hinton, G.E. , year 2010 . title Rectified linear units improve restricted boltzmann machines , in: booktitle Proceedings of the 27th international conference on machine learning (ICML-10) , pp. pages 807--814

work page 2010
[39]

, author De, S

author Orvieto, A. , author De, S. , author Gulcehre, C. , author Pascanu, R. , author Smith, S.L. , year 2024 . title Universality of linear recurrences followed by non-linear projections: Finite-width guarantees and benefits of complex eigenvalues , in: booktitle International Conference on Machine Learning , organization PMLR . pp. pages 38837--38863

work page 2024
[40]

, author Mikolov, T

author Pascanu, R. , author Mikolov, T. , author Bengio, Y. , year 2013 . title On the difficulty of training recurrent neural networks , in: booktitle International conference on machine learning , organization Pmlr . pp. pages 1310--1318

work page 2013
[41]

, author Gross, S

author Paszke, A. , author Gross, S. , author Massa, F. , author Lerer, A. , author Bradbury, J. , author Chanan, G. , author Killeen, T. , author Lin, Z. , author Gimelshein, N. , author Antiga, L. , et al., year 2019 . title Pytorch: An imperative style, high-performance deep learning library . journal Advances in neural information processing systems volume 32

work page 2019
[42]

, author Lumbroso, E

author Ran-Milo, Y. , author Lumbroso, E. , author Cohen-Karlik, E. , author Giryes, R. , author Globerson, A. , author Cohen, N. , year 2024 . title Provable benefits of complex parameterizations for structured state space models . journal Advances in Neural Information Processing Systems volume 37 , pages 115906--115939

work page 2024
[43]

, author Hinton, G.E

author Rumelhart, D.E. , author Hinton, G.E. , author Williams, R.J. , year 1986 . title Learning representations by back-propagating errors . journal nature volume 323 , pages 533--536

work page 1986
[44]

, author Mishra, S

author Rusch, T.K. , author Mishra, S. , year 2020 . title Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies . journal arXiv preprint arXiv:2010.00951

work page arXiv 2020
[45]

, author Mishra, S

author Rusch, T.K. , author Mishra, S. , year 2021 . title Unicornn: A recurrent model for learning very long time dependencies , in: booktitle International Conference on Machine Learning , organization PMLR . pp. pages 9168--9178

work page 2021
[46]

, author Rus, D

author Rusch, T.K. , author Rus, D. , year 2024 . title Oscillatory state-space models . journal arXiv preprint arXiv:2410.03943

work page arXiv 2024
[47]

, author Stein, A

author Schwab, C. , author Stein, A. , author Zech, J. , year 2023 . title Deep operator network approximation rates for lipschitz operators . journal arXiv preprint arXiv:2307.09835

work page arXiv 2023
[48]

, author Yang, H

author Shen, Z. , author Yang, H. , author Zhang, S. , year 2021 . title Neural network approximation: Three hidden layers are enough . journal Neural Networks volume 141 , pages 160--173

work page 2021
[49]

, year 2024

author Strogatz, S.H. , year 2024 . title Nonlinear dynamics and chaos: with applications to physics, biology, chemistry, and engineering . publisher Chapman and Hall/CRC

work page 2024
[50]

, author Casoni, M

author Tiezzi, M. , author Casoni, M. , author Betti, A. , author Guidi, T. , author Gori, M. , author Melacci, S. , year 2025 . title Back to recurrent processing at the crossroad of transformers and state-space models . journal Nature Machine Intelligence , pages 1--11

work page 2025
[51]

, year 2007

author Van Handel, R. , year 2007 . title Filtering, stability, and robustness . Ph.D. thesis. California Institute of Technology

work page 2007
[52]

, author Shazeer, N

author Vaswani, A. , author Shazeer, N. , author Parmar, N. , author Uszkoreit, J. , author Jones, L. , author Gomez, A.N. , author Kaiser, . , author Polosukhin, I. , year 2017 . title Attention is all you need . journal Advances in neural information processing systems volume 30

work page 2017
[53]

, author Xue, B

author Wang, S. , author Xue, B. , year 2023 . title State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory . journal Advances in Neural Information Processing Systems volume 36 , pages 74021--74038

work page 2023
[54]

, year 2017

author Yarotsky, D. , year 2017 . title Error bounds for approximations with deep relu networks . journal Neural networks volume 94 , pages 103--114

work page 2017
[55]

, year 2021

author Yarotsky, D. , year 2021 . title Elementary superexpressive activations , in: booktitle International conference on machine learning , organization PMLR . pp. pages 11932--11940

work page 2021
[56]

, author Stinchcombe, M

author Yukich, J. , author Stinchcombe, M. , author White, H. , year 1995 . title Sup-norm approximation bounds for networks through probabilistic methods . journal IEEE Transactions on Information Theory volume 41 , pages 1021--1027 . :10.1109/18.391247

work page doi:10.1109/18.391247 1995
[57]

, author Shen, Z

author Zhang, S. , author Shen, Z. , author Yang, H. , year 2022 . title Deep network approximation: Achieving arbitrary accuracy with fixed number of neurons . journal Journal of Machine Learning Research volume 23 , pages 1--60

work page 2022