pith. sign in

arxiv: 2503.15696 · v3 · submitted 2025-03-19 · 🧮 math.NA · cs.LG· cs.NA

Approximation properties of neural ODEs

Pith reviewed 2026-05-22 22:41 UTC · model grok-4.3

classification 🧮 math.NA cs.LGcs.NA
keywords neural ODEsuniversal approximation propertyshallow neural networkscontinuous functionsLipschitz constraintsapproximation boundsstability constraints
0
0 comments X

The pith

Composing neural ODE flow maps with embeddings and projections produces shallow networks that satisfy the universal approximation property for continuous functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that neural ODEs, which require matching input and output dimensions, can be adapted to approximate arbitrary continuous functions by sandwiching their flow map between an embedding and a projection. This turns the fixed-time flow map into the activation function of a shallow network, allowing direct application of classical universal approximation results. The authors further examine stability constraints, proving the property survives when the Lipschitz constant of the flow map or the weight norms are bounded separately, but both together reduce expressiveness and come with explicit error bounds.

Core claim

The composition of an arbitrary embedding, the neural ODE flow map at fixed time, and a projection produces a shallow network whose activation function satisfies the conditions required for classical universal approximation theorems to apply directly. The universal approximation property holds for these networks, and it continues to hold when either the Lipschitz constant of the flow map or the weight norms are constrained independently; when both constraints are active simultaneously, approximation bounds quantify the resulting loss of expressiveness.

What carries the argument

The flow map of the neural ODE at the final integration time, serving as the activation function of the resulting shallow network after embedding and projection.

If this is right

  • The universal approximation property holds without constraints on the neural ODE.
  • The universal approximation property holds when the flow map is constrained only by a Lipschitz bound.
  • The universal approximation property holds when the weights are constrained only by norm bounds.
  • When both the Lipschitz bound and weight-norm bounds are imposed together, the network loses expressiveness and explicit approximation error bounds apply.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This construction lets neural ODEs handle input-output dimension mismatch while preserving their dynamical interpretation.
  • The separate-constraint results suggest that stability can be enforced in one aspect without sacrificing the ability to approximate arbitrary continuous maps.
  • The joint-constraint bounds indicate a quantifiable stability-expressiveness trade-off that could be tested in numerical experiments on specific function classes.

Load-bearing premise

The flow map of the neural ODE satisfies the conditions needed for classical universal approximation theorems once composed with embedding and projection.

What would settle it

A continuous function on a compact domain that cannot be approximated to arbitrary accuracy by any shallow network whose activation is a neural ODE flow map, even without the joint constraints.

Figures

Figures reproduced from arXiv: 2503.15696 by Arturo De Marinis, Brynjulf Owren, Davide Murari, Elena Celledoni, Francesco Tudisco, Nicola Guglielmi.

Figure 1
Figure 1. Figure 1: Some images from the MNIST dataset. f that classifies the MNIST images is a function f : R 28×28 → {0, 1, . . . , 9}. The representation of f is unknown, probably not even possible in terms of elementary functions. Nevertheless, we can approximate f by a suitably chosen set of functions H, such as neural networks. In this paper, H is a set of neural networks. Neural networks are parametric maps obtained by… view at source ↗
Figure 2
Figure 2. Figure 2: Smoothed Leaky Rectified Linear Unit (LeakyReLU) with minimal slope [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the test mean squared error with varying numbers of training samples [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: , we plot the accuracy of φ¯⋆ for different values of δ on the MNIST dataset as a function of the magnitude η of the FGSM adversarial attack. As δ increases, the model φ¯⋆ gains in accuracy, but it loses in robustness and stability. In contrast, as δ decreases, the model’s robustness and stability improve, but its accuracy deteriorates. See [39, Section 5] for more details. 0.0 0.02 0.04 0.06 0.08 0.1 0.12… view at source ↗
Figure 5
Figure 5. Figure 5: Mean and standard deviation of the percentage of points of the discretised domain [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Two Moons dataset. and by ¯φ the stabilized shallow neural network φ¯(x) = A2ϕ¯(A1x + b1) + b2, x ∈ R 2 . Here the compact subset K ⊂ R 2 of interest is the Two Moons dataset itself, and it is already a discrete set. We now provide an answer to the following question: where does the lower bound for the approximation error ∥φ − φ¯∥∞,K hold in the domain K? It is sufficient to check where Assumption 3.1 hold… view at source ↗
Figure 7
Figure 7. Figure 7: Two Moons dataset. The green region is that where the lower bound holds, while the [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Two Moons dataset. The green region is that where the lower bound holds, while the [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Two Moons dataset. Different classification for different values of [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
read the original abstract

We study the approximation properties of neural ordinary differential equations (neural ODEs) in the space of continuous functions. Since a neural ODE requires input and output dimensions to be the same, while input and output dimensions of a continuous function are generally different, we need to embed an input into the latent space of the neural ODE, and to project the output of the neural ODE into the output space. By composing the neural ODE flow map with such embedding and projection operations, we get a shallow neural network whose activation function is defined as the flow map of the neural ODE at the final time of the integration interval. Thus, the study of the approximation properties of neural ODEs leads to the study of the approximation properties of shallow neural networks with a particular choice of activation function. We prove the universal approximation property (UAP) of such shallow neural networks in the space of continuous functions. Furthermore, we investigate the approximation properties of shallow neural networks whose parameters satisfy specific constraints. In particular, we constrain the Lipschitz constant of the neural ODE's flow map and the norms of the weights to increase the network's stability. We prove that the UAP holds if we consider either constraint independently. When both are enforced, there is a loss of expressiveness, and we derive approximation bounds that quantify how accurately such a constrained network can approximate a continuous function.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript studies approximation properties of neural ODEs by composing an input embedding, the neural ODE flow map at fixed integration time, and an output projection. This yields a shallow neural network whose activation is the flow map. The authors prove the universal approximation property (UAP) for these networks in the space of continuous functions. They further prove that UAP continues to hold when the Lipschitz constant of the flow map is constrained or when weight norms are constrained, but that simultaneous enforcement of both constraints produces a loss of expressiveness, for which they derive quantitative approximation bounds.

Significance. If the central derivations are correct, the work supplies a direct theoretical bridge between neural ODEs and the classical theory of shallow-network universal approximation, together with a quantitative treatment of stability constraints that is relevant to practical use. The explicit bounds under joint constraints constitute a concrete, falsifiable contribution.

major comments (1)
  1. [Abstract] Abstract (paragraph 2): the claim that the embedding + flow map + projection composition 'produces a shallow neural network whose activation function is defined as the flow map' and that classical UAP theorems therefore apply directly is load-bearing for the entire UAP result, yet the flow map Φ_T : ℝ^d → ℝ^d is vector-valued. Standard scalar UAP theorems (Cybenko, Hornik) require a fixed scalar non-polynomial activation applied componentwise; the manuscript must either reduce the vector case to the scalar setting or supply an independent density argument for this particular vector activation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the single major comment below and will revise the manuscript to improve clarity on the UAP argument.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph 2): the claim that the embedding + flow map + projection composition 'produces a shallow neural network whose activation function is defined as the flow map' and that classical UAP theorems therefore apply directly is load-bearing for the entire UAP result, yet the flow map Φ_T : ℝ^d → ℝ^d is vector-valued. Standard scalar UAP theorems (Cybenko, Hornik) require a fixed scalar non-polynomial activation applied componentwise; the manuscript must either reduce the vector case to the scalar setting or supply an independent density argument for this particular vector activation.

    Authors: We agree that the abstract's phrasing is imprecise and could be read as claiming direct applicability of scalar UAP theorems. The flow map is indeed vector-valued, so the classical scalar results do not apply verbatim. However, the manuscript supplies an independent density argument for the specific vector-valued activation arising from the neural ODE flow map (see the construction and proof in Section 3, which establishes density in C^0 by exploiting the continuous dependence on initial conditions and the non-polynomial character of the flow without reducing to the scalar case). We will revise the abstract to remove the claim that classical theorems apply directly and instead state that we prove UAP directly for this vector-valued activation. This is a clarification only; the main theorems remain unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity: standard UAP theorems applied to composed flow-map activation

full rationale

The derivation defines a shallow network via embedding + neural-ODE flow map at fixed time + projection, then states that this network's activation satisfies the hypotheses of classical scalar UAP theorems (Cybenko/Hornik et al.). No equation or step reduces the claimed approximation result to a quantity defined by fitting inside the paper, nor does any load-bearing premise rest on a self-citation chain. The central claim therefore remains independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the paper invokes standard results from functional analysis and ordinary differential equation theory to establish universal approximation; no free parameters, ad-hoc axioms, or new postulated entities are mentioned.

pith-pipeline@v0.9.0 · 5785 in / 1255 out tokens · 43504 ms · 2026-05-22T22:41:02.987832+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 1 internal anchor

  1. [1]

    L. Deng. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012

  2. [2]

    R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural Ordinary Differential Equations. InAdvances in Neural Information Processing Systems, 2018

  3. [3]

    G. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

  4. [4]

    Hornik, M

    K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359–366, 1989. 27

  5. [5]

    K. Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991

  6. [6]

    Leshno, V

    M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.Neural Networks, 6(6):861–867, 1993

  7. [7]

    A. Pinkus. Approximation theory of the MLP model in neural networks.Acta Numerica, 8:143–195, 1999

  8. [8]

    Funahashi

    K. Funahashi. On the approximate realization of continuous mappings by neural networks. Neural Networks, 2(3):183–192, 1989

  9. [9]

    A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993

  10. [10]

    Hassoun.Fundamentals of Artificial Neural Networks

    M. Hassoun.Fundamentals of Artificial Neural Networks. The MIT Press, 1995

  11. [11]

    Haykin.Neural Networks: A Comprehensive Foundation

    S. Haykin.Neural Networks: A Comprehensive Foundation. Prentice Hall, 1998

  12. [12]

    Gripenberg

    G. Gripenberg. Approximation by neural networks with a bounded number of nodes at each level.Journal of Approximation Theory, 122(2):260–266, 2003

  13. [13]

    Yarotsky

    D. Yarotsky. Error bounds for approximations with deep ReLU networks.Neural Networks, 94:103–114, 2017

  14. [14]

    Z. Lu, H. Pu, F. Wang, Z. Hu, and Liwei W. The Expressive Power of Neural networks: A View from the Width. InAdvances in Neural Information Processing Systems, 2017

  15. [15]

    Approximating Continuous Functions by ReLU Nets of Minimal Width

    B. Hanin and M. Sellke. Approximating Continuous Functions by ReLU Nets of Minimal Width.arXiv preprint arXiv:1710.11278, 2017

  16. [16]

    Kidger and T

    P. Kidger and T. Lyons. Universal Approximation with Deep Narrow Networks. InConfer- ence on Learning Theory, 2020

  17. [17]

    S. Park, C. Yun, J. Lee, and J. Shin. Minimum Width for Universal Approximation. In International Conference on Learning Representations, 2021

  18. [18]

    Y. Cai. Achieve the minimum width of neural networks for universal approximation. In International Conference on Learning Representations, 2023

  19. [19]

    Yarotsky

    D. Yarotsky. Optimal approximation of continuous functions by very deep ReLU networks. InConference on Learning Theory, 2018

  20. [20]

    Z. Shen, H. Yang, and S. Zhang. Deep Network Approximation Characterized by Number of Neurons.Communications in Computational Physics, 28(5):1768–1811, 2019

  21. [21]

    Yarotsky and A

    D. Yarotsky and A. Zhevnerchuk. The phase diagram of approximation rates for deep neural networks. InAdvances in Neural Information Processing Systems, 2020

  22. [22]

    J. Lu, Z. Shen, H. Yang, and S. Zhang. Deep Network Approximation for Smooth Functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021

  23. [23]

    Petersen and F

    P. Petersen and F. Voigtlaender. Optimal approximation of piecewise smooth functions using deep ReLU neural networks.Neural Networks, 108:296–330, 2018. 28

  24. [24]

    Y. Yang, Z. Li, and Y. Wang. Approximation in shift-invariant spaces with deep ReLU neural networks.Neural Networks, 153:269–281, 2022

  25. [25]

    Montanelli, H

    H. Montanelli, H. Yang, and Q. Du. Deep ReLU Networks Overcome the Curse of Dimen- sionality for Generalized Bandlimited Functions.Journal of Computational Mathematics, 39(6):801–815, 2021

  26. [26]

    A. B. Juditsky, O. V. Lepski, and A. B. Tsybakov. Nonparametric estimation of composite functions.The Annals of Statistics, 37(3):1360–1404, 2009

  27. [27]

    Poggio, H

    T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, and Q. Liao. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review.International Journal of Automation and Computing, 14(5):503–519, 2017

  28. [28]

    J. Johnson. Deep, Skinny Neural Networks are not Universal Approximators. InInterna- tional Conference on Learning Representations, 2019

  29. [29]

    Kratsios and L

    A. Kratsios and L. Papon. Universal Approximation Theorems for Differentiable Geometric Deep Learning.Journal of Machine Learning Research, 23(196):1–73, 2022

  30. [30]

    Maiorov and A

    V. Maiorov and A. Pinkus. Lower bounds for approximation by MLP neural networks. Neurocomputing, 25(1-3):81–91, 1999

  31. [31]

    N. J. Guliyev and V. E. Ismailov. Approximation capability of two hidden layer feedforward neural networks with fixed weights.Neurocomputing, 316:262–269, 2018

  32. [32]

    N. J. Guliyev and V. E. Ismailov. On the approximation by single hidden layer feedforward neural networks with fixed weights.Neural Networks, 98:296–304, 2018

  33. [33]

    Z. Shen, H. Yang, and S. Zhang. Optimal Approximation Rate of ReLU Networks in Terms of Width and Depth.Journal de Math´ ematiques Pures et Appliqu´ ees, 157:101–135, 2022

  34. [34]

    S¨ oderlind

    G. S¨ oderlind. The logarithmic norm. History and modern theory.BIT Numerical Mathe- matics, 46(3):631–652, 2006

  35. [35]

    S¨ oderlind.Logarithmic Norms

    G. S¨ oderlind.Logarithmic Norms. Springer, 2024

  36. [36]

    Bass and G

    H. Bass and G. Meisters. Polynomial flows in the plane.Advances in Mathematics, 55(2):173–208, 1985

  37. [37]

    Celledoni, M

    E. Celledoni, M. J. Ehrhardt, C. Etmann, R. I. McLachlan, B. Owren, C-B Sch¨ onlieb, and F. Sherry. Structure-preserving deep learning.European journal of applied mathematics, 32(5):888–936, 2021

  38. [38]

    Guglielmi, A

    N. Guglielmi, A. De Marinis, A. Savostianov, and F. Tudisco. Contractivity of neural ODEs: an eigenvalue optimization problem.Mathematics of Computation, 2025

  39. [39]

    De Marinis, N

    A. De Marinis, N. Guglielmi, S. Sicilia, and F. Tudisco. Improving the robustness of neural ODEs with minimal weight perturbation.arXiv preprint arXiv:2501.10740, 2025

  40. [40]

    Biggio, I

    B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. ˇSrndi´ c, P. Laskov, G. Giacinto, and F. Roli. Evasion Attacks against Machine Learning at Test Time. InMachine Learning and Knowl- edge Discovery in Databases, 2013. 29

  41. [41]

    Szegedy, W

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fer- gus. Intriguing Properties of Neural Networks. InInternational Conference on Learning Representations, 2014

  42. [42]

    I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and Harnessing Adversarial Exam- ples. InInternational Conference on Learning Representations, 2015

  43. [43]

    N. M. Gottschling, V. Antun, A. C. Hansen, and B. Adcock. The Troublesome Kernel: On Hallucinations, No Free Lunches, and the Accuracy-Stability Tradeoff in Inverse Problems. SIAM Review, 67(1):73–104, 2025. 30