Approximation properties of neural ODEs
Pith reviewed 2026-05-22 22:41 UTC · model grok-4.3
The pith
Composing neural ODE flow maps with embeddings and projections produces shallow networks that satisfy the universal approximation property for continuous functions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The composition of an arbitrary embedding, the neural ODE flow map at fixed time, and a projection produces a shallow network whose activation function satisfies the conditions required for classical universal approximation theorems to apply directly. The universal approximation property holds for these networks, and it continues to hold when either the Lipschitz constant of the flow map or the weight norms are constrained independently; when both constraints are active simultaneously, approximation bounds quantify the resulting loss of expressiveness.
What carries the argument
The flow map of the neural ODE at the final integration time, serving as the activation function of the resulting shallow network after embedding and projection.
If this is right
- The universal approximation property holds without constraints on the neural ODE.
- The universal approximation property holds when the flow map is constrained only by a Lipschitz bound.
- The universal approximation property holds when the weights are constrained only by norm bounds.
- When both the Lipschitz bound and weight-norm bounds are imposed together, the network loses expressiveness and explicit approximation error bounds apply.
Where Pith is reading between the lines
- This construction lets neural ODEs handle input-output dimension mismatch while preserving their dynamical interpretation.
- The separate-constraint results suggest that stability can be enforced in one aspect without sacrificing the ability to approximate arbitrary continuous maps.
- The joint-constraint bounds indicate a quantifiable stability-expressiveness trade-off that could be tested in numerical experiments on specific function classes.
Load-bearing premise
The flow map of the neural ODE satisfies the conditions needed for classical universal approximation theorems once composed with embedding and projection.
What would settle it
A continuous function on a compact domain that cannot be approximated to arbitrary accuracy by any shallow network whose activation is a neural ODE flow map, even without the joint constraints.
Figures
read the original abstract
We study the approximation properties of neural ordinary differential equations (neural ODEs) in the space of continuous functions. Since a neural ODE requires input and output dimensions to be the same, while input and output dimensions of a continuous function are generally different, we need to embed an input into the latent space of the neural ODE, and to project the output of the neural ODE into the output space. By composing the neural ODE flow map with such embedding and projection operations, we get a shallow neural network whose activation function is defined as the flow map of the neural ODE at the final time of the integration interval. Thus, the study of the approximation properties of neural ODEs leads to the study of the approximation properties of shallow neural networks with a particular choice of activation function. We prove the universal approximation property (UAP) of such shallow neural networks in the space of continuous functions. Furthermore, we investigate the approximation properties of shallow neural networks whose parameters satisfy specific constraints. In particular, we constrain the Lipschitz constant of the neural ODE's flow map and the norms of the weights to increase the network's stability. We prove that the UAP holds if we consider either constraint independently. When both are enforced, there is a loss of expressiveness, and we derive approximation bounds that quantify how accurately such a constrained network can approximate a continuous function.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies approximation properties of neural ODEs by composing an input embedding, the neural ODE flow map at fixed integration time, and an output projection. This yields a shallow neural network whose activation is the flow map. The authors prove the universal approximation property (UAP) for these networks in the space of continuous functions. They further prove that UAP continues to hold when the Lipschitz constant of the flow map is constrained or when weight norms are constrained, but that simultaneous enforcement of both constraints produces a loss of expressiveness, for which they derive quantitative approximation bounds.
Significance. If the central derivations are correct, the work supplies a direct theoretical bridge between neural ODEs and the classical theory of shallow-network universal approximation, together with a quantitative treatment of stability constraints that is relevant to practical use. The explicit bounds under joint constraints constitute a concrete, falsifiable contribution.
major comments (1)
- [Abstract] Abstract (paragraph 2): the claim that the embedding + flow map + projection composition 'produces a shallow neural network whose activation function is defined as the flow map' and that classical UAP theorems therefore apply directly is load-bearing for the entire UAP result, yet the flow map Φ_T : ℝ^d → ℝ^d is vector-valued. Standard scalar UAP theorems (Cybenko, Hornik) require a fixed scalar non-polynomial activation applied componentwise; the manuscript must either reduce the vector case to the scalar setting or supply an independent density argument for this particular vector activation.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. We address the single major comment below and will revise the manuscript to improve clarity on the UAP argument.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph 2): the claim that the embedding + flow map + projection composition 'produces a shallow neural network whose activation function is defined as the flow map' and that classical UAP theorems therefore apply directly is load-bearing for the entire UAP result, yet the flow map Φ_T : ℝ^d → ℝ^d is vector-valued. Standard scalar UAP theorems (Cybenko, Hornik) require a fixed scalar non-polynomial activation applied componentwise; the manuscript must either reduce the vector case to the scalar setting or supply an independent density argument for this particular vector activation.
Authors: We agree that the abstract's phrasing is imprecise and could be read as claiming direct applicability of scalar UAP theorems. The flow map is indeed vector-valued, so the classical scalar results do not apply verbatim. However, the manuscript supplies an independent density argument for the specific vector-valued activation arising from the neural ODE flow map (see the construction and proof in Section 3, which establishes density in C^0 by exploiting the continuous dependence on initial conditions and the non-polynomial character of the flow without reducing to the scalar case). We will revise the abstract to remove the claim that classical theorems apply directly and instead state that we prove UAP directly for this vector-valued activation. This is a clarification only; the main theorems remain unchanged. revision: yes
Circularity Check
No circularity: standard UAP theorems applied to composed flow-map activation
full rationale
The derivation defines a shallow network via embedding + neural-ODE flow map at fixed time + projection, then states that this network's activation satisfies the hypotheses of classical scalar UAP theorems (Cybenko/Hornik et al.). No equation or step reduces the claimed approximation result to a quantity defined by fitting inside the paper, nor does any load-bearing premise rest on a self-citation chain. The central claim therefore remains independent of the paper's own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
L. Deng. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012
work page 2012
-
[2]
R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural Ordinary Differential Equations. InAdvances in Neural Information Processing Systems, 2018
work page 2018
-
[3]
G. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989
work page 1989
- [4]
-
[5]
K. Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991
work page 1991
- [6]
-
[7]
A. Pinkus. Approximation theory of the MLP model in neural networks.Acta Numerica, 8:143–195, 1999
work page 1999
- [8]
-
[9]
A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993
work page 1993
-
[10]
Hassoun.Fundamentals of Artificial Neural Networks
M. Hassoun.Fundamentals of Artificial Neural Networks. The MIT Press, 1995
work page 1995
-
[11]
Haykin.Neural Networks: A Comprehensive Foundation
S. Haykin.Neural Networks: A Comprehensive Foundation. Prentice Hall, 1998
work page 1998
-
[12]
G. Gripenberg. Approximation by neural networks with a bounded number of nodes at each level.Journal of Approximation Theory, 122(2):260–266, 2003
work page 2003
- [13]
-
[14]
Z. Lu, H. Pu, F. Wang, Z. Hu, and Liwei W. The Expressive Power of Neural networks: A View from the Width. InAdvances in Neural Information Processing Systems, 2017
work page 2017
-
[15]
Approximating Continuous Functions by ReLU Nets of Minimal Width
B. Hanin and M. Sellke. Approximating Continuous Functions by ReLU Nets of Minimal Width.arXiv preprint arXiv:1710.11278, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
P. Kidger and T. Lyons. Universal Approximation with Deep Narrow Networks. InConfer- ence on Learning Theory, 2020
work page 2020
-
[17]
S. Park, C. Yun, J. Lee, and J. Shin. Minimum Width for Universal Approximation. In International Conference on Learning Representations, 2021
work page 2021
-
[18]
Y. Cai. Achieve the minimum width of neural networks for universal approximation. In International Conference on Learning Representations, 2023
work page 2023
- [19]
-
[20]
Z. Shen, H. Yang, and S. Zhang. Deep Network Approximation Characterized by Number of Neurons.Communications in Computational Physics, 28(5):1768–1811, 2019
work page 2019
-
[21]
D. Yarotsky and A. Zhevnerchuk. The phase diagram of approximation rates for deep neural networks. InAdvances in Neural Information Processing Systems, 2020
work page 2020
-
[22]
J. Lu, Z. Shen, H. Yang, and S. Zhang. Deep Network Approximation for Smooth Functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021
work page 2021
-
[23]
P. Petersen and F. Voigtlaender. Optimal approximation of piecewise smooth functions using deep ReLU neural networks.Neural Networks, 108:296–330, 2018. 28
work page 2018
-
[24]
Y. Yang, Z. Li, and Y. Wang. Approximation in shift-invariant spaces with deep ReLU neural networks.Neural Networks, 153:269–281, 2022
work page 2022
-
[25]
H. Montanelli, H. Yang, and Q. Du. Deep ReLU Networks Overcome the Curse of Dimen- sionality for Generalized Bandlimited Functions.Journal of Computational Mathematics, 39(6):801–815, 2021
work page 2021
-
[26]
A. B. Juditsky, O. V. Lepski, and A. B. Tsybakov. Nonparametric estimation of composite functions.The Annals of Statistics, 37(3):1360–1404, 2009
work page 2009
- [27]
-
[28]
J. Johnson. Deep, Skinny Neural Networks are not Universal Approximators. InInterna- tional Conference on Learning Representations, 2019
work page 2019
-
[29]
A. Kratsios and L. Papon. Universal Approximation Theorems for Differentiable Geometric Deep Learning.Journal of Machine Learning Research, 23(196):1–73, 2022
work page 2022
-
[30]
V. Maiorov and A. Pinkus. Lower bounds for approximation by MLP neural networks. Neurocomputing, 25(1-3):81–91, 1999
work page 1999
-
[31]
N. J. Guliyev and V. E. Ismailov. Approximation capability of two hidden layer feedforward neural networks with fixed weights.Neurocomputing, 316:262–269, 2018
work page 2018
-
[32]
N. J. Guliyev and V. E. Ismailov. On the approximation by single hidden layer feedforward neural networks with fixed weights.Neural Networks, 98:296–304, 2018
work page 2018
-
[33]
Z. Shen, H. Yang, and S. Zhang. Optimal Approximation Rate of ReLU Networks in Terms of Width and Depth.Journal de Math´ ematiques Pures et Appliqu´ ees, 157:101–135, 2022
work page 2022
-
[34]
G. S¨ oderlind. The logarithmic norm. History and modern theory.BIT Numerical Mathe- matics, 46(3):631–652, 2006
work page 2006
- [35]
-
[36]
H. Bass and G. Meisters. Polynomial flows in the plane.Advances in Mathematics, 55(2):173–208, 1985
work page 1985
-
[37]
E. Celledoni, M. J. Ehrhardt, C. Etmann, R. I. McLachlan, B. Owren, C-B Sch¨ onlieb, and F. Sherry. Structure-preserving deep learning.European journal of applied mathematics, 32(5):888–936, 2021
work page 2021
-
[38]
N. Guglielmi, A. De Marinis, A. Savostianov, and F. Tudisco. Contractivity of neural ODEs: an eigenvalue optimization problem.Mathematics of Computation, 2025
work page 2025
-
[39]
A. De Marinis, N. Guglielmi, S. Sicilia, and F. Tudisco. Improving the robustness of neural ODEs with minimal weight perturbation.arXiv preprint arXiv:2501.10740, 2025
- [40]
-
[41]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fer- gus. Intriguing Properties of Neural Networks. InInternational Conference on Learning Representations, 2014
work page 2014
-
[42]
I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and Harnessing Adversarial Exam- ples. InInternational Conference on Learning Representations, 2015
work page 2015
-
[43]
N. M. Gottschling, V. Antun, A. C. Hansen, and B. Adcock. The Troublesome Kernel: On Hallucinations, No Free Lunches, and the Accuracy-Stability Tradeoff in Inverse Problems. SIAM Review, 67(1):73–104, 2025. 30
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.