Approximating Continuous Functions by ReLU Nets of Minimal Width

Boris Hanin; Mark Sellke

arxiv: 1710.11278 · v2 · pith:F7RUVTRPnew · submitted 2017-10-31 · 📊 stat.ML · cs.CC· cs.LG· math.CO· math.ST· stat.TH

Approximating Continuous Functions by ReLU Nets of Minimal Width

Boris Hanin , Mark Sellke This is my paper

classification 📊 stat.ML cs.CCcs.LGmath.COmath.STstat.TH

keywords netsreluwidthcontinuousdepthfunctionhiddenminimal

0 comments

read the original abstract

This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well? It turns out that this minimal width is exactly equal to $d_{in}+1.$ That is, if all the hidden layer widths are bounded by $d_{in}$, then even in the infinite depth limit, ReLU nets can only express a very limited class of functions, and, on the other hand, any continuous function on the $d_{in}$-dimensional unit cube can be approximated to arbitrary precision by ReLU nets in which all hidden layers have width exactly $d_{in}+1.$ Our construction in fact shows that any continuous function $f:[0,1]^{d_{in}}\to\mathbb R^{d_{out}}$ can be approximated by a net of width $d_{in}+d_{out}$. We obtain quantitative depth estimates for such an approximation in terms of the modulus of continuity of $f$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Approximation of Maximally Monotone Operators : A Graph Convergence Perspective
cs.LG 2026-05 unverdicted novelty 7.0

Any maximally monotone operator can be approximated in local graph convergence by continuous encoder-decoder networks, with structure-preserving versions that retain maximal monotonicity via resolvent parameterizations.
Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks
cs.LG 2026-05 unverdicted novelty 6.0

Diagonal plus Low-Rank (DLoR) neural networks achieve universal approximation for general activations by additive or multiplicative decompositions of full-rank transformations.
Man, Machine, and Mathematics
math.OC 2026-04 unverdicted novelty 5.0

A high-level outline is given for a unified theory that reduces learning to a small set of ideas from dynamical systems, geometry, and physics via definitions of solvable problems and parametrized methods.
Relocation of compact sets in $\mathbb{R}^n$ by diffeomorphisms and linear separability of datasets in $\mathbb{R}^n$
cs.LG 2026-04 unverdicted novelty 5.0

Any finite number of compact sets in R^n can be relocated to arbitrary targets by diffeomorphisms of R^n and embedded into R^{n+1} to become linearly separable, allowing width-n DNNs with Leaky-ReLU, ELU or SELU activ...
Upper Approximation Bounds for Neural Oscillators
cs.LG 2025-11 unverdicted novelty 5.0

Upper bounds are derived showing that neural oscillator approximation errors for causal operators and stable second-order dynamical systems scale polynomially with the reciprocals of the widths of the two MLPs.
Model reduction of parametric ordinary differential equations via autoencoders: representation properties and convergence analysis
math.NA 2025-09 unverdicted novelty 5.0

Autoencoders enable nonlinear dimensionality reduction for parametric ODEs, with analysis of exact representation properties and convergence of the reduced model to the original.
Approximation properties of neural ODEs
math.NA 2025-03 unverdicted novelty 5.0

Neural ODE flow maps composed with embedding and projection yield shallow networks with universal approximation in C^0, preserved under separate Lipschitz or norm constraints but with quantified loss when both are imposed.
Approximation Theory for Neural Networks: Old and New
cs.LG 2026-05 unverdicted novelty 2.0

A survey summarizing classical density results and quantitative approximation theory for feedforward networks and KANs, with emphasis on depth advantages.