pith. sign in

arxiv: 1710.11278 · v2 · pith:F7RUVTRPnew · submitted 2017-10-31 · 📊 stat.ML · cs.CC· cs.LG· math.CO· math.ST· stat.TH

Approximating Continuous Functions by ReLU Nets of Minimal Width

classification 📊 stat.ML cs.CCcs.LGmath.COmath.STstat.TH
keywords netsreluwidthcontinuousdepthfunctionhiddenminimal
0
0 comments X
read the original abstract

This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well? It turns out that this minimal width is exactly equal to $d_{in}+1.$ That is, if all the hidden layer widths are bounded by $d_{in}$, then even in the infinite depth limit, ReLU nets can only express a very limited class of functions, and, on the other hand, any continuous function on the $d_{in}$-dimensional unit cube can be approximated to arbitrary precision by ReLU nets in which all hidden layers have width exactly $d_{in}+1.$ Our construction in fact shows that any continuous function $f:[0,1]^{d_{in}}\to\mathbb R^{d_{out}}$ can be approximated by a net of width $d_{in}+d_{out}$. We obtain quantitative depth estimates for such an approximation in terms of the modulus of continuity of $f$.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Approximation of Maximally Monotone Operators : A Graph Convergence Perspective

    cs.LG 2026-05 unverdicted novelty 7.0

    Any maximally monotone operator can be approximated in local graph convergence by continuous encoder-decoder networks, with structure-preserving versions that retain maximal monotonicity via resolvent parameterizations.

  2. Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks

    cs.LG 2026-05 unverdicted novelty 6.0

    Diagonal plus Low-Rank (DLoR) neural networks achieve universal approximation for general activations by additive or multiplicative decompositions of full-rank transformations.

  3. Man, Machine, and Mathematics

    math.OC 2026-04 unverdicted novelty 5.0

    A high-level outline is given for a unified theory that reduces learning to a small set of ideas from dynamical systems, geometry, and physics via definitions of solvable problems and parametrized methods.

  4. Relocation of compact sets in $\mathbb{R}^n$ by diffeomorphisms and linear separability of datasets in $\mathbb{R}^n$

    cs.LG 2026-04 unverdicted novelty 5.0

    Any finite number of compact sets in R^n can be relocated to arbitrary targets by diffeomorphisms of R^n and embedded into R^{n+1} to become linearly separable, allowing width-n DNNs with Leaky-ReLU, ELU or SELU activ...

  5. Upper Approximation Bounds for Neural Oscillators

    cs.LG 2025-11 unverdicted novelty 5.0

    Upper bounds are derived showing that neural oscillator approximation errors for causal operators and stable second-order dynamical systems scale polynomially with the reciprocals of the widths of the two MLPs.

  6. Model reduction of parametric ordinary differential equations via autoencoders: representation properties and convergence analysis

    math.NA 2025-09 unverdicted novelty 5.0

    Autoencoders enable nonlinear dimensionality reduction for parametric ODEs, with analysis of exact representation properties and convergence of the reduced model to the original.

  7. Approximation properties of neural ODEs

    math.NA 2025-03 unverdicted novelty 5.0

    Neural ODE flow maps composed with embedding and projection yield shallow networks with universal approximation in C^0, preserved under separate Lipschitz or norm constraints but with quantified loss when both are imposed.

  8. Approximation Theory for Neural Networks: Old and New

    cs.LG 2026-05 unverdicted novelty 2.0

    A survey summarizing classical density results and quantitative approximation theory for feedforward networks and KANs, with emphasis on depth advantages.