Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks

Andrew R. Barron; Jason M. Klusowski

arxiv: 1607.01434 · v4 · pith:LSYOLJDEnew · submitted 2016-07-05 · 🧮 math.ST · stat.ML· stat.TH

Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks

Jason M. Klusowski , Andrew R. Barron This is my paper

classification 🧮 math.ST stat.MLstat.TH

keywords starfracfunctionleftmathbbrightrisksample

0 comments

read the original abstract

Let $ f^{\star} $ be a function on $ \mathbb{R}^d $ with an assumption of a spectral norm $ v_{f^{\star}} $. For various noise settings, we show that $ \mathbb{E}\|\hat{f} - f^{\star} \|^2 \leq \left(v^4_{f^{\star}}\frac{\log d}{n}\right)^{1/3} $, where $ n $ is the sample size and $ \hat{f} $ is either a penalized least squares estimator or a greedily obtained version of such using linear combinations of sinusoidal, sigmoidal, ramp, ramp-squared or other smooth ridge functions. The candidate fits may be chosen from a continuum of functions, thus avoiding the rigidity of discretizations of the parameter space. On the other hand, if the candidate fits are chosen from a discretization, we show that $ \mathbb{E}\|\hat{f} - f^{\star} \|^2 \leq \left(v^3_{f^{\star}}\frac{\log d}{n}\right)^{2/5} $. This work bridges non-linear and non-parametric function estimation and includes single-hidden layer nets. Unlike past theory for such settings, our bound shows that the risk is small even when the input dimension $ d $ of an infinite-dimensional parameterized dictionary is much larger than the available sample size. When the dimension is larger than the cube root of the sample size, this quantity is seen to improve the more familiar risk bound of $ v_{f^{\star}}\left(\frac{d\log (n/d)}{n}\right)^{1/2} $, also investigated here.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Universal approximation property of Banach space-valued random feature models including random neural networks
cs.LG 2023-12 unverdicted novelty 7.0

Banach-valued random feature models, including random single-hidden-layer networks, universally approximate elements of Bochner spaces over non-compact domains with explicit approximation rates.
Adaptive Randomized Neural Networks with Locally Activation Function: Theory and Algorithm for Solving PDEs
math.NA 2026-04 unverdicted novelty 6.0

Randomized neural networks require a sampling domain sized to target smoothness for optimal approximation, and an adaptive PIRaNN method with partition-of-unity refinement solves PDEs with limited local regularity.
Solving Inverse Parametrized Problems via Finite Elements and Extreme Learning Networks
math.NA 2026-02 unverdicted novelty 6.0

A hybrid FEM and ELM framework for parameter-dependent PDEs derives existence, uniqueness, regularity, and error estimates for inverse problems in photoacoustic tomography.
Approximation Theory for Neural Networks: Old and New
cs.LG 2026-05 unverdicted novelty 2.0

A survey summarizing classical density results and quantitative approximation theory for feedforward networks and KANs, with emphasis on depth advantages.