Explicit integral representations and quantitative bounds for two-layer ReLU networks
Pith reviewed 2026-05-13 07:37 UTC · model grok-4.3
The pith
Two-layer ReLU networks admit explicit integral representations that approximate polynomials with L2 errors controlled only by monomial coefficients and the data distribution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Any multivariate polynomial admits an explicit integral representation as a two-layer ReLU network; the sharpened form obtained by harmonic extension and projection produces L2(D) approximation errors whose only explicit dependence is on the coefficients of the monomial expansion and on the underlying distribution D, with no separate factors of dimension or degree appearing in the bound.
What carries the argument
Sharpened ReLU integral representation obtained by first taking the harmonic extension of the target function and then projecting onto the network class.
Load-bearing premise
The target functions possess a monomial expansion whose coefficients alone govern the size of the approximation error, and the measure D allows the harmonic extension and projection to produce bounds free of explicit dimension or degree factors.
What would settle it
A concrete multivariate polynomial and distribution D for which the L2(D) error of the sharpened ReLU integral representation exceeds the bound stated in terms of its monomial coefficients.
read the original abstract
An approach to construct explicit integral representations for two-layer ReLU networks is presented, which provides relatively simple representations for any multivariate polynomial. Quantitative bounds are provided for a particular, sharpened ReLU integral representation, which involves a harmonic extension and a projection. The bounds demonstrate that functions can be approximated with $L^{2}(\mathcal{D})$ errors that do not depend explicitly on dimension or degree, but rather the coefficients of their monomial expansions and the distribution $\mathcal{D}$. We also present a connection to the RKHS of the exponential kernel $K(x,y)=\exp\left(\left\langle x,y\right\rangle \right)$, and a very simple integral representation involving additionally multiplication via a fixed function which has better quantitative bounds.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops explicit integral representations for two-layer ReLU networks, with particularly simple forms for any multivariate polynomial. It derives quantitative L²(D) bounds for a sharpened ReLU kernel representation that incorporates a harmonic extension followed by a projection step; these bounds are controlled solely by the monomial coefficients of the target and the measure D, with no explicit dependence on dimension or degree. A connection to the RKHS of the exponential kernel K(x,y)=exp(⟨x,y⟩) is used to obtain an even simpler integral representation that additionally multiplies by a fixed function and yields improved constants.
Significance. If the stated bounds hold, the work supplies concrete, dimension-free approximation guarantees for polynomials by two-layer ReLU networks together with explicit integral constructions. The link to the exponential-kernel RKHS and the provision of quantitative constants in terms of monomial coefficients are useful contributions to the theory of neural-network approximation in high dimensions.
major comments (2)
- [§3, Theorem 3.2] §3, Theorem 3.2: the claim that the L²(D) error depends only on the monomial coefficients and D (with no explicit dimension factor) rests on the harmonic-extension operator being bounded independently of dimension; the proof must exhibit the precise norm bound for the chosen D and verify that it does not grow with d.
- [§4, Eq. (4.7)] §4, Eq. (4.7): the improved representation obtained via the exponential-kernel RKHS multiplies by a fixed function f; it is not clear whether f is independent of the target polynomial or whether its L²(D) norm introduces hidden dependence on degree or dimension that offsets the claimed constant improvement.
minor comments (2)
- [§2] The measure D is introduced in the abstract and §2 but its precise support and moment conditions are not restated before the main theorems; a short reminder paragraph would improve readability.
- [§3] Notation for the projection operator P_D is used before its definition; moving the definition to the beginning of §3 would eliminate forward references.
Simulated Author's Rebuttal
We thank the referee for the careful reading, the positive assessment of the contribution, and the recommendation for minor revision. We address each major comment below and have revised the manuscript accordingly to improve clarity and explicitness of the arguments.
read point-by-point responses
-
Referee: [§3, Theorem 3.2] §3, Theorem 3.2: the claim that the L²(D) error depends only on the monomial coefficients and D (with no explicit dimension factor) rests on the harmonic-extension operator being bounded independently of dimension; the proof must exhibit the precise norm bound for the chosen D and verify that it does not grow with d.
Authors: We agree that the dimension-free claim requires an explicit verification of the operator norm. For the distribution D (uniform on the unit ball), the harmonic extension operator H satisfies ||H g||_{L^2(D)} ≤ C ||g||_{L^2(∂B)} with C=1 independent of dimension d; this follows from the mean-value property of harmonic functions and the rotational invariance of D. We have revised the proof of Theorem 3.2 to insert this precise bound and the short verification that the constant does not depend on d, thereby making the absence of explicit dimension factors fully transparent. revision: yes
-
Referee: [§4, Eq. (4.7)] §4, Eq. (4.7): the improved representation obtained via the exponential-kernel RKHS multiplies by a fixed function f; it is not clear whether f is independent of the target polynomial or whether its L²(D) norm introduces hidden dependence on degree or dimension that offsets the claimed constant improvement.
Authors: The function f appearing in (4.7) is independent of the target polynomial; it is the fixed multiplier f(x) = exp(½||x||²) arising from the reproducing kernel of the exponential kernel RKHS. Its L²(D) norm is a numerical constant determined solely by D and is therefore independent of both degree and the monomial coefficients of the approximand. The quantitative improvement therefore stems entirely from the smaller RKHS norm of the target and is not offset by ||f||_{L^2(D)}. We have added a short remark after (4.7) stating the explicit form of f and confirming that ||f||_{L^2(D)} is a D-dependent constant only. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper constructs explicit integral representations for two-layer ReLU networks applied to multivariate polynomials via sharpened kernels that incorporate harmonic extension and projection operators. These steps are presented as direct derivations from the ReLU activation and the chosen measure D, with L2(D) error bounds controlled explicitly by monomial coefficients rather than by dimension or degree. The additional connection to the RKHS of the exponential kernel K(x,y)=exp(<x,y>) is introduced as a separate simplification yielding improved constants, without any reduction of the central claims to fitted parameters renamed as predictions or to self-citations that bear the load of uniqueness. No self-definitional loops, ansatz smuggling, or renaming of known results appear in the stated chain; the bounds are derived quantities whose independence from dimension follows from the norm-boundedness assumptions on the operators for the selected D. The overall argument remains internally consistent and externally falsifiable against the monomial expansion inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Harmonic extension and projection operators exist and are well-defined for the relevant function spaces on domain D.
Reference graph
Works this paper leans on
- [1]
-
[2]
24 A Optimal squaredL 2 error for random networks In this section we establish the minimal expected squaredL2(D)error of an IID average of random functions, whenf= R ϕ(·, z)µ(dz)for some measureµand the optimization is performed with respect to the choice of distribution ofz. The argument is a variation on a classical argument for identifying the optimal ...
work page 2012
-
[3]
26 Proof of Lemma 13.Lemma 35 applies withφ(t) =t m, wherem= 2k+i
d−3 2 dt, andP d,k is the restriction of the Legendre polynomial of degreekinddimensions on the unit sphere, Pd,k(t) =n!Γ d−1 2 [k/2]X i=0 (−1)i (1−t 2)itk−2i 4ii!(k−2i)!Γ(i+ d−1 2 ) . 26 Proof of Lemma 13.Lemma 35 applies withφ(t) =t m, wherem= 2k+i. Using Atkinson and Han [2012, Proposition 2.26], we compute λd,k,i = Sd−2 |Sd−1| Z 1 −1 φ(t)Pd,k(t)(1−t
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.