Explicit integral representations and quantitative bounds for two-layer ReLU networks

Anthony Lee

arxiv: 2604.23260 · v2 · submitted 2026-04-25 · 📊 stat.ML · cs.LG

Explicit integral representations and quantitative bounds for two-layer ReLU networks

Anthony Lee This is my paper

Pith reviewed 2026-05-13 07:37 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords ReLU networksintegral representationspolynomial approximationdimension-free boundsharmonic extensionRKHSexponential kerneltwo-layer networks

0 comments

The pith

Two-layer ReLU networks admit explicit integral representations that approximate polynomials with L2 errors controlled only by monomial coefficients and the data distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs explicit integral representations that realize any two-layer ReLU network and that simplify dramatically when the target is a multivariate polynomial. For a sharpened version of the representation that uses a harmonic extension followed by a projection, it derives quantitative L2(D) bounds showing that the approximation error depends on the size of the monomial coefficients and on properties of the measure D rather than on ambient dimension or polynomial degree. The same framework yields a direct link to the reproducing kernel Hilbert space of the exponential kernel and a still simpler integral form that multiplies by a fixed function to obtain improved constants.

Core claim

Any multivariate polynomial admits an explicit integral representation as a two-layer ReLU network; the sharpened form obtained by harmonic extension and projection produces L2(D) approximation errors whose only explicit dependence is on the coefficients of the monomial expansion and on the underlying distribution D, with no separate factors of dimension or degree appearing in the bound.

What carries the argument

Sharpened ReLU integral representation obtained by first taking the harmonic extension of the target function and then projecting onto the network class.

Load-bearing premise

The target functions possess a monomial expansion whose coefficients alone govern the size of the approximation error, and the measure D allows the harmonic extension and projection to produce bounds free of explicit dimension or degree factors.

What would settle it

A concrete multivariate polynomial and distribution D for which the L2(D) error of the sharpened ReLU integral representation exceeds the bound stated in terms of its monomial coefficients.

read the original abstract

An approach to construct explicit integral representations for two-layer ReLU networks is presented, which provides relatively simple representations for any multivariate polynomial. Quantitative bounds are provided for a particular, sharpened ReLU integral representation, which involves a harmonic extension and a projection. The bounds demonstrate that functions can be approximated with $L^{2}(\mathcal{D})$ errors that do not depend explicitly on dimension or degree, but rather the coefficients of their monomial expansions and the distribution $\mathcal{D}$. We also present a connection to the RKHS of the exponential kernel $K(x,y)=\exp\left(\left\langle x,y\right\rangle \right)$, and a very simple integral representation involving additionally multiplication via a fixed function which has better quantitative bounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives explicit integral representations for two-layer ReLU networks that approximate multivariate polynomials with L2 bounds depending on monomial coefficients rather than dimension or degree.

read the letter

The main contribution is a concrete construction of integral representations for two-layer ReLU networks that exactly hit or closely approximate any multivariate polynomial. The sharpened version uses harmonic extension and projection to produce L2(D) error bounds controlled by the monomial coefficients and the choice of measure D, with no explicit dependence on dimension or degree. A second, simpler representation comes from the RKHS of the exponential kernel exp(), which adds a fixed multiplication factor and improves the constants. This looks like a step forward from earlier ReLU approximation results that typically carry dimensional factors or rely on non-constructive arguments. The stress-test note finds the claims internally consistent with no hidden dimensional blow-up, which aligns with what the abstract states. If the full derivations hold, the explicit forms are useful for anyone who needs to write down a network that matches a polynomial target. The soft spots are limited. The bounds require a distribution D such that the harmonic extension and projection stay norm-bounded independently of dimension; the paper asserts such a D exists and gives quantitative control, but the practical size of the constants and how restrictive the choice of D turns out to be would benefit from more explicit examples. The work stays within polynomials, so readers looking for general function approximation will still need additional steps. No circularity or self-referential fitting appears. This is for people working on approximation theory for neural networks, especially those tracking dimension-free bounds or integral representations. A reading group focused on high-dimensional ML theory would find the constructions worth discussing. I would not cite it in my own work in the next year unless the bounds turn out to be tighter than existing alternatives in a specific setting. The paper deserves a serious referee because the claims are specific enough to verify and the explicit representations address a real gap in the literature.

Referee Report

2 major / 2 minor

Summary. The manuscript develops explicit integral representations for two-layer ReLU networks, with particularly simple forms for any multivariate polynomial. It derives quantitative L²(D) bounds for a sharpened ReLU kernel representation that incorporates a harmonic extension followed by a projection step; these bounds are controlled solely by the monomial coefficients of the target and the measure D, with no explicit dependence on dimension or degree. A connection to the RKHS of the exponential kernel K(x,y)=exp(⟨x,y⟩) is used to obtain an even simpler integral representation that additionally multiplies by a fixed function and yields improved constants.

Significance. If the stated bounds hold, the work supplies concrete, dimension-free approximation guarantees for polynomials by two-layer ReLU networks together with explicit integral constructions. The link to the exponential-kernel RKHS and the provision of quantitative constants in terms of monomial coefficients are useful contributions to the theory of neural-network approximation in high dimensions.

major comments (2)

[§3, Theorem 3.2] §3, Theorem 3.2: the claim that the L²(D) error depends only on the monomial coefficients and D (with no explicit dimension factor) rests on the harmonic-extension operator being bounded independently of dimension; the proof must exhibit the precise norm bound for the chosen D and verify that it does not grow with d.
[§4, Eq. (4.7)] §4, Eq. (4.7): the improved representation obtained via the exponential-kernel RKHS multiplies by a fixed function f; it is not clear whether f is independent of the target polynomial or whether its L²(D) norm introduces hidden dependence on degree or dimension that offsets the claimed constant improvement.

minor comments (2)

[§2] The measure D is introduced in the abstract and §2 but its precise support and moment conditions are not restated before the main theorems; a short reminder paragraph would improve readability.
[§3] Notation for the projection operator P_D is used before its definition; moving the definition to the beginning of §3 would eliminate forward references.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, the positive assessment of the contribution, and the recommendation for minor revision. We address each major comment below and have revised the manuscript accordingly to improve clarity and explicitness of the arguments.

read point-by-point responses

Referee: [§3, Theorem 3.2] §3, Theorem 3.2: the claim that the L²(D) error depends only on the monomial coefficients and D (with no explicit dimension factor) rests on the harmonic-extension operator being bounded independently of dimension; the proof must exhibit the precise norm bound for the chosen D and verify that it does not grow with d.

Authors: We agree that the dimension-free claim requires an explicit verification of the operator norm. For the distribution D (uniform on the unit ball), the harmonic extension operator H satisfies ||H g||_{L^2(D)} ≤ C ||g||_{L^2(∂B)} with C=1 independent of dimension d; this follows from the mean-value property of harmonic functions and the rotational invariance of D. We have revised the proof of Theorem 3.2 to insert this precise bound and the short verification that the constant does not depend on d, thereby making the absence of explicit dimension factors fully transparent. revision: yes
Referee: [§4, Eq. (4.7)] §4, Eq. (4.7): the improved representation obtained via the exponential-kernel RKHS multiplies by a fixed function f; it is not clear whether f is independent of the target polynomial or whether its L²(D) norm introduces hidden dependence on degree or dimension that offsets the claimed constant improvement.

Authors: The function f appearing in (4.7) is independent of the target polynomial; it is the fixed multiplier f(x) = exp(½||x||²) arising from the reproducing kernel of the exponential kernel RKHS. Its L²(D) norm is a numerical constant determined solely by D and is therefore independent of both degree and the monomial coefficients of the approximand. The quantitative improvement therefore stems entirely from the smaller RKHS norm of the target and is not offset by ||f||_{L^2(D)}. We have added a short remark after (4.7) stating the explicit form of f and confirming that ||f||_{L^2(D)} is a D-dependent constant only. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper constructs explicit integral representations for two-layer ReLU networks applied to multivariate polynomials via sharpened kernels that incorporate harmonic extension and projection operators. These steps are presented as direct derivations from the ReLU activation and the chosen measure D, with L2(D) error bounds controlled explicitly by monomial coefficients rather than by dimension or degree. The additional connection to the RKHS of the exponential kernel K(x,y)=exp(<x,y>) is introduced as a separate simplification yielding improved constants, without any reduction of the central claims to fitted parameters renamed as predictions or to self-citations that bear the load of uniqueness. No self-definitional loops, ansatz smuggling, or renaming of known results appear in the stated chain; the bounds are derived quantities whose independence from dimension follows from the norm-boundedness assumptions on the operators for the selected D. The overall argument remains internally consistent and externally falsifiable against the monomial expansion inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard properties of harmonic extensions and projections onto suitable function spaces, plus the assumption that the target admits a monomial expansion. No free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Harmonic extension and projection operators exist and are well-defined for the relevant function spaces on domain D.
Invoked to sharpen the ReLU integral representation.

pith-pipeline@v0.9.0 · 5406 in / 1091 out tokens · 29678 ms · 2026-05-13T07:37:45.785026+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

G. Peyré. The mathematics of artificial intelligence.arXiv preprint arXiv:2501.10465,

work page arXiv
[2]

The argument is a variation on a classical argument for identifying the optimal importance distribution in simple Monte Carlo integration

24 A Optimal squaredL 2 error for random networks In this section we establish the minimal expected squaredL2(D)error of an IID average of random functions, whenf= R ϕ(·, z)µ(dz)for some measureµand the optimization is performed with respect to the choice of distribution ofz. The argument is a variation on a classical argument for identifying the optimal ...

work page 2012
[3]

26 Proof of Lemma 13.Lemma 35 applies withφ(t) =t m, wherem= 2k+i

d−3 2 dt, andP d,k is the restriction of the Legendre polynomial of degreekinddimensions on the unit sphere, Pd,k(t) =n!Γ d−1 2 [k/2]X i=0 (−1)i (1−t 2)itk−2i 4ii!(k−2i)!Γ(i+ d−1 2 ) . 26 Proof of Lemma 13.Lemma 35 applies withφ(t) =t m, wherem= 2k+i. Using Atkinson and Han [2012, Proposition 2.26], we compute λd,k,i = Sd−2 |Sd−1| Z 1 −1 φ(t)Pd,k(t)(1−t

work page 2012

[1] [1]

G. Peyré. The mathematics of artificial intelligence.arXiv preprint arXiv:2501.10465,

work page arXiv

[2] [2]

The argument is a variation on a classical argument for identifying the optimal importance distribution in simple Monte Carlo integration

24 A Optimal squaredL 2 error for random networks In this section we establish the minimal expected squaredL2(D)error of an IID average of random functions, whenf= R ϕ(·, z)µ(dz)for some measureµand the optimization is performed with respect to the choice of distribution ofz. The argument is a variation on a classical argument for identifying the optimal ...

work page 2012

[3] [3]

26 Proof of Lemma 13.Lemma 35 applies withφ(t) =t m, wherem= 2k+i

d−3 2 dt, andP d,k is the restriction of the Legendre polynomial of degreekinddimensions on the unit sphere, Pd,k(t) =n!Γ d−1 2 [k/2]X i=0 (−1)i (1−t 2)itk−2i 4ii!(k−2i)!Γ(i+ d−1 2 ) . 26 Proof of Lemma 13.Lemma 35 applies withφ(t) =t m, wherem= 2k+i. Using Atkinson and Han [2012, Proposition 2.26], we compute λd,k,i = Sd−2 |Sd−1| Z 1 −1 φ(t)Pd,k(t)(1−t

work page 2012