Density of Neural Network Classes on Compact Subsets of Topological Vector Spaces

Arash Ghorbanalizadeh; Mohammad Javad Baghbanbashi

arxiv: 2605.22482 · v1 · pith:ZYTYDO65new · submitted 2026-05-21 · 🧮 math.FA

Density of Neural Network Classes on Compact Subsets of Topological Vector Spaces

Mohammad Javad Baghbanbashi , Arash Ghorbanalizadeh This is my paper

Pith reviewed 2026-05-22 01:46 UTC · model grok-4.3

classification 🧮 math.FA

keywords neural networksdensityuniversal approximationtopological vector spacescontinuous functionscompact setssquashing functionsRadon measures

0 comments

The pith

Neural network classes using dual functionals are dense in continuous functions on compact sets of topological vector spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that neural network classes formed from a continuous squashing function applied to affine combinations drawn from the continuous dual are dense in the continuous functions on any compact subset. This extends classical approximation results to general topological vector spaces rather than restricting to Euclidean domains. The separation property of the dual ensures that the linear parts can distinguish points sufficiently to achieve uniform approximation. If correct, the same class becomes dense in the corresponding L^p spaces for any Radon probability measure supported on the compact set.

Core claim

The class Σ_X(Ψ) of all finite sums ∑ ω_j Ψ(f_j(x) + b_j), where f_j belongs to the continuous dual X*, is dense in C(K) with respect to the uniform norm whenever K is compact in X and X* separates points on X. As a direct consequence the same class is dense in L^p(K, μ) for every Radon probability measure μ and every 1 ≤ p < ∞.

What carries the argument

The class Σ_X(Ψ), consisting of finite sums of the squashing function Ψ applied to affine functionals built from elements of the continuous dual X*.

If this is right

Uniform approximation holds for every compact K in any topological vector space whose dual separates points.
The approximation property transfers to L^p spaces with respect to every Radon probability measure on K.
The result applies whenever the squashing function Ψ is continuous.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The construction may be specialized to common infinite-dimensional spaces such as Banach or Fréchet spaces used in functional analysis.
Finite-dimensional truncations of the dual functionals could be tested numerically to observe convergence rates toward the infinite-dimensional case.

Load-bearing premise

The continuous dual of the topological vector space separates points.

What would settle it

A concrete continuous function on some compact K in a space whose dual separates points that cannot be approximated uniformly to within any prescribed epsilon by any element of Σ_X(Ψ).

read the original abstract

We prove density results for neural-network classes on compact sets \(K\subset X\), where \(X\) is a topological vector space whose continuous dual \(X^*\) separates points. Let \(\Psi:\mathbb R\to\mathbb R\) be a continuous squashing function. We show that the class \[ \Sigma_X(\Psi) = \left\{ \sum_{j=1}^{N}\omega_j\Psi(f_j(x)+b_j): N\in\mathbb N,\ \omega_j,b_j\in\mathbb R,\ f_j\in X^* \right\} \] is dense in \(C(K)\) with respect to the uniform norm. As a consequence, if \(\mu\) is a Radon probability measure supported on \(K\), then \(\Sigma_X(\Psi)\) is dense in \(L^p(K,\mu)\) for every \(1\le p<\infty\).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper cleanly extends neural net density results to general topological vector spaces using only the separating dual assumption.

read the letter

The main point is that the paper proves the class of networks built from scaled and shifted squashing functions applied to continuous linear functionals is dense in C(K) for compact K inside a topological vector space whose dual separates points, with the usual L^p consequence for Radon measures on K. That is the core claim extracted from the abstract and stress-test note. It is new in moving past the Banach or Hilbert settings that dominate most prior work on this topic. The paper does well by isolating the minimal condition that makes the weak topology Hausdorff and therefore homeomorphic to the original topology on any compact set, which lets the standard approximation argument with a continuous squashing map carry over without extra structure like local convexity. The construction stays direct and parameter-free. The soft spots are limited. The abstract itself gives no derivation steps or explicit bounds, so the full manuscript has to supply those details without gaps, but the logical chain described in the stress-test holds up once the topology coincidence is noted. No circularity or invented entities appear. This is for readers in functional analysis or approximation theory who want to see universal approximation statements in more general infinite-dimensional settings. A specialist following extensions of these results will get concrete value from the minimal hypotheses. It deserves serious peer review because the statement is precise and the setting is natural for the field.

Referee Report

1 major / 0 minor

Summary. The manuscript proves density results for neural-network classes on compact sets K subset X, where X is a topological vector space whose continuous dual X* separates points. Let Ψ: R to R be a continuous squashing function. It shows that the class Σ_X(Ψ) = { sum_{j=1}^N ω_j Ψ(f_j(x) + b_j) : N in N, ω_j, b_j in R, f_j in X* } is dense in C(K) with respect to the uniform norm. As a consequence, if μ is a Radon probability measure supported on K, then Σ_X(Ψ) is dense in L^p(K, μ) for every 1 ≤ p < ∞.

Significance. If the central derivation holds, the result generalizes classical neural-network approximation theorems from finite-dimensional Euclidean spaces to general topological vector spaces under the separation assumption on X*. This is of interest in functional analysis and infinite-dimensional approximation theory. The direct existence proof with no fitted parameters or self-referential definitions is a strength, as is the clean reduction to the L^p case via standard measure-theoretic arguments.

major comments (1)

§3, proof of Theorem 3.2: the argument that the algebra generated by {f|K : f in X*} is dense in C(K) via Stone-Weierstrass requires an explicit verification that this algebra is closed under pointwise multiplication and separates points on K; while the separation of points follows from the hypothesis on X*, the multiplication closure step is only sketched and should be written out to confirm it does not rely on additional structure of X.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive comment, which helps improve the clarity of the proof. We address the point below and have incorporated the suggested expansion into the revised version.

read point-by-point responses

Referee: [—] §3, proof of Theorem 3.2: the argument that the algebra generated by {f|K : f in X*} is dense in C(K) via Stone-Weierstrass requires an explicit verification that this algebra is closed under pointwise multiplication and separates points on K; while the separation of points follows from the hypothesis on X*, the multiplication closure step is only sketched and should be written out to confirm it does not rely on additional structure of X.

Authors: We agree with the referee that an explicit verification strengthens the argument. In the revised manuscript, we have expanded the relevant paragraph in the proof of Theorem 3.2 to state explicitly that the set A of all finite sums of finite products of elements from {f|K : f ∈ X*} forms a subalgebra of C(K): it is closed under pointwise addition and scalar multiplication by definition, and closed under pointwise multiplication because the product of two such polynomials is again a finite sum of products of the generating functionals. The algebra contains the constant functions (via the zero functional or constant multiples) and separates points on K because X* separates points on X and hence on K. This verification uses only the algebraic operations on the restrictions and the definition of the algebra generated by a set; it does not invoke any further topological or linear structure on X beyond the given hypotheses. revision: yes

Circularity Check

0 steps flagged

No circularity in direct density proof

full rationale

The paper establishes density of Σ_X(Ψ) in C(K) via a direct existence argument under the explicit hypothesis that X* separates points on X. This separation ensures the relevant topologies coincide on compact K, enabling standard approximation via the algebra generated by the functionals or explicit constructions with the continuous squashing map Ψ. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the logical chain is self-contained against external benchmarks such as Stone-Weierstrass and basic topological vector space facts.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on two standard domain assumptions from functional analysis plus the definition of a squashing function; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption The continuous dual X* separates points on X
Invoked in the abstract as the key hypothesis on the topological vector space X that enables the density.
domain assumption Ψ is a continuous squashing function
Stated as the activation function; the precise definition of squashing is assumed known from prior literature.

pith-pipeline@v0.9.0 · 5682 in / 1246 out tokens · 56350 ms · 2026-05-22T01:46:56.523049+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove density results for neural-network classes on compact sets K⊂X, where X is a topological vector space whose continuous dual X* separates points. ... ΣX(Ψ) is dense in C(K) with respect to the uniform norm.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the algebra generated by the restrictions of elements of X* to K ... By Theorem 2.4, A(X*) is dense in C(K)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

V . I. Bogachev,Measure Theory, V ols. I–II, Springer, Berlin–Heidelberg, 2007

work page 2007
[2]

Capel and J

´A. Capel and J. Oc ´ariz, Approximation with neural networks in variable Lebesgue spaces, arXiv:2007.04166, 2020

work page arXiv 2007
[3]

Cybenko, Approximation by superpositions of a sigmoidal function,Mathematics of Control, Signals and Systems2(1989), 303–314

G. Cybenko, Approximation by superpositions of a sigmoidal function,Mathematics of Control, Signals and Systems2(1989), 303–314

work page 1989
[4]

G. B. Folland,Real Analysis: Modern Techniques and Their Applications, 2nd ed., Wiley, New York, 1999

work page 1999
[5]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks2(1989), no. 5, 359–366

work page 1989
[6]

Hornik, Approximation capabilities of multilayer feedforward networks,Neural Networks4(1991), no

K. Hornik, Approximation capabilities of multilayer feedforward networks,Neural Networks4(1991), no. 2, 251–257

work page 1991
[7]

V . E. Ismailov, Universal approximation theorem for neural networks with inputs from a topological vector space,Information Processing Letters193(2026), Article 106623

work page 2026
[8]

Izuki, T

M. Izuki, T. Noi, Y . Sawano, and H. Tanaka, Some density theorems in neural network with variable exponent,Mediterranean Journal of Mathematics22(2025), Article 180

work page 2025
[9]

Leshno, V

M. Leshno, V . Ya. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,Neural Networks6(1993), no. 6, 861–867

work page 1993
[10]

Park and I

J. Park and I. W. Sandberg, Approximation and radial-basis-function networks,Neural Computation5 (1993), no. 2, 305–316

work page 1993
[11]

Pinkus, Approximation theory of the MLP model in neural networks,Acta Numerica8(1999), 143–195

A. Pinkus, Approximation theory of the MLP model in neural networks,Acta Numerica8(1999), 143–195

work page 1999
[12]

Saini, A universal approximation theorem for neural networks with outputs in locally convex spaces, arXiv:2603.07242, 2026

S. Saini, A universal approximation theorem for neural networks with outputs in locally convex spaces, arXiv:2603.07242, 2026

work page arXiv 2026
[13]

J. R. Munkres,Topology, 2nd ed., Prentice Hall, Upper Saddle River, NJ, 2000

work page 2000
[14]

Rudin,Principles of Mathematical Analysis, 3rd ed., McGraw–Hill, New York, 1976

W. Rudin,Principles of Mathematical Analysis, 3rd ed., McGraw–Hill, New York, 1976

work page 1976
[15]

Rudin,Functional Analysis, 2nd ed., McGraw–Hill, New York, 1991

W. Rudin,Functional Analysis, 2nd ed., McGraw–Hill, New York, 1991. MOHAMMAD JAVADBAGHBANBASHI, DEPARTMENT OFMATHEMATICS, INSTITUTE FORADVANCED STUDIES INBASICSCIENCES(IASBS), ZANJAN, 45137-66731, IRAN Email address:baghban.mj@iasbs.ac.ir ARASHGHORBANALIZADEH, DEPARTMENT OFMATHEMATICS, INSTITUTE FORADVANCEDSTUDIES INBASICSCIENCES(IASBS), ZANJAN, 45137-667...

work page 1991

[1] [1]

V . I. Bogachev,Measure Theory, V ols. I–II, Springer, Berlin–Heidelberg, 2007

work page 2007

[2] [2]

Capel and J

´A. Capel and J. Oc ´ariz, Approximation with neural networks in variable Lebesgue spaces, arXiv:2007.04166, 2020

work page arXiv 2007

[3] [3]

Cybenko, Approximation by superpositions of a sigmoidal function,Mathematics of Control, Signals and Systems2(1989), 303–314

G. Cybenko, Approximation by superpositions of a sigmoidal function,Mathematics of Control, Signals and Systems2(1989), 303–314

work page 1989

[4] [4]

G. B. Folland,Real Analysis: Modern Techniques and Their Applications, 2nd ed., Wiley, New York, 1999

work page 1999

[5] [5]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks2(1989), no. 5, 359–366

work page 1989

[6] [6]

Hornik, Approximation capabilities of multilayer feedforward networks,Neural Networks4(1991), no

K. Hornik, Approximation capabilities of multilayer feedforward networks,Neural Networks4(1991), no. 2, 251–257

work page 1991

[7] [7]

V . E. Ismailov, Universal approximation theorem for neural networks with inputs from a topological vector space,Information Processing Letters193(2026), Article 106623

work page 2026

[8] [8]

Izuki, T

M. Izuki, T. Noi, Y . Sawano, and H. Tanaka, Some density theorems in neural network with variable exponent,Mediterranean Journal of Mathematics22(2025), Article 180

work page 2025

[9] [9]

Leshno, V

M. Leshno, V . Ya. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,Neural Networks6(1993), no. 6, 861–867

work page 1993

[10] [10]

Park and I

J. Park and I. W. Sandberg, Approximation and radial-basis-function networks,Neural Computation5 (1993), no. 2, 305–316

work page 1993

[11] [11]

Pinkus, Approximation theory of the MLP model in neural networks,Acta Numerica8(1999), 143–195

A. Pinkus, Approximation theory of the MLP model in neural networks,Acta Numerica8(1999), 143–195

work page 1999

[12] [12]

Saini, A universal approximation theorem for neural networks with outputs in locally convex spaces, arXiv:2603.07242, 2026

S. Saini, A universal approximation theorem for neural networks with outputs in locally convex spaces, arXiv:2603.07242, 2026

work page arXiv 2026

[13] [13]

J. R. Munkres,Topology, 2nd ed., Prentice Hall, Upper Saddle River, NJ, 2000

work page 2000

[14] [14]

Rudin,Principles of Mathematical Analysis, 3rd ed., McGraw–Hill, New York, 1976

W. Rudin,Principles of Mathematical Analysis, 3rd ed., McGraw–Hill, New York, 1976

work page 1976

[15] [15]

Rudin,Functional Analysis, 2nd ed., McGraw–Hill, New York, 1991

W. Rudin,Functional Analysis, 2nd ed., McGraw–Hill, New York, 1991. MOHAMMAD JAVADBAGHBANBASHI, DEPARTMENT OFMATHEMATICS, INSTITUTE FORADVANCED STUDIES INBASICSCIENCES(IASBS), ZANJAN, 45137-66731, IRAN Email address:baghban.mj@iasbs.ac.ir ARASHGHORBANALIZADEH, DEPARTMENT OFMATHEMATICS, INSTITUTE FORADVANCEDSTUDIES INBASICSCIENCES(IASBS), ZANJAN, 45137-667...

work page 1991