Compositional Approximation Can Strictly Outperform Superpositional Approximation

Dennis Elbr\"achter; Philipp Petersen

arxiv: 2606.08727 · v1 · pith:R7BBCENHnew · submitted 2026-06-07 · 🧮 math.NA · cs.LG· cs.NA

Compositional Approximation Can Strictly Outperform Superpositional Approximation

Dennis Elbr\"achter , Philipp Petersen This is my paper

Pith reviewed 2026-06-27 17:47 UTC · model grok-4.3

classification 🧮 math.NA cs.LGcs.NA

keywords approximation theorysuperpositional approximationcompositional approximationneural networksfunction classesapproximation ratesbit string encodinguniform approximation error

0 comments

The pith

Certain function classes allow compositional approximations to achieve arbitrarily better rates than superpositional ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs explicit examples of function classes where superpositional approximation methods, which build approximants as linear combinations from a dictionary, achieve strictly worse rates than compositional methods such as those used in neural networks. The gap between these rates can be made arbitrarily large even when all methods are restricted to parameter encodings whose bit length grows proportionally with the number of parameters. This challenges the common view that superpositional methods are optimal for many classically studied function classes under such encoding constraints. A sympathetic reader would care because it identifies cases where the compositional structure itself is essential for reaching the best possible approximation rates.

Core claim

Many classically studied function classes are known to be approximated optimally by superpositional methods, but the authors construct explicit function classes exhibiting structural properties that limit superpositional approximation rates to be strictly lower than compositional approximation rates, with an arbitrarily large gap between them.

What carries the argument

The structural properties of specific function classes that constrain the uniform approximation error decay achievable by linear combinations from a dictionary while permitting faster decay under compositional constructions, all subject to proportional bit-string parameter encoding.

If this is right

Superpositional methods cannot achieve the highest-order polynomial decay rates for these function classes even when parameters satisfy the proportional bit-length condition.
Compositional constructions become necessary to reach the best possible uniform approximation rates for the identified classes.
The optimality results known for superpositional methods on many classical function classes do not extend to every function class meeting the encoding constraint.
Neural-network-style compositional approximants can deliver strictly superior performance on functions with the required structural properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The result points toward function classes with limited smoothness or hierarchical structure as candidates where the gap appears.
Similar gaps might exist between other approximation paradigms once the bit-length constraint is imposed.
The explicit constructions could be used to test whether particular neural network architectures saturate the compositional rate for these classes.

Load-bearing premise

There exist function classes whose structural properties force superpositional methods to have strictly lower approximation rates than compositional methods under the bit-length constraint.

What would settle it

A demonstration that the explicit examples constructed in the paper actually admit superpositional approximations whose error decays at the same polynomial rate as compositional ones would falsify the central claim.

read the original abstract

Many classically studied function classes are known to be approximated optimally by superpositional methods, i.e. with approximants constructed as the linear combination of elements in some dictionary. Here optimality means that the uniform approximation error viewed as a function of the number of parameters used has polynomial decay of the highest order achievable by any parametrized method whose parameters can be encoded as a bit string of length proportional, up to logarithmic factors, to the number of parameters. While compositional methods like neural networks are structurally different, their approximation rates can be made comparable by imposing constraints that ensure such a proportional bit string encoding. In this work we study function classes exhibiting structural properties that limit superpositional approximation rates to be strictly lower than compositional approximation rates. In particular, we construct explicit examples for which there is an arbitrarily large gap.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper constructs explicit function classes where compositional approximation rates can be made arbitrarily better than superpositional ones under bit-string encoding.

read the letter

The punchline is that this paper gives explicit examples of function classes where compositional approximation strictly outperforms superpositional approximation, with the performance gap made arbitrarily large.

The new part is the construction of these classes. Earlier results showed superpositional methods are optimal for many standard classes, but here the authors find classes whose structure forces superpositional rates to be worse while compositional ones do better. They deliver concrete instances rather than abstract existence arguments.

The work is done well in that it ties the separation directly to structural properties and then provides the examples. The approach seems straightforward and targets the question of when composition helps in approximation.

A minor soft spot is that the result depends on the precise definition of the parameter encoding as bit strings and on the specific function classes chosen to exhibit the gap. These choices are necessary for the separation, but they could make the examples feel tailored. If the constructions hold up under scrutiny, this is not a problem, but referees will want to see the details of the rate calculations.

The paper is for approximation theorists and people analyzing neural network approximation capabilities. A reader interested in strict separations between different approximation paradigms will find it relevant.

I recommend sending this to peer review. The claim is specific enough to be checked, and the topic is worth the referee time.

Referee Report

0 major / 1 minor

Summary. The paper claims that certain function classes with specific structural properties admit compositional approximation rates that are strictly superior to those achievable by superpositional methods (linear combinations from a dictionary under proportional bit-string parameter encoding), and constructs explicit examples where this performance gap can be made arbitrarily large.

Significance. If the explicit constructions and accompanying rate calculations are correct and free of encoding loopholes, the result would supply concrete counterexamples showing that superpositional optimality does not hold for all function classes and would strengthen the case for compositional structures in approximation theory. The emphasis on explicit, non-asymptotic examples is a methodological strength.

minor comments (1)

The abstract refers to 'bit string of length proportional, up to logarithmic factors, to the number of parameters'; a precise definition of the encoding model and the function classes should appear early in the introduction or preliminaries to allow immediate verification of the claimed gap.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their summary and for highlighting the potential significance of explicit constructions in this area. We address the referee's summary below and confirm that the manuscript contains no encoding loopholes.

read point-by-point responses

Referee: The paper claims that certain function classes with specific structural properties admit compositional approximation rates that are strictly superior to those achievable by superpositional methods (linear combinations from a dictionary under proportional bit-string parameter encoding), and constructs explicit examples where this performance gap can be made arbitrarily large.

Authors: The manuscript constructs explicit function classes with structural properties that enforce a strict separation: superpositional approximants (linear combinations from a dictionary with proportional bit-string encoding of parameters) are limited to lower rates, while compositional methods achieve strictly superior rates, with the gap made arbitrarily large by choice of parameters in the construction. All rate calculations are non-asymptotic and explicit, with the bit-string encoding constraint enforced uniformly to preclude loopholes. revision: no

Circularity Check

0 steps flagged

No significant circularity; direct constructions of explicit examples

full rationale

The paper's argument rests on constructing explicit function classes with structural properties that enforce strictly inferior superpositional approximation rates (under bit-string encoding) compared to compositional ones, with the gap made arbitrarily large. This is a direct existence proof via examples rather than any derivation chain, parameter fitting, or self-citation that reduces the result to its own inputs. No load-bearing steps match the enumerated circularity patterns; the claim is self-contained against external benchmarks in approximation theory.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no details on parameters, axioms, or entities are available.

pith-pipeline@v0.9.1-grok · 5661 in / 868 out tokens · 26568 ms · 2026-06-27T17:47:52.818693+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 5 canonical work pages

[1]

Kainen and Vĕra Kůrková , abstract =

Paul C. Kainen and Vĕra Kůrková , abstract =. Quasiorthogonal dimension of euclidean spaces , journal =. 1993 , issn =. doi:https://doi.org/10.1016/0893-9659(93)90023-G , url =

work page doi:10.1016/0893-9659(93)90023-g 1993
[2]

The History of Degenerate (Bipartite) Extremal Graph Problems

F \"u redi, Zolt \'a n and Simonovits, Mikl \'o s. The History of Degenerate (Bipartite) Extremal Graph Problems. Erd o s Centennial. 2013. doi:10.1007/978-3-642-39286-3_7

work page doi:10.1007/978-3-642-39286-3_7 2013
[3]

Deep Neural Network Approximation Theory , year=

Elbrächter, Dennis and Perekrestenko, Dmytro and Grohs, Philipp and Bölcskei, Helmut , journal=. Deep Neural Network Approximation Theory , year=
[4]

High-Dimensional Probability: An Introduction with Applications in Data Science , publisher=

Vershynin, Roman , year=. High-Dimensional Probability: An Introduction with Applications in Data Science , publisher=
[5]

DeVore, Ronald A. , year=. Nonlinear approximation , volume=. doi:10.1017/S0962492900002816 , journal=

work page doi:10.1017/s0962492900002816
[6]

and Vetterli, M

Donoho, D.L. and Vetterli, M. and DeVore, R.A. and Daubechies, I. , journal=. Data compression and harmonic analysis , year=
[7]

DeVore, B

DeVore, Ronald and Hanin, Boris and Petrova, Guergana , year=. Neural network approximation , volume=. doi:10.1017/S0962492921000052 , journal=

work page doi:10.1017/s0962492921000052
[8]

Error bounds for approximations with deep

Yarotsky, Dmitry , Date-Added =. Error bounds for approximations with deep. Neural Networks , Volume =
[9]

How degenerate is the parametrization of neural networks with the ReLU activation function? , url =

Elbr\". How degenerate is the parametrization of neural networks with the ReLU activation function? , url =. Advances in Neural Information Processing Systems , pages =
[10]

Topological properties of the set of functions generated by neural networks of fixed size

Petersen, \ Philipp Christian\ and Mones Raslan and Felix Voigtlaender. Topological properties of the set of functions generated by neural networks of fixed size. Foundations of Computational Mathematics. 2021. doi:10.1007/s10208-020-09461-0

work page doi:10.1007/s10208-020-09461-0 2021

[1] [1]

Kainen and Vĕra Kůrková , abstract =

Paul C. Kainen and Vĕra Kůrková , abstract =. Quasiorthogonal dimension of euclidean spaces , journal =. 1993 , issn =. doi:https://doi.org/10.1016/0893-9659(93)90023-G , url =

work page doi:10.1016/0893-9659(93)90023-g 1993

[2] [2]

The History of Degenerate (Bipartite) Extremal Graph Problems

F \"u redi, Zolt \'a n and Simonovits, Mikl \'o s. The History of Degenerate (Bipartite) Extremal Graph Problems. Erd o s Centennial. 2013. doi:10.1007/978-3-642-39286-3_7

work page doi:10.1007/978-3-642-39286-3_7 2013

[3] [3]

Deep Neural Network Approximation Theory , year=

Elbrächter, Dennis and Perekrestenko, Dmytro and Grohs, Philipp and Bölcskei, Helmut , journal=. Deep Neural Network Approximation Theory , year=

[4] [4]

High-Dimensional Probability: An Introduction with Applications in Data Science , publisher=

Vershynin, Roman , year=. High-Dimensional Probability: An Introduction with Applications in Data Science , publisher=

[5] [5]

DeVore, Ronald A. , year=. Nonlinear approximation , volume=. doi:10.1017/S0962492900002816 , journal=

work page doi:10.1017/s0962492900002816

[6] [6]

and Vetterli, M

Donoho, D.L. and Vetterli, M. and DeVore, R.A. and Daubechies, I. , journal=. Data compression and harmonic analysis , year=

[7] [7]

DeVore, B

DeVore, Ronald and Hanin, Boris and Petrova, Guergana , year=. Neural network approximation , volume=. doi:10.1017/S0962492921000052 , journal=

work page doi:10.1017/s0962492921000052

[8] [8]

Error bounds for approximations with deep

Yarotsky, Dmitry , Date-Added =. Error bounds for approximations with deep. Neural Networks , Volume =

[9] [9]

How degenerate is the parametrization of neural networks with the ReLU activation function? , url =

Elbr\". How degenerate is the parametrization of neural networks with the ReLU activation function? , url =. Advances in Neural Information Processing Systems , pages =

[10] [10]

Topological properties of the set of functions generated by neural networks of fixed size

Petersen, \ Philipp Christian\ and Mones Raslan and Felix Voigtlaender. Topological properties of the set of functions generated by neural networks of fixed size. Foundations of Computational Mathematics. 2021. doi:10.1007/s10208-020-09461-0

work page doi:10.1007/s10208-020-09461-0 2021