Quantitative Approximation Rates for Group Equivariant Learning
Pith reviewed 2026-05-15 20:06 UTC · model grok-4.3
The pith
Equivariant neural networks achieve identical quantitative approximation rates to standard ReLU networks for symmetric Holder functions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions. Thus, hard-coding equivariance does not result in a loss of expressivity or approximation power in these models for alpha-Holder continuous targets that obey the group symmetries.
What carries the argument
Bi-Lipschitz invariant or equivariant representations of the group action that reduce the equivariant approximation task to a standard MLP approximation task without incurring additional cost.
If this is right
- Permutation-invariant Deep Sets achieve the same approximation rates as standard MLPs for invariant functions.
- Permutation-equivariant Sumformer and Transformer architectures match MLP rates for equivariant targets.
- Frame-averaging networks preserve the standard MLP approximation rates for functions invariant under both permutations and rigid motions.
- General bi-Lipschitz invariant models exhibit no degradation in approximation speed from the invariance constraint.
Where Pith is reading between the lines
- Practitioners can enforce known symmetries via equivariant architectures to improve generalization while retaining the same approximation efficiency as unconstrained networks.
- The same reduction strategy may yield quantitative rates for additional continuous groups whenever suitable bi-Lipschitz representations exist.
- Empirical checks on high-dimensional symmetric data sets would indicate whether the derived rates are sharp in practice.
Load-bearing premise
The target functions are alpha-Holder continuous and the group actions admit bi-Lipschitz invariant or equivariant representations that the architectures can exploit without additional approximation cost.
What would settle it
A concrete alpha-Holder symmetric function for which the minimal network width or depth needed to reach a fixed approximation error is asymptotically larger for every equivariant architecture than for an ordinary ReLU MLP.
read the original abstract
The universal approximation theorem establishes that neural networks can approximate any continuous function on a compact set. Later works in approximation theory provide quantitative approximation rates for ReLU networks on the class of $\alpha$-H\"older functions $f: [0,1]^N \to \mathbb{R}$. The goal of this paper is to provide similar quantitative approximation results in the context of group equivariant learning, where the learned $\alpha$-H\"older function is known to obey certain group symmetries. While there has been much interest in the literature in understanding the universal approximation properties of equivariant models, very few quantitative approximation results are known for equivariant models. In this paper, we bridge this gap by deriving quantitative approximation rates for several prominent group-equivariant and invariant architectures. The architectures that we consider include: the permutation-invariant Deep Sets architecture; the permutation-equivariant Sumformer and Transformer architectures; joint invariance to permutations and rigid motions using invariant networks based on frame averaging; and general bi-Lipschitz invariant models. Overall, we show that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions. Thus, hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives quantitative approximation rates for α-Hölder continuous functions that are equivariant or invariant under group actions, for architectures including Deep Sets, Sumformer, Transformers, frame-averaging networks for permutations and rigid motions, and general bi-Lipschitz invariant models. It concludes that these equivariant models achieve the same approximation rates as standard ReLU MLPs of equal size when approximating equivariant targets.
Significance. If the derivations hold with uniform constants independent of the group discretization, this would establish that hard-coding equivariance incurs no approximation penalty, providing a strong theoretical basis for equivariant architectures in symmetric domains. The paper addresses a notable gap in quantitative results for equivariant models.
major comments (2)
- [frame averaging and bi-Lipschitz models] The central claim that equivariant architectures match MLP rates requires the invariant/equivariant feature map φ (e.g., frame averaging) to satisfy a bi-Lipschitz bound with constant independent of discretization parameters such as the number of frames. For rigid-motion groups the orbit diameter and frame count can inflate this constant, multiplying the Hölder prefactor and breaking uniformity with the MLP baseline whose rate depends only on α and input dimension.
- The reduction from standard Hölder approximation theorems to the equivariant setting is asserted but not accompanied by explicit constants or the full chain of inequalities; without these it is impossible to confirm that no group-dependent factors are absorbed into the O(·) notation.
minor comments (1)
- Notation for the group actions and the precise definition of the bi-Lipschitz constants should be stated once at the beginning of the technical sections to avoid ambiguity when comparing rates across architectures.
Simulated Author's Rebuttal
We thank the referee for the careful review and for highlighting the need to ensure uniformity of constants in our quantitative approximation results. We address each major comment below and will incorporate clarifications in the revised manuscript.
read point-by-point responses
-
Referee: [frame averaging and bi-Lipschitz models] The central claim that equivariant architectures match MLP rates requires the invariant/equivariant feature map φ (e.g., frame averaging) to satisfy a bi-Lipschitz bound with constant independent of discretization parameters such as the number of frames. For rigid-motion groups the orbit diameter and frame count can inflate this constant, multiplying the Hölder prefactor and breaking uniformity with the MLP baseline whose rate depends only on α and input dimension.
Authors: We agree that the bi-Lipschitz constant for frame-averaging maps must be tracked carefully. In the manuscript, for any fixed discretization of the group (including a finite number of frames for rigid motions), the bi-Lipschitz constant of φ is a fixed number that depends only on the group action and the chosen discretization, not on network width, depth, or the Hölder exponent α. The approximation rate is therefore identical to that of a standard ReLU MLP in its scaling with network size; the group-dependent factor is absorbed into the overall prefactor, which is permitted to depend on all fixed problem parameters (including the group and its discretization) just as the MLP constant depends on input dimension. We will revise the relevant sections to state this dependence explicitly and to note that the rates remain uniform with respect to network parameters for any fixed group discretization. revision: partial
-
Referee: [—] The reduction from standard Hölder approximation theorems to the equivariant setting is asserted but not accompanied by explicit constants or the full chain of inequalities; without these it is impossible to confirm that no group-dependent factors are absorbed into the O(·) notation.
Authors: We acknowledge that the proof sketch in the current version does not display the complete chain of inequalities with all constants written out. In the revision we will expand the argument to include the full sequence: (i) the bi-Lipschitz property of the invariant/equivariant feature map φ with explicit constant L_φ, (ii) the composition with the standard Hölder approximation theorem applied to the pulled-back function, and (iii) the final bound showing that the network-size dependence is exactly the same as the MLP case while all group-dependent factors appear only in the multiplicative constant outside the O(·) term. This will make clear that no hidden group-dependent factors enter the asymptotic rate. revision: yes
Circularity Check
No circularity: rates follow from standard approximation theory applied to equivariant maps
full rationale
The derivation applies classical quantitative ReLU approximation bounds (for alpha-Holder functions) to the listed equivariant architectures by composing them with bi-Lipschitz invariant/equivariant feature maps. No step redefines a quantity in terms of itself, renames a fitted parameter as a prediction, or reduces the central rate comparison to a self-citation chain. The bi-Lipschitz assumption is stated explicitly as an external hypothesis on the group actions and is not derived from the networks under study. All error bounds are obtained by standard triangle-inequality chaining of the MLP rate with the Lipschitz constant of the feature map; the resulting constants may depend on the group but are not forced by any internal fit or redefinition within the paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Target functions belong to the alpha-Holder class on a compact domain
- domain assumption Group actions are bi-Lipschitz or admit suitable invariant/equivariant frames
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We show that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions... quantitative approximation rates for Deep Sets, Sumformer, Transformer, frame averaging, and general bi-Lipschitz invariant models (Abstract, Table 1).
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Approximation rates of ∼(1/ϵ)^{N_G/2α} on quotient space V/G with covering number bounds (Proposition 1, Corollary 3).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.