pith. sign in

arxiv: 2602.20370 · v2 · submitted 2026-02-23 · 💻 cs.LG · cs.NA· math.NA

Quantitative Approximation Rates for Group Equivariant Learning

Pith reviewed 2026-05-15 20:06 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA
keywords group equivariant learningquantitative approximation ratesReLU networksuniversal approximationpermutation equivarianceframe averagingDeep SetsTransformers
0
0 comments X

The pith

Equivariant neural networks achieve identical quantitative approximation rates to standard ReLU networks for symmetric Holder functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives quantitative approximation rates for several group-equivariant and invariant neural network architectures on the class of alpha-Holder continuous functions. It establishes that these rates match those obtained by equally sized ordinary ReLU multilayer perceptrons when the target functions respect the relevant group symmetries. The architectures examined include permutation-invariant Deep Sets, permutation-equivariant Sumformers and Transformers, frame-averaging networks for joint permutation and rigid-motion invariance, and general bi-Lipschitz invariant models. The central result is that embedding the symmetry constraint directly into the model does not reduce expressivity or slow the rate of approximation relative to unconstrained networks.

Core claim

Equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions. Thus, hard-coding equivariance does not result in a loss of expressivity or approximation power in these models for alpha-Holder continuous targets that obey the group symmetries.

What carries the argument

Bi-Lipschitz invariant or equivariant representations of the group action that reduce the equivariant approximation task to a standard MLP approximation task without incurring additional cost.

If this is right

  • Permutation-invariant Deep Sets achieve the same approximation rates as standard MLPs for invariant functions.
  • Permutation-equivariant Sumformer and Transformer architectures match MLP rates for equivariant targets.
  • Frame-averaging networks preserve the standard MLP approximation rates for functions invariant under both permutations and rigid motions.
  • General bi-Lipschitz invariant models exhibit no degradation in approximation speed from the invariance constraint.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners can enforce known symmetries via equivariant architectures to improve generalization while retaining the same approximation efficiency as unconstrained networks.
  • The same reduction strategy may yield quantitative rates for additional continuous groups whenever suitable bi-Lipschitz representations exist.
  • Empirical checks on high-dimensional symmetric data sets would indicate whether the derived rates are sharp in practice.

Load-bearing premise

The target functions are alpha-Holder continuous and the group actions admit bi-Lipschitz invariant or equivariant representations that the architectures can exploit without additional approximation cost.

What would settle it

A concrete alpha-Holder symmetric function for which the minimal network width or depth needed to reach a fixed approximation error is asymptotically larger for every equivariant architecture than for an ordinary ReLU MLP.

read the original abstract

The universal approximation theorem establishes that neural networks can approximate any continuous function on a compact set. Later works in approximation theory provide quantitative approximation rates for ReLU networks on the class of $\alpha$-H\"older functions $f: [0,1]^N \to \mathbb{R}$. The goal of this paper is to provide similar quantitative approximation results in the context of group equivariant learning, where the learned $\alpha$-H\"older function is known to obey certain group symmetries. While there has been much interest in the literature in understanding the universal approximation properties of equivariant models, very few quantitative approximation results are known for equivariant models. In this paper, we bridge this gap by deriving quantitative approximation rates for several prominent group-equivariant and invariant architectures. The architectures that we consider include: the permutation-invariant Deep Sets architecture; the permutation-equivariant Sumformer and Transformer architectures; joint invariance to permutations and rigid motions using invariant networks based on frame averaging; and general bi-Lipschitz invariant models. Overall, we show that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions. Thus, hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript derives quantitative approximation rates for α-Hölder continuous functions that are equivariant or invariant under group actions, for architectures including Deep Sets, Sumformer, Transformers, frame-averaging networks for permutations and rigid motions, and general bi-Lipschitz invariant models. It concludes that these equivariant models achieve the same approximation rates as standard ReLU MLPs of equal size when approximating equivariant targets.

Significance. If the derivations hold with uniform constants independent of the group discretization, this would establish that hard-coding equivariance incurs no approximation penalty, providing a strong theoretical basis for equivariant architectures in symmetric domains. The paper addresses a notable gap in quantitative results for equivariant models.

major comments (2)
  1. [frame averaging and bi-Lipschitz models] The central claim that equivariant architectures match MLP rates requires the invariant/equivariant feature map φ (e.g., frame averaging) to satisfy a bi-Lipschitz bound with constant independent of discretization parameters such as the number of frames. For rigid-motion groups the orbit diameter and frame count can inflate this constant, multiplying the Hölder prefactor and breaking uniformity with the MLP baseline whose rate depends only on α and input dimension.
  2. The reduction from standard Hölder approximation theorems to the equivariant setting is asserted but not accompanied by explicit constants or the full chain of inequalities; without these it is impossible to confirm that no group-dependent factors are absorbed into the O(·) notation.
minor comments (1)
  1. Notation for the group actions and the precise definition of the bi-Lipschitz constants should be stated once at the beginning of the technical sections to avoid ambiguity when comparing rates across architectures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for highlighting the need to ensure uniformity of constants in our quantitative approximation results. We address each major comment below and will incorporate clarifications in the revised manuscript.

read point-by-point responses
  1. Referee: [frame averaging and bi-Lipschitz models] The central claim that equivariant architectures match MLP rates requires the invariant/equivariant feature map φ (e.g., frame averaging) to satisfy a bi-Lipschitz bound with constant independent of discretization parameters such as the number of frames. For rigid-motion groups the orbit diameter and frame count can inflate this constant, multiplying the Hölder prefactor and breaking uniformity with the MLP baseline whose rate depends only on α and input dimension.

    Authors: We agree that the bi-Lipschitz constant for frame-averaging maps must be tracked carefully. In the manuscript, for any fixed discretization of the group (including a finite number of frames for rigid motions), the bi-Lipschitz constant of φ is a fixed number that depends only on the group action and the chosen discretization, not on network width, depth, or the Hölder exponent α. The approximation rate is therefore identical to that of a standard ReLU MLP in its scaling with network size; the group-dependent factor is absorbed into the overall prefactor, which is permitted to depend on all fixed problem parameters (including the group and its discretization) just as the MLP constant depends on input dimension. We will revise the relevant sections to state this dependence explicitly and to note that the rates remain uniform with respect to network parameters for any fixed group discretization. revision: partial

  2. Referee: [—] The reduction from standard Hölder approximation theorems to the equivariant setting is asserted but not accompanied by explicit constants or the full chain of inequalities; without these it is impossible to confirm that no group-dependent factors are absorbed into the O(·) notation.

    Authors: We acknowledge that the proof sketch in the current version does not display the complete chain of inequalities with all constants written out. In the revision we will expand the argument to include the full sequence: (i) the bi-Lipschitz property of the invariant/equivariant feature map φ with explicit constant L_φ, (ii) the composition with the standard Hölder approximation theorem applied to the pulled-back function, and (iii) the final bound showing that the network-size dependence is exactly the same as the MLP case while all group-dependent factors appear only in the multiplicative constant outside the O(·) term. This will make clear that no hidden group-dependent factors enter the asymptotic rate. revision: yes

Circularity Check

0 steps flagged

No circularity: rates follow from standard approximation theory applied to equivariant maps

full rationale

The derivation applies classical quantitative ReLU approximation bounds (for alpha-Holder functions) to the listed equivariant architectures by composing them with bi-Lipschitz invariant/equivariant feature maps. No step redefines a quantity in terms of itself, renames a fitted parameter as a prediction, or reduces the central rate comparison to a self-citation chain. The bi-Lipschitz assumption is stated explicitly as an external hypothesis on the group actions and is not derived from the networks under study. All error bounds are obtained by standard triangle-inequality chaining of the MLP rate with the Lipschitz constant of the feature map; the resulting constants may depend on the group but are not forced by any internal fit or redefinition within the paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions from approximation theory plus domain assumptions about group actions.

axioms (2)
  • domain assumption Target functions belong to the alpha-Holder class on a compact domain
    Invoked to obtain quantitative rates; standard in the field.
  • domain assumption Group actions are bi-Lipschitz or admit suitable invariant/equivariant frames
    Required for the frame-averaging and bi-Lipschitz invariant models to preserve approximation rates.

pith-pipeline@v0.9.0 · 5528 in / 1170 out tokens · 49448 ms · 2026-05-15T20:06:31.548090+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.