Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power

Dacheng Tao; Fengxiang He; Tian Qin; Xinmei Tian; Yuzhu Chen

REVIEW 2 major objections 3 minor 10 references

Reviewed by Pith at T0; open to challenge.

T0 means a machine referee read the full paper against a public rubric. The mark states how deep the mechanical check went, never who wrote it. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

Enforcing equivariance in neural networks reduces expressive power, which enlarging model size can restore with proven bounds while also lowering hypothesis space dimensionality.

2026-05-21 17:42 UTC pith:4O4KALXY

load-bearing objection The paper shows that equivariance cuts linear regions in 2-layer ReLU nets via restricted boundary hyperplanes, proves width bounds to recover capacity, and claims the fix lowers hypothesis dimension, but stays inside that single shallow case. the 2 major comments →

arxiv 2512.09673 v3 pith:4O4KALXY submitted 2025-12-10 cs.LG cs.AIcs.NEstat.ML

Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power

Yuzhu Chen , Tian Qin , Xinmei Tian , Fengxiang He , Dacheng Tao This is my paper

classification cs.LG cs.AIcs.NEstat.ML

keywords equivariant neural networksexpressive powerReLU networkshypothesis space dimensionalitymodel size compensationsymmetry constraintsgeneralizability

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how requiring networks to respect data symmetries through equivariance affects their ability to represent varied functions. In two-layer ReLU networks, analysis of boundary hyperplanes and channel vectors shows that these constraints narrow the set of representable mappings. The authors prove that increasing network width compensates for this reduction and supply upper bounds on the necessary enlargement. They further establish that the resulting wider equivariant networks occupy a hypothesis space of lower dimensionality, which carries implications for improved generalization.

Core claim

Focusing on 2-layer ReLU networks, enforcing equivariance constraints undermines the expressive power as revealed through examination of boundary hyperplanes and channel vectors. This drawback can be compensated for by enlarging the model size, for which upper bounds on the required enlargement are proven. The enlarged neural architectures exhibit reduced hypothesis space dimensionality, implying even better generalizability.

What carries the argument

Boundary hyperplanes and channel vectors in 2-layer ReLU networks, used to demonstrate the restrictions that equivariance imposes on the functions a network can compute.

Load-bearing premise

The analysis assumes that studying boundary hyperplanes and channel vectors in 2-layer ReLU networks is enough to capture the effect of equivariance on expressive power and that the compensation bounds from this case apply more generally.

What would settle it

A count of the distinct linear regions or representable functions for equivariant versus non-equivariant 2-layer ReLU networks of matched and increased widths to check whether the predicted enlargement restores the original count.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Equivariant networks require larger widths to match the expressive power of unconstrained networks.
Explicit upper bounds quantify the model enlargement needed to offset the loss from equivariance.
The compensated larger equivariant architectures operate with lower-dimensional hypothesis spaces.
Lower hypothesis space dimensionality supports stronger generalization performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The size-compensation result may guide layer-width choices when applying equivariance to new symmetry groups.
If the dimensionality reduction persists in deeper networks, it could explain observed generalization advantages of equivariant models in practice.
Designers might deliberately scale equivariant architectures beyond the minimum compensation point to exploit the dimensionality benefit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

The paper shows that equivariance cuts linear regions in 2-layer ReLU nets via restricted boundary hyperplanes, proves width bounds to recover capacity, and claims the fix lowers hypothesis dimension, but stays inside that single shallow case.

read the letter

The main takeaway is straightforward. For two-layer ReLU networks, forcing equivariance shrinks the set of functions you can represent because the boundary hyperplanes and channel vectors become constrained. The authors give a constructive argument for this reduction and then prove explicit upper bounds on how much wider the hidden layer needs to be to match the original number of regions. They also show that the compensated equivariant network has strictly lower hypothesis-space dimension than the unconstrained version, which they link to improved generalization.

Referee Report

2 major / 3 minor

Summary. Focusing on 2-layer ReLU networks, the paper claims that enforcing equivariance constraints undermines expressive power, as shown constructively via analysis of boundary hyperplanes and channel vectors. This drawback can be compensated by enlarging model size, with explicit upper bounds proven on the required enlargement factor. The paper further shows that the resulting enlarged equivariant architectures have strictly lower hypothesis-space dimensionality than unconstrained counterparts, implying improved generalizability.

Significance. If the results hold, this work supplies a useful geometric perspective on the trade-off between symmetry constraints and expressivity, together with concrete compensation bounds and the unexpected finding of dimensionality reduction. The constructive demonstrations and the proofs of upper bounds on enlargement are explicit strengths that advance understanding of how equivariance shapes the geometry of linear regions in the 2-layer ReLU setting.

major comments (2)

[§4] §4: the upper bound on the width-enlargement factor needed to recover the same number of linear regions is derived only inside the 2-layer ReLU case; the manuscript supplies neither an inductive argument nor a counter-example showing that the same factor (or a comparable one) suffices once depth, group representation, or activation choice changes.
[§5] §5: the claim that the enlarged equivariant hypothesis space has strictly lower dimension than the unconstrained counterpart rests on the 2-layer geometry; without an explicit dimension formula or comparison that survives changes in depth, it is unclear whether the generalizability implication extends beyond the restricted setting.

minor comments (3)

[Introduction] Introduction: add a short paragraph contrasting the present geometric counting argument with prior expressivity results for equivariant networks (e.g., those based on group Fourier analysis).
[Figure 2] Figure 2: the caption should explicitly state the group action and input dimension used to generate the depicted hyperplanes.
[§2.1] §2.1: the definition of channel vectors would be clearer if the precise mapping from weight matrix columns to hyperplane normals were written as an equation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and for highlighting the geometric perspective and constructive proofs as strengths of the work. Our manuscript is explicitly scoped to 2-layer ReLU networks, as stated in the abstract and throughout the text, to enable a detailed analysis of boundary hyperplanes and channel vectors. We address the major comments below by clarifying scope and committing to textual revisions that make this limitation more explicit.

read point-by-point responses

Referee: [§4] §4: the upper bound on the width-enlargement factor needed to recover the same number of linear regions is derived only inside the 2-layer ReLU case; the manuscript supplies neither an inductive argument nor a counter-example showing that the same factor (or a comparable one) suffices once depth, group representation, or activation choice changes.

Authors: We agree that the upper bound on the enlargement factor is proven only for the 2-layer ReLU case. The manuscript contains no inductive argument or counter-example for deeper networks, different group representations, or other activations. This is because the paper deliberately restricts attention to the 2-layer ReLU setting to obtain explicit geometric characterizations and concrete bounds. We will revise the introduction and conclusion to state this scope limitation more prominently and to list generalization to other depths and activations as an open direction for future work. revision: partial
Referee: [§5] §5: the claim that the enlarged equivariant hypothesis space has strictly lower dimension than the unconstrained counterpart rests on the 2-layer geometry; without an explicit dimension formula or comparison that survives changes in depth, it is unclear whether the generalizability implication extends beyond the restricted setting.

Authors: The referee correctly observes that the strict dimensionality reduction and the associated generalizability claim are established using the 2-layer geometry. No dimension formula valid for arbitrary depth is provided. Within the 2-layer ReLU setting, however, the reduction is shown by comparing the number of free parameters after accounting for the equivariance constraints and the enlargement factor. We will add a clarifying sentence in §5 (and a corresponding remark in the conclusion) that explicitly limits the generalizability implication to the 2-layer case analyzed and identifies extension to deeper architectures as future work. revision: partial

Circularity Check

0 steps flagged

No circularity: direct geometric proofs within 2-layer ReLU setting are self-contained.

full rationale

The paper restricts its analysis to 2-layer ReLU networks and derives its claims via direct constructive examination of boundary hyperplanes and channel vectors to show reduced representable functions under equivariance, followed by explicit upper bounds on width enlargement to recover linear regions and a comparison of hypothesis space dimensions. These steps consist of internal geometric arguments and proofs that do not rely on parameters fitted to the target quantities, self-definitional reductions, or load-bearing self-citations. The derivation remains self-contained against the paper's stated scope and assumptions, with no reduction of results to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract invokes standard assumptions from neural-network approximation theory (ReLU networks, boundary hyperplanes, channel vectors) without introducing new free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5658 in / 1052 out tokens · 77051 ms · 2026-05-21T17:42:27.664151+00:00 · methodology

0 comments

read the original abstract

Equivariant neural networks encode the intrinsic symmetry of data as an inductive bias, which has achieved impressive performance in wide domains. However, the understanding to their expressive power remains premature. Focusing on 2-layer ReLU networks, this paper investigates the impact of enforcing equivariance constraints on the expressive power. By examining the boundary hyperplanes and the channel vectors, we constructively demonstrate that enforcing equivariance constraints could undermine the expressive power. Naturally, this drawback can be compensated for by enlarging the model size -- we further prove upper bounds on the required enlargement for compensation. Surprisingly, we show that the enlarged neural architectures have reduced hypothesis space dimensionality, implying even better generalizability.

Figures

Figures reproduced from arXiv: 2512.09673 by Dacheng Tao, Fengxiang He, Tian Qin, Xinmei Tian, Yuzhu Chen.

**Figure 1.** Figure 1: An equivariant function that satisfies s((a, b) ⊤) = s((b, a) ⊤) and s((a, b) ⊤) = s((−a, −b) ⊤). The left subfigure is a 3D plot, while the right subfigure is a 2D Contour map. function s(x). Specifically, we examine whether the following inequalities hold: inf θ∈Θ Ex∼P [∥Fθ(x) − s(x)∥ 2 2 ] < inf θ∈Θ∩GENs Ex∼P [∥Fθ(x) − s(x)∥ 2 2 ] and inf θ∈Θ∩GENs Ex∼P [∥Fθ(x) − s(x)∥ 2 2 ] < inf θ∈Θ∩LENs Ex∼P [∥Fθ(x) −… view at source ↗

**Figure 2.** Figure 2: A visualization of the feature function F of F(x, y) = σ(x) + σ(−y) + σ(−x + y), where the left subfigure is F1 and the right figure is F2. As shown, there are two boundary hyperplanes: x = 0, y = 0, y = x. 4 Boundary Hyperplanes and Channel Vectors To establish a clear foundation for our analysis of expressive power, we discuss two tools, symmetry boundary hyperplanes and symmetry channel vectors, in this… view at source ↗

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · 6 internal anchors

[1]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Se (3)-transformers: 3d roto-translation equivariant attention networks.Advances in neural information processing systems, 33:1970–1981,

Fabian Fuchs, Daniel Worrall, Volker Fischer, and Max Welling. Se (3)-transformers: 3d roto-translation equivariant attention networks.Advances in neural information processing systems, 33:1970–1981,

work page 1970
[3]

Platonic Transformers: A Solid Choice For Equivariance

12 Mohammad Mohaiminul Islam, Rishabh Anand, David R Wessels, Friso de Kruiff, Thijs P Kuipers, Rex Ying, Clara I Sánchez, Sharvaree Vadgama, Georg Bökman, and Erik J Bekkers. Platonic transformers: A solid choice for equivariance.arXiv preprint arXiv:2510.03511,

work page internal anchor Pith review arXiv
[4]

Graph neural networks are more powerful than we think

Charilaos I Kanatsoulis and Alejandro Ribeiro. Graph neural networks are more powerful than we think. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7550–7554. IEEE,

work page 2024
[5]

Invariant and Equivariant Graph Networks

HaggaiMaron, HeliBen-Hamu, NadavShamir, andYaronLipman. Invariantandequivariantgraphnetworks. arXiv preprint arXiv:1812.09902,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

On Universality of Deep Equivariant Networks

Marco Pacini, Mircea Petrache, Bruno Lepri, Shubhendu Trivedi, and Robin Walters. On universality of deep equivariant networks.arXiv preprint arXiv:2510.15814,

work page internal anchor Pith review arXiv
[7]

Victor Garcia Satorras, Emiel Hoogeboom, Fabian B Fuchs, Ingmar Posner, and Max Welling.E(n)equiv- ariant normalizing flows for molecule generation in 3d.arXiv preprint arXiv:2105.09016,

work page arXiv
[8]

Equivariance regularization for image reconstruction.arXiv preprint arXiv:2202.05062,

Junqi Tang. Equivariance regularization for image reconstruction.arXiv preprint arXiv:2202.05062,

work page arXiv
[9]

Graph Attention Networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Deep Sets

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexan- der Smola. Deep sets.arXiv preprint arXiv:1703.06114,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Se (3)-transformers: 3d roto-translation equivariant attention networks.Advances in neural information processing systems, 33:1970–1981,

Fabian Fuchs, Daniel Worrall, Volker Fischer, and Max Welling. Se (3)-transformers: 3d roto-translation equivariant attention networks.Advances in neural information processing systems, 33:1970–1981,

work page 1970

[3] [3]

Platonic Transformers: A Solid Choice For Equivariance

12 Mohammad Mohaiminul Islam, Rishabh Anand, David R Wessels, Friso de Kruiff, Thijs P Kuipers, Rex Ying, Clara I Sánchez, Sharvaree Vadgama, Georg Bökman, and Erik J Bekkers. Platonic transformers: A solid choice for equivariance.arXiv preprint arXiv:2510.03511,

work page internal anchor Pith review arXiv

[4] [4]

Graph neural networks are more powerful than we think

Charilaos I Kanatsoulis and Alejandro Ribeiro. Graph neural networks are more powerful than we think. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7550–7554. IEEE,

work page 2024

[5] [5]

Invariant and Equivariant Graph Networks

HaggaiMaron, HeliBen-Hamu, NadavShamir, andYaronLipman. Invariantandequivariantgraphnetworks. arXiv preprint arXiv:1812.09902,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

On Universality of Deep Equivariant Networks

Marco Pacini, Mircea Petrache, Bruno Lepri, Shubhendu Trivedi, and Robin Walters. On universality of deep equivariant networks.arXiv preprint arXiv:2510.15814,

work page internal anchor Pith review arXiv

[7] [7]

Victor Garcia Satorras, Emiel Hoogeboom, Fabian B Fuchs, Ingmar Posner, and Max Welling.E(n)equiv- ariant normalizing flows for molecule generation in 3d.arXiv preprint arXiv:2105.09016,

work page arXiv

[8] [8]

Equivariance regularization for image reconstruction.arXiv preprint arXiv:2202.05062,

Junqi Tang. Equivariance regularization for image reconstruction.arXiv preprint arXiv:2202.05062,

work page arXiv

[9] [9]

Graph Attention Networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Deep Sets

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexan- der Smola. Deep sets.arXiv preprint arXiv:1703.06114,

work page internal anchor Pith review Pith/arXiv arXiv