Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power
Pith reviewed 2026-05-21 17:42 UTC · model grok-4.3
The pith
Enforcing equivariance in neural networks reduces expressive power, which enlarging model size can restore with proven bounds while also lowering hypothesis space dimensionality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Focusing on 2-layer ReLU networks, enforcing equivariance constraints undermines the expressive power as revealed through examination of boundary hyperplanes and channel vectors. This drawback can be compensated for by enlarging the model size, for which upper bounds on the required enlargement are proven. The enlarged neural architectures exhibit reduced hypothesis space dimensionality, implying even better generalizability.
What carries the argument
Boundary hyperplanes and channel vectors in 2-layer ReLU networks, used to demonstrate the restrictions that equivariance imposes on the functions a network can compute.
If this is right
- Equivariant networks require larger widths to match the expressive power of unconstrained networks.
- Explicit upper bounds quantify the model enlargement needed to offset the loss from equivariance.
- The compensated larger equivariant architectures operate with lower-dimensional hypothesis spaces.
- Lower hypothesis space dimensionality supports stronger generalization performance.
Where Pith is reading between the lines
- The size-compensation result may guide layer-width choices when applying equivariance to new symmetry groups.
- If the dimensionality reduction persists in deeper networks, it could explain observed generalization advantages of equivariant models in practice.
- Designers might deliberately scale equivariant architectures beyond the minimum compensation point to exploit the dimensionality benefit.
Load-bearing premise
The analysis assumes that studying boundary hyperplanes and channel vectors in 2-layer ReLU networks is enough to capture the effect of equivariance on expressive power and that the compensation bounds from this case apply more generally.
What would settle it
A count of the distinct linear regions or representable functions for equivariant versus non-equivariant 2-layer ReLU networks of matched and increased widths to check whether the predicted enlargement restores the original count.
Figures
read the original abstract
Equivariant neural networks encode the intrinsic symmetry of data as an inductive bias, which has achieved impressive performance in wide domains. However, the understanding to their expressive power remains premature. Focusing on 2-layer ReLU networks, this paper investigates the impact of enforcing equivariance constraints on the expressive power. By examining the boundary hyperplanes and the channel vectors, we constructively demonstrate that enforcing equivariance constraints could undermine the expressive power. Naturally, this drawback can be compensated for by enlarging the model size -- we further prove upper bounds on the required enlargement for compensation. Surprisingly, we show that the enlarged neural architectures have reduced hypothesis space dimensionality, implying even better generalizability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. Focusing on 2-layer ReLU networks, the paper claims that enforcing equivariance constraints undermines expressive power, as shown constructively via analysis of boundary hyperplanes and channel vectors. This drawback can be compensated by enlarging model size, with explicit upper bounds proven on the required enlargement factor. The paper further shows that the resulting enlarged equivariant architectures have strictly lower hypothesis-space dimensionality than unconstrained counterparts, implying improved generalizability.
Significance. If the results hold, this work supplies a useful geometric perspective on the trade-off between symmetry constraints and expressivity, together with concrete compensation bounds and the unexpected finding of dimensionality reduction. The constructive demonstrations and the proofs of upper bounds on enlargement are explicit strengths that advance understanding of how equivariance shapes the geometry of linear regions in the 2-layer ReLU setting.
major comments (2)
- [§4] §4: the upper bound on the width-enlargement factor needed to recover the same number of linear regions is derived only inside the 2-layer ReLU case; the manuscript supplies neither an inductive argument nor a counter-example showing that the same factor (or a comparable one) suffices once depth, group representation, or activation choice changes.
- [§5] §5: the claim that the enlarged equivariant hypothesis space has strictly lower dimension than the unconstrained counterpart rests on the 2-layer geometry; without an explicit dimension formula or comparison that survives changes in depth, it is unclear whether the generalizability implication extends beyond the restricted setting.
minor comments (3)
- [Introduction] Introduction: add a short paragraph contrasting the present geometric counting argument with prior expressivity results for equivariant networks (e.g., those based on group Fourier analysis).
- [Figure 2] Figure 2: the caption should explicitly state the group action and input dimension used to generate the depicted hyperplanes.
- [§2.1] §2.1: the definition of channel vectors would be clearer if the precise mapping from weight matrix columns to hyperplane normals were written as an equation.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and for highlighting the geometric perspective and constructive proofs as strengths of the work. Our manuscript is explicitly scoped to 2-layer ReLU networks, as stated in the abstract and throughout the text, to enable a detailed analysis of boundary hyperplanes and channel vectors. We address the major comments below by clarifying scope and committing to textual revisions that make this limitation more explicit.
read point-by-point responses
-
Referee: [§4] §4: the upper bound on the width-enlargement factor needed to recover the same number of linear regions is derived only inside the 2-layer ReLU case; the manuscript supplies neither an inductive argument nor a counter-example showing that the same factor (or a comparable one) suffices once depth, group representation, or activation choice changes.
Authors: We agree that the upper bound on the enlargement factor is proven only for the 2-layer ReLU case. The manuscript contains no inductive argument or counter-example for deeper networks, different group representations, or other activations. This is because the paper deliberately restricts attention to the 2-layer ReLU setting to obtain explicit geometric characterizations and concrete bounds. We will revise the introduction and conclusion to state this scope limitation more prominently and to list generalization to other depths and activations as an open direction for future work. revision: partial
-
Referee: [§5] §5: the claim that the enlarged equivariant hypothesis space has strictly lower dimension than the unconstrained counterpart rests on the 2-layer geometry; without an explicit dimension formula or comparison that survives changes in depth, it is unclear whether the generalizability implication extends beyond the restricted setting.
Authors: The referee correctly observes that the strict dimensionality reduction and the associated generalizability claim are established using the 2-layer geometry. No dimension formula valid for arbitrary depth is provided. Within the 2-layer ReLU setting, however, the reduction is shown by comparing the number of free parameters after accounting for the equivariance constraints and the enlargement factor. We will add a clarifying sentence in §5 (and a corresponding remark in the conclusion) that explicitly limits the generalizability implication to the 2-layer case analyzed and identifies extension to deeper architectures as future work. revision: partial
Circularity Check
No circularity: direct geometric proofs within 2-layer ReLU setting are self-contained.
full rationale
The paper restricts its analysis to 2-layer ReLU networks and derives its claims via direct constructive examination of boundary hyperplanes and channel vectors to show reduced representable functions under equivariance, followed by explicit upper bounds on width enlargement to recover linear regions and a comparison of hypothesis space dimensions. These steps consist of internal geometric arguments and proofs that do not rely on parameters fitted to the target quantities, self-definitional reductions, or load-bearing self-citations. The derivation remains self-contained against the paper's stated scope and assumptions, with no reduction of results to their own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Fabian Fuchs, Daniel Worrall, Volker Fischer, and Max Welling. Se (3)-transformers: 3d roto-translation equivariant attention networks.Advances in neural information processing systems, 33:1970–1981,
work page 1970
-
[3]
Platonic transformers: A solid choice for equivariance.arXiv preprint arXiv:2510.03511,
12 Mohammad Mohaiminul Islam, Rishabh Anand, David R Wessels, Friso de Kruiff, Thijs P Kuipers, Rex Ying, Clara I Sánchez, Sharvaree Vadgama, Georg Bökman, and Erik J Bekkers. Platonic transformers: A solid choice for equivariance.arXiv preprint arXiv:2510.03511,
-
[4]
Graph neural networks are more powerful than we think
Charilaos I Kanatsoulis and Alejandro Ribeiro. Graph neural networks are more powerful than we think. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7550–7554. IEEE,
work page 2024
-
[5]
Invariant and Equivariant Graph Networks
HaggaiMaron, HeliBen-Hamu, NadavShamir, andYaronLipman. Invariantandequivariantgraphnetworks. arXiv preprint arXiv:1812.09902,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
On universality of deep equivariant networks.arXiv preprint arXiv:2510.15814,
Marco Pacini, Mircea Petrache, Bruno Lepri, Shubhendu Trivedi, and Robin Walters. On universality of deep equivariant networks.arXiv preprint arXiv:2510.15814,
- [7]
-
[8]
Equivariance regularization for image reconstruction.arXiv preprint arXiv:2202.05062,
Junqi Tang. Equivariance regularization for image reconstruction.arXiv preprint arXiv:2202.05062,
-
[9]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexan- der Smola. Deep sets.arXiv preprint arXiv:1703.06114,
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.