pith. machine review for the scientific record. sign in

arxiv: 2604.14037 · v1 · submitted 2026-04-15 · 💻 cs.LG · math.AG· math.CO

Recognition: unknown

A Complete Symmetry Classification of Shallow ReLU Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:43 UTC · model grok-4.3

classification 💻 cs.LG math.AGmath.CO
keywords neural network symmetriesReLU activationshallow networksparameter identifiabilityneuromanifoldsymmetry classificationpiecewise linear networks
0
0 comments X

The pith

Exploiting the non-differentiability of ReLU yields a complete classification of symmetries in shallow ReLU networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a full classification of parameter symmetries for shallow networks using the ReLU activation function. Symmetries are distinct parameter sets that produce identical input-output mappings. A sympathetic reader would care because these equivalences shape the geometry of the space of networks, which in turn influences how optimization moves through parameters. Prior approaches relied on smooth activations and therefore excluded ReLU; the present work instead uses the locations where ReLU changes its slope to enumerate every symmetry exactly.

Core claim

By exploiting the non-differentiability of the ReLU activation, a complete classification of all symmetries in shallow ReLU networks is obtained. This classification enumerates every distinct set of weights and biases that realize the same function, without imposing restrictions on network width or requiring analyticity of the activation.

What carries the argument

The non-differentiability of ReLU, which creates piecewise-linear regions whose boundaries are used to match and classify all equivalent parameter configurations.

If this is right

  • Every symmetry of any shallow ReLU network can be identified explicitly.
  • The geometry of the neuromanifold becomes fully describable for the shallow ReLU case.
  • Optimization trajectories can be analyzed with complete knowledge of all equivalent parameter points.
  • No extra regularity conditions on weights or width are required for the classification to hold.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same piecewise-linear matching technique could be adapted to classify symmetries in deeper ReLU architectures.
  • Symmetry-aware regularization or initialization schemes could be derived from the explicit list of equivalences.
  • The classification supplies a concrete testbed for studying how symmetries affect generalization and convergence rates.

Load-bearing premise

The non-differentiability of ReLU suffices by itself to produce an exhaustive list of all symmetries for shallow networks of arbitrary width.

What would settle it

A concrete shallow ReLU network whose set of symmetries cannot be fully recovered by the classification procedure would refute the central claim.

Figures

Figures reproduced from arXiv: 2604.14037 by Pranavkrishnan Ramakrishnan.

Figure 1
Figure 1. Figure 1: The bent hyperplane arrangements of θ1 and θ2. The section in red represents the domain where the first hidden neuron is activated, blue the second, and green the third. Finally the set ρ(2,3,1)(θi)(x, y) = 0 is denoted in black. where n ′ ≤ n and S ⊆ Sn−n′ is a subgroup generated by at most  n − n ′ 2  two cycles. The details of this result and its exact form is given in Proposition 2. It is natural to … view at source ↗
read the original abstract

Parameter space is not function space for neural network architectures. This fact, investigated as early as the 1990s under terms such as ``reverse engineering," or ``parameter identifiability", has led to the natural question of parameter space symmetries\textemdash the study of distinct parameters in neural architectures which realize the same function. Indeed, the quotient space obtained by identifying parameters giving rise to the same function, called the \textit{neuromanifold}, has been shown in some cases to have rich geometric properties, impacting optimization dynamics. Thus far, techniques towards complete classifications have required the analyticity of the activation function, notably excising the important case of ReLU. Here, in contrast, we exploit the non-differentiability of the ReLU activation to provide a complete classification of the symmetries in the shallow case.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that by exploiting the non-differentiability of the ReLU activation function, it is possible to provide a complete classification of the symmetries in the parameter space of shallow ReLU networks, i.e., distinct parameters that realize the same realized function, in contrast to prior approaches requiring analytic activations.

Significance. If the classification is exhaustive for arbitrary shallow ReLU networks, the result would be significant for understanding the geometry of the neuromanifold in the practically important ReLU case, with potential implications for parameter identifiability and optimization dynamics.

major comments (1)
  1. [Abstract] Abstract: The assertion of a 'complete classification' for the shallow case does not address whether genericity assumptions on weights/biases are required. Degenerate configurations (multiple neurons with identical or linearly dependent weights) merge activation regions and can enlarge the stabilizer group with additional discrete symmetries not generated by permutations and sign flips; without an explicit treatment or proof that such cases are covered, the exhaustiveness of the classification is not established.
minor comments (1)
  1. The abstract is dense and would benefit from a concise statement of the classified symmetry group (e.g., generated by what operations under what conditions) to clarify the main result.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for recognizing the potential significance of a complete symmetry classification for shallow ReLU networks. We address the major comment below and will revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion of a 'complete classification' for the shallow case does not address whether genericity assumptions on weights/biases are required. Degenerate configurations (multiple neurons with identical or linearly dependent weights) merge activation regions and can enlarge the stabilizer group with additional discrete symmetries not generated by permutations and sign flips; without an explicit treatment or proof that such cases are covered, the exhaustiveness of the classification is not established.

    Authors: We thank the referee for this observation. Our classification exploits the locations of non-differentiability of the realized function to recover the hyperplane arrangement and the associated symmetries; this geometric approach does not rely on differentiability or on any genericity assumption on the weights and biases. When neurons have identical or linearly dependent weights, the activation regions merge, but the only additional freedoms are permutations among the identical neurons (already accounted for in the symmetry group) and sign flips; any other discrete transformation would alter the piecewise-linear slopes or the locations of the kinks, which can be detected directly from the function. We nevertheless agree that an explicit treatment would improve clarity. In the revised manuscript we will add a dedicated subsection proving that no further discrete symmetries arise in degenerate configurations, thereby confirming that the classification remains exhaustive without genericity restrictions. revision: yes

Circularity Check

0 steps flagged

No circularity: classification follows from ReLU non-differentiability without reduction to inputs or self-citations

full rationale

The abstract and description present a direct mathematical exploitation of ReLU kinks to enumerate symmetries for shallow networks. No equations or steps are shown that define a quantity in terms of itself, rename a fitted parameter as a prediction, or rely on load-bearing self-citations for the completeness claim. The derivation chain is self-contained as an exhaustive case analysis on activation regions, with no evidence of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract supplies no explicit free parameters, axioms, or invented entities; the central move is the use of ReLU non-differentiability as a distinguishing property.

axioms (1)
  • domain assumption Non-differentiability of ReLU suffices for a complete symmetry classification in the shallow case
    The abstract contrasts this property with the analyticity requirement of earlier work.

pith-pipeline@v0.9.0 · 5433 in / 1193 out tokens · 30488 ms · 2026-05-10T13:43:55.722623+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Most ReLU Networks Admit Identifiable Parameters

    cs.LG 2026-05 accept novelty 8.0

    For ReLU networks with input and hidden widths at least 2, most parameters are identifiable up to symmetry, so the functional dimension equals the parameter count minus the number of hidden neurons.

Reference graph

Works this paper leans on

18 extracted references · 12 canonical work pages · cited by 1 Pith paper

  1. [1]

    For neural networks, function determines form

    Francesca Albertini and Eduardo D. Sontag. “For neural networks, function determines form”. In:Neural Networks6.7 (1993), pp. 975–990.issn: 0893-6080.doi:https : / / doi . org / 10 . 1016 / S0893 - 6080(09 ) 80007 - 5.url:https : / / www . sciencedirect . com / science / article/pii/S0893608009800075

  2. [2]

    Joachim Bona-Pellissier, Fran¸ cois Malgouyres, and Fran¸ cois Bachoc.Geometry-induced Reg- ularization in Deep ReLU Neural Networks. 2026. arXiv:2402.08269 [cs.AI].url:https: //arxiv.org/abs/2402.08269

  3. [3]

    Joachim Bona-Pellissier, Fran¸ cois Malgouyres, and Fran¸ cois Bachoc.Local Identifiability of Deep ReLU Neural Networks: the Theory. 2022. arXiv:2206.07424 [math.ST].url:https: //arxiv.org/abs/2206.07424

  4. [4]

    From Wigner-Yanase-Dyson Conjecture to Carlen-Frank-Lieb Conjec- ture

    J. Elisenda Grigsby et al. “Functional dimension of feedforward ReLU neural networks”. In: Advances in Mathematics482 (Dec. 2025), p. 110636.issn: 0001-8708.doi:10.1016/j.aim. 2025.110636.url:http://dx.doi.org/10.1016/j.aim.2025.110636

  5. [5]

    Recovering a Feed-Forward Net From Its Output

    Charles Fefferman and Scott Markel. “Recovering a Feed-Forward Net From Its Output”. In:Advances in Neural Information Processing Systems. Ed. by J. Cowan, G. Tesauro, and J. Alspector. Vol. 6. Morgan-Kaufmann, 1993.url:https://proceedings.neurips.cc/ paper_files/paper/1993/file/e49b8b4053df9505e1f48c3a701c0682-Paper.pdf

  6. [6]

    Finkel, J

    Bella Finkel et al. “Activation degree thresholds and expressiveness of polynomial neural networks”. In:Algebraic Statistics16.2 (Sept. 2025), pp. 113–130.issn: 2693-2997.doi:10. 2140/astat.2025.16.113.url:http://dx.doi.org/10.2140/astat.2025.16.113

  7. [7]

    Charles Godfrey et al.On the Symmetries of Deep Learning Models and their Internal Repre- sentations. 2023. arXiv:2205.14258 [cs.LG].url:https://arxiv.org/abs/2205.14258

  8. [8]

    Hidden Symmetries of ReLU Net- works

    Elisenda Grigsby, Kathryn Lindsey, and David Rolnick. “Hidden Symmetries of ReLU Net- works”. In:Proceedings of the 40th International Conference on Machine Learning. Ed. by Andreas Krause et al. Vol. 202. Proceedings of Machine Learning Research. PMLR, 23–29 Jul 2023, pp. 11734–11760.url:https://proceedings.mlr.press/v202/grigsby23a.html

  9. [9]

    Joe Kileel, Matthew Trager, and Joan Bruna.On the Expressive Power of Deep Polynomial Neural Networks. 2019. arXiv:1905.12207 [cs.LG].url:https://arxiv.org/abs/1905. 12207

  10. [10]

    Lee.Introduction to Smooth Manifolds

    John M. Lee.Introduction to Smooth Manifolds. Springer, 2013

  11. [11]

    Marissa Masden.Algorithmic Determination of the Combinatorial Structure of the Linear Regions of ReLU Neural Networks. 2022. arXiv:2207.07696 [cs.LG].url:https://arxiv. org/abs/2207.07696. 20 REFERENCES

  12. [12]

    Notes on the Symmetries of 2-Layer ReLU-Networks

    Henning Petzka, Martin Trimmel, and Cristian Sminchisescu. “Notes on the Symmetries of 2-Layer ReLU-Networks”. In:Proceedings of the Northern Lights Deep Learning Workshop1 (Feb. 2020), p. 6.doi:10.7557/18.5150

  13. [13]

    Functional vs. parametric equivalence of Re{LU} networks

    Mary Phuong and Christoph H. Lampert. “Functional vs. parametric equivalence of Re{LU} networks”. In:International Conference on Learning Representations. 2020.url:https : //openreview.net/forum?id=Bylx-TNKvH

  14. [14]

    Reverse-engineering deep ReLU networks

    David Rolnick and Konrad Kording. “Reverse-engineering deep ReLU networks”. In:Proceed- ings of the 37th International Conference on Machine Learning. Ed. by Hal Daum´ e III and Aarti Singh. Vol. 119. Proceedings of Machine Learning Research. PMLR, 13–18 Jul 2020, pp. 8178–8187.url:https://proceedings.mlr.press/v119/rolnick20a.html

  15. [15]

    An Embedding of ReLU Networks and an Analysis of Their Identifiability

    Pierre Stock and R´ emi Gribonval. “An Embedding of ReLU Networks and an Analysis of Their Identifiability”. In:Constructive Approximation57.2 (July 2022), pp. 853–899.issn: 1432- 0940.doi:10.1007/s00365- 022- 09578- 1.url:http://dx.doi.org/10.1007/s00365- 022-09578-1

  16. [16]

    Yani Zhang and Helmut B¨ olcskei.Complete Identification of Deep ReLU Neural Networks by Many-Valued Logic. 2026. arXiv:2602.00266 [cs.AI].url:https://arxiv.org/abs/ 2602.00266

  17. [17]

    Bo Zhao, Robin Walters, and Rose Yu.Symmetry in Neural Network Parameter Spaces. 2025. arXiv:2506.13018 [cs.LG].url:https://arxiv.org/abs/2506.13018

  18. [18]

    Bo Zhao et al.Symmetries, flat minima, and the conserved quantities of gradient flow. 2023. arXiv:2210.17216 [cs.LG].url:https://arxiv.org/abs/2210.17216. 6.Code to Compute Minimal Form 1from math import sqrt 2# Notably this sign function returns 0 for 0 instead of 1 as might be more standard . 3def sign ( x ) : 4if x > 0: return 1 5elif x < 0: return -1 ...