For ReLU networks with width at least two in input and hidden layers, an open set of parameters is identifiable, implying functional dimension equals parameter count minus hidden neurons.
Symmetry in neural network parameter spaces
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
A complete classification of symmetries in shallow ReLU networks is achieved by using the non-differentiability of ReLU.
Dead-Direction Conditioners provide gauge-equivariant preconditioning by conditioning optimizer state on symmetry orbits, yielding improved resistance to over-training collapse and higher detection of dead directions compared to AdamW and Muon.
Neural networks admit large families of approximately equivalent solutions via neuron identifiability even without structural symmetry, enabling linear low-loss merging paths without prior alignment.
In overparameterized quadratic networks, one-pass SGD escapes generalization plateaus only modestly faster and selects the initialization-closest zero-loss solution due to a conserved quantity in the overlap ODEs.
A bidirectional optimization method using parameterized transformations enables near-zero loss barriers for linear mode connectivity in medium-scale language models and small barriers in billion-parameter transformers.
citing papers explorer
-
Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks
In overparameterized quadratic networks, one-pass SGD escapes generalization plateaus only modestly faster and selects the initialization-closest zero-loss solution due to a conserved quantity in the overlap ODEs.