Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

Johanni Brea, Berfin Simsek, Bernd Illing, Wulfram Gerstner · 2019 · cs.LG · arXiv 1907.02911

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

The permutation symmetry of neurons in each layer of a deep neural network gives rise not only to multiple equivalent global minima of the loss function, but also to first-order saddle points located on the path between the global minima. In a network of $d-1$ hidden layers with $n_k$ neurons in layers $k = 1, \ldots, d$, we construct smooth paths between equivalent global minima that lead through a `permutation point' where the input and output weight vectors of two neurons in the same hidden layer $k$ collide and interchange. We show that such permutation points are critical points with at least $n_{k+1}$ vanishing eigenvalues of the Hessian matrix of second derivatives indicating a local plateau of the loss function. We find that a permutation point for the exchange of neurons $i$ and $j$ transits into a flat valley (or generally, an extended plateau of $n_{k+1}$ flat dimensions) that enables all $n_k!$ permutations of neurons in a given layer $k$ at the same loss value. Moreover, we introduce high-order permutation points by exploiting the recursive structure in neural network functions, and find that the number of $K^{\text{th}}$-order permutation points is at least by a factor $\sum_{k=1}^{d-1}\frac{1}{2!^K}{n_k-K \choose K}$ larger than the (already huge) number of equivalent global minima. In two tasks, we illustrate numerically that some of the permutation points correspond to first-order saddles (`permutation saddles'): first, in a toy network with a single hidden layer on a function approximation task and, second, in a multilayer network on the MNIST task. Our geometric approach yields a lower bound on the number of critical points generated by weight-space symmetries and provides a simple intuitive link between previous mathematical results and numerical observations.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A Theory of Saddle Escape in Deep Nonlinear Networks

cs.LG · 2026-05-02 · unverdicted · novelty 8.0 · 3 refs

Derives exact Frobenius norm imbalance identity for deep nonlinear networks, classifies activations into four classes, and obtains critical-depth escape time law τ★ = Θ(ε^{-(r-2)}) from reduction to scalar ODE on permutation-symmetric submanifold.

Beyond Structural Symmetries: Linear Mode Connectivity via Neuron Identifiability

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Neural networks admit large families of approximately equivalent solutions via neuron identifiability even without structural symmetry, enabling linear low-loss merging paths without prior alignment.

A Geometric Characterization of the Stationary Plateau for Two-Layer Neural Networks

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

A geometric classification of stationary points on neuron-splitting plateaus in two-layer NN loss landscapes using the inner Hessian.

The Platonic Representation Hypothesis

cs.LG · 2024-05-13 · unverdicted · novelty 5.0

Representations learned by large AI models are converging toward a shared statistical model of reality.

Nora: Normalized Orthogonal Row Alignment for Scalable Matrix Optimizer

cs.LG · 2026-05-05 · unverdicted · novelty 4.0

Nora is a matrix optimizer that stabilizes weight norms and angular velocities through row-wise momentum projection onto the orthogonal complement of the weights while approximating structured preconditioning with O(mn) complexity and proven scalability.

citing papers explorer

Showing 5 of 5 citing papers after filters.

A Theory of Saddle Escape in Deep Nonlinear Networks cs.LG · 2026-05-02 · unverdicted · none · ref 12 · 3 links · internal anchor
Derives exact Frobenius norm imbalance identity for deep nonlinear networks, classifies activations into four classes, and obtains critical-depth escape time law τ★ = Θ(ε^{-(r-2)}) from reduction to scalar ODE on permutation-symmetric submanifold.
Beyond Structural Symmetries: Linear Mode Connectivity via Neuron Identifiability cs.LG · 2026-06-03 · unverdicted · none · ref 26 · internal anchor
Neural networks admit large families of approximately equivalent solutions via neuron identifiability even without structural symmetry, enabling linear low-loss merging paths without prior alignment.
A Geometric Characterization of the Stationary Plateau for Two-Layer Neural Networks cs.LG · 2026-06-03 · unverdicted · none · ref 20 · internal anchor
A geometric classification of stationary points on neuron-splitting plateaus in two-layer NN loss landscapes using the inner Hessian.
The Platonic Representation Hypothesis cs.LG · 2024-05-13 · unverdicted · none · ref 73 · internal anchor
Representations learned by large AI models are converging toward a shared statistical model of reality.
Nora: Normalized Orthogonal Row Alignment for Scalable Matrix Optimizer cs.LG · 2026-05-05 · unverdicted · none · ref 8
Nora is a matrix optimizer that stabilizes weight norms and angular velocities through row-wise momentum projection onto the orthogonal complement of the weights while approximating structured preconditioning with O(mn) complexity and proven scalability.

Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer