pith. machine review for the scientific record. sign in

arxiv: 2505.21722 · v2 · submitted 2025-05-27 · 💻 cs.LG · cs.AI· stat.ML

Recognition: unknown

Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape

Authors on Pith no claims yet
classification 💻 cs.LG cs.AIstat.ML
keywords escapedeepfirstrelubiasdirectionsdynamicslow-rank
0
0 comments X
read the original abstract

When a deep ReLU network is initialized with small weights, gradient descent (GD) is at first dominated by the saddle at the origin in parameter space. We study the so-called escape directions along which GD leaves the origin, which play a similar role as the eigenvectors of the Hessian for strict saddles. We show that the optimal escape direction features a low-rank bias in its deeper layers: the first singular value of the $\ell$-th layer weight matrix is at least $\ell^{\frac{1}{4}}$ larger than any other singular value. We also prove a number of related results about these escape directions. We suggest that deep ReLU networks exhibit saddle-to-saddle dynamics, with GD visiting a sequence of saddles with increasing bottleneck rank (Jacot, 2023).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Theory of Saddle Escape in Deep Nonlinear Networks

    cs.LG 2026-05 unverdicted novelty 7.0

    Derives exact norm-imbalance identity for deep nonlinear nets, classifying activations into four classes and yielding escape time law τ★ = Θ(ε^{-(r-2)}) governed by bottleneck depth r.

  2. A Theory of Saddle Escape in Deep Nonlinear Networks

    cs.LG 2026-05 conditional novelty 7.0

    An exact norm-imbalance identity classifies activations into four classes and reduces deep nonlinear training flow to a scalar ODE that predicts saddle escape time scaling as ε to the power of minus (r-2) for r bottle...