pith. sign in

arxiv: 2506.08764 · v3 · pith:QZTGQLTInew · submitted 2025-06-10 · 💻 cs.LG

On the Stability of the Jacobian Matrix in Deep Neural Networks

classification 💻 cs.LG
keywords networksneuralstabilitydeepjacobianinitializationmatrixschemes
0
0 comments X
read the original abstract

Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction

    cs.CV 2026-05 conditional novelty 7.0

    Omni-DuplexEval creates a new benchmark and LLM-as-a-Judge framework for real-time duplex omni-modal interaction, revealing that current models score below 40% overall and struggle especially with proactive responses.

  2. Geometric Asymmetry in MoE Specialization: Functional Decorrelation and Representational Overlap

    cs.LG 2026-05 unverdicted novelty 7.0

    MoE experts in pretrained Transformers exhibit functional decorrelation with near-zero Jacobian alignment yet occupy partially overlapping representation subspaces, with routing sparsity modulating the geometry.