Early directional convergence in deep homogeneous neural networks for small initializations

[KH24b] Akshay Kumar, Jarvis Haupt · 2024 · arXiv 2403.08121

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

The Weight Gram Matrix Captures Sequential Feature Linearization in Deep Networks

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Gradient descent in deep networks implicitly drives features toward target-linear structure as captured by the weight Gram matrix and a derived virtual covariance.

An overview of condensation phenomenon in deep learning

cs.LG · 2025-04-13 · unverdicted · novelty 2.0

Neural networks exhibit condensation of neurons into clusters with similar outputs whose number increases monotonically during training, facilitated by small initializations or dropout, providing insights into generalization and reasoning.

citing papers explorer

Showing 2 of 2 citing papers.

The Weight Gram Matrix Captures Sequential Feature Linearization in Deep Networks cs.LG · 2026-05-07 · unverdicted · none · ref 24
Gradient descent in deep networks implicitly drives features toward target-linear structure as captured by the weight Gram matrix and a derived virtual covariance.
An overview of condensation phenomenon in deep learning cs.LG · 2025-04-13 · unverdicted · none · ref 8
Neural networks exhibit condensation of neurons into clusters with similar outputs whose number increases monotonically during training, facilitated by small initializations or dropout, providing insights into generalization and reasoning.

Early directional convergence in deep homogeneous neural networks for small initializations

fields

years

verdicts

representative citing papers

citing papers explorer