Gradient descent in deep networks implicitly drives features toward target-linear structure as captured by the weight Gram matrix and a derived virtual covariance.
Early directional convergence in deep homogeneous neural networks for small initializations
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Neural networks exhibit condensation of neurons into clusters with similar outputs whose number increases monotonically during training, facilitated by small initializations or dropout, providing insights into generalization and reasoning.
citing papers explorer
-
The Weight Gram Matrix Captures Sequential Feature Linearization in Deep Networks
Gradient descent in deep networks implicitly drives features toward target-linear structure as captured by the weight Gram matrix and a derived virtual covariance.
-
An overview of condensation phenomenon in deep learning
Neural networks exhibit condensation of neurons into clusters with similar outputs whose number increases monotonically during training, facilitated by small initializations or dropout, providing insights into generalization and reasoning.