Gradient descent in deep networks implicitly drives features toward target-linear structure as captured by the weight Gram matrix and a derived virtual covariance.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Optimal softmax temperature is analytically determined by feature dimensionality, adjusted by fitted coefficients and batch norm for model- and domain-robust classification.
citing papers explorer
-
The Weight Gram Matrix Captures Sequential Feature Linearization in Deep Networks
Gradient descent in deep networks implicitly drives features toward target-linear structure as captured by the weight Gram matrix and a derived virtual covariance.
-
Analytical Softmax Temperature Setting from Feature Dimensions for Model- and Domain-Robust Classification
Optimal softmax temperature is analytically determined by feature dimensionality, adjusted by fitted coefficients and batch norm for model- and domain-robust classification.