Optimal softmax temperature is analytically determined by feature dimensionality, adjusted by fitted coefficients and batch norm for model- and domain-robust classification.
Exploring the impact of temperature scaling in softmax for classification and adversarial robustness
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Sinkhorn-normalized doubly stochastic attention preserves rank more effectively than Softmax row-stochastic attention, with both showing doubly exponential rank decay to one with network depth.
citing papers explorer
-
Analytical Softmax Temperature Setting from Feature Dimensions for Model- and Domain-Robust Classification
Optimal softmax temperature is analytically determined by feature dimensionality, adjusted by fitted coefficients and batch norm for model- and domain-robust classification.
-
Sinkhorn doubly stochastic attention rank decay analysis
Sinkhorn-normalized doubly stochastic attention preserves rank more effectively than Softmax row-stochastic attention, with both showing doubly exponential rank decay to one with network depth.