MLP activation sparsity equals augmented flatness divided by input norm times gradient; the ratio falls during training and can be reduced further by three plug-and-play changes, yielding higher sparsity on ImageNet and C4.
Available: http://arxiv.org/abs/2305.01939 4
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Towards the Connection between Activation Sparsity and Flat Minima
MLP activation sparsity equals augmented flatness divided by input norm times gradient; the ratio falls during training and can be reduced further by three plug-and-play changes, yielding higher sparsity on ImageNet and C4.