StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.
Beyond first-order methods, gradient-free approaches have also been proposed and analyzed for this challenging setting [Lin et al., 2022, Chen et al., 2023a, Liu et al., 2024d,e]
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.