SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.
The best performing run among all of these achieved a final loss of 3.12 while the best Shampoo run achieved a final loss of 3.10
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2024 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.