SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.
In the second sweep we observe small improvements in performance by using β2 = βshampoo = .99, so our final numbers use β2 = βshampoo = .99
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2024 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.