pith. sign in

In the second sweep we observe small improvements in performance by using β2 = βshampoo = .99, so our final numbers use β2 = βshampoo = .99

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2024 1

verdicts

ACCEPT 1

representative citing papers

SOAP: Improving and Stabilizing Shampoo using Adam

cs.LG · 2024-09-17 · accept · novelty 8.0

SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.

citing papers explorer

Showing 1 of 1 citing paper.

  • SOAP: Improving and Stabilizing Shampoo using Adam cs.LG · 2024-09-17 · accept · none · ref 12

    SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.