The best performing run among all of these achieved a final loss of 3.12 while the best Shampoo run achieved a final loss of 3.10

We did a cross product sweep over learning rate ( 3

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SOAP: Improving and Stabilizing Shampoo using Adam

cs.LG · 2024-09-17 · accept · novelty 8.0

SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.

citing papers explorer

Showing 1 of 1 citing paper.

SOAP: Improving and Stabilizing Shampoo using Adam cs.LG · 2024-09-17 · accept · none · ref 17
SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.

The best performing run among all of these achieved a final loss of 3.12 while the best Shampoo run achieved a final loss of 3.10

fields

years

verdicts

representative citing papers

citing papers explorer