LAMB optimizer trains BERT with batch size 32868, reducing training time to 76 minutes on TPUv3 Pod without performance loss.
LAMB optimizer is able to achieve 94.08% test accuracy in 24 epochs, which is better than other adaptive optimizers and momentum SGD
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2019 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
LAMB optimizer trains BERT with batch size 32868, reducing training time to 76 minutes on TPUv3 Pod without performance loss.