LAMB optimizer trains BERT with batch size 32868, reducing training time to 76 minutes on TPUv3 Pod without performance loss.
19 Published as a conference paper at ICLR 2020 Figure 7: This figure shows the training loss curve of LAMB optimizer
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2019 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
LAMB optimizer trains BERT with batch size 32868, reducing training time to 76 minutes on TPUv3 Pod without performance loss.