19 Published as a conference paper at ICLR 2020 Figure 7: This ﬁgure shows the training loss curve of LAMB optimizer

Based on our comprehensive tuning results, we conclude the existing adaptive solvers do not perform well on ImageNet training or at least it is hard to tune them · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

cs.LG · 2019-04-01 · conditional · novelty 6.0

LAMB optimizer trains BERT with batch size 32868, reducing training time to 76 minutes on TPUv3 Pod without performance loss.

citing papers explorer

Showing 1 of 1 citing paper.

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes cs.LG · 2019-04-01 · conditional · none · ref 27
LAMB optimizer trains BERT with batch size 32868, reducing training time to 76 minutes on TPUv3 Pod without performance loss.

19 Published as a conference paper at ICLR 2020 Figure 7: This ﬁgure shows the training loss curve of LAMB optimizer

fields

years

verdicts

representative citing papers

citing papers explorer