We train all models with weight decay 1e−5 as suggested in (Tan & Le, 2019), but we reduce the learning rate to 0.016 as the models tend to diverge for higher values

16 Published as a conference paper at ICLR · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2020-10-03 · conditional · novelty 6.0

SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.

Showing 1 of 1 citing paper.

Sharpness-Aware Minimization for Efficiently Improving Generalization cs.LG · 2020-10-03 · conditional · none · ref 56
SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.