pith. sign in

We train all models with weight decay 1e−5 as suggested in (Tan & Le, 2019), but we reduce the learning rate to 0.016 as the models tend to diverge for higher values

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2020 1

verdicts

CONDITIONAL 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.