For ﬁne-tuning, we use Adam (Kingma & Ba

as the optimizer with weight decay (Loshchilov & Hutter · 2018

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

cs.CL · 2021-11-18 · accept · novelty 6.0

DeBERTaV3 improves DeBERTa by switching to replaced token detection pre-training and using gradient-disentangled embedding sharing, reaching 91.37% on GLUE and new SOTA on XNLI zero-shot.

citing papers explorer

Showing 1 of 1 citing paper.

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing cs.CL · 2021-11-18 · accept · none · ref 28
DeBERTaV3 improves DeBERTa by switching to replaced token detection pre-training and using gradient-disentangled embedding sharing, reaching 91.37% on GLUE and new SOTA on XNLI zero-shot.

For ﬁne-tuning, we use Adam (Kingma & Ba

fields

years

verdicts

representative citing papers

citing papers explorer