Race: Large-scale reading comprehension dataset from examinations

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy · 2017

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

cs.CL · 2020-06-05 · unverdicted · novelty 7.0

DeBERTa improves BERT-style models by separating content and relative position in attention and adding absolute positions to the decoder, yielding consistent gains on NLU and NLG tasks and the first single-model superhuman score on SuperGLUE.

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

cs.CL · 2021-11-18 · accept · novelty 6.0

DeBERTaV3 improves DeBERTa by switching to replaced token detection pre-training and using gradient-disentangled embedding sharing, reaching 91.37% on GLUE and new SOTA on XNLI zero-shot.

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

cs.CL · 2023-08-17 · unverdicted · novelty 5.0

Empirical tests show LLMs from 1B to 7B parameters exhibit catastrophic forgetting during continual instruction tuning, with forgetting severity increasing with scale and decoder-only models retaining more than encoder-decoder models.

citing papers explorer

Showing 3 of 3 citing papers.

DeBERTa: Decoding-enhanced BERT with Disentangled Attention cs.CL · 2020-06-05 · unverdicted · none · ref 16
DeBERTa improves BERT-style models by separating content and relative position in attention and adding absolute positions to the decoder, yielding consistent gains on NLU and NLG tasks and the first single-model superhuman score on SuperGLUE.
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing cs.CL · 2021-11-18 · accept · none · ref 12
DeBERTaV3 improves DeBERTa by switching to replaced token detection pre-training and using gradient-disentangled embedding sharing, reaching 91.37% on GLUE and new SOTA on XNLI zero-shot.
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning cs.CL · 2023-08-17 · unverdicted · none · ref 9
Empirical tests show LLMs from 1B to 7B parameters exhibit catastrophic forgetting during continual instruction tuning, with forgetting severity increasing with scale and decoder-only models retaining more than encoder-decoder models.

Race: Large-scale reading comprehension dataset from examinations

fields

years

verdicts

representative citing papers

citing papers explorer