pith. machine review for the scientific record. sign in

arxiv: 1907.04829 · v1 · submitted 2019-07-10 · 💻 cs.CL

Recognition: unknown

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Authors on Pith no claims yet
classification 💻 cs.CL
keywords multi-tasksingle-taskmodeldistillationmethodnetworkstrainingaddress
0
0 comments X
read the original abstract

It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. OPT: Open Pre-trained Transformer Language Models

    cs.CL 2022-05 unverdicted novelty 7.0

    OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

  2. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

    cs.CL 2019-09 accept novelty 7.0

    ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.

  3. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

    cs.CL 2019-05 accept novelty 6.0

    SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.