Pre-trained Language Model Representations for Language Generation

Alexei Baevski; Michael Auli; Sergey Edunov

arxiv: 1903.09722 · v2 · pith:M2LDPC6Tnew · submitted 2019-03-22 · 💻 cs.CL

Pre-trained Language Model Representations for Language Generation

Sergey Edunov , Alexei Baevski , Michael Auli This is my paper

classification 💻 cs.CL

keywords languagepre-trainedrepresentationsabstractivemachinemodelsequencesummarization

0 comments

read the original abstract

Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.

This paper has not been read by Pith yet.

Pre-trained Language Model Representations for Language Generation

discussion (0)