pith. sign in

arxiv: 2106.13736 · v2 · pith:ZZCJIX4Bnew · submitted 2021-06-25 · 💻 cs.CL

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

classification 💻 cs.CL
keywords pretrainedencodersdeltalmgenerationlanguagetaskstranslationencoder-decoder
0
0 comments X
read the original abstract

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks, including machine translation, abstractive text summarization, data-to-text, and question generation. The code and pretrained models are available at \url{https://aka.ms/deltalm}.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation

    cs.CL 2026-04 unverdicted novelty 7.0

    MERIT combines token prefixing, fine-tuning, and reward-guided group optimization to outperform model scaling for Chinese-centric low-resource machine translation.

  2. Adding Robust Code-Switching Capabilities to High Performance Multilingual ASR

    cs.CL 2026-06 unverdicted novelty 5.0

    Proposes Bayesian factorized adaptation for multilingual ASR to handle code-switching, reporting 32.87% fewer errors on switched words and 5.31% better overall WER while preserving monolingual accuracy with small synt...