DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

Alexandre Muzio; Dongdong Zhang; Furu Wei; Hany Hassan Awadalla; Li Dong; Saksham Singhal; Shaohan Huang; Shuming Ma; Xia Song

arxiv: 2106.13736 · v2 · pith:ZZCJIX4Bnew · submitted 2021-06-25 · 💻 cs.CL

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

Shuming Ma , Li Dong , Shaohan Huang , Dongdong Zhang , Alexandre Muzio , Saksham Singhal , Hany Hassan Awadalla , Xia Song

show 1 more author

Furu Wei

This is my paper

classification 💻 cs.CL

keywords pretrainedencodersdeltalmgenerationlanguagetaskstranslationencoder-decoder

0 comments

read the original abstract

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks, including machine translation, abstractive text summarization, data-to-text, and question generation. The code and pretrained models are available at \url{https://aka.ms/deltalm}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation
cs.CL 2026-04 unverdicted novelty 7.0

MERIT combines token prefixing, fine-tuning, and reward-guided group optimization to outperform model scaling for Chinese-centric low-resource machine translation.
Adding Robust Code-Switching Capabilities to High Performance Multilingual ASR
cs.CL 2026-06 unverdicted novelty 5.0

Proposes Bayesian factorized adaptation for multilingual ASR to handle code-switching, reporting 32.87% fewer errors on switched words and 5.31% better overall WER while preserving monolingual accuracy with small synt...