Understanding Back-Translation at Scale

David Grangier; Michael Auli; Myle Ott; Sergey Edunov

arxiv: 1808.09381 · v2 · pith:MTUNM4IVnew · submitted 2018-08-28 · 💻 cs.CL

Understanding Back-Translation at Scale

Sergey Edunov , Myle Ott , Michael Auli , David Grangier This is my paper

classification 💻 cs.CL

keywords datasentencessyntheticback-translationback-translationsbeameffectivemonolingual

0 comments

read the original abstract

An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences. We find that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective. Our analysis shows that sampling or noisy synthetic data gives a much stronger training signal than data generated by beam or greedy search. We also compare how synthetic data compares to genuine bitext and study various domain effects. Finally, we scale to hundreds of millions of monolingual sentences and achieve a new state of the art of 35 BLEU on the WMT'14 English-German test set.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Language Models are Few-Shot Learners
cs.CL 2020-05 accept novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
cs.LG 2019-10 unverdicted novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...
Model-Agnostic Meta Learning for Class Imbalance Adaptation
cs.CL 2026-04 conditional novelty 5.0

HAMR combines meta-learning with hardness-aware weighting and neighborhood resampling to improve minority-class performance on imbalanced NLP datasets.
Movie Gen: A Cast of Media Foundation Models
cs.CV 2024-10 unverdicted novelty 5.0

A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
cs.CL 2019-07 unverdicted novelty 5.0

A single multilingual NMT model for 103 languages trained on 25B examples demonstrates transfer learning benefits for low-resource languages.
Enhancing Scientific Discourse: Machine Translation for the Scientific Domain
cs.CL 2026-05 conditional novelty 4.0

Development of domain-specific scientific corpora for English-Spanish, English-French, and English-Portuguese and their application to fine-tuning NMT models.
Data-Centric Foundation Models in Computational Healthcare: A Survey
cs.LG 2024-01 unverdicted novelty 3.0

The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.