The Missing Ingredient in Zero-Shot Neural Machine Translation

Ankur Bapna; Melvin Johnson; Naveen Arivazhagan; Orhan Firat; Roee Aharoni; Wolfgang Macherey

arxiv: 1903.07091 · v1 · pith:ZLIM5NNNnew · submitted 2019-03-17 · 💻 cs.CL · cs.AI· cs.LG

The Missing Ingredient in Zero-Shot Neural Machine Translation

Naveen Arivazhagan , Ankur Bapna , Orhan Firat , Roee Aharoni , Melvin Johnson , Wolfgang Macherey This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords translationzero-shotlanguagesmodelsapproachfirstlanguagemachine

0 comments

read the original abstract

Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models
cs.CL 2019-06 unverdicted novelty 4.0

Task-specific decoder parameters outperform fully shared decoder parameters in both supervised and zero-shot multilingual translation performance.
Improving Zero-shot Translation with Language-Independent Constraints
cs.CL 2019-06 unverdicted novelty 4.0

Language-independent constraints and regularization in multilingual Transformer NMT yield a 2.23 BLEU average gain on zero-shot pairs from the IWSLT 2017 dataset.