pith. sign in

arxiv: 1904.02338 · v2 · pith:2VCRL2UDnew · submitted 2019-04-04 · 💻 cs.LG · cs.CL· cs.NE· stat.ML

Consistency by Agreement in Zero-shot Neural Machine Translation

classification 💻 cs.LG cs.CLcs.NEstat.ML
keywords translationzero-shotmodelsmultilingualoftentrainingagreement-basedconsistency
0
0 comments X
read the original abstract

Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization---a challenging setup that tests models on translation directions they have not been optimized for at training time. To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zero-shot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreement-based learning often results in 2-3 BLEU zero-shot improvement over strong baselines without any loss in performance on supervised translation directions.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models

    cs.CL 2019-06 unverdicted novelty 4.0

    Task-specific decoder parameters outperform fully shared decoder parameters in both supervised and zero-shot multilingual translation performance.

  2. Improving Zero-shot Translation with Language-Independent Constraints

    cs.CL 2019-06 unverdicted novelty 4.0

    Language-independent constraints and regularization in multilingual Transformer NMT yield a 2.23 BLEU average gain on zero-shot pairs from the IWSLT 2017 dataset.