Improving Zero-shot Translation with Language-Independent Constraints

Alex Waibel; Jan Niehues; Ngoc-Quan Pham; Thanh-Le Ha

arxiv: 1906.08584 · v1 · pith:H4EQOZ5Knew · submitted 2019-06-20 · 💻 cs.CL

Improving Zero-shot Translation with Language-Independent Constraints

Ngoc-Quan Pham , Jan Niehues , Thanh-Le Ha , Alex Waibel This is my paper

Pith reviewed 2026-05-25 19:43 UTC · model grok-4.3

classification 💻 cs.CL

keywords zero-shot translationmultilingual NMTTransformer regularizationlanguage-independent constraintsIWSLT 2017neural machine translation

0 comments

The pith

Regularization constraints make multilingual NMT models robust for zero-shot translation between unseen language pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the zero-shot translation ability of multilingual neural machine translation models, where systems must handle language pairs absent from training data. It first tests an encoder built to be independent of the source language, revealing how models can learn shared multilingual representations. From this, the authors develop regularization methods applied to the Transformer that enforce language independence throughout the model. These changes produce an average 2.23 BLEU gain across 12 language pairs on the IWSLT 2017 dataset relative to a strong multilingual baseline, with gains holding even when multiple pivots are involved. A reader would care because the approach supplies a direct alternative to pivot-based translation and clarifies cross-language information flow inside the network.

Core claim

By first constructing a source-language-independent encoder and then introducing regularization methods that enforce language independence in the standard Transformer, the model becomes robust under zero-shot conditions and delivers an average improvement of 2.23 BLEU points across 12 language pairs on the IWSLT 2017 multilingual dataset compared with the zero-shot performance of a state-of-the-art multilingual system; the same effect is confirmed for language pairs that require multiple intermediate pivots.

What carries the argument

Language-independent constraints realized as regularization methods that encourage the production of representations independent of any specific language.

If this is right

The full architecture becomes more robust under zero-shot conditions.
Gains persist for language pairs that require multiple intermediate pivots.
The method supplies a direct alternative to pivot translation.
It yields clearer insight into how the model captures information across languages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same regularization approach could be tested on other multilingual sequence tasks such as classification or generation.
If the constraints generalize, training data requirements for covering many language pairs could be reduced.
Explicit independence penalties may prove useful in other encoder-decoder architectures beyond translation.

Load-bearing premise

The regularization methods enforce genuine language independence that improves zero-shot performance without hurting accuracy on language pairs seen during training.

What would settle it

Applying the same regularization methods to a different multilingual dataset and observing no consistent BLEU gains on its unseen language pairs would falsify the central claim.

Figures

Figures reproduced from arXiv: 1906.08584 by Alex Waibel, Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha.

**Figure 2.** Figure 2: Three different constraints for language-independent decoders. The model is run twice as translation [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The STAR setup (left) with English as the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages. In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds regularization to push a multilingual Transformer toward language-independent behavior and reports a 2.23 BLEU zero-shot lift on IWSLT 2017.

read the letter

The main thing to know is that they get a clear average gain of 2.23 BLEU across 12 zero-shot pairs by adding regularization that encourages language independence in the Transformer, on top of a multilingual baseline. They first test an explicitly source-language-independent encoder as a proof of concept, then carry the idea over to regularization terms in the standard model. The extra checks with multiple intermediate pivots are a useful addition and show the effect is not limited to the simplest cases. The work stays on public IWSLT 2017 data, so the numbers are at least comparable to other papers in the area. This is straightforward empirical work that targets a real practical problem in multilingual NMT. The regularization approach is presented as a new design choice rather than a direct lift from earlier papers, and the reported improvement is the central result. The abstract leaves out the exact formulation of the regularization and any ablation tables, so it is hard to judge how much of the gain traces to the new terms versus other tuning choices. IWSLT covers a limited set of mostly European languages, which means the zero-shot pairs may not be the hardest test cases. It would also help to see the supervised-direction numbers to confirm there is no hidden trade-off. Readers who build or tune multilingual systems will find the concrete method and the measured numbers worth checking. The paper shows clear thinking on the problem and honest engagement with the zero-shot setting, so it deserves referee time even if the gains turn out to be modest once the details are examined.

Referee Report

2 major / 2 minor

Summary. The paper investigates zero-shot translation in multilingual NMT. It first constructs a source-language-independent encoder as a proof of concept, then introduces regularization methods into the standard Transformer to promote language-independent representations. Experiments on the IWSLT 2017 multilingual dataset report an average gain of 2.23 BLEU on 12 zero-shot pairs relative to a state-of-the-art multilingual baseline, with further confirmation on pairs requiring multiple pivots.

Significance. If the gains prove robust under proper controls and ablations, the work offers a practical, data-free route to better zero-shot performance in multilingual NMT. The empirical focus on a public dataset and the explicit comparison to a strong baseline are strengths; the approach could reduce reliance on pivot translation for low-resource directions.

major comments (2)

[Abstract and Results] The abstract reports a 2.23 BLEU average gain but supplies no details on the precise regularization formulation, the exact language-pair splits used for training vs. zero-shot evaluation, or statistical significance of the improvements. These elements are load-bearing for the central empirical claim and must be presented with full training details and ablation tables.
[Experiments] It is unclear whether the reported supervised-pair performance remains unchanged or degrades after regularization; any claim that the method enforces language independence without harming seen directions requires explicit before/after numbers on the supervised directions.

minor comments (2)

[Methods] Notation for the regularization terms should be introduced consistently and tied to the equations in the methods section.
[Table 1 or equivalent] The paper should include a clear table listing all 12 zero-shot pairs, their pivot status, and the exact baseline system used for comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation of minor revision. We address each major comment below and will update the manuscript to incorporate the requested clarifications and additional results.

read point-by-point responses

Referee: [Abstract and Results] The abstract reports a 2.23 BLEU average gain but supplies no details on the precise regularization formulation, the exact language-pair splits used for training vs. zero-shot evaluation, or statistical significance of the improvements. These elements are load-bearing for the central empirical claim and must be presented with full training details and ablation tables.

Authors: We agree that the abstract is too concise and that the central claims require more supporting detail. In the revision we will expand the abstract to briefly describe the regularization formulation and the training/zero-shot splits. We will also add a dedicated subsection with full hyperparameter and training details, complete ablation tables, and statistical significance results computed via paired bootstrap resampling over the test sets. revision: yes
Referee: [Experiments] It is unclear whether the reported supervised-pair performance remains unchanged or degrades after regularization; any claim that the method enforces language independence without harming seen directions requires explicit before/after numbers on the supervised directions.

Authors: We acknowledge that the manuscript does not currently report supervised-direction results before and after regularization. We will add a table comparing BLEU scores on all supervised pairs for the baseline multilingual model versus the regularized models. If the numbers show any degradation, we will discuss it explicitly; otherwise we will note that performance is preserved within statistical noise. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical study that trains multilingual NMT models on IWSLT 2017, introduces regularization for language independence, and reports measured BLEU gains on zero-shot pairs. No derivation chain, equations, or first-principles results are claimed; the central result is an observed average +2.23 BLEU improvement that can be checked against the public dataset and baseline. No self-citation load-bearing steps, fitted inputs renamed as predictions, or ansatz smuggling appear in the provided text. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5731 in / 1078 out tokens · 29712 ms · 2026-05-25T19:43:14.882644+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/LogicAsFunctionalEquation.lean SatisfiesLawsOfLogic / Translation Theorem unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We achieve an average improvement of 2.23 BLEU points across 12 language pairs... by designing regularization methods into the standard Transformer model
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel / Jcost functional equation unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MSE loss on attention/decoder states to force language-independent representations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 14 internal anchors

[1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Maruan Al-Shedivat and Ankur P Parikh. 2019. Consistency by agreement in zero-shot neural machine translation. arXiv preprint arXiv:1904.02338

work page internal anchor Pith review Pith/arXiv arXiv 2019
[4]

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, and Wolfgang Macherey. 2019. The missing ingredient in zero-shot neural machine translation. arXiv preprint arXiv:1903.07091

work page internal anchor Pith review Pith/arXiv arXiv 2019
[5]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014
[6]

Mauro Cettolo, Marcello Federico, Luisa Bentivogli, Niehues Jan, St \"u ker Sebastian, Sudoh Katsuitho, Yoshino Koichiro, and Federmann Christian. 2017. Overview of the iwslt 2017 evaluation campaign. In International Workshop on Spoken Language Translation, pages 2--14

work page 2017
[7]

Yun Chen, Yang Liu, Yong Cheng, and Victor OK Li. 2017. A teacher-student framework for zero-resource neural machine translation. arXiv preprint arXiv:1705.00753

work page internal anchor Pith review Pith/arXiv arXiv 2017
[8]

Yun Chen, Yang Liu, and Victor OK Li. 2018. Zero-resource neural machine translation with multi-agent communication game. In Thirty-Second AAAI Conference on Artificial Intelligence

work page 2018
[9]

Raj Dabre, Fabien Cromieres, and Sadao Kurohashi. 2017. Kyoto university mt system description for iwslt 2017. Proc. of IWSLT, Tokyo, Japan

work page 2017
[10]

Tobias Domhan and Felix Hieber. 2017. Using target-side monolingual data for neural machine translation through multi-task learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1500--1505

work page 2017
[11]

Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Advances in neural information processing systems, pages 1019--1027

work page 2016
[13]

Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor OK Li. 2018. Universal neural machine translation for extremely low resource languages. arXiv preprint arXiv:1802.05368

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

Thanh-Le Ha, Jan Niehues, and Alexander Waibel. 2016. Toward multilingual neural machine translation with universal encoder and decoder. In Proceedings of the 13th International Workshop on Spoken Language Translation (IWSLT 2016), Seattle, USA

work page 2016
[15]

Thanh-Le Ha, Jan Niehues, and Alexander Waibel. 2017. Effective strategies in zero-shot neural machine translation. arXiv preprint arXiv:1711.07893

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Advances in Neural Information Processing Systems, pages 820--828

work page 2016
[17]

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. B. Viegas, M. Wattenberg, G. Corrado, M. Hughes, and J. Dean. 2016. Google s multilingual neural machine translation system: Enabling zero-shot translation. CoRR, abs/1611.04558

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In EMNLP, volume 3, page 413

work page 2013
[19]

Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099

work page internal anchor Pith review Pith/arXiv arXiv 2016
[20]

Yunsu Kim, Yingbo Gao, and Hermann Ney. 2019. Effective cross-lingual transfer of neural machine translation models without shared vocabularies. In Proceedings of the Annual Meeting on Association for Computational Linguistics (ACL 2019)

work page 2019
[21]

Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan Zhang, and Jason Sun. 2018. A neural interlingua for multilingual machine translation. arXiv preprint arXiv:1804.08198

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025

work page internal anchor Pith review Pith/arXiv arXiv 2015
[23]

Jan Niehues and Eunah Cho. 2017. Exploiting linguistic resources for neural machine translation using multi-task learning. In Proceedings of the Second Conference on Machine Translation, pages 80--89

work page 2017
[24]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch

work page 2017
[25]

Emmanouil Antonios Platanios, Mrinmaya Sachan, Graham Neubig, and Tom Mitchell. 2018. Contextual parameter generation for universal neural machine translation. arXiv preprint arXiv:1808.08493

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

Holger Schwenk and Matthijs Douze. 2017. Learning joint multilingual sentence representations with neural machine translation. arXiv preprint arXiv:1704.04154

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Sutskever, O

I. Sutskever, O. Vinyals, and Q. V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, pages 3104--3112, Quebec, Canada

work page 2014
[28]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

Rui Wang, Andrew Finch, Masao Utiyama, and Eiichiro Sumita. 2017. https://doi.org/10.18653/v1/P17-2089 Sentence embedding for neural machine translation domain adaptation . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 560--566, Vancouver, Canada. Association for Computational Li...

work page doi:10.18653/v1/p17-2089 2017
[30]

Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2016), pages 1568--1575, Austin, USA

work page 2016

[1] [1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Maruan Al-Shedivat and Ankur P Parikh. 2019. Consistency by agreement in zero-shot neural machine translation. arXiv preprint arXiv:1904.02338

work page internal anchor Pith review Pith/arXiv arXiv 2019

[4] [4]

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, and Wolfgang Macherey. 2019. The missing ingredient in zero-shot neural machine translation. arXiv preprint arXiv:1903.07091

work page internal anchor Pith review Pith/arXiv arXiv 2019

[5] [5]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014

[6] [6]

Mauro Cettolo, Marcello Federico, Luisa Bentivogli, Niehues Jan, St \"u ker Sebastian, Sudoh Katsuitho, Yoshino Koichiro, and Federmann Christian. 2017. Overview of the iwslt 2017 evaluation campaign. In International Workshop on Spoken Language Translation, pages 2--14

work page 2017

[7] [7]

Yun Chen, Yang Liu, Yong Cheng, and Victor OK Li. 2017. A teacher-student framework for zero-resource neural machine translation. arXiv preprint arXiv:1705.00753

work page internal anchor Pith review Pith/arXiv arXiv 2017

[8] [8]

Yun Chen, Yang Liu, and Victor OK Li. 2018. Zero-resource neural machine translation with multi-agent communication game. In Thirty-Second AAAI Conference on Artificial Intelligence

work page 2018

[9] [9]

Raj Dabre, Fabien Cromieres, and Sadao Kurohashi. 2017. Kyoto university mt system description for iwslt 2017. Proc. of IWSLT, Tokyo, Japan

work page 2017

[10] [10]

Tobias Domhan and Felix Hieber. 2017. Using target-side monolingual data for neural machine translation through multi-task learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1500--1505

work page 2017

[11] [11]

Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Advances in neural information processing systems, pages 1019--1027

work page 2016

[13] [13]

Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor OK Li. 2018. Universal neural machine translation for extremely low resource languages. arXiv preprint arXiv:1802.05368

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

Thanh-Le Ha, Jan Niehues, and Alexander Waibel. 2016. Toward multilingual neural machine translation with universal encoder and decoder. In Proceedings of the 13th International Workshop on Spoken Language Translation (IWSLT 2016), Seattle, USA

work page 2016

[15] [15]

Thanh-Le Ha, Jan Niehues, and Alexander Waibel. 2017. Effective strategies in zero-shot neural machine translation. arXiv preprint arXiv:1711.07893

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Advances in Neural Information Processing Systems, pages 820--828

work page 2016

[17] [17]

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. B. Viegas, M. Wattenberg, G. Corrado, M. Hughes, and J. Dean. 2016. Google s multilingual neural machine translation system: Enabling zero-shot translation. CoRR, abs/1611.04558

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In EMNLP, volume 3, page 413

work page 2013

[19] [19]

Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099

work page internal anchor Pith review Pith/arXiv arXiv 2016

[20] [20]

Yunsu Kim, Yingbo Gao, and Hermann Ney. 2019. Effective cross-lingual transfer of neural machine translation models without shared vocabularies. In Proceedings of the Annual Meeting on Association for Computational Linguistics (ACL 2019)

work page 2019

[21] [21]

Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan Zhang, and Jason Sun. 2018. A neural interlingua for multilingual machine translation. arXiv preprint arXiv:1804.08198

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025

work page internal anchor Pith review Pith/arXiv arXiv 2015

[23] [23]

Jan Niehues and Eunah Cho. 2017. Exploiting linguistic resources for neural machine translation using multi-task learning. In Proceedings of the Second Conference on Machine Translation, pages 80--89

work page 2017

[24] [24]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch

work page 2017

[25] [25]

Emmanouil Antonios Platanios, Mrinmaya Sachan, Graham Neubig, and Tom Mitchell. 2018. Contextual parameter generation for universal neural machine translation. arXiv preprint arXiv:1808.08493

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

Holger Schwenk and Matthijs Douze. 2017. Learning joint multilingual sentence representations with neural machine translation. arXiv preprint arXiv:1704.04154

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

Sutskever, O

I. Sutskever, O. Vinyals, and Q. V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, pages 3104--3112, Quebec, Canada

work page 2014

[28] [28]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017

[29] [29]

Rui Wang, Andrew Finch, Masao Utiyama, and Eiichiro Sumita. 2017. https://doi.org/10.18653/v1/P17-2089 Sentence embedding for neural machine translation domain adaptation . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 560--566, Vancouver, Canada. Association for Computational Li...

work page doi:10.18653/v1/p17-2089 2017

[30] [30]

Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2016), pages 1568--1575, Austin, USA

work page 2016