Robust Machine Translation with Domain Sensitive Pseudo-Sources: Baidu-OSU WMT19 MT Robustness Shared Task System Report

Baigong Zheng; Hairong Liu; Liang Huang; Mingbo Ma; Renjie Zheng

arxiv: 1906.08393 · v2 · pith:FZ75BTRDnew · submitted 2019-06-19 · 💻 cs.CL

Robust Machine Translation with Domain Sensitive Pseudo-Sources: Baidu-OSU WMT19 MT Robustness Shared Task System Report

Renjie Zheng , Hairong Liu , Mingbo Ma , Baigong Zheng , Liang Huang This is my paper

Pith reviewed 2026-05-25 19:56 UTC · model grok-4.3

classification 💻 cs.CL

keywords machine translationdomain adaptationsocial mediarobustnessback-translationpseudo dataWMT shared task

0 comments

The pith

A domain-sensitive training approach with pseudo noisy sources improves English-French and French-English machine translation for social media by more than 10 BLEU points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper describes a system for translating social media text, which differs in style and contains noise compared to standard parallel corpora, and has very limited available data. The authors combine large amounts of parallel data from popular domains with limited social media data using domain-sensitive training. They also create additional training data by back-translating monolingual data into pseudo noisy sources with a similar domain-sensitive model. This leads to substantial improvements over baselines in the WMT 2019 robustness task.

Core claim

The system leverages a domain sensitive training method to combine large out-of-domain parallel data with limited in-domain social media data, and generates pseudo noisy source sentences via back-translation from monolingual data using a similarly trained model, achieving more than 10 BLEU improvement in En-Fr and Fr-En translation compared to baseline methods.

What carries the argument

Domain sensitive training procedure that transfers knowledge from out-of-domain corpora and generates pseudo-sources through back-translation.

If this is right

The method allows effective use of abundant out-of-domain data for low-resource noisy domains.
Back-translated pseudo noisy sources help model the noise characteristics of social media.
Performance gains hold for both translation directions between English and French.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar techniques could improve translation in other user-generated content domains such as forums or reviews.
The approach may reduce the need for large in-domain parallel corpora in new applications.
Further experiments could test the method's effectiveness on additional language pairs or noise types.

Load-bearing premise

The domain-sensitive training procedure can successfully transfer knowledge from large out-of-domain parallel corpora to the social media domain without introducing harmful mismatch or overfitting to the limited in-domain data.

What would settle it

A direct comparison showing that the proposed domain-sensitive method with pseudo-sources fails to produce more than 10 BLEU improvement over baselines on the En-Fr and Fr-En social media test sets would falsify the central claim.

read the original abstract

This paper describes the machine translation system developed jointly by Baidu Research and Oregon State University for WMT 2019 Machine Translation Robustness Shared Task. Translation of social media is a very challenging problem, since its style is very different from normal parallel corpora (e.g. News) and also include various types of noises. To make it worse, the amount of social media parallel corpora is extremely limited. In this paper, we use a domain sensitive training method which leverages a large amount of parallel data from popular domains together with a little amount of parallel data from social media. Furthermore, we generate a parallel dataset with pseudo noisy source sentences which are back-translated from monolingual data using a model trained by a similar domain sensitive way. We achieve more than 10 BLEU improvement in both En-Fr and Fr-En translation compared with the baseline methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard WMT shared-task system report that gets large BLEU gains on noisy social media MT by mixing domains and back-translating with a domain-sensitive model, but the abstract gives almost no experimental detail.

read the letter

The main point is that this Baidu-OSU entry for the WMT19 robustness task reports more than 10 BLEU improvement on En-Fr and Fr-En by training on a mix of large out-of-domain parallel data and limited social media data, then generating pseudo noisy sources via back-translation from a similarly domain-sensitive model. The approach directly targets the style mismatch and noise that make social media translation hard when in-domain parallels are scarce. What they do well is apply the domain-sensitive idea consistently to both the main model and the data augmentation step, which is a sensible practical extension of back-translation rather than a brand-new framework. The paper clearly states the problem and the constraints, and the claimed gains are large enough to be worth checking. The soft spots are straightforward. The abstract supplies no data sizes, baseline definitions, ablation results, or significance tests, so the central claim cannot be judged from what is shown. Without those details it is impossible to know whether the domain sensitivity itself drives the improvement or whether the gains come from more data and tuning. As a system report the work is incremental; it combines established techniques without new theory or first-principles derivation. This paper is for people who build or evaluate robust MT systems, especially for user-generated content or shared tasks. A reader already working on domain adaptation or back-translation might pick up a useful recipe, but it is unlikely to change how the field thinks about the problem. It deserves peer review because the task is real and the empirical result is presented as strong, even if the paper would need substantial revision for experimental transparency.

Referee Report

1 major / 0 minor

Summary. The manuscript describes the Baidu-OSU submission to the WMT 2019 MT Robustness Shared Task. It proposes domain-sensitive training that mixes large out-of-domain parallel corpora with limited social-media parallel data, plus generation of pseudo-noisy source sentences via back-translation from a similarly trained model, and reports more than 10 BLEU improvement over baselines for both En-Fr and Fr-En directions on social-media test data.

Significance. If the empirical gains prove robust under controlled ablations and statistical testing, the work would illustrate a practical, low-overhead recipe for transferring knowledge from abundant out-of-domain data to noisy, low-resource domains; such transfer methods remain relevant for production MT systems facing domain shift.

major comments (1)

[Abstract] Abstract: the central claim of '>10 BLEU improvement' is stated without any definition of the baseline systems, training hyperparameters, data sizes, or statistical significance tests, so the reported delta cannot be evaluated from the given text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive suggestion. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of '>10 BLEU improvement' is stated without any definition of the baseline systems, training hyperparameters, data sizes, or statistical significance tests, so the reported delta cannot be evaluated from the given text.

Authors: We agree that the abstract, as a standalone summary, would be clearer with additional context on the baselines. In the revised version we will add a sentence specifying that the baseline is a standard Transformer NMT model trained only on the large out-of-domain News parallel corpora (approximately 40M sentence pairs), while our system additionally incorporates the small social-media parallel set (approximately 100k pairs) plus the generated pseudo-noisy data. We will also note that the >10 BLEU gain is measured on the official WMT 2019 robustness test sets and is consistent with the detailed ablations and data-size information already reported in Sections 3 and 4. Because of the strict length limit of the abstract we cannot include full hyper-parameter tables or significance tests there, but we will reference the relevant sections. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical system report for a WMT shared task on robust MT. Its central claims consist of reported BLEU gains (>10 points) on En-Fr and Fr-En test sets obtained via domain-sensitive training on mixed corpora plus back-translated pseudo-sources. No equations, formal derivations, or fitted parameters are defined such that any reported quantity reduces to itself by construction. All performance numbers are external test-set measurements against baselines; the method description contains no self-definitional steps, self-citation load-bearing premises, or renaming of known results. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, background axioms, or newly postulated entities.

pith-pipeline@v0.9.0 · 5692 in / 1004 out tokens · 24042 ms · 2026-05-25T19:56:30.363903+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 12 internal anchors

[1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. 2013. How noisy social media text, how diffrnt social media sources? In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 356--364

work page 2013
[4]

Yonatan Belinkov and Yonatan Bisk. 2017. Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the Second Conference on Machine Translation, pages 118--126

work page 2017
[6]

Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. arXiv preprint arXiv:1701.03214

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. arXiv preprint arXiv:1806.00258

work page internal anchor Pith review Pith/arXiv arXiv 2018
[8]

Jacob Eisenstein. 2013. What to do about bad language on the internet. In Proceedings of the 2013 conference of the North American Chapter of the association for computational linguistics: Human language technologies, pages 359--369

work page 2013
[9]

Liang Huang, Kai Zhao, and Mingbo Ma. 2017. When to finish? optimal beam search for neural text generation (modulo beam size). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2134--2139

work page 2017
[10]

OpenNMT: Open-Source Toolkit for Neural Machine Translation

G. Klein , Y. Kim , Y. Deng , J. Senellart , and A. M. Rush . 2017. http://arxiv.org/abs/1701.02810 OpenNMT: Open-Source Toolkit for Neural Machine Translation . ArXiv e-prints

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, et al. 2018. Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5039--5049

work page 2018
[12]

Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M

Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir K. Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M. Pino, and Hassan Sajjad. 2019. Findings of the first shared task on machine translation robustness. In Proceedings of the Fourth Conference on Machine Translation, Volume 2: Shared Task Papers, Florence, Italy. Association ...

work page 2019
[13]

Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2019 a . Robust neural machine translation with joint textual and phonetic embedding. ACL

work page 2019
[14]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1--10

work page 2017
[15]

Yuchen Liu, Hao Xiong, Zhongjun He, Jiajun Zhang, Hua Wu, Haifeng Wang, and Chengqing Zong. 2019 b . End-to-end speech translation with knowledge distillation. arXiv preprint arXiv:1904.08075

work page internal anchor Pith review Pith/arXiv arXiv 2019
[16]

Mingbo Ma, Liang Huang, Hao Xiong, Renjie Zheng, Kaibo Liu, Baigong Zheng, Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, Hua Wu, and Haifeng Wang. 2018. Stacl: Simultaneous translation with integrated anticipation and controllable latency. ACL

work page 2018
[17]

Paul Michel and Graham Neubig. 2018. Mtnt: A testbed for machine translation of noisy text. arXiv preprint arXiv:1809.00388

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Matt Post. 2018. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015 a . Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709

work page internal anchor Pith review Pith/arXiv arXiv 2015
[20]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015 b . Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909

work page internal anchor Pith review Pith/arXiv arXiv 2015
[21]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh neural machine translation systems for wmt 16. arXiv preprint arXiv:1606.02891

work page internal anchor Pith review Pith/arXiv arXiv 2016
[22]

Sumeet Singh, Craig Stewart, Graham Neubig, et al. 2019. Improving robustness of machine translation with synthetic noise. arXiv preprint arXiv:1902.09508

work page internal anchor Pith review Pith/arXiv arXiv 2019
[23]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30

work page 2017
[24]

Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, and Eiichiro Sumita. 2017. Instance weighting for neural machine translation domain adaptation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1482--1488

work page 2017
[25]

Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018. Switchout: an efficient data augmentation algorithm for neural machine translation. arXiv preprint arXiv:1808.07512

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

Ziang Xie, Guillaume Genthial, Stanley Xie, Andrew Ng, and Dan Jurafsky. 2018. Noising and denoising natural language: Diverse backtranslation for grammar correction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 619--628

work page 2018
[27]

Baigong Zheng, Renjie Zheng, Mingbo Ma, and Liang Huang. 2019. Simultaneous translation with flexible policy via restricted imitation learning. ACL

work page 2019
[28]

Renjie Zheng, Junkun Chen, and Xipeng Qiu. 2018 a . Same representation, different attentions: shareable sentence representation learning from multiple tasks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 4616--4622. AAAI Press

work page 2018
[29]

Renjie Zheng, Mingbo Ma, and Liang Huang. 2018 b . Multi-reference training with pseudo-references for neural translation and text generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3188--3197

work page 2018
[30]

Renjie Zheng, Yilin Yang, Mingbo Ma, and Liang Huang. 2018 c . Ensemble sequence level training for multimodal mt: Osu-baidu wmt18 multimodal machine translation system report. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 632--636

work page 2018
[31]

Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. Neural system combination for machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 378--384

work page 2017

[1] [1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. 2013. How noisy social media text, how diffrnt social media sources? In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 356--364

work page 2013

[4] [4]

Yonatan Belinkov and Yonatan Bisk. 2017. Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173

work page internal anchor Pith review Pith/arXiv arXiv 2017

[5] [5]

Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the Second Conference on Machine Translation, pages 118--126

work page 2017

[6] [6]

Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. arXiv preprint arXiv:1701.03214

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. arXiv preprint arXiv:1806.00258

work page internal anchor Pith review Pith/arXiv arXiv 2018

[8] [8]

Jacob Eisenstein. 2013. What to do about bad language on the internet. In Proceedings of the 2013 conference of the North American Chapter of the association for computational linguistics: Human language technologies, pages 359--369

work page 2013

[9] [9]

Liang Huang, Kai Zhao, and Mingbo Ma. 2017. When to finish? optimal beam search for neural text generation (modulo beam size). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2134--2139

work page 2017

[10] [10]

OpenNMT: Open-Source Toolkit for Neural Machine Translation

G. Klein , Y. Kim , Y. Deng , J. Senellart , and A. M. Rush . 2017. http://arxiv.org/abs/1701.02810 OpenNMT: Open-Source Toolkit for Neural Machine Translation . ArXiv e-prints

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, et al. 2018. Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5039--5049

work page 2018

[12] [12]

Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M

Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir K. Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M. Pino, and Hassan Sajjad. 2019. Findings of the first shared task on machine translation robustness. In Proceedings of the Fourth Conference on Machine Translation, Volume 2: Shared Task Papers, Florence, Italy. Association ...

work page 2019

[13] [13]

Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2019 a . Robust neural machine translation with joint textual and phonetic embedding. ACL

work page 2019

[14] [14]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1--10

work page 2017

[15] [15]

Yuchen Liu, Hao Xiong, Zhongjun He, Jiajun Zhang, Hua Wu, Haifeng Wang, and Chengqing Zong. 2019 b . End-to-end speech translation with knowledge distillation. arXiv preprint arXiv:1904.08075

work page internal anchor Pith review Pith/arXiv arXiv 2019

[16] [16]

Mingbo Ma, Liang Huang, Hao Xiong, Renjie Zheng, Kaibo Liu, Baigong Zheng, Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, Hua Wu, and Haifeng Wang. 2018. Stacl: Simultaneous translation with integrated anticipation and controllable latency. ACL

work page 2018

[17] [17]

Paul Michel and Graham Neubig. 2018. Mtnt: A testbed for machine translation of noisy text. arXiv preprint arXiv:1809.00388

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Matt Post. 2018. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015 a . Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709

work page internal anchor Pith review Pith/arXiv arXiv 2015

[20] [20]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015 b . Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909

work page internal anchor Pith review Pith/arXiv arXiv 2015

[21] [21]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh neural machine translation systems for wmt 16. arXiv preprint arXiv:1606.02891

work page internal anchor Pith review Pith/arXiv arXiv 2016

[22] [22]

Sumeet Singh, Craig Stewart, Graham Neubig, et al. 2019. Improving robustness of machine translation with synthetic noise. arXiv preprint arXiv:1902.09508

work page internal anchor Pith review Pith/arXiv arXiv 2019

[23] [23]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30

work page 2017

[24] [24]

Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, and Eiichiro Sumita. 2017. Instance weighting for neural machine translation domain adaptation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1482--1488

work page 2017

[25] [25]

Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018. Switchout: an efficient data augmentation algorithm for neural machine translation. arXiv preprint arXiv:1808.07512

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

Ziang Xie, Guillaume Genthial, Stanley Xie, Andrew Ng, and Dan Jurafsky. 2018. Noising and denoising natural language: Diverse backtranslation for grammar correction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 619--628

work page 2018

[27] [27]

Baigong Zheng, Renjie Zheng, Mingbo Ma, and Liang Huang. 2019. Simultaneous translation with flexible policy via restricted imitation learning. ACL

work page 2019

[28] [28]

Renjie Zheng, Junkun Chen, and Xipeng Qiu. 2018 a . Same representation, different attentions: shareable sentence representation learning from multiple tasks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 4616--4622. AAAI Press

work page 2018

[29] [29]

Renjie Zheng, Mingbo Ma, and Liang Huang. 2018 b . Multi-reference training with pseudo-references for neural translation and text generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3188--3197

work page 2018

[30] [30]

Renjie Zheng, Yilin Yang, Mingbo Ma, and Liang Huang. 2018 c . Ensemble sequence level training for multimodal mt: Osu-baidu wmt18 multimodal machine translation system report. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 632--636

work page 2018

[31] [31]

Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. Neural system combination for machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 378--384

work page 2017