Robust Machine Translation with Domain Sensitive Pseudo-Sources: Baidu-OSU WMT19 MT Robustness Shared Task System Report
Pith reviewed 2026-05-25 19:56 UTC · model grok-4.3
The pith
A domain-sensitive training approach with pseudo noisy sources improves English-French and French-English machine translation for social media by more than 10 BLEU points.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The system leverages a domain sensitive training method to combine large out-of-domain parallel data with limited in-domain social media data, and generates pseudo noisy source sentences via back-translation from monolingual data using a similarly trained model, achieving more than 10 BLEU improvement in En-Fr and Fr-En translation compared to baseline methods.
What carries the argument
Domain sensitive training procedure that transfers knowledge from out-of-domain corpora and generates pseudo-sources through back-translation.
If this is right
- The method allows effective use of abundant out-of-domain data for low-resource noisy domains.
- Back-translated pseudo noisy sources help model the noise characteristics of social media.
- Performance gains hold for both translation directions between English and French.
Where Pith is reading between the lines
- Similar techniques could improve translation in other user-generated content domains such as forums or reviews.
- The approach may reduce the need for large in-domain parallel corpora in new applications.
- Further experiments could test the method's effectiveness on additional language pairs or noise types.
Load-bearing premise
The domain-sensitive training procedure can successfully transfer knowledge from large out-of-domain parallel corpora to the social media domain without introducing harmful mismatch or overfitting to the limited in-domain data.
What would settle it
A direct comparison showing that the proposed domain-sensitive method with pseudo-sources fails to produce more than 10 BLEU improvement over baselines on the En-Fr and Fr-En social media test sets would falsify the central claim.
read the original abstract
This paper describes the machine translation system developed jointly by Baidu Research and Oregon State University for WMT 2019 Machine Translation Robustness Shared Task. Translation of social media is a very challenging problem, since its style is very different from normal parallel corpora (e.g. News) and also include various types of noises. To make it worse, the amount of social media parallel corpora is extremely limited. In this paper, we use a domain sensitive training method which leverages a large amount of parallel data from popular domains together with a little amount of parallel data from social media. Furthermore, we generate a parallel dataset with pseudo noisy source sentences which are back-translated from monolingual data using a model trained by a similar domain sensitive way. We achieve more than 10 BLEU improvement in both En-Fr and Fr-En translation compared with the baseline methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the Baidu-OSU submission to the WMT 2019 MT Robustness Shared Task. It proposes domain-sensitive training that mixes large out-of-domain parallel corpora with limited social-media parallel data, plus generation of pseudo-noisy source sentences via back-translation from a similarly trained model, and reports more than 10 BLEU improvement over baselines for both En-Fr and Fr-En directions on social-media test data.
Significance. If the empirical gains prove robust under controlled ablations and statistical testing, the work would illustrate a practical, low-overhead recipe for transferring knowledge from abundant out-of-domain data to noisy, low-resource domains; such transfer methods remain relevant for production MT systems facing domain shift.
major comments (1)
- [Abstract] Abstract: the central claim of '>10 BLEU improvement' is stated without any definition of the baseline systems, training hyperparameters, data sizes, or statistical significance tests, so the reported delta cannot be evaluated from the given text.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive suggestion. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of '>10 BLEU improvement' is stated without any definition of the baseline systems, training hyperparameters, data sizes, or statistical significance tests, so the reported delta cannot be evaluated from the given text.
Authors: We agree that the abstract, as a standalone summary, would be clearer with additional context on the baselines. In the revised version we will add a sentence specifying that the baseline is a standard Transformer NMT model trained only on the large out-of-domain News parallel corpora (approximately 40M sentence pairs), while our system additionally incorporates the small social-media parallel set (approximately 100k pairs) plus the generated pseudo-noisy data. We will also note that the >10 BLEU gain is measured on the official WMT 2019 robustness test sets and is consistent with the detailed ablations and data-size information already reported in Sections 3 and 4. Because of the strict length limit of the abstract we cannot include full hyper-parameter tables or significance tests there, but we will reference the relevant sections. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is an empirical system report for a WMT shared task on robust MT. Its central claims consist of reported BLEU gains (>10 points) on En-Fr and Fr-En test sets obtained via domain-sensitive training on mixed corpora plus back-translated pseudo-sources. No equations, formal derivations, or fitted parameters are defined such that any reported quantity reduces to itself by construction. All performance numbers are external test-set measurements against baselines; the method description contains no self-definitional steps, self-citation load-bearing premises, or renaming of known results. The derivation chain is therefore self-contained and non-circular.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. 2013. How noisy social media text, how diffrnt social media sources? In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 356--364
work page 2013
-
[4]
Yonatan Belinkov and Yonatan Bisk. 2017. Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the Second Conference on Machine Translation, pages 118--126
work page 2017
-
[6]
Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. arXiv preprint arXiv:1701.03214
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. arXiv preprint arXiv:1806.00258
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[8]
Jacob Eisenstein. 2013. What to do about bad language on the internet. In Proceedings of the 2013 conference of the North American Chapter of the association for computational linguistics: Human language technologies, pages 359--369
work page 2013
-
[9]
Liang Huang, Kai Zhao, and Mingbo Ma. 2017. When to finish? optimal beam search for neural text generation (modulo beam size). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2134--2139
work page 2017
-
[10]
OpenNMT: Open-Source Toolkit for Neural Machine Translation
G. Klein , Y. Kim , Y. Deng , J. Senellart , and A. M. Rush . 2017. http://arxiv.org/abs/1701.02810 OpenNMT: Open-Source Toolkit for Neural Machine Translation . ArXiv e-prints
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[11]
Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, et al. 2018. Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5039--5049
work page 2018
-
[12]
Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M
Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir K. Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M. Pino, and Hassan Sajjad. 2019. Findings of the first shared task on machine translation robustness. In Proceedings of the Fourth Conference on Machine Translation, Volume 2: Shared Task Papers, Florence, Italy. Association ...
work page 2019
-
[13]
Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2019 a . Robust neural machine translation with joint textual and phonetic embedding. ACL
work page 2019
-
[14]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1--10
work page 2017
-
[15]
Yuchen Liu, Hao Xiong, Zhongjun He, Jiajun Zhang, Hua Wu, Haifeng Wang, and Chengqing Zong. 2019 b . End-to-end speech translation with knowledge distillation. arXiv preprint arXiv:1904.08075
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[16]
Mingbo Ma, Liang Huang, Hao Xiong, Renjie Zheng, Kaibo Liu, Baigong Zheng, Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, Hua Wu, and Haifeng Wang. 2018. Stacl: Simultaneous translation with integrated anticipation and controllable latency. ACL
work page 2018
-
[17]
Paul Michel and Graham Neubig. 2018. Mtnt: A testbed for machine translation of noisy text. arXiv preprint arXiv:1809.00388
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
Matt Post. 2018. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015 a . Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[20]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015 b . Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[21]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh neural machine translation systems for wmt 16. arXiv preprint arXiv:1606.02891
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[22]
Sumeet Singh, Craig Stewart, Graham Neubig, et al. 2019. Improving robustness of machine translation with synthetic noise. arXiv preprint arXiv:1902.09508
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[23]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30
work page 2017
-
[24]
Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, and Eiichiro Sumita. 2017. Instance weighting for neural machine translation domain adaptation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1482--1488
work page 2017
-
[25]
Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018. Switchout: an efficient data augmentation algorithm for neural machine translation. arXiv preprint arXiv:1808.07512
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[26]
Ziang Xie, Guillaume Genthial, Stanley Xie, Andrew Ng, and Dan Jurafsky. 2018. Noising and denoising natural language: Diverse backtranslation for grammar correction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 619--628
work page 2018
-
[27]
Baigong Zheng, Renjie Zheng, Mingbo Ma, and Liang Huang. 2019. Simultaneous translation with flexible policy via restricted imitation learning. ACL
work page 2019
-
[28]
Renjie Zheng, Junkun Chen, and Xipeng Qiu. 2018 a . Same representation, different attentions: shareable sentence representation learning from multiple tasks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 4616--4622. AAAI Press
work page 2018
-
[29]
Renjie Zheng, Mingbo Ma, and Liang Huang. 2018 b . Multi-reference training with pseudo-references for neural translation and text generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3188--3197
work page 2018
-
[30]
Renjie Zheng, Yilin Yang, Mingbo Ma, and Liang Huang. 2018 c . Ensemble sequence level training for multimodal mt: Osu-baidu wmt18 multimodal machine translation system report. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 632--636
work page 2018
-
[31]
Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. Neural system combination for machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 378--384
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.