CUNI System for the WMT19 Robustness Task
Pith reviewed 2026-05-25 18:50 UTC · model grok-4.3
The pith
A Transformer model trained on clean news data is already far more robust to noisy input than an LSTM baseline, and can be further improved by fine-tuning on noisy data without harming clean-domain quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Quantitative results show that the CUNI Transformer system is already far more robust to noisy input than the LSTM-based baseline provided by the task organizers. We further improved the performance of our model by fine-tuning on the in-domain noisy data without influencing the translation quality on the news domain.
What carries the argument
CUNI Transformer neural machine translation model fine-tuned on noisy in-domain data.
If this is right
- Transformer models exhibit greater robustness to noisy input than LSTM models in machine translation.
- Fine-tuning on noisy data raises accuracy on noisy test sets.
- News-domain performance remains stable after the noisy fine-tuning step.
Where Pith is reading between the lines
- The same fine-tuning recipe might transfer to other noisy-input sequence tasks such as speech recognition or summarization.
- If confirmed across more noise distributions, it would reduce the need for separate robustness modules in production MT systems.
- Comparing the effect on different noise types (typos, ASR errors, code-switching) would test how general the observed robustness is.
Load-bearing premise
Fine-tuning on noisy data leaves translation quality on clean news data unchanged.
What would settle it
Measuring a drop in BLEU on a standard news test set after the noisy-data fine-tuning step would show that the no-influence claim does not hold.
Figures
read the original abstract
We present our submission to the WMT19 Robustness Task. Our baseline system is the Charles University (CUNI) Transformer system trained for the WMT18 shared task on News Translation. Quantitative results show that the CUNI Transformer system is already far more robust to noisy input than the LSTM-based baseline provided by the task organizers. We further improved the performance of our model by fine-tuning on the in-domain noisy data without influencing the translation quality on the news domain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports the CUNI submission to the WMT19 Robustness Task. The baseline is the authors' WMT18 News Translation Transformer system, which is stated to be substantially more robust to noisy input than the task-provided LSTM baseline. Fine-tuning this model on in-domain noisy data is reported to yield further robustness gains while leaving translation quality on the news domain unchanged.
Significance. If the reported scores are accurate and the evaluation protocol is sound, the work supplies a concrete empirical data point on the relative robustness of Transformer versus LSTM architectures in MT and on the utility of targeted fine-tuning for robustness without clean-domain degradation. This is useful for practitioners building production MT systems that must handle noisy input.
major comments (1)
- [Abstract] Abstract, final sentence: the central claim that fine-tuning on noisy data leaves news-domain quality unchanged rests on an unstated verification procedure, test-set split, and before/after metric values. Without these details the claim cannot be assessed and is load-bearing for the paper's contribution.
minor comments (2)
- The manuscript should include the actual BLEU (or other metric) values, standard deviations if available, and the exact news-domain test set used for the 'no degradation' claim, preferably in a dedicated results table.
- Clarify the source and size of the 'in-domain noisy data' used for fine-tuning and whether any overlap exists with the robustness test sets.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on the abstract. We address the point below and will revise accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract, final sentence: the central claim that fine-tuning on noisy data leaves news-domain quality unchanged rests on an unstated verification procedure, test-set split, and before/after metric values. Without these details the claim cannot be assessed and is load-bearing for the paper's contribution.
Authors: We agree that the abstract claim requires explicit supporting details to be assessable. The verification used the WMT18 news test set and the BLEU metric; before/after scores will be reported in the revised abstract and methods section. This is a straightforward addition that does not alter the empirical findings. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is a standard empirical system description for a shared task. It reports observed BLEU scores on robustness test sets for a pre-existing Transformer baseline (from WMT18) and after fine-tuning on noisy data. No equations, derivations, predictions, or first-principles claims exist. The baseline citation is descriptive only and does not support any load-bearing theoretical step. All central claims reduce directly to measured performance numbers on external test data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. https://arxiv.org/abs/1409.0473 Neural machine translation by jointly learning to align and translate . CoRR, abs/1409.0473
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[4]
Yonatan Belinkov and Yonatan Bisk. 2018. https://openreview.net/forum?id=BJ8vJebC- Synthetic and natural noise both break neural machine translation . In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings
work page 2018
-
[5]
Ond r ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, and Marco Turchi. 2015. https://doi.org/10.18653/v1/W15-3001 Findings of the 2015 workshop on statistical machine translation . In Proceedings of the Ten...
-
[6]
Ond r ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. https://www.aclweb.org/anthology/W18-6401 Findings of the 2018 conference on machine translation ( WMT 18) . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 272--303, Belgium, Brussels. Association...
work page 2018
-
[7]
Ond r ej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, and Ale s Tamchyna. 2014. https://doi.org/10.3115/v1/W14-3302 Findings of the 2014 workshop on statistical machine translation . In Proceedings of the Ninth Workshop on ...
-
[8]
Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. 2018. https://www.aclweb.org/anthology/P18-1163 Towards robust neural machine translation . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1756--1766, Melbourne, Australia. Association for Computational Linguistics
work page 2018
-
[9]
Huda Khayrallah and Philipp Koehn. 2018. https://www.aclweb.org/anthology/W18-2709 On the impact of various types of noise on neural machine translation . In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 74--83, Melbourne, Australia. Association for Computational Linguistics
work page 2018
-
[10]
Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M
Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir K. Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M. Pino, and Hassan Sajjad. 2019. Findings of the first shared task on machine translation robustness. In Proceedings of the Fourth Conference on Machine Translation, Volume 2: Shared Task Papers, Florence, Italy. Association ...
work page 2019
-
[11]
Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2018. http://arxiv.org/abs/1810.06729 Robust neural machine translation with joint textual and phonetic embedding . CoRR, abs/1810.06729
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[12]
Paul Michel and Graham Neubig. 2018. https://www.aclweb.org/anthology/D18-1050 MTNT : A testbed for machine translation of noisy text . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 543--553, Brussels, Belgium. Association for Computational Linguistics
work page 2018
-
[13]
Martin Popel. 2018. https://www.aclweb.org/anthology/W18-6424 CUNI transformer neural MT system for WMT 18 . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 482--487, Belgium, Brussels. Association for Computational Linguistics
work page 2018
-
[14]
Martin Popel and Ond r ej Bojar. 2018. https://doi.org/10.2478/pralin-2018-0002 Training Tips for the Transformer Model . The Prague Bulletin of Mathematical Linguistics, 110:43--70
-
[15]
Vaibhav, Sumeet Singh, Craig Stewart, and Graham Neubig. 2019. Improving robustness of machine translation with synthetic noise. CoRR, abs/1902.09508
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[16]
Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan Gomez, Stephan Gouws, Llion Jones, ukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. https://www.aclweb.org/anthology/W18-1819 T ensor2 T ensor for neural machine translation . In Proceedings of the 13th Conference of the Association for M...
work page 2018
-
[17]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, pages 6000--6010, Long Beach, CA, USA. Curran Associates, Inc
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.