CUNI System for the WMT19 Robustness Task

Jind\v{r}ich Helcl; Jind\v{r}ich Libovick\'y; Martin Popel

arxiv: 1906.09246 · v1 · pith:JZLE5QSTnew · submitted 2019-06-21 · 💻 cs.CL

CUNI System for the WMT19 Robustness Task

Jind\v{r}ich Helcl , Jind\v{r}ich Libovick\'y , Martin Popel This is my paper

Pith reviewed 2026-05-25 18:50 UTC · model grok-4.3

classification 💻 cs.CL

keywords machine translationrobustnesstransformernoisy inputfine-tuningWMT19

0 comments

The pith

A Transformer model trained on clean news data is already far more robust to noisy input than an LSTM baseline, and can be further improved by fine-tuning on noisy data without harming clean-domain quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports the CUNI submission to the WMT19 Robustness Task in machine translation. Their existing Transformer system, trained only on clean news data for a prior shared task, already outperforms the task's LSTM baseline on noisy inputs by a wide margin. Additional fine-tuning on in-domain noisy data raises performance on the robustness test set while leaving news-domain translation quality unchanged.

Core claim

Quantitative results show that the CUNI Transformer system is already far more robust to noisy input than the LSTM-based baseline provided by the task organizers. We further improved the performance of our model by fine-tuning on the in-domain noisy data without influencing the translation quality on the news domain.

What carries the argument

CUNI Transformer neural machine translation model fine-tuned on noisy in-domain data.

If this is right

Transformer models exhibit greater robustness to noisy input than LSTM models in machine translation.
Fine-tuning on noisy data raises accuracy on noisy test sets.
News-domain performance remains stable after the noisy fine-tuning step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fine-tuning recipe might transfer to other noisy-input sequence tasks such as speech recognition or summarization.
If confirmed across more noise distributions, it would reduce the need for separate robustness modules in production MT systems.
Comparing the effect on different noise types (typos, ASR errors, code-switching) would test how general the observed robustness is.

Load-bearing premise

Fine-tuning on noisy data leaves translation quality on clean news data unchanged.

What would settle it

Measuring a drop in BLEU on a standard news test set after the noisy-data fine-tuning step would show that the no-influence claim does not hold.

Figures

Figures reproduced from arXiv: 1906.09246 by Jind\v{r}ich Helcl, Jind\v{r}ich Libovick\'y, Martin Popel.

read the original abstract

We present our submission to the WMT19 Robustness Task. Our baseline system is the Charles University (CUNI) Transformer system trained for the WMT18 shared task on News Translation. Quantitative results show that the CUNI Transformer system is already far more robust to noisy input than the LSTM-based baseline provided by the task organizers. We further improved the performance of our model by fine-tuning on the in-domain noisy data without influencing the translation quality on the news domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard WMT system paper showing their prior Transformer already beats the LSTM baseline on noise and that fine-tuning helps without hurting clean news quality, but the method is routine with no new ideas.

read the letter

The main takeaway is that their WMT18 Transformer system already shows much better robustness to noisy input than the LSTM baseline from the task organizers. Fine-tuning on the provided noisy data then lifts performance on the robustness sets while leaving news-domain quality unchanged. That combination of observations is the paper's concrete contribution as a shared-task submission. It gives other teams a practical data point on how a strong existing model behaves under noise and how simple adaptation can improve it further. Checking that clean performance does not drop is a sensible control that many adaptation papers omit. The work is honest about what it did and sticks to reporting the observed scores from their submission. The approach itself is not new. Fine-tuning on in-domain or noisy data is a standard technique already used across MT papers, and the authors do not add new components, analysis of why the Transformer is more robust, or comparisons to other adaptation strategies. The abstract states the claims clearly, but the strength of the evidence depends on the full tables and evaluation details that are not visible here. As a factual report of one team's results on the task data, the central claims do not require hidden assumptions to hold. This paper is mainly useful for people who work on WMT tasks or who need quick, reproducible ways to make MT systems more tolerant of noisy real-world input. It is not aimed at readers looking for methodological advances or general insights into robustness. I would not bring it to a reading group. I would not cite it in my own work. It still deserves peer review as a system description because shared-task papers are the normal channel for documenting these kinds of empirical results and the claims appear to be direct reports of measured scores rather than overstated generalizations.

Referee Report

1 major / 2 minor

Summary. The manuscript reports the CUNI submission to the WMT19 Robustness Task. The baseline is the authors' WMT18 News Translation Transformer system, which is stated to be substantially more robust to noisy input than the task-provided LSTM baseline. Fine-tuning this model on in-domain noisy data is reported to yield further robustness gains while leaving translation quality on the news domain unchanged.

Significance. If the reported scores are accurate and the evaluation protocol is sound, the work supplies a concrete empirical data point on the relative robustness of Transformer versus LSTM architectures in MT and on the utility of targeted fine-tuning for robustness without clean-domain degradation. This is useful for practitioners building production MT systems that must handle noisy input.

major comments (1)

[Abstract] Abstract, final sentence: the central claim that fine-tuning on noisy data leaves news-domain quality unchanged rests on an unstated verification procedure, test-set split, and before/after metric values. Without these details the claim cannot be assessed and is load-bearing for the paper's contribution.

minor comments (2)

The manuscript should include the actual BLEU (or other metric) values, standard deviations if available, and the exact news-domain test set used for the 'no degradation' claim, preferably in a dedicated results table.
Clarify the source and size of the 'in-domain noisy data' used for fine-tuning and whether any overlap exists with the robustness test sets.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We address the point below and will revise accordingly.

read point-by-point responses

Referee: [Abstract] Abstract, final sentence: the central claim that fine-tuning on noisy data leaves news-domain quality unchanged rests on an unstated verification procedure, test-set split, and before/after metric values. Without these details the claim cannot be assessed and is load-bearing for the paper's contribution.

Authors: We agree that the abstract claim requires explicit supporting details to be assessable. The verification used the WMT18 news test set and the BLEU metric; before/after scores will be reported in the revised abstract and methods section. This is a straightforward addition that does not alter the empirical findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a standard empirical system description for a shared task. It reports observed BLEU scores on robustness test sets for a pre-existing Transformer baseline (from WMT18) and after fine-tuning on noisy data. No equations, derivations, predictions, or first-principles claims exist. The baseline citation is descriptive only and does not support any load-bearing theoretical step. All central claims reduce directly to measured performance numbers on external test data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work rests on standard neural MT assumptions (Transformer architecture, supervised fine-tuning improves robustness) without introducing new free parameters or entities visible in the abstract.

pith-pipeline@v0.9.0 · 5607 in / 1018 out tokens · 22647 ms · 2026-05-25T18:50:59.711890+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 3 internal anchors

[1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. https://arxiv.org/abs/1409.0473 Neural machine translation by jointly learning to align and translate . CoRR, abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014
[4]

Yonatan Belinkov and Yonatan Bisk. 2018. https://openreview.net/forum?id=BJ8vJebC- Synthetic and natural noise both break neural machine translation . In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings

work page 2018
[5]

Ond r ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, and Marco Turchi. 2015. https://doi.org/10.18653/v1/W15-3001 Findings of the 2015 workshop on statistical machine translation . In Proceedings of the Ten...

work page doi:10.18653/v1/w15-3001 2015
[6]

Ond r ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. https://www.aclweb.org/anthology/W18-6401 Findings of the 2018 conference on machine translation ( WMT 18) . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 272--303, Belgium, Brussels. Association...

work page 2018
[7]

Ond r ej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, and Ale s Tamchyna. 2014. https://doi.org/10.3115/v1/W14-3302 Findings of the 2014 workshop on statistical machine translation . In Proceedings of the Ninth Workshop on ...

work page doi:10.3115/v1/w14-3302 2014
[8]

Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. 2018. https://www.aclweb.org/anthology/P18-1163 Towards robust neural machine translation . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1756--1766, Melbourne, Australia. Association for Computational Linguistics

work page 2018
[9]

Huda Khayrallah and Philipp Koehn. 2018. https://www.aclweb.org/anthology/W18-2709 On the impact of various types of noise on neural machine translation . In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 74--83, Melbourne, Australia. Association for Computational Linguistics

work page 2018
[10]

Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M

Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir K. Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M. Pino, and Hassan Sajjad. 2019. Findings of the first shared task on machine translation robustness. In Proceedings of the Fourth Conference on Machine Translation, Volume 2: Shared Task Papers, Florence, Italy. Association ...

work page 2019
[11]

Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2018. http://arxiv.org/abs/1810.06729 Robust neural machine translation with joint textual and phonetic embedding . CoRR, abs/1810.06729

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Paul Michel and Graham Neubig. 2018. https://www.aclweb.org/anthology/D18-1050 MTNT : A testbed for machine translation of noisy text . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 543--553, Brussels, Belgium. Association for Computational Linguistics

work page 2018
[13]

Martin Popel. 2018. https://www.aclweb.org/anthology/W18-6424 CUNI transformer neural MT system for WMT 18 . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 482--487, Belgium, Brussels. Association for Computational Linguistics

work page 2018
[14]

Martin Popel and Ond r ej Bojar. 2018. https://doi.org/10.2478/pralin-2018-0002 Training Tips for the Transformer Model . The Prague Bulletin of Mathematical Linguistics, 110:43--70

work page doi:10.2478/pralin-2018-0002 2018
[15]

Vaibhav, Sumeet Singh, Craig Stewart, and Graham Neubig. 2019. Improving robustness of machine translation with synthetic noise. CoRR, abs/1902.09508

work page internal anchor Pith review Pith/arXiv arXiv 2019
[16]

Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan Gomez, Stephan Gouws, Llion Jones, ukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. https://www.aclweb.org/anthology/W18-1819 T ensor2 T ensor for neural machine translation . In Proceedings of the 13th Conference of the Association for M...

work page 2018
[17]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, pages 6000--6010, Long Beach, CA, USA. Curran Associates, Inc

work page 2017

[1] [1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. https://arxiv.org/abs/1409.0473 Neural machine translation by jointly learning to align and translate . CoRR, abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014

[4] [4]

Yonatan Belinkov and Yonatan Bisk. 2018. https://openreview.net/forum?id=BJ8vJebC- Synthetic and natural noise both break neural machine translation . In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings

work page 2018

[5] [5]

Ond r ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, and Marco Turchi. 2015. https://doi.org/10.18653/v1/W15-3001 Findings of the 2015 workshop on statistical machine translation . In Proceedings of the Ten...

work page doi:10.18653/v1/w15-3001 2015

[6] [6]

Ond r ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. https://www.aclweb.org/anthology/W18-6401 Findings of the 2018 conference on machine translation ( WMT 18) . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 272--303, Belgium, Brussels. Association...

work page 2018

[7] [7]

Ond r ej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, and Ale s Tamchyna. 2014. https://doi.org/10.3115/v1/W14-3302 Findings of the 2014 workshop on statistical machine translation . In Proceedings of the Ninth Workshop on ...

work page doi:10.3115/v1/w14-3302 2014

[8] [8]

Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. 2018. https://www.aclweb.org/anthology/P18-1163 Towards robust neural machine translation . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1756--1766, Melbourne, Australia. Association for Computational Linguistics

work page 2018

[9] [9]

Huda Khayrallah and Philipp Koehn. 2018. https://www.aclweb.org/anthology/W18-2709 On the impact of various types of noise on neural machine translation . In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 74--83, Melbourne, Australia. Association for Computational Linguistics

work page 2018

[10] [10]

Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M

Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir K. Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M. Pino, and Hassan Sajjad. 2019. Findings of the first shared task on machine translation robustness. In Proceedings of the Fourth Conference on Machine Translation, Volume 2: Shared Task Papers, Florence, Italy. Association ...

work page 2019

[11] [11]

Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2018. http://arxiv.org/abs/1810.06729 Robust neural machine translation with joint textual and phonetic embedding . CoRR, abs/1810.06729

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Paul Michel and Graham Neubig. 2018. https://www.aclweb.org/anthology/D18-1050 MTNT : A testbed for machine translation of noisy text . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 543--553, Brussels, Belgium. Association for Computational Linguistics

work page 2018

[13] [13]

Martin Popel. 2018. https://www.aclweb.org/anthology/W18-6424 CUNI transformer neural MT system for WMT 18 . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 482--487, Belgium, Brussels. Association for Computational Linguistics

work page 2018

[14] [14]

Martin Popel and Ond r ej Bojar. 2018. https://doi.org/10.2478/pralin-2018-0002 Training Tips for the Transformer Model . The Prague Bulletin of Mathematical Linguistics, 110:43--70

work page doi:10.2478/pralin-2018-0002 2018

[15] [15]

Vaibhav, Sumeet Singh, Craig Stewart, and Graham Neubig. 2019. Improving robustness of machine translation with synthetic noise. CoRR, abs/1902.09508

work page internal anchor Pith review Pith/arXiv arXiv 2019

[16] [16]

Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan Gomez, Stephan Gouws, Llion Jones, ukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. https://www.aclweb.org/anthology/W18-1819 T ensor2 T ensor for neural machine translation . In Proceedings of the 13th Conference of the Association for M...

work page 2018

[17] [17]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, pages 6000--6010, Long Beach, CA, USA. Curran Associates, Inc

work page 2017