pith. sign in

arxiv: 1906.09246 · v1 · pith:JZLE5QSTnew · submitted 2019-06-21 · 💻 cs.CL

CUNI System for the WMT19 Robustness Task

Pith reviewed 2026-05-25 18:50 UTC · model grok-4.3

classification 💻 cs.CL
keywords machine translationrobustnesstransformernoisy inputfine-tuningWMT19
0
0 comments X

The pith

A Transformer model trained on clean news data is already far more robust to noisy input than an LSTM baseline, and can be further improved by fine-tuning on noisy data without harming clean-domain quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports the CUNI submission to the WMT19 Robustness Task in machine translation. Their existing Transformer system, trained only on clean news data for a prior shared task, already outperforms the task's LSTM baseline on noisy inputs by a wide margin. Additional fine-tuning on in-domain noisy data raises performance on the robustness test set while leaving news-domain translation quality unchanged.

Core claim

Quantitative results show that the CUNI Transformer system is already far more robust to noisy input than the LSTM-based baseline provided by the task organizers. We further improved the performance of our model by fine-tuning on the in-domain noisy data without influencing the translation quality on the news domain.

What carries the argument

CUNI Transformer neural machine translation model fine-tuned on noisy in-domain data.

If this is right

  • Transformer models exhibit greater robustness to noisy input than LSTM models in machine translation.
  • Fine-tuning on noisy data raises accuracy on noisy test sets.
  • News-domain performance remains stable after the noisy fine-tuning step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fine-tuning recipe might transfer to other noisy-input sequence tasks such as speech recognition or summarization.
  • If confirmed across more noise distributions, it would reduce the need for separate robustness modules in production MT systems.
  • Comparing the effect on different noise types (typos, ASR errors, code-switching) would test how general the observed robustness is.

Load-bearing premise

Fine-tuning on noisy data leaves translation quality on clean news data unchanged.

What would settle it

Measuring a drop in BLEU on a standard news test set after the noisy-data fine-tuning step would show that the no-influence claim does not hold.

Figures

Figures reproduced from arXiv: 1906.09246 by Jind\v{r}ich Helcl, Jind\v{r}ich Libovick\'y, Martin Popel.

Figure 1
Figure 1. Figure 1: Learning curves showing the progress of fine [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

We present our submission to the WMT19 Robustness Task. Our baseline system is the Charles University (CUNI) Transformer system trained for the WMT18 shared task on News Translation. Quantitative results show that the CUNI Transformer system is already far more robust to noisy input than the LSTM-based baseline provided by the task organizers. We further improved the performance of our model by fine-tuning on the in-domain noisy data without influencing the translation quality on the news domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript reports the CUNI submission to the WMT19 Robustness Task. The baseline is the authors' WMT18 News Translation Transformer system, which is stated to be substantially more robust to noisy input than the task-provided LSTM baseline. Fine-tuning this model on in-domain noisy data is reported to yield further robustness gains while leaving translation quality on the news domain unchanged.

Significance. If the reported scores are accurate and the evaluation protocol is sound, the work supplies a concrete empirical data point on the relative robustness of Transformer versus LSTM architectures in MT and on the utility of targeted fine-tuning for robustness without clean-domain degradation. This is useful for practitioners building production MT systems that must handle noisy input.

major comments (1)
  1. [Abstract] Abstract, final sentence: the central claim that fine-tuning on noisy data leaves news-domain quality unchanged rests on an unstated verification procedure, test-set split, and before/after metric values. Without these details the claim cannot be assessed and is load-bearing for the paper's contribution.
minor comments (2)
  1. The manuscript should include the actual BLEU (or other metric) values, standard deviations if available, and the exact news-domain test set used for the 'no degradation' claim, preferably in a dedicated results table.
  2. Clarify the source and size of the 'in-domain noisy data' used for fine-tuning and whether any overlap exists with the robustness test sets.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We address the point below and will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract, final sentence: the central claim that fine-tuning on noisy data leaves news-domain quality unchanged rests on an unstated verification procedure, test-set split, and before/after metric values. Without these details the claim cannot be assessed and is load-bearing for the paper's contribution.

    Authors: We agree that the abstract claim requires explicit supporting details to be assessable. The verification used the WMT18 news test set and the BLEU metric; before/after scores will be reported in the revised abstract and methods section. This is a straightforward addition that does not alter the empirical findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a standard empirical system description for a shared task. It reports observed BLEU scores on robustness test sets for a pre-existing Transformer baseline (from WMT18) and after fine-tuning on noisy data. No equations, derivations, predictions, or first-principles claims exist. The baseline citation is descriptive only and does not support any load-bearing theoretical step. All central claims reduce directly to measured performance numbers on external test data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work rests on standard neural MT assumptions (Transformer architecture, supervised fine-tuning improves robustness) without introducing new free parameters or entities visible in the abstract.

pith-pipeline@v0.9.0 · 5607 in / 1018 out tokens · 22647 ms · 2026-05-25T18:50:59.711890+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 3 internal anchors

  1. [1]

    URL: " 'urlintro :=

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. https://arxiv.org/abs/1409.0473 Neural machine translation by jointly learning to align and translate . CoRR, abs/1409.0473

  4. [4]

    Yonatan Belinkov and Yonatan Bisk. 2018. https://openreview.net/forum?id=BJ8vJebC- Synthetic and natural noise both break neural machine translation . In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings

  5. [5]

    Ond r ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, and Marco Turchi. 2015. https://doi.org/10.18653/v1/W15-3001 Findings of the 2015 workshop on statistical machine translation . In Proceedings of the Ten...

  6. [6]

    Ond r ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. https://www.aclweb.org/anthology/W18-6401 Findings of the 2018 conference on machine translation ( WMT 18) . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 272--303, Belgium, Brussels. Association...

  7. [7]

    Ond r ej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, and Ale s Tamchyna. 2014. https://doi.org/10.3115/v1/W14-3302 Findings of the 2014 workshop on statistical machine translation . In Proceedings of the Ninth Workshop on ...

  8. [8]

    Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. 2018. https://www.aclweb.org/anthology/P18-1163 Towards robust neural machine translation . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1756--1766, Melbourne, Australia. Association for Computational Linguistics

  9. [9]

    Huda Khayrallah and Philipp Koehn. 2018. https://www.aclweb.org/anthology/W18-2709 On the impact of various types of noise on neural machine translation . In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 74--83, Melbourne, Australia. Association for Computational Linguistics

  10. [10]

    Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M

    Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir K. Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan M. Pino, and Hassan Sajjad. 2019. Findings of the first shared task on machine translation robustness. In Proceedings of the Fourth Conference on Machine Translation, Volume 2: Shared Task Papers, Florence, Italy. Association ...

  11. [11]

    Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2018. http://arxiv.org/abs/1810.06729 Robust neural machine translation with joint textual and phonetic embedding . CoRR, abs/1810.06729

  12. [12]

    Paul Michel and Graham Neubig. 2018. https://www.aclweb.org/anthology/D18-1050 MTNT : A testbed for machine translation of noisy text . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 543--553, Brussels, Belgium. Association for Computational Linguistics

  13. [13]

    Martin Popel. 2018. https://www.aclweb.org/anthology/W18-6424 CUNI transformer neural MT system for WMT 18 . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 482--487, Belgium, Brussels. Association for Computational Linguistics

  14. [14]

    Martin Popel and Ond r ej Bojar. 2018. https://doi.org/10.2478/pralin-2018-0002 Training Tips for the Transformer Model . The Prague Bulletin of Mathematical Linguistics, 110:43--70

  15. [15]

    Vaibhav, Sumeet Singh, Craig Stewart, and Graham Neubig. 2019. Improving robustness of machine translation with synthetic noise. CoRR, abs/1902.09508

  16. [16]

    Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan Gomez, Stephan Gouws, Llion Jones, ukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. https://www.aclweb.org/anthology/W18-1819 T ensor2 T ensor for neural machine translation . In Proceedings of the 13th Conference of the Association for M...

  17. [17]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, pages 6000--6010, Long Beach, CA, USA. Curran Associates, Inc