The University of Edinburgh's Submissions to the WMT19 News Translation Task

Alexandra Birch; Antonio Valerio Miceli Barone; Faheem Kirefu; Nikolay Bogoychev; Rachel Bawden; Roman Grundkiewicz; Ulrich Germann

arxiv: 1907.05854 · v1 · pith:ARJ2SBOHnew · submitted 2019-07-12 · 💻 cs.CL

The University of Edinburgh's Submissions to the WMT19 News Translation Task

Rachel Bawden , Nikolay Bogoychev , Ulrich Germann , Roman Grundkiewicz , Faheem Kirefu , Antonio Valerio Miceli Barone , Alexandra Birch This is my paper

Pith reviewed 2026-05-24 22:17 UTC · model grok-4.3

classification 💻 cs.CL

keywords machine translationback-translationnews translationWMT19German-Englishsynthetic datatokenisation

0 comments

The pith

Vast amounts of back-translated data continue to raise German-to-English news translation quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports the University of Edinburgh's entries in the WMT19 news translation shared task for six language directions. Across all pairs the authors added back-translations of target-side monolingual data as extra training material. The central investigation concerns German-to-English: whether translation quality keeps rising when the volume of this synthetic data is increased to very large scales, and what additional patterns emerge beyond earlier scaling studies.

Core claim

For German-to-English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018).

What carries the argument

Back-translation of large monolingual target-language corpora to create synthetic parallel training data.

If this is right

For high-resource pairs, simply scaling back-translated data remains an effective route to better systems.
Character-based tokenization can be directly compared against sub-word segmentation for Chinese source and target text.
Cross-lingual language-model pre-training plus pivoting through Hindi offers an alternative path for English-Gujarati.
Different pre-processing and tokenisation choices can be tested for English-to-Czech without changing the core back-translation approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the scaling result holds, future high-resource systems may be limited mainly by the supply of clean monolingual target text rather than by model capacity.
The same back-translation pipeline could be applied to other high-resource pairs where large monolingual corpora already exist.
Low-resource directions may still require the additional techniques tested here once back-translation data becomes scarce.

Load-bearing premise

The quality of the back-translated synthetic data stays high enough at very large volumes that extra data keeps helping rather than adding noise.

What would settle it

A controlled scaling curve for German-to-English in which BLEU or human scores stop rising or begin to fall once back-translated data exceeds a few hundred million sentences.

read the original abstract

The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English-to-Czech, we compared different pre-processing and tokenisation regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard WMT19 system description paper that applies back-translation and tokenization variants to the task pairs without new methods or verifiable quantitative claims.

read the letter

The main point is that this paper simply reports what the Edinburgh team submitted to the WMT19 news translation task across six directions. They used back-translated monolingual data as synthetic training for all pairs and added a few targeted experiments on top of that baseline approach. For German-English they scaled up the back-translated data volume and say they picked up a couple of extra observations relative to Edunov et al. For Chinese they compared character-level tokenization to subword segmentation. Gujarati gets some semi-supervised work with cross-lingual language model pre-training plus pivoting through Hindi, and Czech gets a comparison of preprocessing choices. These are all practical decisions for the shared task rather than new techniques. The paper does a reasonable job laying out the setups in plain terms so others can see what data and pipelines were used. That is the useful part for anyone who wants the details of one participating system. The soft spots are straightforward. The abstract contains no numbers, tables, or error bars, so the claimed additional insights on back-translation scaling cannot be checked from the text provided. This is typical for system description papers, but it limits how much weight the work can carry on its own. There is also no attempt to derive anything general or to falsify the assumption that more back-translated data stays helpful at extreme scale. The paper is aimed at readers who follow the WMT shared tasks or who need implementation notes for these specific language pairs. It does not reorganize any subfield. I would not bring it to a general reading group and would not cite it in my own work. It still deserves peer review because shared-task system reports form part of the official record for the task and the experimental setups are described honestly.

Referee Report

0 major / 0 minor

Summary. The manuscript describes the University of Edinburgh's submissions to the WMT19 News Translation shared task across six language pairs (En-Gu, Gu-En, En-Zh, Zh-En, De-En, En-Cs). It reports the creation and use of back-translated monolingual data as synthetic training data for all directions, plus pair-specific experiments: semi-supervised MT with cross-lingual LM pre-training and Hindi pivoting for English-Gujarati; character-based vs. sub-word tokenisation for Chinese; a scaling study on large volumes of back-translated data for German-English (extending Edunov et al. 2018); and comparisons of pre-processing and tokenisation regimes for English-Czech.

Significance. As a shared-task system description, the paper documents concrete engineering choices and the existence of a scaling study on back-translation volume for German-English. When the reported experimental outcomes and the additional insights relative to prior work are reproducible from the full system details, it supplies useful reference material for the community on practical NMT data-augmentation and tokenisation decisions.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review of our WMT19 system description paper and for recommending acceptance. We are pleased that the manuscript is viewed as supplying useful reference material on practical NMT choices for data augmentation and tokenisation.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a factual system-description report of WMT19 submissions. It contains no equations, no fitted parameters renamed as predictions, no derivations, and no load-bearing self-citations or uniqueness theorems. All claims reduce to the existence of the described experiments and comparisons with prior external work (e.g., Edunov et al. 2018), which are independent of the present manuscript.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No new theoretical claims; the work rests on standard neural MT training assumptions and the premise that automatic metrics track human quality, none of which are derived in the paper.

pith-pipeline@v0.9.0 · 5709 in / 1015 out tokens · 20607 ms · 2026-05-24T22:17:12.248678+00:00 · methodology

The University of Edinburgh's Submissions to the WMT19 News Translation Task

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)