Lingua Custodia at WMT'19: Attempts to Control Terminology

Franck Burlot

arxiv: 1907.04618 · v1 · pith:ZZH3SZTLnew · submitted 2019-07-10 · 💻 cs.CL

Lingua Custodia at WMT'19: Attempts to Control Terminology

Franck Burlot This is my paper

Pith reviewed 2026-05-24 23:59 UTC · model grok-4.3

classification 💻 cs.CL

keywords machine translationterminology controlconstrained decodingbacktranslationWMT shared taskGerman-French translationEU elections domain

0 comments

The pith

Backtranslation with constrained decoding guarantees correct translation of specific unseen terms in machine translation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports experiments adapting machine translation terminology for a German-to-French news task on EU elections, a topic with no provided in-domain parallel training data. The core method generates backtranslations using a decoding approach that inserts constraints to force accurate rendering of particular entities such as political parties and person names. This targets cases where the needed terms do not appear in the training data. A reader would care because the technique offers a way to steer output terminology for restricted domains without collecting new parallel text.

Core claim

Our primary submission to the shared task uses backtranslation generated with a type of decoding allowing the insertion of constraints in the output in order to guarantee the correct translation of specific terms that are not necessarily observed in the data.

What carries the argument

Constrained decoding during backtranslation that permits insertion of required terms into the generated output.

If this is right

Terminology can be adapted to a narrow topic such as EU elections without any in-domain parallel data.
Specific entities including political parties and person names receive more accurate translations than would be produced by unconstrained generation.
The same constrained backtranslation pipeline applies to the German-to-French news translation direction.
Terms absent from training data can still be rendered correctly by direct insertion during decoding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same constraint mechanism could be tested on other language pairs where terminology consistency matters more than raw fluency.
If term lists can be extracted automatically from source text or glossaries, the method might reduce reliance on manual term identification.
Combining the constrained decoder with later fine-tuning steps could produce additive gains on domain-specific accuracy.

Load-bearing premise

The approach assumes that the specific terms requiring constraints can be reliably identified in advance and that forcing their inclusion via constrained decoding will not degrade overall translation quality, fluency, or adequacy on the rest of the sentence.

What would settle it

A side-by-side comparison in which the same backtranslation model is run once with constraints and once without, then measured for whether the constrained version produces lower automatic scores or human judgments on overall sentence quality.

read the original abstract

This paper describes Lingua Custodia's submission to the WMT'19 news shared task for German-to-French on the topic of the EU elections. We report experiments on the adaptation of the terminology of a machine translation system to a specific topic, aimed at providing more accurate translations of specific entities like political parties and person names, given that the shared task provided no in-domain training parallel data dealing with the restricted topic. Our primary submission to the shared task uses backtranslation generated with a type of decoding allowing the insertion of constraints in the output in order to guarantee the correct translation of specific terms that are not necessarily observed in the data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard WMT system paper applying constrained decoding during backtranslation to force specific terms into synthetic data for a German-French news task, with no new method and no results shown in the abstract.

read the letter

The one thing to know is that the paper describes using constrained decoding on backtranslated data so that terms like political party names get inserted correctly even when absent from training. They target the EU elections domain for German-to-French where no in-domain parallels were supplied by the shared task. That setup is reasonable for the problem they faced. What they do well is lay out a concrete pipeline that combines two established techniques—backtranslation plus constraint insertion—to address out-of-vocabulary entities in a narrow news topic. The description is clear about identifying the terms in advance and forcing their placement in the output. Credit for focusing on a real shared-task constraint rather than a toy setting. The soft spots are exactly where the stress-test note flags them. The abstract supplies no BLEU numbers, no term-accuracy figures, and no comparison between constrained and unconstrained backtranslation, so there is no evidence yet that the forced insertions preserve overall quality or fluency. The central assumption—that pre-identified terms can be added without harming the rest of the sentence—remains untested in the provided text. If the full paper contains those ablations and human judgments, the claim strengthens; otherwise it stays descriptive. This paper is for MT engineers who need to adapt systems to specific events or domains and want to see one group's implementation choices. A reader already working on terminology control might borrow the constraint mechanism, but the work does not change how the field thinks about the underlying problem. I would send it to peer review for the WMT proceedings because system papers like this document what was tried in the task and give others a starting point, even with limited novelty.

Referee Report

2 major / 1 minor

Summary. The manuscript describes Lingua Custodia's submission to the WMT'19 German-to-French news translation shared task on the EU elections topic. It reports experiments on terminology adaptation via backtranslation generated with constrained decoding, with the goal of forcing correct translations of specific entities (political parties, person names) that are absent from the provided parallel training data.

Significance. If the constrained-decoding backtranslation is shown to preserve overall translation quality while guaranteeing term accuracy, the approach would supply a practical, deployable technique for domain-specific terminology control when in-domain parallel data are unavailable. The work is a standard system-description contribution to a shared-task track.

major comments (2)

[Abstract] Abstract: the central claim that constrained decoding 'guarantees the correct translation of specific terms that are not necessarily observed in the data' is presented without any quantitative support (BLEU, TER, or human adequacy/fluency scores) comparing the constrained backtranslation to an unconstrained baseline; this absence prevents verification that the synthetic data remain usable for final model training.
[Method description] Method description (primary submission paragraph): no ablation or diagnostic is reported on the interaction between forced term insertion and the remainder of the sentence; without such analysis it is impossible to confirm the weakest assumption that constraint placement does not degrade fluency or adequacy on non-constrained tokens.

minor comments (1)

The manuscript should include the official WMT automatic and human evaluation scores for the submitted system, together with a brief comparison to the unconstrained backtranslation baseline used in the same pipeline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments on our WMT'19 system description. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that constrained decoding 'guarantees the correct translation of specific terms that are not necessarily observed in the data' is presented without any quantitative support (BLEU, TER, or human adequacy/fluency scores) comparing the constrained backtranslation to an unconstrained baseline; this absence prevents verification that the synthetic data remain usable for final model training.

Authors: The guarantee is a direct consequence of the constrained decoding procedure, which forces inclusion of the specified terms regardless of their presence in training data. The manuscript reports the final system performance on the shared-task test set, which was trained using the constrained backtranslations; this serves as the quantitative evidence of usability. No separate BLEU comparison isolating the backtranslation step was conducted, as the paper is a system description rather than an ablation study. We will revise the abstract to explicitly link the claim to the reported final scores. revision: yes
Referee: [Method description] Method description (primary submission paragraph): no ablation or diagnostic is reported on the interaction between forced term insertion and the remainder of the sentence; without such analysis it is impossible to confirm the weakest assumption that constraint placement does not degrade fluency or adequacy on non-constrained tokens.

Authors: We agree that a targeted diagnostic on how constraints affect surrounding tokens would be informative. As a concise shared-task system paper, the manuscript prioritizes description of the submitted pipeline and its overall results over internal ablations. The final system scores provide an indirect check on overall quality. We will add a short limitations paragraph in the revision acknowledging the absence of such diagnostics. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system description with no derivations or self-referential reductions.

full rationale

The paper is a WMT shared-task system description that reports an engineering pipeline (backtranslation + constrained decoding for terminology). No equations, fitted parameters, uniqueness theorems, or ansatzes are present. The central claim is an empirical assertion about term insertion, not a derivation that reduces to its own inputs by construction. No self-citation load-bearing steps or renaming of known results occur. The work is self-contained as a practical report; absence of ablations is a separate evidence question, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical machine translation system paper; the abstract contains no mathematical derivations, free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5622 in / 1091 out tokens · 21631 ms · 2026-05-24T23:59:18.225864+00:00 · methodology

Lingua Custodia at WMT'19: Attempts to Control Terminology

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)