Lingua Custodia at WMT'19: Attempts to Control Terminology
Pith reviewed 2026-05-24 23:59 UTC · model grok-4.3
The pith
Backtranslation with constrained decoding guarantees correct translation of specific unseen terms in machine translation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our primary submission to the shared task uses backtranslation generated with a type of decoding allowing the insertion of constraints in the output in order to guarantee the correct translation of specific terms that are not necessarily observed in the data.
What carries the argument
Constrained decoding during backtranslation that permits insertion of required terms into the generated output.
If this is right
- Terminology can be adapted to a narrow topic such as EU elections without any in-domain parallel data.
- Specific entities including political parties and person names receive more accurate translations than would be produced by unconstrained generation.
- The same constrained backtranslation pipeline applies to the German-to-French news translation direction.
- Terms absent from training data can still be rendered correctly by direct insertion during decoding.
Where Pith is reading between the lines
- The same constraint mechanism could be tested on other language pairs where terminology consistency matters more than raw fluency.
- If term lists can be extracted automatically from source text or glossaries, the method might reduce reliance on manual term identification.
- Combining the constrained decoder with later fine-tuning steps could produce additive gains on domain-specific accuracy.
Load-bearing premise
The approach assumes that the specific terms requiring constraints can be reliably identified in advance and that forcing their inclusion via constrained decoding will not degrade overall translation quality, fluency, or adequacy on the rest of the sentence.
What would settle it
A side-by-side comparison in which the same backtranslation model is run once with constraints and once without, then measured for whether the constrained version produces lower automatic scores or human judgments on overall sentence quality.
read the original abstract
This paper describes Lingua Custodia's submission to the WMT'19 news shared task for German-to-French on the topic of the EU elections. We report experiments on the adaptation of the terminology of a machine translation system to a specific topic, aimed at providing more accurate translations of specific entities like political parties and person names, given that the shared task provided no in-domain training parallel data dealing with the restricted topic. Our primary submission to the shared task uses backtranslation generated with a type of decoding allowing the insertion of constraints in the output in order to guarantee the correct translation of specific terms that are not necessarily observed in the data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes Lingua Custodia's submission to the WMT'19 German-to-French news translation shared task on the EU elections topic. It reports experiments on terminology adaptation via backtranslation generated with constrained decoding, with the goal of forcing correct translations of specific entities (political parties, person names) that are absent from the provided parallel training data.
Significance. If the constrained-decoding backtranslation is shown to preserve overall translation quality while guaranteeing term accuracy, the approach would supply a practical, deployable technique for domain-specific terminology control when in-domain parallel data are unavailable. The work is a standard system-description contribution to a shared-task track.
major comments (2)
- [Abstract] Abstract: the central claim that constrained decoding 'guarantees the correct translation of specific terms that are not necessarily observed in the data' is presented without any quantitative support (BLEU, TER, or human adequacy/fluency scores) comparing the constrained backtranslation to an unconstrained baseline; this absence prevents verification that the synthetic data remain usable for final model training.
- [Method description] Method description (primary submission paragraph): no ablation or diagnostic is reported on the interaction between forced term insertion and the remainder of the sentence; without such analysis it is impossible to confirm the weakest assumption that constraint placement does not degrade fluency or adequacy on non-constrained tokens.
minor comments (1)
- The manuscript should include the official WMT automatic and human evaluation scores for the submitted system, together with a brief comparison to the unconstrained backtranslation baseline used in the same pipeline.
Simulated Author's Rebuttal
We thank the referee for the comments on our WMT'19 system description. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that constrained decoding 'guarantees the correct translation of specific terms that are not necessarily observed in the data' is presented without any quantitative support (BLEU, TER, or human adequacy/fluency scores) comparing the constrained backtranslation to an unconstrained baseline; this absence prevents verification that the synthetic data remain usable for final model training.
Authors: The guarantee is a direct consequence of the constrained decoding procedure, which forces inclusion of the specified terms regardless of their presence in training data. The manuscript reports the final system performance on the shared-task test set, which was trained using the constrained backtranslations; this serves as the quantitative evidence of usability. No separate BLEU comparison isolating the backtranslation step was conducted, as the paper is a system description rather than an ablation study. We will revise the abstract to explicitly link the claim to the reported final scores. revision: yes
-
Referee: [Method description] Method description (primary submission paragraph): no ablation or diagnostic is reported on the interaction between forced term insertion and the remainder of the sentence; without such analysis it is impossible to confirm the weakest assumption that constraint placement does not degrade fluency or adequacy on non-constrained tokens.
Authors: We agree that a targeted diagnostic on how constraints affect surrounding tokens would be informative. As a concise shared-task system paper, the manuscript prioritizes description of the submitted pipeline and its overall results over internal ablations. The final system scores provide an indirect check on overall quality. We will add a short limitations paragraph in the revision acknowledging the absence of such diagnostics. revision: yes
Circularity Check
No circularity: empirical system description with no derivations or self-referential reductions.
full rationale
The paper is a WMT shared-task system description that reports an engineering pipeline (backtranslation + constrained decoding for terminology). No equations, fitted parameters, uniqueness theorems, or ansatzes are present. The central claim is an empirical assertion about term insertion, not a derivation that reduces to its own inputs by construction. No self-citation load-bearing steps or renaming of known results occur. The work is self-contained as a practical report; absence of ablations is a separate evidence question, not circularity.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.