Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

Martin Volk; Rico Sennrich; Samuel L\"aubli

arxiv: 1808.07048 · v1 · pith:IMRJ7AQEnew · submitted 2018-08-21 · 💻 cs.CL

Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

Samuel L\"aubli , Rico Sennrich , Martin Volk This is my paper

classification 💻 cs.CL

keywords translationevaluationhumanmachinedocument-leveldocumentsparitysentences

0 comments

read the original abstract

Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese--English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasise the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Translationese in Machine Translation Evaluation
cs.CL 2019-06 unverdicted novelty 6.0

Translationese in MT test sets biases evaluations, supporting exclusion of reverse-created data, re-evaluation of human-parity claims, and power analysis for reliable significance testing.
An Explainable Approach to Document-level Translation Evaluation with Topic Modeling
cs.CE 2026-04 unverdicted novelty 5.0

A topic-modeling framework measures document-level thematic consistency in translations by aligning key tokens across languages with a bilingual dictionary and scoring via cosine similarity, providing explainable insi...