Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point

Christian Hardmeier; Liane Guillou

arxiv: 1808.04164 · v1 · pith:VMJCXG2Inew · submitted 2018-08-13 · 💻 cs.CL

Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point

Liane Guillou , Christian Hardmeier This is my paper

classification 💻 cs.CL

keywords metricsautomatichumanjudgementsperformancepronountesttranslation

0 comments

read the original abstract

We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semi-automatic metrics and test suites in place of fully automatic metrics.

This paper has not been read by Pith yet.

Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point

discussion (0)