Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.
Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CL 2years
2026 2roles
background 1polarities
background 1representative citing papers
Nsanku benchmark shows current LLMs achieve only modest zero-shot translation scores on 43 Ghanaian languages, with no model reaching both high average performance and high cross-language consistency.
citing papers explorer
-
Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations
Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.
-
Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages
Nsanku benchmark shows current LLMs achieve only modest zero-shot translation scores on 43 Ghanaian languages, with no model reaching both high average performance and high cross-language consistency.