HardMTBench is a difficulty-aware benchmark of 20,000 directional test items across 12 domains that widens GEMBA score ranges by a factor of two and reveals domain-specific weaknesses in 22 MT systems.
CoRR, abs/2601.22931
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
HardMTBench: Stress-Testing Chinese-English Translation on Knowledge-Intensive Domains
HardMTBench is a difficulty-aware benchmark of 20,000 directional test items across 12 domains that widens GEMBA score ranges by a factor of two and reveals domain-specific weaknesses in 22 MT systems.