Round-trip translation evaluation shows that existing multilingual benchmarks measure reasoning and recall instead of language skills, with the new LiT benchmark correlating at rho=0.94 to LMArena ratings.
Is the target text linguistically correct and natural?
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss
Round-trip translation evaluation shows that existing multilingual benchmarks measure reasoning and recall instead of language skills, with the new LiT benchmark correlating at rho=0.94 to LMArena ratings.