COMPASS is a new reproducible benchmarking framework for S2ST that deploys 46 metrics on 1248 configurations, shows single-metric rankings mislead, reduces to 10 metrics per direction, and finds domain-specific metrics better match human judgments than standalone MOS predictors.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Benchmarking Speech-to-Speech Translation Models
COMPASS is a new reproducible benchmarking framework for S2ST that deploys 46 metrics on 1248 configurations, shows single-metric rankings mislead, reduces to 10 metrics per direction, and finds domain-specific metrics better match human judgments than standalone MOS predictors.