Unified no-reference models assess audio aesthetics across speech, music, and sound via four perceptual axes and achieve performance comparable or superior to human mean opinion scores.
The t05 system for the V oice MOS C hallenge 2024: Transfer learning from deep image classifier to naturalness MOS prediction of high-quality synthetic speech
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
Unified no-reference models assess audio aesthetics across speech, music, and sound via four perceptual axes and achieve performance comparable or superior to human mean opinion scores.