Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

· 2026 · eess.AS · arXiv 2606.19951

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Mean opinion score (MOS) prediction models are widely used as proxy metrics in text-to-speech (TTS) research, yet their ability to capture quality differences beyond acoustic fidelity remains unclear. We investigate this via controlled perturbations on speech: acoustic degradation, prosodic errors, and manipulation of speaker-specific characteristics such as pitch and speaking rate. We obtained MOS predictions for these speech samples from both human listeners and the model, and analyzed the differences in their perceptual characteristics. Results show that most models track acoustic degradation well, while all are insensitive to prosodic errors despite large subjective score drops. For speaker characteristics, models exhibit a double dissociation: strong mean fundamental frequency (F0) biases absent in human ratings, yet insensitivity to speaking rate and F0 variability that humans notice. These findings highlight limitations of scalar MOS prediction beyond acoustic fidelity.

representative citing papers

Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

eess.AS · 2026-06-18 · unverdicted · novelty 5.0

MOS models match humans on acoustic degradation but are insensitive to prosodic errors and show a double dissociation on speaker characteristics like mean F0 bias and insensitivity to rate and F0 variability.

citing papers explorer

Showing 1 of 1 citing paper.

Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations eess.AS · 2026-06-18 · unverdicted · none · ref 2 · internal anchor
MOS models match humans on acoustic degradation but are insensitive to prosodic errors and show a double dissociation on speaker characteristics like mean F0 bias and insensitivity to rate and F0 variability.

Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

fields

years

verdicts

representative citing papers

citing papers explorer