SpeechEditBench provides seven atomic editing tasks, compositional multi-operation instructions, and an anchor-based protocol yielding target success, preservation success, and joint success metrics; evaluations show no model excels across dimensions and compositional editing is especially difficult
arXiv preprint arXiv:2407.17172 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
UniSAE unifies speaker, emotion, and multi-granularity content editing in speech via a new discrete phonetic posteriorgram representation and diffusion-based rendering.
citing papers explorer
-
UniSAE: Unified Speech Attribute Editing on Speaker, Emotion and Low-Level Content via Discrete Phonetic Posteriorgram Modelling
UniSAE unifies speaker, emotion, and multi-granularity content editing in speech via a new discrete phonetic posteriorgram representation and diffusion-based rendering.