MINT-Bench is a new benchmark using hierarchical taxonomy, multi-stage data pipeline, and hybrid evaluation to assess instruction-following TTS systems, revealing major gaps in compositional and paralinguistic controls.
Vevo2: A unified and controllable frame- work for speech and singing voice generation
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
eess.AS 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
YingMusic-Singer-Plus is a diffusion model for singing voice synthesis that preserves melody from a reference clip while allowing flexible lyric changes without manual alignment, outperforming Vevo2 and introducing the LyricEditBench benchmark.
citing papers explorer
-
MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
MINT-Bench is a new benchmark using hierarchical taxonomy, multi-stage data pipeline, and hybrid evaluation to assess instruction-following TTS systems, revealing major gaps in compositional and paralinguistic controls.
-
YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance
YingMusic-Singer-Plus is a diffusion model for singing voice synthesis that preserves melody from a reference clip while allowing flexible lyric changes without manual alignment, outperforming Vevo2 and introducing the LyricEditBench benchmark.