Hierspeech++: Bridging the gap between seman- tic and acoustic representation of speech by hierarchical varia- tional inference for zero-shot speech synthesis,

Sang-Hoon Lee, Ha-Yeong Choi, Seung-Bin Kim, Seong- Whan Lee, “Hierspeech++: Bridging the gap between semantic, acoustic representation of speech by hierarchical variational inference for zero-shot speech synthesis,”IEEE Transa · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Cross-modal Consistency Guidance for Robust Emotion Control in Auto-Regressive TTS Models

cs.CL · 2025-10-15 · unverdicted · novelty 5.0 · 2 refs

Introduces CCG-CFG with inconsistency-based dynamic scales and hard-sample mining distillation to boost emotional alignment in auto-regressive TTS, reporting up to 12% absolute gains in emotion recognition accuracy.

citing papers explorer

Showing 1 of 1 citing paper.

Cross-modal Consistency Guidance for Robust Emotion Control in Auto-Regressive TTS Models cs.CL · 2025-10-15 · unverdicted · none · ref 19 · 2 links
Introduces CCG-CFG with inconsistency-based dynamic scales and hard-sample mining distillation to boost emotional alignment in auto-regressive TTS, reporting up to 12% absolute gains in emotion recognition accuracy.

Hierspeech++: Bridging the gap between seman- tic and acoustic representation of speech by hierarchical varia- tional inference for zero-shot speech synthesis,

fields

years

verdicts

representative citing papers

citing papers explorer