AgentSteerTTS proposes a multi-agent framework with adversarial disentanglement, dual-stream anchoring via acoustic prototypes, and fast-slow feedback to achieve intent-faithful expressive TTS for composite instructions.
Mm-tts: A unified framework for multimodal, prompt-induced emotional text-to- speech synthesis
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
A two-stage static-then-dynamic prompt selection strategy using prosodic features, LLM coherence scores, and similarity metrics improves emotion intensity and speaker consistency in zero-shot TTS.
citing papers explorer
-
AgentSteerTTS: A Multi-Agent Closed-Loop Framework for Composite-Instruction Text-to-Speech
AgentSteerTTS proposes a multi-agent framework with adversarial disentanglement, dual-stream anchoring via acoustic prototypes, and fast-slow feedback to achieve intent-faithful expressive TTS for composite instructions.
-
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
A two-stage static-then-dynamic prompt selection strategy using prosodic features, LLM coherence scores, and similarity metrics improves emotion intensity and speaker consistency in zero-shot TTS.